CN111008660A

CN111008660A - Semantic map generation method, device and system, storage medium and electronic equipment

Info

Publication number: CN111008660A
Application number: CN201911221893.1A
Authority: CN
Inventors: 孙晓峰
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2020-04-14

Abstract

The embodiment of the invention relates to a semantic map generation method, a semantic map generation device, a semantic map generation system, a storage medium and electronic equipment, and relates to the technical field of vehicle navigation, wherein the method comprises the following steps: carrying out image correction on a plurality of left and right images included in an original image to obtain a plurality of stereopair, and carrying out semantic segmentation on each stereopair to obtain a semantic category corresponding to each stereopair; stereo matching is carried out on each stereo pair and the semantic categories corresponding to each stereo pair to obtain a disparity map of each stereo pair, and single-frame semantization dense point clouds of each left image and each right image are generated according to the disparity map of each stereo pair; and generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds, and generating the semantic map according to the global semantic dense point clouds. The embodiment of the invention improves the generation efficiency of the semantic map.

Description

Semantic map generation method, device and system, storage medium and electronic equipment

Technical Field

The embodiment of the invention relates to the technical field of vehicle navigation, in particular to a semantic map generating method, a semantic map generating device, a semantic map generating system, a computer-readable storage medium and electronic equipment.

Background

The high-precision map is used as an important component of an automatic driving or unmanned driving system and plays an important role in the aspects of perception assistance of the surrounding environment of a vehicle, high-precision positioning, lane level path planning, decision making and the like.

At present, the main technical process of high-precision map production comprises five basic links of data acquisition, data preprocessing, automatic identification, manual checking and correction, compiling and publishing and the like. Specifically, in the current stage, the production of high-precision maps mostly needs to combine three-dimensional laser point cloud data obtained by a laser radar and image data obtained by a camera, and the production is carried out in a multi-source data fusion mode.

However, the above method has the following drawbacks: in the automatic identification process, different algorithms are required to be respectively adopted for processing the laser point cloud data and the image data, and finally, the results of the laser point cloud data and the image data are required to be fused, so that the map generation efficiency is low.

Therefore, it is necessary to provide a new semantic map generation method.

It is to be noted that the information invented in the above background section is only for enhancing the understanding of the background of the present invention, and therefore, may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide a semantic map generating method, a semantic map generating device, a semantic map generating system, a computer readable storage medium and an electronic device, thereby overcoming the problem of low map generating efficiency caused by the limitations and defects of the related art at least to a certain extent.

According to an aspect of the present disclosure, there is provided a semantic map generation method, including:

carrying out image correction on a plurality of left and right images included in an original image to obtain a plurality of stereopair, and carrying out semantic segmentation on each stereopair to obtain a semantic category corresponding to each stereopair;

stereo matching is carried out on each stereo pair and the semantic categories corresponding to each stereo pair to obtain a disparity map of each stereo pair, and single-frame semantization dense point clouds of each left image and each right image are generated according to the disparity map of each stereo pair;

and generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds, and generating the semantic map according to the global semantic dense point clouds.

In an exemplary embodiment of the present disclosure, the generating method of the semantic map further includes:

acquiring an original image shot by a binocular camera;

wherein, carrying out image rectification on a plurality of left and right images included in an original image to obtain a plurality of stereopair comprises the following steps:

performing distortion correction and/or stereo correction on a plurality of left and right images included in the original image;

and obtaining a plurality of stereopairs according to the corrected left and right images.

In an exemplary embodiment of the present disclosure, performing distortion correction on a plurality of left and right images included in the original picture includes:

carrying out distortion correction on a plurality of left and right images included in the original image by using a parameter distortion model;

wherein the parametric distortion model comprises a plurality of radial distortion parameters and a plurality of tangential distortion parameters.

In an exemplary embodiment of the present disclosure, the performing stereoscopic rectification on the plurality of left and right images included in the original video includes:

and performing stereo correction on a plurality of left and right images included in the original image by using a first parameter matrix and a second parameter matrix in a left camera and a right camera of the binocular camera and an external parameter calibration matrix between the left camera and the right camera.

In an exemplary embodiment of the present disclosure, stereo matching each of the stereo pairs and a semantic category corresponding to each of the stereo pairs, and obtaining a disparity map of each of the stereo pairs includes:

constructing an energy function according to the weight coefficient corresponding to each semantic category, the matching cost function of the left image and the right image in each stereoscopic image pair and the parallax between a certain pixel point in the left image and a neighborhood pixel point corresponding to the pixel point;

and performing stereo matching on each stereo pair and the semantic category corresponding to each stereo pair according to the energy function to obtain a disparity map of each stereo pair.

In an exemplary embodiment of the present disclosure, generating a single-frame semantically dense point cloud of each of the left and right images from the disparity map of each of the stereo pairs comprises:

converting the disparity map of each stereopair into three-dimensional point cloud, and respectively assigning the semantic category corresponding to each stereopair to the three-dimensional point corresponding to each semantic category;

and filtering the three-dimensional points by using a preset filtering rule, and generating single-frame semantization dense point clouds of the left image and the right image according to the filtered three-dimensional points.

In an exemplary embodiment of the disclosure, generating the global semantically dense point cloud of the original image according to each of the single-frame semantically dense point clouds includes:

generating a global pose matrix of the vehicle according to the track information of the vehicle acquired by the combined inertial navigation coordinate system; wherein the combined inertial navigation coordinate system is an inertial sensor/global positioning system combined coordinate system;

and generating the global semantization dense point cloud of the original image according to an external parameter calibration matrix between a left camera and a combined inertial navigation coordinate system in the binocular camera, the global pose matrix and each single-frame semantization dense point cloud.

According to an aspect of the present disclosure, there is provided a semantic map generation apparatus including:

the image correction module is used for carrying out image correction on a plurality of left and right images included in an original image to obtain a plurality of stereopair, and carrying out semantic segmentation on each stereopair to obtain a semantic category corresponding to each stereopair;

the stereo matching module is used for carrying out stereo matching on each stereo pair and the semantic category corresponding to each stereo pair to obtain a disparity map of each stereo pair, and generating single-frame semantic dense point clouds of each left image and each right image according to the disparity map of each stereo pair;

and the semantic map generation module is used for generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds and generating the semantic map according to the global semantic dense point clouds.

According to an aspect of the present disclosure, there is provided a semantic map generation system including:

the binocular stereo camera is used for acquiring an original image;

the integrated navigation system is used for acquiring the track information of the vehicle;

the processing system is connected with the binocular stereo camera and the combined inertial navigation system through a network and is used for carrying out image correction on a plurality of left and right images included in an original image to obtain a plurality of stereo pairs and carrying out semantic segmentation on each stereo pair to obtain a semantic category corresponding to each stereo pair; and

stereo matching is carried out on each stereo pair and the semantic categories corresponding to each stereo pair to obtain a disparity map of each stereo pair, and single-frame semantization dense point clouds of each left image and each right image are generated according to the disparity map of each stereo pair; and

generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds and the track information of the vehicles, and generating the semantic map according to the global semantic dense point clouds.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a semantic map generating method according to any one of the above.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the semantic map generation method of any one of the above via execution of the executable instructions.

On one hand, a plurality of stereo pairs are obtained by carrying out image rectification on a plurality of left and right images included in an original image, and semantic classification corresponding to each stereo pair is obtained by carrying out semantic segmentation on each stereo pair; then, stereo matching is carried out on each stereo pair and semantic categories corresponding to each stereo pair to obtain a disparity map of each stereo pair, and single-frame semantic dense point clouds of each left image and each right image are generated according to the disparity map of each stereo pair; finally, generating global semantization dense point clouds of the original images according to the single-frame semantization dense point clouds, and generating semantic maps according to the global semantization dense point clouds, so that the problem that in the prior art, because different algorithms are required to be adopted for processing laser point cloud data and image data respectively in the automatic identification process, and finally, the results of the laser point cloud data and the image data are required to be fused, the map generation efficiency is low, and the semantic map generation efficiency is improved; on the other hand, generating single-frame semantization dense point cloud of each left image and each right image according to the disparity map of each stereopair; finally, generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds, and generating semantic maps according to the global semantic dense point clouds, so that the accuracy of the semantic maps is improved; on the other hand, semantic categories corresponding to the stereo pairs are obtained by performing semantic segmentation on the stereo pairs; and then, stereo matching is carried out on each stereo pair and the semantic category corresponding to each stereo pair to obtain the disparity map of each stereo pair, so that the accuracy of the disparity map is improved, and the accuracy of the semantic map is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 schematically illustrates a block diagram of a map generation system according to an exemplary embodiment of the present invention;

FIG. 2 schematically illustrates a flow chart of a method of generating a semantic map according to an exemplary embodiment of the invention;

fig. 3 schematically illustrates a flowchart of a method for stereo matching of stereo pairs and semantic categories corresponding to the stereo pairs to obtain disparity maps of the stereo pairs according to an exemplary embodiment of the present invention;

FIG. 4 schematically illustrates a flow diagram of a method of generating a single frame semantically dense point cloud of left and right images from disparity maps of stereo pairs, according to an exemplary embodiment of the present invention;

FIG. 5 schematically illustrates a flowchart of a method for generating globally semantically dense point clouds of an original image from single-frame semantically dense point clouds, according to an exemplary embodiment of the present invention;

FIG. 6 schematically illustrates a flow chart of a method of generating another semantic map according to an exemplary embodiment of the invention;

FIG. 7 schematically illustrates a block diagram of a semantic map generation system in accordance with an exemplary embodiment of the present invention;

FIG. 8 schematically shows a block diagram of a semantic map generation apparatus according to an exemplary embodiment of the present invention;

fig. 9 schematically illustrates an electronic device for implementing the semantic map generating method according to an exemplary embodiment of the present invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the invention.

Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Referring to fig. 1, the main technical process of generating a high-precision map at present includes five basic links, i.e., data acquisition, data preprocessing, automatic identification, manual checking and correction, compiling and publishing. Specifically, firstly, data acquisition can be performed through the laser radar 101, the camera 102 and the GPS integrated navigation system 103 to obtain a point cloud image 104, real-time image data 105 and vehicle track information 106, then point cloud automatic identification is performed on the point cloud image through the point cloud automatic identification module 107 to generate point cloud data, and image automatic identification is performed on the real-time image data through the image automatic identification module 108 to obtain an image identification result; further, the point cloud data, the image recognition result and the vehicle track information acquired by the GPS integrated navigation system are fused by the image fusion module 109, the fusion result is checked and corrected by the check and correction module 110, the fusion result after the check and correction is finally verified on site, and the fusion result passing the verification is compiled and issued as a map.

However, for the high-precision map production scheme combining the laser radar and the camera, which is currently mainstream, the following disadvantages mainly exist:

on the one hand, the data acquisition system is comparatively complicated, difficult maintenance: at present, most map acquisition systems are composed of a plurality of sensors such as laser radars, cameras and the like, and when any sensor goes wrong, the whole system cannot work normally, so that higher challenges are provided for system maintenance;

on the other hand, the collection equipment price is not very, and difficult volume production: at present, a set of reliable acquisition equipment often needs dozens of yuan or even millions of yuan, wherein the prices of a laser radar sensor and a high-precision GPS combined inertial navigation system are always higher, and the large-scale mass production of an acquisition device is not facilitated;

on the other hand, the data acquisition quantity is large, and higher requirements are put forward on data storage and transmission: because the dense three-dimensional point cloud data and the image data of the road scene are recorded at the same time, a large amount of data can be acquired by the acquisition equipment every minute, which puts higher requirements on the storage and transmission of the data;

further, the coordinate systems of the sensors are not consistent, and additional calibration registration is required: different sensors have different coordinate references, and reliable map results can be obtained only by accurately calibrating the relative poses among the sensors. The accuracy of the map is low because the calibration accuracy affects the final map forming accuracy;

furthermore, the multi-source data needs to be processed and fused respectively, and the processing flow is complex: because the laser point cloud data and the image data need to be processed by different algorithms respectively in the automatic identification process, and finally, the results of the laser point cloud data and the image data need to be fused, the complex processing flow is not beneficial to the improvement of the map generation efficiency.

In the present exemplary embodiment, a semantic map generation method is first provided, where the method may be operated in a server, a server cluster, a cloud server, or the like, and may also be operated in a terminal device; of course, those skilled in the art may also operate the method of the present invention on other platforms as needed, and this is not particularly limited in this exemplary embodiment. Referring to fig. 2, the semantic map generating method may include the steps of:

step S210, image rectification is carried out on a plurality of left and right images included in an original image to obtain a plurality of stereopair, and semantic segmentation is carried out on each stereopair to obtain a semantic category corresponding to each stereopair.

And S220, performing stereo matching on each stereo pair and the semantic categories corresponding to each stereo pair to obtain a disparity map of each stereo pair, and generating single-frame semantization dense point clouds of each left image and each right image according to the disparity map of each stereo pair.

Step S230, generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds, and generating the semantic map according to the global semantic dense point clouds.

In the semantic map generating method, on one hand, a plurality of stereopairs are obtained by carrying out image correction on a plurality of left and right images included in an original image, and semantic segmentation is carried out on each stereopair to obtain a semantic category corresponding to each stereopair; then, stereo matching is carried out on each stereo pair and semantic categories corresponding to each stereo pair to obtain a disparity map of each stereo pair, and single-frame semantic dense point clouds of each left image and each right image are generated according to the disparity map of each stereo pair; finally, generating global semantization dense point clouds of the original images according to the single-frame semantization dense point clouds, and generating semantic maps according to the global semantization dense point clouds, so that the problem that in the prior art, because different algorithms are required to be adopted for processing laser point cloud data and image data respectively in the automatic identification process, and finally, the results of the laser point cloud data and the image data are required to be fused, the map generation efficiency is low, and the semantic map generation efficiency is improved; on the other hand, generating single-frame semantization dense point cloud of each left image and each right image according to the disparity map of each stereopair; finally, generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds, and generating semantic maps according to the global semantic dense point clouds, so that the accuracy of the semantic maps is improved; on the other hand, semantic categories corresponding to the stereo pairs are obtained by performing semantic segmentation on the stereo pairs; and then, stereo matching is carried out on each stereo pair and the semantic category corresponding to each stereo pair to obtain the disparity map of each stereo pair, so that the accuracy of the disparity map is improved, and the accuracy of the semantic map is further improved.

Hereinafter, each step involved in the generating method of the semantic map of the exemplary embodiment of the present invention will be explained and explained in detail with reference to the drawings.

First, the objects of the embodiments of the present invention will be explained and explained. The invention provides a high-precision map production scheme based on binocular stereo vision, which is a starting point of the invention on how to establish a high-precision map production system which is reliable in system, controllable in cost, simple in process and efficient in production, and aims at overcoming many defects in the current mainstream high-precision map production scheme based on the combination of a laser radar and a camera. In the invention, the expensive lidar sensor is replaced by a binocular stereo camera: on one hand, the complexity and the price cost of the map acquisition device are reduced; on the other hand, the complexity of data processing in the later period is simplified, and the problems of pose calibration and multi-source data fusion between the laser radar and the camera are solved.

Specifically, the invention provides a high-precision map production scheme based on binocular stereo vision aiming at main links in the high-precision map production process, and introduces a method for generating road scene semantization dense point cloud based on a binocular stereo camera and a GPS combined navigation system in detail. Based on the generated semantization dense point cloud, map production operators can conveniently perform vectorization extraction on interested map elements, and the map generation efficiency is improved.

The method takes a left image pair and a right image pair acquired by a binocular stereo camera and acquired vehicle track information acquired by a GPS/IMU combined navigation system as input, and finally obtains road scene semantization dense point cloud under a global coordinate system through five steps of image correction, pixel-level semantic segmentation, stereo matching, point cloud generation and global projection transformation. In the present invention, it is assumed that the intrinsic parameters of each sensor and the relative extrinsic parameters between sensors are known. Wherein, K₁，K₂Internal parameter matrices, D, for the left and right cameras, respectively₁，D₂Are vectors formed by distortion coefficients of the left camera and the right camera respectively,

calibrating matrix, T, for external parameters between left and right cameras_ICAnd calibrating a matrix for the external parameters between the left camera and the GPS/IMU integrated navigation system. The implementation of each step is described in detail below.

In step S210, a plurality of left and right images included in the original image are subjected to image rectification to obtain a plurality of stereo pairs, and each stereo pair is subjected to semantic segmentation to obtain a semantic category corresponding to each stereo pair.

In this exemplary embodiment, first, an original image captured by a binocular camera is acquired, and then, image rectification is performed on a plurality of left and right images included in the original image to obtain a plurality of stereo pairs, which may specifically include: performing distortion correction and/or stereo correction on a plurality of left and right images included in the original image; and obtaining a plurality of stereopairs according to the corrected left and right images.

Wherein the performing distortion correction on the plurality of left and right images included in the original video may include: carrying out distortion correction on a plurality of left and right images included in the original image by using a parameter distortion model; wherein the parametric distortion model comprises a plurality of radial distortion parameters and a plurality of tangential distortion parameters. Further, the performing stereoscopic rectification on the plurality of left and right images included in the original image may include: and performing stereo correction on a plurality of left and right images included in the original image by using a first parameter matrix and a second parameter matrix in a left camera and a right camera of the binocular camera and an external parameter calibration matrix between the left camera and the right camera.

Specifically, because the original image acquired by the video camera inevitably has distortion, the images acquired by the left and right cameras need to be respectively subjected to distortion correction according to the calibrated distortion parameters. In the present invention, D ═ k is adopted₁,k₂,k₃,p₁,p₂]The five-parameter distortion model of (1), wherein k₁,k₂,k₃As a radial distortion parameter, p₁,p₂Is a tangential distortion parameter. The correction formula is as follows:

wherein (x)_corr,y_corr) Is the pixel coordinate after the distortion correction, (x, y) is the pixel coordinate before the distortion correction, and

in addition, in order to generate a stereo pair without vertical parallax and facilitate subsequent stereo matching, besides distortion correction, parameter matrices K in the left and right two-phase cameras acquired in a calibration stage of the binocular camera are required₁，K₂And an external parameter calibration matrix between the two cameras

And (5) performing stereo correction. For example, the Bouguet epipolar line correction algorithm implemented in the opencv toolkit can be used for stereo correction purposes to generate an undistorted stereopair.

Further, after the corrected stereopair is obtained, semantic segmentation can be performed on each stereopair to obtain semantic categories corresponding to each stereopair. For example, according to the semantic requirements of the high-precision map, the street scene image can be divided into 14 semantic categories of roads, automobiles, motorcycles, bicycles, people, riders, buildings, railings, poles, traffic lights, traffic signboards, vegetation, sky and background, and the deep convolutional neural network model can be trained by utilizing a neural network result search algorithm for semantic image segmentation. And finally, inputting the left and right images after the distortion correction into the network model respectively to obtain pixel level semantic class images corresponding to each stereopair.

In step S220, stereo matching is performed on each stereo pair and the semantic category corresponding to each stereo pair to obtain a disparity map of each stereo pair, and a single-frame semantization dense point cloud of each left image and each right image is generated according to the disparity map of each stereo pair.

In this exemplary embodiment, first, stereo matching is performed on each stereo pair and the semantic category corresponding to each stereo pair, so as to obtain a disparity map of each stereo pair. Specifically, as shown in fig. 3, the step of performing stereo matching on each stereo pair and the semantic type corresponding to each stereo pair to obtain the disparity map of each stereo pair may include step S310 and step S320, which will be described in detail below.

In step S310, an energy function is constructed according to the weight coefficient corresponding to each semantic category, the matching cost function of the left image and the right image in each stereo image pair, and the parallax between a certain pixel point in the left image and the neighboring pixel point corresponding to the pixel point.

In step S320, stereo matching is performed on each stereo pair and the semantic category corresponding to each stereo pair according to the energy function, so as to obtain a disparity map of each stereo pair.

Step S310 and step S320 will be explained and explained below. Specifically, this step mainly performs pixel-by-pixel matching on the stereo-corrected left and right images. I.e. for each in the left imageAnd searching a corresponding matching pixel point in the right image by a pixel point, and recording the difference value of the pixel coordinate of the corresponding matching point in the X direction to generate a parallax map. In order to improve matching accuracy and robustness, the method is different from the existing stereo matching algorithm, the semantic categories obtained in the last step are integrated into the stereo matching process, and the constructed energy function is shown as the following formula (1). Where d is the parallax to be optimized, E_data(d) For data items, E_smooth(d) For the smoothing term, C (d)_p) To match the cost function, S (d)_q,d_p) As a penalty term, W₁(p) and W₂And (p, q) is a weight coefficient obtained according to the semantic category graph, and O (p) is a semantic category corresponding to the pixel point p.

Wherein, the pixel point q is a neighborhood pixel of the pixel point p, namely q belongs to N (p); i is_L(q) is the gray value of q pixel points on the left image, I_R(q, d) is the gray value of the pixel point of the right image corresponding to the q pixel point when the parallax is d

S(d_q,d_p)＝min(K,|d_q-d_p|)，K＞0； (3)

Wherein, pixel point q and pixel point p are two adjacent, equal-position pixel points, and their parallaxes are d respectively_qAnd d_p. Generally, the two pixels can be defaulted to have consistent parallax, and if the parallax is inconsistent, punishment is needed; the formula is used for calculating the absolute value of the parallax between the two, and the value of K is an empirical value larger than 0.

After the energy function is obtained, stereo matching can be performed on each stereo pair and the semantic type corresponding to each stereo pair through the energy function, so as to obtain a disparity map e (d) of each stereo pair.

And secondly, generating single-frame semantization dense point clouds of the left and right images according to the disparity maps of the stereo pairs. Specifically, referring to fig. 4, generating a single-frame semantically dense point cloud of each left and right image according to the disparity map of each stereo pair may include step S410 and step S420, which will be described in detail below.

In step S410, the disparity map of each stereo pair is converted into a three-dimensional point cloud, and semantic categories corresponding to each stereo pair are respectively assigned to three-dimensional points corresponding to each semantic category.

In step S420, the three-dimensional points are filtered by using a preset filtering rule, and a single-frame semantization dense point cloud of each of the left and right images is generated according to the filtered three-dimensional points.

Hereinafter, step S410 and step S420 will be explained and explained. Specifically, the method comprises the steps of firstly converting a disparity map obtained by matching in the previous step into a three-dimensional point cloud by using a point cloud generation algorithm realized in an opencv visual library; then, assigning the semantic category of each pixel to a corresponding three-dimensional point; and finally, filtering the unreliable or dynamic target point clouds belonging to six categories of sky, people, riders, automobiles, motorcycles and bicycles to obtain single-frame semantization dense point clouds.

In step S130, a global semantization dense point cloud of the original image is generated according to each single-frame semantization dense point cloud, and the semantic map is generated according to the global semantization dense point cloud.

In the present exemplary embodiment, first, a global semantically dense point cloud of the original image is generated from each single-frame semantically dense point cloud. Specifically, referring to fig. 5, generating the global semantically-processed dense point cloud of the original image according to each single-frame semantically-processed dense point cloud may include step S510 and step S520, which will be described in detail below.

In step S510, generating a global pose matrix of the vehicle according to the trajectory information of the vehicle acquired by the combined inertial navigation coordinate system; and the combined inertial navigation coordinate system is an inertial sensor/global positioning system combined coordinate system.

In step S520, a global semantization dense point cloud of the original image is generated according to an extrinsic parameter calibration matrix between the left camera and the combined inertial navigation coordinate system in the binocular camera, the global pose matrix, and each of the single-frame semantization dense point clouds.

Hereinafter, step S510 and step S520 will be explained. Specifically, in order to obtain a point cloud map with a global unified coordinate reference, the step transforms the single-frame semantization dense point cloud obtained in the previous step into a GPS/IMU combined inertial navigation coordinate system (inertial sensor/global positioning system combined coordinate system) by using the following formula. Wherein the content of the first and second substances,

generating a single-frame semantization dense point cloud by using a left camera as a coordinate reference according to the stereo pair obtained at the moment (t); t is_ICCalibrating a matrix for an external reference between a left camera and a combined inertial navigation system, G_(t)The global position and pose matrix of the vehicle at the time (t) measured by the combined inertial navigation is acquired,

and transforming the single frame of dense point cloud with the global coordinate reference. Specifically, the method comprises the following steps:

and then, single-frame dense point clouds which are acquired at different moments and have global unified coordinate reference are accumulated in sequence, and finally, large-range semantization dense point clouds covering the whole road scene can be generated, so that the method is used for efficiently generating a high-precision map. And finally, generating a semantic map according to the global semantization dense point cloud.

Hereinafter, the semantic map generation method according to the exemplary embodiment of the present invention will be further explained and explained with reference to fig. 6. Referring to fig. 6, the semantic map generating method may include the steps of:

step S601, acquiring an original image shot by a binocular camera;

step S602, correcting left and right images in an original image to obtain a stereopair;

step S603, carrying out pixel-level semantic segmentation on the stereo image pair to obtain semantic categories;

step S604, stereo matching is carried out on the stereo image pairs and the semantic categories to obtain a disparity map (depth map);

step S605, generating a single-frame semantization dense point cloud according to the semantic category and the disparity map;

step S606, collecting vehicle track information;

step S607, carrying out global projection transformation according to the vehicle track information and the single-frame semantization dense point cloud to generate a global semantization dense point cloud;

step S608, a semantic map is generated according to the global semantization dense point cloud.

In the semantic map generation method provided by the exemplary embodiment of the invention, on one hand, data acquisition can be performed through a binocular camera and a combined pipeline coordinate system without using a laser radar, and the problems that in the prior art, the whole system cannot work normally and the system is difficult to maintain due to the fact that any sensor needs to perform data acquisition through a plurality of sensors are solved;

on the other hand, the problem that a set of reliable acquisition equipment in the prior art needs dozens of yuan or even millions of yuan, wherein the price of a laser radar sensor is always higher, and the mass production of the acquisition device is not facilitated is solved;

on the other hand, the problem that in the prior art, due to the fact that dense three-dimensional point cloud data and image data of a road scene need to be recorded at the same time, a large amount of data can be acquired by collection equipment every minute, and the requirements for data storage and transmission are high is solved;

furthermore, the coordinate conversion is needed only when the single-frame semantization dense point cloud is subjected to global transformation to obtain the global semantization dense point cloud, so that the problem that in the prior art, due to the fact that different sensors have different coordinate references, the relative pose between the sensors needs to be accurately calibrated to obtain a reliable map result is solved. The calibration precision influences the final mapping precision, so that the map precision is low, and the accuracy of the semantic map is improved;

furthermore, the problem that in the prior art, because different algorithms are required to be respectively adopted for processing the laser point cloud data and the image data in the automatic identification process, and the results of the laser point cloud data and the image data are finally required to be fused, the relatively complex processing flow is not beneficial to the improvement of the map generation efficiency is solved; the generation efficiency of semantic maps is improved.

The embodiment of the invention also provides a semantic map generating system. Referring to fig. 7, the semantic map generation system may include a binocular stereo camera 701, a combined navigation system 702, and a processing system 703. Wherein:

the binocular stereo camera 701 may be used to obtain an original image;

the integrated navigation system 702 may be used to collect trajectory information for a vehicle;

a processing system (for example, an image processing system) 703, connected to the binocular stereo camera and the combined inertial navigation system via a network, and configured to perform image rectification on a plurality of left and right images included in an original image to obtain a plurality of stereo pairs, and perform semantic segmentation on each stereo pair to obtain a semantic category corresponding to each stereo pair; performing stereo matching on each stereo pair and the semantic categories corresponding to each stereo pair to obtain a disparity map of each stereo pair, and generating single-frame semantization dense point clouds of each left image and each right image according to the disparity map of each stereo pair; and generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds and the track information of the vehicles, and generating the semantic map according to the global semantic dense point clouds.

The embodiment of the invention also provides a semantic map generating device. Referring to fig. 8, the semantic map generating apparatus may include an image rectification module 810, a stereo matching module 820, and a semantic map generating module 830. Wherein:

the image rectification module 810 may be configured to perform image rectification on a plurality of left and right images included in an original image to obtain a plurality of stereo pairs, and perform semantic segmentation on each stereo pair to obtain a semantic category corresponding to each stereo pair.

The stereo matching module 820 may be configured to perform stereo matching on each stereo pair and a semantic category corresponding to each stereo pair to obtain a disparity map of each stereo pair, and generate a single-frame semantization dense point cloud of each left image and each right image according to the disparity map of each stereo pair.

The semantic map generating module 830 may be configured to generate a global semantically dense point cloud of the original image according to each of the single-frame semantically dense point clouds, and generate the semantic map according to the global semantically dense point cloud.

In an exemplary embodiment of the present disclosure, the semantic map generating apparatus further includes:

the original image acquisition module can be used for acquiring an original image shot by the binocular camera;

performing distortion correction and/or stereo correction on a plurality of left and right images included in the original image; and obtaining a plurality of stereopairs according to the corrected left and right images.

carrying out distortion correction on a plurality of left and right images included in the original image by using a parameter distortion model; wherein the parametric distortion model comprises a plurality of radial distortion parameters and a plurality of tangential distortion parameters.

constructing an energy function according to the weight coefficient corresponding to each semantic category, the matching cost function of the left image and the right image in each stereoscopic image pair and the parallax between a certain pixel point in the left image and a neighborhood pixel point corresponding to the pixel point; and performing stereo matching on each stereo pair and the semantic category corresponding to each stereo pair according to the energy function to obtain a disparity map of each stereo pair.

converting the disparity map of each stereopair into three-dimensional point cloud, and respectively assigning the semantic category corresponding to each stereopair to the three-dimensional point corresponding to each semantic category; and filtering the three-dimensional points by using a preset filtering rule, and generating single-frame semantization dense point clouds of the left image and the right image according to the filtered three-dimensional points.

generating a global pose matrix of the vehicle according to the track information of the vehicle acquired by the combined inertial navigation coordinate system; wherein the combined inertial navigation coordinate system is an inertial sensor/global positioning system combined coordinate system; and generating the global semantization dense point cloud of the original image according to an external parameter calibration matrix between a left camera and a combined inertial navigation coordinate system in the binocular camera, the global pose matrix and each single-frame semantization dense point cloud.

The specific details of each module in the apparatus for generating semantic map have been described in detail in the method for generating corresponding semantic map, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present invention are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 900 according to this embodiment of the invention is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one memory unit 920, and a bus 930 that couples various system components including the memory unit 920 and the processing unit 910.

Wherein the storage unit stores program code that is executable by the processing unit 910 to cause the processing unit 910 to perform steps according to various exemplary embodiments of the present invention described in the above section "exemplary methods" of the present specification. For example, the processing unit 910 may perform step S210 as shown in fig. 2: carrying out image correction on a plurality of left and right images included in an original image to obtain a plurality of stereopair, and carrying out semantic segmentation on each stereopair to obtain a semantic category corresponding to each stereopair; step S220: stereo matching is carried out on each stereo pair and the semantic categories corresponding to each stereo pair to obtain a disparity map of each stereo pair, and single-frame semantization dense point clouds of each left image and each right image are generated according to the disparity map of each stereo pair; step S230: and generating global semantic dense point clouds of the original images according to the single-frame semantic dense point clouds, and generating the semantic map according to the global semantic dense point clouds.

The storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access memory unit (RAM)9201 and/or a cache memory unit 9202, and may further include a read only memory unit (ROM) 9203.

Storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present invention.

In an exemplary embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

According to the program product for realizing the method, the portable compact disc read only memory (CD-ROM) can be adopted, the program code is included, and the program product can be operated on terminal equipment, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A semantic map generation method is characterized by comprising the following steps:

2. The semantic map generating method according to claim 1, further comprising:

acquiring an original image shot by a binocular camera;

3. The semantic map generation method according to claim 2, wherein performing distortion correction on a plurality of left and right images included in the original video includes:

4. The semantic map generation method according to claim 2, wherein the stereo-rectifying a plurality of left and right images included in the original video includes:

5. The method for generating a semantic map according to claim 1, wherein performing stereo matching on each stereo pair and a semantic category corresponding to each stereo pair to obtain a disparity map of each stereo pair comprises:

6. The semantic map generation method of claim 1, wherein generating a single-frame semantically dense point cloud of each of the left and right images from the disparity map of each of the stereo pairs comprises:

7. The semantic map generation method of claim 1, wherein generating the globally semantically dense point cloud of the original image from each of the single-frame semantically dense point clouds comprises:

8. A semantic map generation apparatus, comprising:

9. A semantic map generation system, comprising:

the binocular stereo camera is used for acquiring an original image;

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of generating a semantic map according to any one of claims 1 to 7.

11. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the semantic map generation method of any one of claims 1-7 via execution of the executable instructions.