CN113505725A - Multi-sensor semantic fusion method for two-dimensional voxel level in unmanned driving - Google Patents

Multi-sensor semantic fusion method for two-dimensional voxel level in unmanned driving Download PDF

Info

Publication number
CN113505725A
CN113505725A CN202110839333.3A CN202110839333A CN113505725A CN 113505725 A CN113505725 A CN 113505725A CN 202110839333 A CN202110839333 A CN 202110839333A CN 113505725 A CN113505725 A CN 113505725A
Authority
CN
China
Prior art keywords
dimensional
semantic
grid
map
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110839333.3A
Other languages
Chinese (zh)
Inventor
张雨
陈东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qingzhou Zhihang Technology Co ltd
Original Assignee
Beijing Qingzhou Zhihang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qingzhou Zhihang Technology Co ltd filed Critical Beijing Qingzhou Zhihang Technology Co ltd
Priority to CN202110839333.3A priority Critical patent/CN113505725A/en
Publication of CN113505725A publication Critical patent/CN113505725A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention relates to a two-dimensional voxel-level multi-sensor semantic fusion method in unmanned driving, which comprises the following steps: creating a first two-dimensional voxel map at the initial moment of unmanned driving; carrying out mesh semantic initialization processing; at any time in the unmanned driving, acquiring a previous position, a previous two-dimensional pixel map, a first acceleration vector, a first angular velocity vector and a first data frame; reconstructing a two-dimensional voxel map to generate a current two-dimensional voxel map; generating a first three-dimensional feature point set; generating a first two-dimensional feature map; forming a first comparison semantic array set; performing sensor semantic fusion processing on a first grid semantic array of each first two-dimensional voxel grid in the current two-dimensional voxel map by using a corresponding first ratio semantic array set; and taking the current two-dimensional voxel map after the fusion processing as a first two-dimensional voxel map corresponding to the current moment. By the method, the problem of low manual labeling efficiency is solved, and the labeling precision is improved.

Description

Multi-sensor semantic fusion method for two-dimensional voxel level in unmanned driving
Technical Field
The invention relates to the technical field of data processing, in particular to a two-dimensional voxel level multi-sensor semantic fusion method in unmanned driving.
Background
In the field of unmanned driving, a process of inputting data acquired by a sensor into a target detection or target classification model for calculation to obtain a scene or obstacle classification result is called a semantic extraction process, the obtained scene or obstacle classification result is called semantic information, and the semantic information is brought into an image to label related entities, namely semantic labeling. The purpose of semantic annotation on the images is mainly to meet the data requirements of various target detection, target classification and automatic driving simulation models, and only the images with the semantic annotation can be used as training data of various target detection and target classification models and simulation data of the automatic driving simulation models. With the wide application of various models in the field of unmanned driving and the field of unmanned driving simulation, the demand of various training data and simulation data is very large, and the semantic annotation workload of the corresponding training images and simulation images is also very large. However, most of the current image semantic annotation work is completed by adopting a manual annotation mode or a semi-manual annotation mode, and the current situation obviously cannot meet the current technical development requirement.
Disclosure of Invention
The invention aims to provide a multisensor semantic fusion method, electronic equipment and a computer readable storage medium for two-dimensional voxel level in unmanned driving, and an aerial view with semantic labeling information is continuously and automatically generated in the unmanned driving process, and the semantic labeling granularity of the aerial view is accurate to the two-dimensional voxel level. Therefore, the problem of low labeling efficiency caused by manual labeling can be solved, and the labeling precision can be improved.
In order to achieve the above object, a first aspect of the embodiments of the present invention provides a method for semantic fusion of multiple sensors at two-dimensional voxel level in unmanned driving, where the method includes:
at an initial moment t of unmanned driving0According to a preset first coverage area and a first grid threshold value, a first two-dimensional pixel map which takes the current position of the self-vehicle as a reference point and takes the angle of the self-vehicle as an aerial view angle is created; carrying out grid semantic initialization processing on the first two-dimensional voxel map according to a preset initialization mode; the first two-dimensional voxel map comprises a plurality of first two-dimensional voxel grids; the first two-dimensional voxel grid comprises a first gridA semantic array;
at any time t during unmanned drivingiObtaining and last time ti-1Generating the previous position by the corresponding self-parking position information, and acquiring the previous time ti-1The corresponding first two-dimensional voxel map generates the previous two-dimensional voxel map, i>0; acquiring acceleration information and angular velocity information measured by a vehicle-mounted sensor in real time, and generating a corresponding first acceleration vector and a first angular velocity vector; acquiring 360-degree data frames acquired by each vehicle-mounted second-class sensor in real time, and generating corresponding first data frames;
performing two-dimensional voxel map reconstruction processing according to the first acceleration vector, the first angular velocity vector, the previous position, the current position of the vehicle and the previous two-dimensional voxel map to generate a current two-dimensional voxel map;
inputting each first data frame into a corresponding sensor data pipeline for carrying out three-dimensional data semantic recognition processing, and generating a corresponding first three-dimensional feature point set; performing two-dimensional aerial view conversion processing on each first three-dimensional feature point set to generate a corresponding first two-dimensional feature map; the first two-dimensional feature map comprises a plurality of first two-dimensional feature points; the first two-dimensional feature point comprises a first two-dimensional point semantic array;
according to the corresponding relation of the two-dimensional coordinates of the current two-dimensional voxel map and each first two-dimensional feature map, grouping the first two-dimensional feature points of each first two-dimensional feature map in the same first two-dimensional voxel grid in the current two-dimensional voxel map, and forming a corresponding first comparison semantic array set by the first two-dimensional point semantic arrays of the first two-dimensional feature points in the same group;
performing sensor semantic fusion processing on the first grid semantic array of each first two-dimensional voxel grid in the current two-dimensional voxel map by using the corresponding first comparison semantic array set;
and taking the current two-dimensional voxel map subjected to fusion processing as the current time tiA corresponding first two-dimensional voxel map.
Preferably, the sensor is an inertial navigation sensor, and comprises three groups of mutually orthogonal single-axis gyroscopes and three groups of mutually orthogonal single-axis accelerometers;
the second type of sensor comprises a camera, a laser radar, a millimeter wave radar and an ultrasonic radar.
Preferably, the shape of the first two-dimensional voxel map is M × N, M is the maximum number of columns, and N is the maximum number of rows; the first two-dimensional voxel map comprises M x N of the first two-dimensional voxel grids; the first two-dimensional voxel grid comprises a first grid coordinate and the first grid semantic array; the first grid semantic array comprises a first grid semantic category and a first grid category confidence; the side length of the first two-dimensional voxel grid is the first grid threshold;
the first three-dimensional feature point set comprises a plurality of first three-dimensional feature points; the first three-dimensional feature point comprises a first three-dimensional point coordinate and a first three-dimensional point semantic array; the first three-dimensional point semantic array comprises a first three-dimensional point semantic category and a first three-dimensional point category confidence coefficient;
the first two-dimensional feature map comprises a plurality of the first two-dimensional feature points; the first two-dimensional feature point comprises a first two-dimensional point coordinate and the first two-dimensional point semantic array; the first two-dimensional point semantic array includes a first two-dimensional point semantic category and a first two-dimensional point category confidence.
Preferably, the grid semantic initializing processing on the first two-dimensional voxel map according to a preset initialization mode specifically includes:
identifying the initialization mode;
if the initialization mode is the first mode, acquiring a 360-degree data frame acquired in real time from a preset main sensor in each vehicle-mounted second-class sensor to generate a main sensor data frame; inputting the main sensor data frame into a corresponding sensor data pipeline for semantic recognition processing of three-dimensional data, and generating a corresponding main sensor three-dimensional feature point set; performing two-dimensional aerial view conversion processing on the three-dimensional characteristic point set of the primary sensor to generate an initial two-dimensional characteristic map;
if the initialization mode is the second mode, acquiring 360-degree data frames acquired by each vehicle-mounted second-class sensor in real time, and generating corresponding sensor data frames; inputting each sensor data frame into a corresponding sensor data pipeline to perform three-dimensional data semantic recognition processing, and generating a corresponding sensor three-dimensional feature point set; performing two-dimensional aerial view conversion processing on the three-dimensional characteristic point set of each sensor to generate a corresponding two-dimensional characteristic map of the sensor; performing sensor feature fusion processing on the obtained sensor two-dimensional feature maps to generate an initial two-dimensional feature map;
according to the two-dimensional coordinate corresponding relation between the initial two-dimensional feature map and the first two-dimensional voxel map, using an initial two-dimensional point semantic array of initial two-dimensional feature points of the initial two-dimensional feature map to perform initialization setting on the first grid semantic array of the first two-dimensional voxel grid of the first two-dimensional voxel map;
wherein the initial two-dimensional feature map comprises a plurality of the initial two-dimensional feature points; the initial two-dimensional feature points comprise initial two-dimensional point coordinates and the initial two-dimensional point semantic arrays; the initial two-dimensional point semantic array includes an initial two-dimensional point semantic category and an initial two-dimensional point category confidence.
Preferably, the performing two-dimensional voxel map reconstruction processing according to the first acceleration vector, the first angular velocity vector, the previous position, the current position of the vehicle, and the previous two-dimensional voxel map to generate a current two-dimensional voxel map specifically includes:
establishing a first two-dimensional voxel map with a relative self-vehicle angle as an aerial view angle by taking the current position of the self-vehicle as a reference point as the current two-dimensional voxel map;
dividing the current two-dimensional voxel map into a first grid area and a second grid area; the first grid area is an area where a grid intersection is generated with the previous two-dimensional voxel map; the second grid area is an area without grid intersection with the previous two-dimensional voxel map;
setting the current two-dimensional voxel map; determining the displacement relation between the previous two-dimensional voxel map and the current two-dimensional voxel map according to the first acceleration vector, the first angular velocity vector, the previous position and the current position of the self vehicle; according to the displacement relation, setting a voxel grid of the first grid area by using a corresponding voxel grid in the previous two-dimensional voxel map; setting the first mesh semantic category and the first mesh category confidence of the first mesh semantic array of the first two-dimensional voxel mesh of the second mesh region to null.
Preferably, the performing, by using the corresponding first pair semantic array set, sensor semantic fusion processing on the first grid semantic array of each first two-dimensional voxel grid in the current two-dimensional voxel map specifically includes:
determining whether the first two-dimensional point semantic category of the first pair semantic array set is unique;
if the judgment result is not unique, performing redundant category deletion processing on the first comparison semantic array set; in the first comparison semantic array set, performing summation calculation on the first two-dimensional point category confidence coefficients of all the first two-dimensional point semantic arrays with the same first two-dimensional point semantic category to obtain a plurality of first confidence coefficient summations; taking the first two-dimensional point semantic category corresponding to the largest first confidence summation as a reserved category; deleting the first two-dimensional point semantic array for which the first two-dimensional point semantic category is not the retention category;
if the judgment result is unique, taking the unique first two-dimensional point semantic category in the first comparison semantic array set as the reserved category;
when the first grid semantic category of the first grid semantic array is empty, setting the first grid semantic category by using the reserved category; performing weighted average calculation on all the first two-dimensional point category confidence degrees corresponding to the reserved categories to generate a first average confidence degree; setting the first mesh category confidence for the first mesh semantic array using the first average confidence;
when the first grid semantic category of the first grid semantic array is not empty, identifying whether the reserved category is the same as the first grid semantic category of the first grid semantic array;
if the recognition results are different, setting the first grid semantic category by using the reserved category; performing weighted average calculation on all the first two-dimensional point category confidence degrees corresponding to the reserved categories to generate a second average confidence degree; setting the first mesh category confidence for the first mesh semantic array using the second average confidence;
if the identification results are the same, taking the class confidence of each first two-dimensional point as a first class observation parameter; taking the first grid category confidence as a state parameter at the previous moment; inputting the state parameter of the previous moment and the plurality of first-class observation parameters into a Bayesian filter model fusing a plurality of observations to estimate the state of the current moment, and replacing the first grid class confidence by using the estimated state of the current moment.
A second aspect of an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a transceiver;
the processor is configured to be coupled to the memory, read and execute instructions in the memory, so as to implement the method steps of the first aspect;
the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.
A third aspect of embodiments of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the method of the first aspect.
The embodiment of the invention provides a multi-sensor semantic fusion method of a two-dimensional voxel level in unmanned driving, electronic equipment and a computer readable storage medium. In the unmanned driving process, on one hand, the two-dimensional voxel map is continuously reconstructed according to the pose difference of adjacent sampling moments so as to ensure the scene accuracy of the two-dimensional voxel map output at each moment; on the other hand, a data pipeline (pipeline) is used for carrying out semantic recognition on the collected data of each sensor to obtain a three-dimensional feature point set with semantic features, an overhead view projection is carried out on the three-dimensional feature point set to obtain a two-dimensional feature point set with the semantic features, namely a two-dimensional feature map, the corresponding two-dimensional feature points are used as references to carry out sensor semantic fusion processing according to the coordinate corresponding relation between the two-dimensional voxel grids and the two-dimensional feature points of each sensor, so that the latest semantic information of the two-dimensional voxel grids at the current moment is obtained, and the latest two-dimensional voxel map is used as an output labeling image at the current moment after the semantic updating of all the two-dimensional voxel grids is completed. By the embodiment of the invention, the problem of low labeling efficiency caused by manual labeling is solved, the scene accuracy and the semantic labeling precision of the output semantic labeling graph, namely the two-dimensional voxel graph, are ensured by means of multi-sensor fusion, pose difference adjustment and the like, and the labeling quality is improved.
Drawings
Fig. 1 is a schematic diagram of a semantic fusion method of two-dimensional voxel level multi-sensors in unmanned driving according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the whole unmanned process, the automatic driving system makes an aerial view two-dimensional voxel map of a space with a fixed area around the self-vehicle through the two-dimensional voxel level multi-sensor semantic fusion method in the unmanned driving, and reconstructs the two-dimensional voxel map in real time through the adjacent time position difference so as to ensure the scene accuracy of the image; the method is implemented at the initial moment of the whole process, semantic initialization is carried out on each two-dimensional voxel grid of the two-dimensional voxel map by using a semantic recognition result of data acquired by a sensor, and semantic information updating is carried out on each two-dimensional voxel grid of the two-dimensional voxel map at the current moment by using a semantic fusion processing result of multiple sensors at the middle moment of the whole process, so that the real-time image and semantic annotation precision and annotation quality can be ensured; fig. 1 is a schematic diagram of a semantic fusion method for two-dimensional voxel level multi-sensors in unmanned driving according to an embodiment of the present invention, as shown in fig. 1, the method mainly includes the following steps:
step 1, at the initial time t of unmanned driving0According to a preset first coverage area and a first grid threshold value, a first two-dimensional pixel map which takes the current position of the self-vehicle as a reference point and takes the angle of the self-vehicle as an aerial view angle is created; carrying out grid semantic initialization processing on the first two-dimensional pixel graph according to a preset initialization mode;
the shape of the first two-dimensional pixel graph is MxN, M is the maximum column number, and N is the maximum row number; the first two-dimensional voxel map comprises M x N first two-dimensional voxel grids; the first two-dimensional voxel grid comprises a first grid coordinate and a first grid semantic array; the first grid semantic array comprises a first grid semantic category and a first grid category confidence; the side length of the first two-dimensional voxel grid is a first grid threshold;
here, at the initial time t of unmanned driving0The self vehicle is in a static state by default, and the first two-dimensional voxel map is constructed and initialized in advance when the vehicle is in the static state;
the method specifically comprises the following steps: step 11, creating a first two-dimensional map with a self-vehicle angle as an aerial view angle by taking the current position of the self-vehicle as a reference point according to a preset first coverage area and a first grid threshold;
here, the first two-dimensional pixel map constructed according to the embodiment of the present invention is actually an overhead view, the shape of the overhead view coverage area of the first two-dimensional pixel map is a rectangle with a fixed aspect ratio, the overhead view angle of the first two-dimensional pixel map is an angle from above to below the vehicle, the size of the overhead view coverage area of the first two-dimensional pixel map is determined by a preset first coverage area, the vehicle body of the vehicle is completely within the coverage area, the current position of the vehicle is determined by default to be a rearward approximate point on a lateral center line of the overhead view coverage area, and the current position of the vehicle is vehicle anchor point information obtained by a vehicle positioning device;
different from the conventional overhead image pixel-level data structure, the first two-dimensional pixel map is a two-dimensional voxel-level data structure and consists of M multiplied by N first two-dimensional voxel grids, wherein M is the maximum column number, and N is the maximum row number; each first two-dimensional voxel grid is actually a square, and the actual measuring range corresponding to the unit side length of the first two-dimensional voxel grid is consistent with a preset first grid threshold; thus, the first coverage area is M × N × first grid threshold2(ii) a For example, if the first grid threshold is 0.2M, the size of the first two-dimensional voxel grid is 0.2 × 0.2 — 0.04 square meter, and if the first coverage area is 400 square meters and the fixed aspect ratio is 4/1, then M is 40/0.2 — 200, and N is 10/0.2 — 50, then the first two-dimensional voxel map will include 200 × 50 — 10000 first two-dimensional voxel grids, and M × N × first grid threshold is2200 × 50 × 0.04 ═ 400 ═ first coverage area;
each first two-dimensional voxel grid corresponds to a data set formed by first grid coordinates and a first grid semantic array; the first grid coordinate (x, y) is a two-dimensional grid coordinate, x corresponding to a column coordinate and y corresponding to a row coordinate; the first grid semantic array includes two subdata: a first mesh semantic category and a first mesh category confidence; the first grid semantic category is semantic labeling information of a current two-dimensional voxel grid, such as lane lines, traffic lights, buildings, cars, motorcycles, bicycles, people, animals, and the like; the first grid type confidence coefficient is a semantic confidence coefficient of the current two-dimensional voxel grid, and the confidence coefficient indicates the probability that the semantic features of the current two-dimensional voxel grid are specifically the type pointed by the first grid semantic type, for example, the first grid semantic type is a lane line, and the first grid type confidence coefficient is 0.9, which means that the probability that the image of the current first two-dimensional voxel grid is the lane line is 0.9;
step 12, carrying out mesh semantic initialization processing on the first two-dimensional voxel map according to a preset initialization mode;
here, after the previous step completes the construction of the first two-dimensional voxel map, the first grid semantic array of each first two-dimensional voxel grid in the map is not filled with actual content, and therefore, the first two-dimensional voxel grid is initialized by the present step;
the method specifically comprises the following steps: step 121, identifying an initialization mode; if the initialization mode is the first mode, go to step 122; if the initialization mode is the second mode, go to step 123;
wherein the initialization mode comprises a first mode and a second mode;
here, the embodiment of the present invention provides two initialization methods, which respectively correspond to the first mode and the second mode; for the first mode, one of the sensors is selected as a main sensor, and the grid semantic initialization is performed on the first two-dimensional voxel map by referring to the environment of the main sensor and the detection result of the obstacle; for the second mode, the embodiment of the invention fuses semantic recognition results of a plurality of sensors, and performs grid semantic initialization on the first two-dimensional voxel map according to the fusion result;
step 122, acquiring a 360-degree data frame acquired in real time from a preset main sensor in each vehicle-mounted second-class sensor to generate a main sensor data frame; inputting the main sensor data frame into a corresponding sensor data pipeline for semantic recognition processing of three-dimensional data, and generating a corresponding main sensor three-dimensional feature point set; performing two-dimensional aerial view conversion processing on the three-dimensional characteristic point set of the main sensor to generate an initial two-dimensional characteristic map; go to step 124;
the second sensor comprises a camera, a laser radar, a millimeter wave radar and an ultrasonic radar; the initial two-dimensional feature map comprises a plurality of initial two-dimensional feature points; the initial two-dimensional feature points comprise initial two-dimensional point coordinates and initial two-dimensional point semantic arrays; the initial two-dimensional point semantic array comprises an initial two-dimensional point semantic category and an initial two-dimensional point category confidence coefficient;
here, the second type of sensor is a sensor for identifying scenes and obstacles, and may specifically be a camera, a laser radar, a millimeter wave radar, and an ultrasonic radar;
here, if the initialization mode is set in advance to the first mode, it is necessary to select one sensor from the above-described camera, lidar, millimeter wave radar, and ultrasonic radar as a main sensor in advance, and the camera or the lidar is conventionally preferred; because the area covered by the first two-dimensional pixel map surrounds the periphery of the vehicle body, the data collected by the main sensor, namely the data frame of the main sensor, must be a data frame covering 360 degrees of the periphery of the vehicle;
after the data frame of the main sensor is obtained, the data frame is input into a sensor data pipeline to carry out data cleaning, three-dimensional feature point coordinate conversion and feature identification of three-dimensional feature points, so that a feature point set, namely a three-dimensional feature point set of the main sensor, is obtained, wherein each feature point carries semantic information and confidence coefficient information;
after the three-dimensional characteristic point set of the main sensor is obtained, in order to be convenient for corresponding to a two-dimensional first two-dimensional pixel map, conversion processing of two-dimensional projection coordinates needs to be carried out on three-dimensional point coordinates of the three-dimensional characteristic point set of the main sensor according to an aerial view logic, and an aerial view which can reflect the characteristics of surrounding scenes or barriers, namely an initial two-dimensional characteristic map, is formed by the point set with the converted coordinates;
it should be noted that the coverage area of the initial two-dimensional feature map at least includes the coverage area of the first two-dimensional voxel map, the initial two-dimensional feature map is an aerial view at a pixel level, and the pixel density of the initial two-dimensional feature map is greater than the voxel grid density of the first two-dimensional voxel map; if the initial two-dimensional feature point has a projection relation with the three-dimensional feature point of the main sensor three-dimensional feature point set, the pixel point of the initial two-dimensional feature map carries semantic information and semantic confidence information of the corresponding three-dimensional feature point; the initial two-dimensional point coordinates of the initial two-dimensional feature points are the two-dimensional overhead view image pixel point coordinates, the initial two-dimensional point semantic category of the initial two-dimensional point semantic array is semantic information carried by the pixel point, and the initial two-dimensional point category confidence coefficient is semantic confidence coefficient information carried by the pixel point;
step 123, acquiring 360-degree data frames acquired by each vehicle-mounted second-class sensor in real time, and generating corresponding sensor data frames; inputting each sensor data frame into a corresponding sensor data pipeline to perform three-dimensional data semantic recognition processing, and generating a corresponding sensor three-dimensional feature point set; performing two-dimensional aerial view conversion processing on the three-dimensional characteristic point set of each sensor to generate a corresponding two-dimensional characteristic map of the sensor; performing sensor feature fusion processing on the obtained two-dimensional feature maps of the plurality of sensors to generate an initial two-dimensional feature map;
here, if the initialization mode is preset to the second mode, all the second sensors currently installed in the vehicle are required to be used for synchronous data acquisition; because the area covered by the first two-dimensional pixel map surrounds the periphery of the vehicle body, the data of all the two types of sensors, namely all the sensor data frames, are required to be data frames covering 360 degrees of the periphery of the vehicle;
after all sensor data frames are obtained, inputting each sensor data frame into a corresponding sensor data pipeline for data cleaning, three-dimensional feature point coordinate conversion and feature identification of three-dimensional feature points, thereby obtaining a corresponding feature point set, namely a sensor three-dimensional feature point set, wherein each feature point of the feature point set carries semantic information and confidence information;
after obtaining all the three-dimensional characteristic point sets of the sensors, in order to be corresponding to a two-dimensional first two-dimensional pixel map, conversion processing of two-dimensional projection coordinates needs to be carried out on three-dimensional point coordinates of each three-dimensional characteristic point set of the sensors according to an aerial view logic, and an aerial view which can reflect the characteristics of surrounding scenes or barriers, namely a two-dimensional characteristic map of the sensors, is formed by the point sets with the converted coordinates;
here, if two-dimensional feature points of each sensor two-dimensional feature map are all placed in the same two-dimensional coordinate, we can find that a large number of redundant feature points exist, so after obtaining a plurality of sensor two-dimensional feature maps, sensor feature fusion processing needs to be performed on the sensor two-dimensional feature maps, where the sensor fusion is the post-fusion or detection layer fusion under the conventional condition, and the corresponding processing methods are various and include a weighted average method, a bayesian estimation method, a D-S evidence theory method, a kalman filtering method and an artificial intelligence model processing method, in the embodiment of the present invention, the corresponding method is selected for processing according to the quantity and the data distribution features, and the processing result, that is, the two-dimensional feature map with the multi-sensor feature fusion is completed, is used as an initial two-dimensional feature map; the data structures of the output initial two-dimensional feature map and the initial two-dimensional feature map output in step 122 are consistent, and further description is omitted here;
step 124, according to the two-dimensional coordinate corresponding relation between the initial two-dimensional feature map and the first two-dimensional voxel map, using the initial two-dimensional point semantic array of the initial two-dimensional feature points of the initial two-dimensional feature map to perform initialization setting on the first grid semantic array of the first two-dimensional voxel grid of the first two-dimensional voxel map.
Here, the reason why the initial two-dimensional feature map obtained in step 122 or 123 includes the first two-dimensional voxel map is that the overlooking coverage area of the initial two-dimensional feature map is larger than the voxel grid density of the first two-dimensional voxel map; therefore, if the initial two-dimensional feature map and the first two-dimensional voxel map are subjected to image overlapping according to the same coordinate system, the initial two-dimensional feature points are found to fall into each first two-dimensional voxel grid; according to the corresponding relation, the initial two-dimensional feature points in the same first two-dimensional voxel grid are grouped into one group; the semantic categories of the initial two-dimensional feature points in the same group, namely the semantic categories of the initial two-dimensional feature points, may be all the same, may be partially different or may be all different;
after the corresponding grouping is completed, the first grid semantic array of the first two-dimensional voxel grid corresponding to the current grouping is set according to the maximum confidence rule, specifically: the method comprises the steps of obtaining a plurality of initial two-dimensional point confidence coefficient sums by performing summation calculation on all initial two-dimensional point category confidence coefficients corresponding to the same semantic category (initial two-dimensional point semantic category) in a group, taking the initial two-dimensional point confidence coefficient sum with the largest value as the semantic category of a current first two-dimensional voxel grid, namely a first grid semantic category of a first grid semantic array, performing weighted average calculation on all initial two-dimensional point category confidence coefficients of the semantic category in the current group, which are the same as the first grid semantic category, and taking the obtained weighted average value as the semantic confidence coefficient of the current first two-dimensional voxel grid, namely the first grid category confidence coefficient of the first grid semantic array.
After the initialization of the first two-dimensional map at the initial moment of the unmanned driving is completed through the step 1, the updating of the first two-dimensional map in the unmanned driving process can be processed through the subsequent steps 2-7.
Step 2, at any time t during unmanned drivingiObtaining and last time ti-1Generating the previous position by the corresponding self-parking position information, and acquiring the previous time ti-1The corresponding first two-dimensional voxel map generates the previous two-dimensional voxel map, i>0; acquiring acceleration information and angular velocity information measured by a vehicle-mounted sensor in real time, and generating a corresponding first acceleration vector and a first angular velocity vector; acquiring 360-degree data frames acquired by each vehicle-mounted second-class sensor in real time, and generating corresponding first data frames;
one type of sensor is an inertial navigation sensor and comprises three groups of mutually orthogonal single-axis gyroscopes and three groups of mutually orthogonal single-axis accelerometers.
One type of sensor is an inertial navigation sensor which is mainly used for detecting and measuring acceleration, inclination, impact, vibration, rotation angle, angular velocity and multiple-degree-of-freedom motion states, and the inertial navigation sensor consists of a gyroscope and an accelerometer; in the unmanned driving process, the semantic updating of the first two-dimensional pixel map is processed in a multi-sensor fusion mode, so that all the two types of sensors currently installed on the self-vehicle are required to be used for synchronous data acquisition, and the data of all the two types of sensors, namely all the first data frames, surround the periphery of the self-vehicle body, so that the data of all the two types of sensors are required to be data frames covering 360 degrees of the periphery of the self-vehicle.
Step 3, performing two-dimensional voxel map reconstruction processing according to the first acceleration vector, the first angular velocity vector, the previous position, the current position of the vehicle and the previous two-dimensional voxel map to generate a current two-dimensional voxel map;
in the process of driving, the information of scenes and obstacles around the front moment and the back moment of the self-vehicle changes, but the continuity of the front frame and the back frame does not change suddenly, even the content of the front frame and the back frame has a large amount of overlapped content, the relative displacement of the overlapped content of the front frame and the back frame in the image is related to the acceleration and the angular velocity, and the relative displacement and the angular velocity are called as pose difference parameters; based on the principle, when the first two-dimensional voxel map at the current moment is generated, namely the current two-dimensional voxel map, the embodiment of the invention reconstructs the two-dimensional feature map at the previous moment, namely the previous two-dimensional voxel map, according to the pose difference of the previous moment and the next moment to obtain the current two-dimensional voxel map;
the method specifically comprises the following steps: step 31, establishing a first two-dimensional voxel map with a relative self-vehicle angle as an aerial view angle by taking the current position of the self-vehicle as a reference point as a current two-dimensional voxel map;
here, like step 11, further description is not provided herein;
step 32, dividing the current two-dimensional voxel map into a first grid area and a second grid area;
the first grid area is an area which generates grid intersection with a previous two-dimensional voxel map; the second grid area is an area without grid intersection with the previous two-dimensional voxel map;
here, the previous position and the current position of the vehicle are the vehicle positioning point information obtained by the vehicle positioning device, so that the coordinates of the previous position and the current position of the vehicle are both regarded as the same map coordinate system, and the coordinate system is used as a reference to mark four vertex angles of the previous two-dimensional voxel map and the current two-dimensional voxel map in the coordinate system, so that an intersection area of two rectangles can be obtained, and a first grid area and a second grid area can be obtained;
step 33, setting the current two-dimensional voxel map;
the method specifically comprises the following steps: determining the displacement relation between the previous two-dimensional voxel map and the current two-dimensional voxel map according to the first acceleration vector, the first angular velocity vector, the previous position and the current position of the vehicle; according to the displacement relation, setting the voxel grid of the first grid area by using the corresponding voxel grid in the previous two-dimensional voxel map; and setting the first grid semantic category and the first grid category confidence coefficient of the first grid semantic array of the first two-dimensional voxel grid of the second grid area to be null.
Here, when the current two-dimensional voxel map is set, the first mesh region and the second mesh region are set, respectively;
when a first grid area is set, the displacement relation between a previous two-dimensional voxel map and a current two-dimensional voxel map needs to be determined, the displacement relation is that a first acceleration vector, a first angular velocity vector, a previous position and a current position of a self-vehicle are used as estimation parameters, a relative displacement path S at the front moment and the rear moment is estimated according to the relation between a position difference and the relative displacement path, reverse displacement of the displacement path S is carried out to obtain a reverse relative displacement path S ', a first grid of the previous two-dimensional voxel map in an intersection area is used as a displacement starting grid, the coordinate of the first grid is marked as a first coordinate, displacement is carried out from the first coordinate in the previous two-dimensional voxel map according to the reverse relative displacement path S' to obtain a displacement ending grid, and the displacement ending grid coordinate is a second coordinate; this means that the first grid in the intersection area in the previous two-dimensional voxel map, the corresponding grid in the current two-dimensional voxel map should be the second grid with the second coordinate; after finding the corresponding grid of the first grid, namely the second grid, copying the first grid semantic array of the first grid to the first grid semantic array of the second grid;
when the second mesh area is set, semantic information of the area is not included in the previous two-dimensional voxel map, and is set to be null.
Step 4, inputting each first data frame into a corresponding sensor data pipeline for carrying out three-dimensional data semantic recognition processing, and generating a corresponding first three-dimensional feature point set; performing two-dimensional aerial view conversion processing on each first three-dimensional feature point set to generate a corresponding first two-dimensional feature map;
the method specifically comprises the following steps: step 41, inputting each first data frame into a corresponding sensor data pipeline for semantic recognition processing of three-dimensional data, and generating a corresponding first three-dimensional feature point set;
the first three-dimensional feature point set comprises a plurality of first three-dimensional feature points; the first three-dimensional feature point comprises a first three-dimensional point coordinate and a first three-dimensional point semantic array; the first three-dimensional point semantic array comprises a first three-dimensional point semantic category and a first three-dimensional point category confidence coefficient;
after all the first data frames are obtained, firstly, inputting each first data frame into a corresponding sensor data pipeline for data cleaning and three-dimensional coordinate conversion processing;
the sensor data pipelines are different according to different sensor types, for example, a camera data pipeline corresponds to a camera, a laser radar data pipeline corresponds to a laser radar, a millimeter wave radar data pipeline corresponds to a millimeter wave radar, and an ultrasonic radar data pipeline corresponds to an ultrasonic radar;
when data cleaning and three-dimensional coordinate conversion processing are carried out on each sensor data pipeline, data noise reduction and redundancy removal processing are carried out according to the data characteristics of the corresponding sensor, and then three-dimensional coordinate conversion processing is carried out on the cleaned data according to the conversion relation between the coordinate system of the corresponding sensor and the appointed three-dimensional coordinate system to obtain a corresponding three-dimensional point set; for example, if the three-dimensional coordinate system is a world three-dimensional coordinate system, then for a laser radar data pipeline, a millimeter wave radar data pipeline, or an ultrasonic radar data pipeline, conversion from a spherical three-dimensional coordinate system or a cylindrical three-dimensional coordinate system to the world three-dimensional coordinate system needs to be performed; for a camera data pipeline, when three-dimensional coordinate conversion is carried out, depth estimation is carried out on a two-dimensional image to obtain the depth value of each pixel point, and then the two-dimensional image coordinate system of the pixel points combined with the depth values is converted into a world three-dimensional coordinate system;
after completing data cleaning and three-dimensional coordinate conversion, each sensor data pipeline continues to perform semantic segmentation and target identification processing on the obtained three-dimensional point set, and performs semantic annotation processing on the three-dimensional point set according to a target identification result, so that a semantically annotated three-dimensional point set, namely a first three-dimensional feature point set is obtained, and each first three-dimensional feature point of the feature point set carries a group of semantic information and confidence information, namely a semantic category of the first three-dimensional point and a confidence coefficient of the first three-dimensional point semantic array besides a group of three-dimensional coordinates, namely the first three-dimensional point coordinate;
step 42, performing two-dimensional aerial view image conversion processing on each first three-dimensional feature point set to generate a corresponding first two-dimensional feature image;
the method specifically comprises the following steps: step 421, initializing a first two-dimensional feature map;
wherein the first two-dimensional feature map comprises a plurality of first two-dimensional feature points; the first two-dimensional feature point comprises a first two-dimensional point coordinate and a first two-dimensional point semantic array; the first two-dimensional point semantic array comprises a first two-dimensional point semantic category and a first two-dimensional point category confidence coefficient;
here, if the three-dimensional coordinates of the current first three-dimensional feature point set consist of three dimensions, i.e., length, width, and height, the maximum value H in the two dimensions, i.e., length and width, is takenmaxAnd WmaxGenerating a shape H of the first two-dimensional feature mapmax×WmaxCalibrating the two-dimensional aerial view coordinate of the first two-dimensional characteristic map according to an image coordinate calibration method; it should be noted that the coverage area of the first two-dimensional feature map at least includes the coverage area of the first two-dimensional voxel map, and the first two-dimensional feature map is an aerial view of the pixel level, and the pixel density of the first two-dimensional feature map should be greater than the voxel grid density of the first two-dimensional voxel map; the pixel point of the first two-dimensional characteristic diagram isIf the first two-dimensional feature point has a projection relation with the three-dimensional feature points of the first three-dimensional feature point set, the semantic class of the first two-dimensional point and the confidence coefficient of the class of the first two-dimensional point carried by the pixel point are consistent with the corresponding first three-dimensional feature point;
step 422, according to the projection relation between the three-dimensional feature point coordinates and the two-dimensional aerial view coordinate, performing two-dimensional aerial view projection coordinate conversion processing on the first three-dimensional point coordinates of each first three-dimensional feature point in the first three-dimensional feature point set to obtain corresponding first two-dimensional projection coordinates;
here, the first two-dimensional projection coordinates are overhead view projection coordinates of each first three-dimensional feature point;
step 423, screening all the first three-dimensional feature points, and if the first two-dimensional projection coordinates of a plurality of first three-dimensional feature points are the same, reserving only the first three-dimensional feature point of which the height value of the three-dimensional coordinate is the maximum according to the overlook view logic;
here, during the projection conversion of the overhead view, a plurality of first three-dimensional feature points may exist above the same two-dimensional coordinate, and only the uppermost first three-dimensional feature point can be actually monitored according to the viewing angle of the overhead view, so that only the first three-dimensional feature point with the largest height value is reserved in the embodiment of the present invention; through the screening in the step, only one first three-dimensional characteristic point is corresponding to each two-dimensional projection coordinate;
step 424, polling each first two-dimensional projection coordinate; taking a first three-dimensional feature point uniquely corresponding to a current first two-dimensional projection coordinate as a current first three-dimensional feature point, taking a first two-dimensional feature point uniquely corresponding to the current first two-dimensional projection coordinate as a current first two-dimensional feature point, setting a first two-dimensional point semantic category of the current first two-dimensional feature point by using a first three-dimensional point semantic category of the current first three-dimensional feature point, and setting a first two-dimensional point category confidence coefficient of the current first two-dimensional feature point by using a first three-dimensional point category confidence coefficient of the current first three-dimensional feature point;
here, the current step is a processing step of copying the semantic features of the first three-dimensional feature point to the projected first two-dimensional feature point, and it is ensured that the semantic features of the first three-dimensional feature point are consistent with the semantic features of the overhead projection point, that is, the first two-dimensional feature point.
And 5, grouping the first two-dimensional feature points of each first two-dimensional feature map in the same first two-dimensional voxel grid in the current two-dimensional voxel map into a group according to the corresponding relation of the two-dimensional coordinates of the current two-dimensional voxel map and each first two-dimensional feature map, and forming a corresponding first comparison semantic array set by the first two-dimensional point semantic arrays of the first two-dimensional feature points in the same group.
Here, the overlooking coverage area of the first two-dimensional feature map is known to include the first two-dimensional voxel map according to the cause of the first two-dimensional feature map, and the pixel density of the first two-dimensional feature map is greater than the voxel grid density of the first two-dimensional voxel map; therefore, if each first two-dimensional feature map and the first two-dimensional voxel map are overlapped according to the same coordinate system, we find that the first two-dimensional feature points fall into each first two-dimensional voxel grid; according to the corresponding relation, the first two-dimensional feature points in the same first two-dimensional voxel grid are grouped into one group, and the first two-dimensional point semantic array of the first two-dimensional feature points in the same group is extracted to be used as a first comparison semantic array set; here, the first two-dimensional point semantic categories of the respective first two-dimensional point semantic arrays of the first pair semantic array set may be all the same or partially different or all the different.
Step 6, performing sensor semantic fusion processing on a first grid semantic array of each first two-dimensional voxel grid in the current two-dimensional voxel map by using a corresponding first ratio semantic array set;
the method specifically comprises the following steps: step 61, judging whether the semantic category of the first two-dimensional point of the first comparison semantic array set is unique; if the judgment result is not unique, go to step 62; if the judgment result is unique, go to step 63;
step 62, carrying out redundant category deletion processing on the first comparison semantic array set; in the first comparison semantic array set, performing sum calculation on first two-dimensional point category confidence coefficients of all first two-dimensional point semantic arrays with the same first two-dimensional point semantic category to obtain a plurality of first confidence coefficient sums; taking the semantic category of the first two-dimensional point corresponding to the maximum first confidence sum as a reserved category; deleting the first two-dimensional point semantic array of which the first two-dimensional point semantic category is not the reserved category; go to step 64;
here, as described above, the first two-dimensional point semantic arrays of each first two-dimensional point semantic array of the first comparative semantic array set may be partially different or may be completely different, in both cases, the redundant category deletion is performed on the first two-dimensional point semantic array of the first comparative semantic array set according to the maximum confidence rule, specifically, a plurality of first confidence sums are obtained by summing all the first two-dimensional point category confidences corresponding to the same semantic category (first two-dimensional point semantic category) in the group, the first two-dimensional point semantic category corresponding to the first confidence sum with the largest value is used as the retention category of the current first comparative semantic array set, and the deletion operation is performed on the other first two-dimensional point semantic arrays whose semantic categories are not consistent with the retention categories;
step 63, taking the only first two-dimensional point semantic category in the first comparison semantic array set as a reserved category;
here, as described above, the first two-dimensional point semantic categories of the first two-dimensional point semantic arrays of the first pair semantic array set may be all the same, in which case, redundant category deletion is not required for the first pair semantic array set, and only the unified category is used as the retained category label;
step 64, when the first grid semantic category of the first grid semantic array is empty, setting the first grid semantic category by using a reserved category; performing weighted average calculation on all the first two-dimensional point category confidence degrees corresponding to the reserved categories to generate a first average confidence degree; setting a first grid category confidence coefficient of the first grid semantic array by using the first average confidence coefficient;
here, the semantic type of the first mesh is null, which indicates that the current first two-dimensional voxel mesh is actually the mesh in the second mesh region in step 32, and as can be seen from step 33, the meshes in the second mesh region are all initialized to be null; for the grid without semantic information at the previous moment, setting a first grid semantic category of the current grid by directly using a sensor semantic set, namely a maximum category of a first comparison semantic array set, namely a retention category, and setting a first grid category confidence coefficient of the current grid by using a weighted average value of first two-dimensional point category confidence coefficients of the retention category, namely a first average confidence coefficient;
step 65, when the first grid semantic category of the first grid semantic array is not empty, identifying whether the reserved category is the same as the first grid semantic category of the first grid semantic array; if the identification results are not the same, go to step 66; if the recognition results are the same, go to step 67;
here, if the semantic type of the first mesh is not null, it indicates that the current first two-dimensional voxel mesh is the mesh in the first mesh region in step 32, and as can be seen from step 33, the meshes in the first mesh region all have semantic information of the previous time; comparing the semantic category of the previous moment with the maximum semantic category of the sensor at the current moment, and if the semantic category of the previous moment is different from the maximum semantic category of the sensor at the current moment, turning to step 66; if the two are the same, go to step 67;
step 66, setting the semantic category of the first grid by using the reserved category; performing weighted average calculation on all the first two-dimensional point category confidence degrees corresponding to the reserved categories to generate a second average confidence degree; setting a first grid category confidence coefficient of the first grid semantic array by using the second average confidence coefficient; turning to step 7;
here, the semantic category of the current grid at the previous time is inconsistent with the semantic category predicted by the sensor data at the current time, that is, the retention category, which indicates that the quality change of the entity object in the coverage area of the current first two-dimensional voxel grid occurs, and this situation is particularly common when the entity object is close to a pedestrian passageway, no people pass in the previous second, people group occurs in the next second, and in addition, people, animals, motor vehicles and the like which suddenly cross the self-vehicle driving passageway are also common; in this case, the data priority rule set by the embodiment of the present invention is: the reserved category predicted by the real-time data of the sensor is taken as priority; therefore, the first grid semantic category of the current grid is set directly by using the reserved category, and the first grid category confidence of the current grid is set by using the weighted average value of the first two-dimensional point category confidence of the reserved category, namely the second average confidence;
step 67, using the category confidence of each first two-dimensional point as a first category observation parameter; taking the confidence coefficient of the first grid category as a state parameter at the previous moment; and inputting the state parameters of the previous moment and a plurality of first-class observation parameters into a Bayesian filter model fusing a plurality of observations to estimate the state of the current moment, and replacing the first grid class confidence by using the estimated state of the current moment.
Here, the semantic category of the current grid at the previous time is consistent with the semantic category predicted by the sensor data at the current time, that is, the retention category, which indicates that the entity object in the coverage area of the current first two-dimensional voxel grid has no qualitative change; in this case, the semantic recognition probability of each sensor, that is, the first two-dimensional point category confidence coefficient, is used as the observation, the semantic recognition probability at the previous moment, that is, the first grid category confidence coefficient which is not updated is used as the state at the previous moment, and a Bayesian filter model fusing multiple observations is input for operation, so that the state at the current moment can be obtained; the state at the current moment is actually the latest probability that the grid semantic category is still the reserved category at the current moment, so the confidence coefficient of the first grid category is replaced by using the state at the current moment.
Step 7, taking the current two-dimensional voxel map after fusion processing as the current time tiA corresponding first two-dimensional voxel map.
At this time, the automatic driving system is immediately followed by the next time ti+1When the first two-dimensional map is output, the steps 2-7 are executed in a circulating mode until the automatic driving system stops outputting the first two-dimensional map.
Fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention. The electronic device may be the terminal device or the server, or may be a terminal device or a server connected to the terminal device or the server and implementing the method according to the embodiment of the present invention. As shown in fig. 2, the electronic device may include: a processor 301 (e.g., a CPU), a memory 302, a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transceiving operation of the transceiver 303. Various instructions may be stored in memory 302 for performing various processing functions and implementing the processing steps described in the foregoing method embodiments. Preferably, the electronic device according to an embodiment of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to implement communication connections between the elements. The communication port 306 is used for connection communication between the electronic device and other peripherals.
The system bus mentioned in fig. 2 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM) and may also include a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a Graphics Processing Unit (GPU), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It should be noted that the embodiment of the present invention also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the method and the processing procedure provided in the above-mentioned embodiment.
The embodiment of the present invention further provides a chip for executing the instructions, where the chip is configured to execute the processing steps described in the foregoing method embodiment.
The embodiment of the invention provides a multi-sensor semantic fusion method of a two-dimensional voxel level in unmanned driving, electronic equipment and a computer readable storage medium. In the unmanned driving process, on one hand, the two-dimensional voxel map is continuously reconstructed according to the pose difference of adjacent sampling moments so as to ensure the scene accuracy of the two-dimensional voxel map output at each moment; on the other hand, a data pipeline is used for carrying out semantic recognition on the collected data of each sensor to obtain a three-dimensional feature point set with semantic features, an aerial view projection is carried out on the three-dimensional feature point set to obtain a two-dimensional feature point set with the semantic features, namely a two-dimensional feature map, the corresponding two-dimensional feature points are used as references to carry out sensor semantic fusion processing according to the coordinate corresponding relation between the two-dimensional voxel grids and the two-dimensional feature points of each sensor, the latest semantic information of the two-dimensional voxel grids at the current moment is obtained, and the latest two-dimensional voxel map is used as an output labeling image at the current moment after semantic updating of all the two-dimensional voxel grids is completed. By the embodiment of the invention, the problem of low labeling efficiency caused by manual labeling is solved, the scene accuracy and the semantic labeling precision of the output semantic labeling graph, namely the two-dimensional voxel graph, are ensured by means of multi-sensor fusion, pose difference adjustment and the like, and the labeling quality is improved.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method for two-dimensional voxel-level multi-sensor semantic fusion in unmanned driving, the method comprising:
at an initial moment t of unmanned driving0According to a preset first coverage area and a first grid threshold value, a first two-dimensional pixel map which takes the current position of the self-vehicle as a reference point and takes the angle of the self-vehicle as an aerial view angle is created; carrying out grid semantic initialization processing on the first two-dimensional voxel map according to a preset initialization mode; the first two-dimensional voxel map comprises a plurality of first two-dimensional voxel grids; the first two-dimensional voxel grid comprises a first grid semantic array;
in any one of unmanned drivingTime tiObtaining and last time ti-1Generating the previous position by the corresponding self-parking position information, and acquiring the previous time ti-1The corresponding first two-dimensional voxel map generates the previous two-dimensional voxel map, i>0; acquiring acceleration information and angular velocity information measured by a vehicle-mounted sensor in real time, and generating a corresponding first acceleration vector and a first angular velocity vector; acquiring 360-degree data frames acquired by each vehicle-mounted second-class sensor in real time, and generating corresponding first data frames;
performing two-dimensional voxel map reconstruction processing according to the first acceleration vector, the first angular velocity vector, the previous position, the current position of the vehicle and the previous two-dimensional voxel map to generate a current two-dimensional voxel map;
inputting each first data frame into a corresponding sensor data pipeline for carrying out three-dimensional data semantic recognition processing, and generating a corresponding first three-dimensional feature point set; performing two-dimensional aerial view conversion processing on each first three-dimensional feature point set to generate a corresponding first two-dimensional feature map; the first two-dimensional feature map comprises a plurality of first two-dimensional feature points; the first two-dimensional feature point comprises a first two-dimensional point semantic array;
according to the corresponding relation of the two-dimensional coordinates of the current two-dimensional voxel map and each first two-dimensional feature map, grouping the first two-dimensional feature points of each first two-dimensional feature map in the same first two-dimensional voxel grid in the current two-dimensional voxel map, and forming a corresponding first comparison semantic array set by the first two-dimensional point semantic arrays of the first two-dimensional feature points in the same group;
performing sensor semantic fusion processing on the first grid semantic array of each first two-dimensional voxel grid in the current two-dimensional voxel map by using the corresponding first comparison semantic array set;
and taking the current two-dimensional voxel map subjected to fusion processing as the current time tiA corresponding first two-dimensional voxel map.
2. The method of unmanned two-dimensional voxel-level multi-sensor semantic fusion according to claim 1,
the first-class sensor is an inertial navigation sensor and comprises three groups of mutually orthogonal single-axis gyroscopes and three groups of mutually orthogonal single-axis accelerometers;
the second type of sensor comprises a camera, a laser radar, a millimeter wave radar and an ultrasonic radar.
3. The method of unmanned two-dimensional voxel-level multi-sensor semantic fusion according to claim 1,
the shape of the first two-dimensional voxel map is MxN, M is the maximum column number, and N is the maximum row number; the first two-dimensional voxel map comprises M x N of the first two-dimensional voxel grids; the first two-dimensional voxel grid comprises a first grid coordinate and the first grid semantic array; the first grid semantic array comprises a first grid semantic category and a first grid category confidence; the side length of the first two-dimensional voxel grid is the first grid threshold;
the first three-dimensional feature point set comprises a plurality of first three-dimensional feature points; the first three-dimensional feature point comprises a first three-dimensional point coordinate and a first three-dimensional point semantic array; the first three-dimensional point semantic array comprises a first three-dimensional point semantic category and a first three-dimensional point category confidence coefficient;
the first two-dimensional feature map comprises a plurality of the first two-dimensional feature points; the first two-dimensional feature point comprises a first two-dimensional point coordinate and the first two-dimensional point semantic array; the first two-dimensional point semantic array includes a first two-dimensional point semantic category and a first two-dimensional point category confidence.
4. The method for semantic fusion of the two-dimensional voxel level multi-sensor in the unmanned aerial vehicle according to claim 3, wherein the mesh semantic initialization processing is performed on the first two-dimensional voxel map according to a preset initialization mode, and specifically comprises:
identifying the initialization mode;
if the initialization mode is the first mode, acquiring a 360-degree data frame acquired in real time from a preset main sensor in each vehicle-mounted second-class sensor to generate a main sensor data frame; inputting the main sensor data frame into a corresponding sensor data pipeline for semantic recognition processing of three-dimensional data, and generating a corresponding main sensor three-dimensional feature point set; performing two-dimensional aerial view conversion processing on the three-dimensional characteristic point set of the primary sensor to generate an initial two-dimensional characteristic map;
if the initialization mode is the second mode, acquiring 360-degree data frames acquired by each vehicle-mounted second-class sensor in real time, and generating corresponding sensor data frames; inputting each sensor data frame into a corresponding sensor data pipeline to perform three-dimensional data semantic recognition processing, and generating a corresponding sensor three-dimensional feature point set; performing two-dimensional aerial view conversion processing on the three-dimensional characteristic point set of each sensor to generate a corresponding two-dimensional characteristic map of the sensor; performing sensor feature fusion processing on the obtained sensor two-dimensional feature maps to generate an initial two-dimensional feature map;
according to the two-dimensional coordinate corresponding relation between the initial two-dimensional feature map and the first two-dimensional voxel map, using an initial two-dimensional point semantic array of initial two-dimensional feature points of the initial two-dimensional feature map to perform initialization setting on the first grid semantic array of the first two-dimensional voxel grid of the first two-dimensional voxel map;
wherein the initial two-dimensional feature map comprises a plurality of the initial two-dimensional feature points; the initial two-dimensional feature points comprise initial two-dimensional point coordinates and the initial two-dimensional point semantic arrays; the initial two-dimensional point semantic array includes an initial two-dimensional point semantic category and an initial two-dimensional point category confidence.
5. The method according to claim 3, wherein the generating a current two-dimensional voxel map by performing two-dimensional voxel map reconstruction processing according to the first acceleration vector, the first angular velocity vector, the previous position, the current position of the vehicle, and the previous two-dimensional voxel map specifically comprises:
establishing a first two-dimensional voxel map with a relative self-vehicle angle as an aerial view angle by taking the current position of the self-vehicle as a reference point as the current two-dimensional voxel map;
dividing the current two-dimensional voxel map into a first grid area and a second grid area; the first grid area is an area where a grid intersection is generated with the previous two-dimensional voxel map; the second grid area is an area without grid intersection with the previous two-dimensional voxel map;
setting the current two-dimensional voxel map; determining the displacement relation between the previous two-dimensional voxel map and the current two-dimensional voxel map according to the first acceleration vector, the first angular velocity vector, the previous position and the current position of the self vehicle; according to the displacement relation, setting a voxel grid of the first grid area by using a corresponding voxel grid in the previous two-dimensional voxel map; setting the first mesh semantic category and the first mesh category confidence of the first mesh semantic array of the first two-dimensional voxel mesh of the second mesh region to null.
6. The method according to claim 3, wherein the performing the sensor semantic fusion processing on the first grid semantic array of each first two-dimensional voxel grid in the current two-dimensional voxel map by using the corresponding first comparison semantic array set specifically comprises:
determining whether the first two-dimensional point semantic category of the first pair semantic array set is unique;
if the judgment result is not unique, performing redundant category deletion processing on the first comparison semantic array set; in the first comparison semantic array set, performing summation calculation on the first two-dimensional point category confidence coefficients of all the first two-dimensional point semantic arrays with the same first two-dimensional point semantic category to obtain a plurality of first confidence coefficient summations; taking the first two-dimensional point semantic category corresponding to the largest first confidence summation as a reserved category; deleting the first two-dimensional point semantic array for which the first two-dimensional point semantic category is not the retention category;
if the judgment result is unique, taking the unique first two-dimensional point semantic category in the first comparison semantic array set as the reserved category;
when the first grid semantic category of the first grid semantic array is empty, setting the first grid semantic category by using the reserved category; performing weighted average calculation on all the first two-dimensional point category confidence degrees corresponding to the reserved categories to generate a first average confidence degree; setting the first mesh category confidence for the first mesh semantic array using the first average confidence;
when the first grid semantic category of the first grid semantic array is not empty, identifying whether the reserved category is the same as the first grid semantic category of the first grid semantic array;
if the recognition results are different, setting the first grid semantic category by using the reserved category; performing weighted average calculation on all the first two-dimensional point category confidence degrees corresponding to the reserved categories to generate a second average confidence degree; setting the first mesh category confidence for the first mesh semantic array using the second average confidence;
if the identification results are the same, taking the class confidence of each first two-dimensional point as a first class observation parameter; taking the first grid category confidence as a state parameter at the previous moment; inputting the state parameter of the previous moment and the plurality of first-class observation parameters into a Bayesian filter model fusing a plurality of observations to estimate the state of the current moment, and replacing the first grid class confidence by using the estimated state of the current moment.
7. An electronic device, comprising: a memory, a processor, and a transceiver;
the processor is used for being coupled with the memory, reading and executing the instructions in the memory to realize the method steps of any one of claims 1-6;
the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.
8. A computer-readable storage medium having stored thereon computer instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-6.
CN202110839333.3A 2021-07-23 2021-07-23 Multi-sensor semantic fusion method for two-dimensional voxel level in unmanned driving Withdrawn CN113505725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110839333.3A CN113505725A (en) 2021-07-23 2021-07-23 Multi-sensor semantic fusion method for two-dimensional voxel level in unmanned driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110839333.3A CN113505725A (en) 2021-07-23 2021-07-23 Multi-sensor semantic fusion method for two-dimensional voxel level in unmanned driving

Publications (1)

Publication Number Publication Date
CN113505725A true CN113505725A (en) 2021-10-15

Family

ID=78014436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110839333.3A Withdrawn CN113505725A (en) 2021-07-23 2021-07-23 Multi-sensor semantic fusion method for two-dimensional voxel level in unmanned driving

Country Status (1)

Country Link
CN (1) CN113505725A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117932548A (en) * 2024-03-25 2024-04-26 西北工业大学 Confidence-based flight obstacle recognition method based on multi-source heterogeneous sensor fusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117932548A (en) * 2024-03-25 2024-04-26 西北工业大学 Confidence-based flight obstacle recognition method based on multi-source heterogeneous sensor fusion
CN117932548B (en) * 2024-03-25 2024-06-04 西北工业大学 Confidence-based flight obstacle recognition method based on multi-source heterogeneous sensor fusion

Similar Documents

Publication Publication Date Title
JP6831414B2 (en) Methods for positioning, devices, devices and computers for positioning Readable storage media
US10229363B2 (en) Probabilistic inference using weighted-integrals-and-sums-by-hashing for object tracking
CN111192295B (en) Target detection and tracking method, apparatus, and computer-readable storage medium
CN109435955B (en) Performance evaluation method, device and equipment for automatic driving system and storage medium
CN111027401A (en) End-to-end target detection method with integration of camera and laser radar
CN113009506B (en) Virtual-real combined real-time laser radar data generation method, system and equipment
US11608058B2 (en) Method of and system for predicting future event in self driving car (SDC)
RU2750243C2 (en) Method and system for generating a trajectory for a self-driving car (sdc)
RU2744012C1 (en) Methods and systems for automated determination of objects presence
CN111257882B (en) Data fusion method and device, unmanned equipment and readable storage medium
US20230237210A1 (en) 3d multi-object simulation
WO2024012211A1 (en) Autonomous-driving environmental perception method, medium and vehicle
CN114325634A (en) Method for extracting passable area in high-robustness field environment based on laser radar
US8823555B2 (en) Apparatus for displaying terrain on a display apparatus of an airborne vehicle
CN115147328A (en) Three-dimensional target detection method and device
CN113505725A (en) Multi-sensor semantic fusion method for two-dimensional voxel level in unmanned driving
EP3734226A1 (en) Methods and systems for determining trajectory estimation order for vehicles
CN114419573A (en) Dynamic occupancy grid estimation method and device
CN112507891A (en) Method and device for automatically identifying high-speed intersection and constructing intersection vector
CN115965847A (en) Three-dimensional target detection method and system based on multi-modal feature fusion under cross view angle
CN115359332A (en) Data fusion method and device based on vehicle-road cooperation, electronic equipment and system
CN112651405B (en) Target detection method and device
CN115240168A (en) Perception result obtaining method and device, computer equipment and storage medium
WO2022133986A1 (en) Accuracy estimation method and system
CN114648639A (en) Target vehicle detection method, system and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20211015

WW01 Invention patent application withdrawn after publication