CN114332796A - Multi-sensor fusion voxel characteristic map generation method and system - Google Patents

Multi-sensor fusion voxel characteristic map generation method and system Download PDF

Info

Publication number
CN114332796A
CN114332796A CN202111597823.3A CN202111597823A CN114332796A CN 114332796 A CN114332796 A CN 114332796A CN 202111597823 A CN202111597823 A CN 202111597823A CN 114332796 A CN114332796 A CN 114332796A
Authority
CN
China
Prior art keywords
point cloud
voxel
mapping
characteristic
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111597823.3A
Other languages
Chinese (zh)
Inventor
孔德明
李晓伟
曹尚杰
张文宇
沈阅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Yandayan Soft Information System Co ltd
QINHUANGDAO PORT CO Ltd
Yanshan University
Original Assignee
Hebei Yandayan Soft Information System Co ltd
QINHUANGDAO PORT CO Ltd
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Yandayan Soft Information System Co ltd, QINHUANGDAO PORT CO Ltd, Yanshan University filed Critical Hebei Yandayan Soft Information System Co ltd
Priority to CN202111597823.3A priority Critical patent/CN114332796A/en
Publication of CN114332796A publication Critical patent/CN114332796A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Optical Radar Systems And Details Thereof (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a multi-sensor fusion voxel characteristic map generation method and a multi-sensor fusion voxel characteristic map generation system. Meanwhile, the method provided by the invention integrates the characteristics extracted by two sensor characteristic extraction networks, constructs the environment information characteristics with richer information, and improves the accuracy of three-dimensional target detection.

Description

Multi-sensor fusion voxel characteristic map generation method and system
Technical Field
The invention relates to the technical field of computer vision and automatic driving, in particular to a method and a system for generating a multi-sensor fusion voxel characteristic map.
Background
In the past decade, many scholars have been conducting research on computer vision technology and have achieved fruitful research results, and as one of the important applications of computer vision technology, environmental awareness technology in the field of automatic driving has been widely studied in recent years and has become a research focus. An automatic driving automobile is usually provided with a camera and a laser radar sensor so as to respectively obtain a high-resolution image and a laser radar point cloud of the surrounding environment, wherein the high-resolution image consists of pixels which are regularly and densely distributed and comprises color information; the lidar point cloud is composed of discrete points which are irregular and sparsely distributed in space, and comprises position information. These two kinds of information are the most common two kinds of input information in the context-aware technology, and most computer vision tasks are based on these two kinds of information. In the field of computer vision, a convolutional neural network is a common and effective technology, a deep learning network can be constructed by using the convolutional neural network, and data containing various information is input into the deep learning network, so that characteristic information in the input data is extracted and some visual tasks, such as three-dimensional target detection, semantic segmentation, instance segmentation and the like in an automatic driving scene, are completed.
Currently, in practical applications, existing high-performance three-dimensional target detection methods generally only use point cloud data as input, but such methods lack RGB channel information values in image information, and both the RGB information in an image and three-dimensional structure information contained in a point cloud are important features for identifying an object. There are also multi-modal fusion methods that typically fuse point cloud feature maps and image feature maps in a pixel-to-pixel fashion, but such fusion methods are limited by the alignment between feature maps and feature maps, and as a result, the accuracy of single-modal methods is not high.
Therefore, how to design a multi-sensor fusion voxel characteristic map generation method and system capable of improving the three-dimensional target detection accuracy rate becomes a technical problem to be solved in the field.
Disclosure of Invention
The invention aims to provide a method and a system for generating a multi-sensor fusion voxel characteristic map, which can improve the accuracy of three-dimensional target detection.
In order to achieve the purpose, the invention provides the following scheme:
a multi-sensor fused voxel feature map generation method, the method comprising the steps of:
acquiring a laser radar point and an original image;
mapping pixels of the original image to corresponding laser radar points according to the mapping relation between the original image and the laser radar points to obtain first mapping data;
mapping the first mapping data to pixels of a forward-looking image according to the mapping relation between the laser radar point and the forward-looking image to obtain second mapping data;
carrying out sparse coding on the second mapping data to generate a lightweight sparse image;
performing voxelization on the laser radar point to obtain a three-dimensional voxel of the laser radar point;
carrying out feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain pixel features and voxel features;
coding the pixel characteristics and the voxel characteristics to corresponding point cloud positions to obtain pixel point cloud characteristics and voxel point cloud characteristics;
performing feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature;
and performing inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
Optionally, mapping the pixels of the original image to corresponding lidar points according to the mapping relationship between the original image and the lidar points to obtain first mapping data, which specifically includes:
establishing a first mapping relation between the original image and the laser radar points according to a calibration matrix between a camera and the laser radar; the first mapping relationship is:
Figure BDA0003431947350000021
wherein x is a coordinate on a depth dimension of the point cloud, y is a coordinate on a width dimension of the point cloud, z is a coordinate on a height dimension of the point cloud, ur is an abscissa of a pixel in the original image, vr is an ordinate of a pixel in the original image, I is an internal reference matrix, and E is an external reference matrix.
Optionally, mapping the first mapping data to pixels of a forward-looking image according to a mapping relationship between the lidar point and the forward-looking image to obtain second mapping data, specifically including:
establishing a second mapping relation between the laser radar point and the forward-looking image by utilizing a spherical projection principle; the second mapping relation is as follows:
Figure BDA0003431947350000031
wherein u is an abscissa of a pixel in the forward-looking image, v is an ordinate of a pixel in the forward-looking image, x is a coordinate in a depth dimension of the point cloud, y is a coordinate in a width dimension of the point cloud, z is a coordinate in a height dimension of the point cloud, r is a reflection intensity, w is a width of the forward-looking image, h is a height of the forward-looking image, fovdownThe vertical view angle below the laser radar is shown, and fov is the vertical view angle of the laser radar;
and mapping the first mapping data to a forward-looking image pixel according to the second mapping relation to obtain second mapping data.
Optionally, performing sparse coding on the second mapping data to generate a lightweight sparse image, specifically including:
judging whether any mapping data in the second mapping data is obtained by mapping a plurality of mapping data in the first mapping data, and obtaining a first judgment result;
if the first judgment result is yes, taking an average value of the plurality of mapping data in the first mapping data as a code value of any one mapping data in the second mapping data;
judging whether any mapping data in the second mapping data is obtained by mapping one mapping data in the first mapping data to obtain a second judgment result;
if the second judgment result is yes, taking the corresponding mapping data in the first mapping data as the coding value of any mapping data in the second mapping data;
and if the first judgment result and the second judgment result are both negative, not coding.
Optionally, the voxelization of the laser radar point is performed to obtain a three-dimensional voxel of the laser radar point, and the method specifically includes:
judging whether any voxel in the voxels contains a plurality of laser radar points to obtain a third judgment result;
if the third judgment result is yes, using the average value of the plurality of laser radar points as the coding value of any voxel in the voxels;
judging whether any voxel in the voxels contains one laser radar point or not to obtain a fourth judgment result;
if the fourth judgment result is yes, using the corresponding laser radar point as the coding value of any one of the voxels;
and if the third judgment result and the fourth judgment result are both negative, not coding.
Optionally, performing feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain a pixel feature and a voxel feature, specifically including:
respectively constructing a two-dimensional sparse convolution feature extraction network and a three-dimensional sparse convolution feature extraction network;
performing feature extraction on the lightweight sparse image by using the feature extraction network of the two-dimensional sparse convolution to obtain pixel features;
and performing feature extraction on the three-dimensional voxels by using the feature extraction network of the three-dimensional sparse convolution to obtain voxel features.
Optionally, the pixel feature and the voxel feature are encoded to a corresponding point cloud position to obtain a pixel point cloud feature and a voxel point cloud feature, and the method specifically includes:
encoding the pixel characteristics to a point cloud position by using a quadratic linear interpolation algorithm based on a four-near neighborhood to obtain the pixel point cloud characteristics;
and coding the voxel characteristic to a point cloud position by using a cubic linear interpolation algorithm based on an inverse distance weight method to obtain the voxel point cloud characteristic.
Optionally, feature fusion is performed on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature, which specifically includes:
processing the pixel point cloud feature by using a first full-connection block to obtain a one-dimensional pixel point cloud feature; the first full connection block comprises three full connection layers, two batch normalization layers and two ReLU activation function layers;
processing the voxel point cloud characteristics by using a second full connecting block to obtain one-dimensional voxel point cloud characteristics; the second full connection block comprises three full connection layers, two batch normalization layers and two ReLU activation function layers;
processing the one-dimensional pixel point cloud characteristic and the one-dimensional voxel point cloud characteristic by using a sigmoid function to obtain a pixel point cloud characteristic weight and a voxel point cloud characteristic weight; the sigmoid function is as follows:
Figure BDA0003431947350000051
wherein, wppAs pixel point cloud feature weights, wpvThe weight of the voxel point cloud characteristic is obtained,
Figure BDA0003431947350000052
is a one-dimensional pixel point cloud characteristic,
Figure BDA0003431947350000053
is a one-dimensional voxel point cloud characteristic;
fusing the pixel point cloud characteristic and the voxel point cloud characteristic by using the pixel point cloud characteristic weight and the voxel point cloud characteristic weight to obtain a point cloud fused characteristic; the fusion expression is:
ffuse=[fpp(1+wpp),fpv(1+wpv)]
wherein f isfuseAs a point cloud fusion feature, fppAs a pixel point cloud feature, fpvIs a voxel point cloud characteristic.
Optionally, the point cloud fusion features are subjected to inverse mapping to obtain a multi-sensor fusion voxel feature map, which specifically includes:
using a cubic linear interpolation method to reversely map the point cloud fusion feature to the position of a non-empty voxel in the voxel feature to obtain a replacement point cloud fusion feature;
and replacing the non-empty voxel characteristics of the corresponding positions in the voxel characteristics by the replacement point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
The invention also provides a multi-sensor fusion voxel characteristic map generation system, which comprises:
the data acquisition module is used for acquiring laser radar points and original images;
the first mapping module is used for mapping the pixels of the original image to corresponding laser radar points according to the mapping relation between the original image and the laser radar points to obtain first mapping data;
the second mapping module is used for mapping the first mapping data to pixels of a forward-looking image according to the mapping relation between the laser radar point and the forward-looking image to obtain second mapping data;
the sparse coding module is used for carrying out sparse coding on the second mapping data to generate a lightweight sparse image;
the voxelization module is used for voxelizing the laser radar point to obtain a three-dimensional voxel of the laser radar point;
the characteristic extraction module is used for carrying out characteristic extraction on the lightweight sparse image and the three-dimensional voxel to obtain pixel characteristics and voxel characteristics;
the point cloud characteristic acquisition module is used for coding the pixel characteristic and the voxel characteristic to corresponding point cloud positions to obtain the pixel point cloud characteristic and the voxel point cloud characteristic;
the characteristic fusion module is used for carrying out characteristic fusion on the pixel point cloud characteristic and the voxel point cloud characteristic to obtain a point cloud fusion characteristic;
and the reflection module is used for carrying out inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the method, the mapping relation between the original image and the laser radar point and the mapping relation between the laser radar point and the foresight image are established, the light sparse image is generated, and a foundation is provided for the rapid processing of image information. Meanwhile, the method provided by the invention integrates the characteristics extracted by two sensor characteristic extraction networks, constructs the environment information characteristics with richer information, and improves the accuracy of three-dimensional target detection.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a multi-sensor fusion voxel feature map generation method according to embodiment 1 of the present invention;
fig. 2 is a raw image of sample number 7312 in the KITTI autopilot dataset;
FIG. 3 is a laser radar spot for sample number 7312 in the KITTI autonomous driving dataset;
fig. 4 is a lightweight sparse image of sample number 7312 in the KITTI autonomous driving dataset;
FIG. 5 is a diagram of a feature fusion network architecture;
FIG. 6 is a schematic diagram of multi-sensor fusion voxel feature map generation.
Fig. 7 is a structural diagram of a multi-sensor fusion voxel feature map generation system according to embodiment 2 of the present invention.
Description of the symbols:
1. a data acquisition module; 2. a first mapping module; 3. a second mapping module; 4. a sparsification coding module; 5. a voxelization module; 6. a feature extraction module; 7. a point cloud feature acquisition module; 8. a feature fusion module; 9. and a reflection module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a method and a system for generating a multi-sensor fusion voxel characteristic map, which can improve the accuracy of three-dimensional target detection.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1:
referring to fig. 1, the present invention provides a method for generating a multi-sensor fused voxel feature map, which includes the following steps:
s1: acquiring a laser radar point and an original image;
s2: mapping pixels of the original image to corresponding laser radar points according to the mapping relation between the original image and the laser radar points to obtain first mapping data;
s3: mapping the first mapping data to pixels of a forward-looking image according to the mapping relation between the laser radar point and the forward-looking image to obtain second mapping data;
s4: carrying out sparse coding on the second mapping data to generate a lightweight sparse image;
s5: performing voxelization on the laser radar point to obtain a three-dimensional voxel of the laser radar point;
s6: carrying out feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain pixel features and voxel features;
s7: coding the pixel characteristics and the voxel characteristics to corresponding point cloud positions to obtain pixel point cloud characteristics and voxel point cloud characteristics;
s8: performing feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature;
s9: and performing inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
In step S2, mapping the pixels of the original image to corresponding lidar points according to the mapping relationship between the original image and the lidar points to obtain first mapping data, which specifically includes:
establishing a first mapping relation between the original image and the laser radar points according to a calibration matrix between a camera and the laser radar; the first mapping relationship is:
Figure BDA0003431947350000081
wherein x is a coordinate on a depth dimension of the point cloud, y is a coordinate on a width dimension of the point cloud, z is a coordinate on a height dimension of the point cloud, ur is an abscissa of a pixel in the original image, vr is an ordinate of a pixel in the original image, I is an internal reference matrix, and E is an external reference matrix.
In step S3, mapping the first mapping data to pixels of a forward-looking image according to a mapping relationship between the lidar point and the forward-looking image, to obtain second mapping data, specifically including:
establishing a second mapping relation between the laser radar point and the forward-looking image by utilizing a spherical projection principle; the second mapping relation is as follows:
Figure BDA0003431947350000082
wherein u is an abscissa of a pixel in the forward-looking image, v is an ordinate of a pixel in the forward-looking image, x is a coordinate in a depth dimension of the point cloud, y is a coordinate in a width dimension of the point cloud, z is a coordinate in a height dimension of the point cloud, r is a reflection intensity, w is a width of the forward-looking image, h is a height of the forward-looking image, fovdownThe vertical view angle below the laser radar is shown, and fov is the vertical view angle of the laser radar;
and mapping the first mapping data to a forward-looking image pixel according to the second mapping relation to obtain second mapping data.
In step S4, the sparse coding is performed on the second mapping data to generate a lightweight sparse image, specifically including:
judging whether any mapping data in the second mapping data is obtained by mapping a plurality of mapping data in the first mapping data, and obtaining a first judgment result;
if the first judgment result is yes, taking an average value of the plurality of mapping data in the first mapping data as a code value of any one mapping data in the second mapping data;
judging whether any mapping data in the second mapping data is obtained by mapping one mapping data in the first mapping data to obtain a second judgment result;
if the second judgment result is yes, taking the corresponding mapping data in the first mapping data as the coding value of any mapping data in the second mapping data;
and if the first judgment result and the second judgment result are both negative, not coding.
In step S5, the voxelizing the lidar point to obtain a three-dimensional voxel of the lidar point specifically includes:
judging whether any voxel in the voxels contains a plurality of laser radar points to obtain a third judgment result;
if the third judgment result is yes, using the average value of the plurality of laser radar points as the coding value of any voxel in the voxels;
judging whether any voxel in the voxels contains one laser radar point or not to obtain a fourth judgment result;
if the fourth judgment result is yes, using the corresponding laser radar point as the coding value of any one of the voxels;
and if the third judgment result and the fourth judgment result are both negative, not coding.
In step S6, performing feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain a pixel feature and a voxel feature, specifically including:
respectively constructing a two-dimensional sparse convolution feature extraction network and a three-dimensional sparse convolution feature extraction network;
performing feature extraction on the lightweight sparse image by using the feature extraction network of the two-dimensional sparse convolution to obtain pixel features;
and performing feature extraction on the three-dimensional voxels by using the feature extraction network of the three-dimensional sparse convolution to obtain voxel features.
In step S7, the pixel feature and the voxel feature are encoded to a corresponding point cloud position to obtain a pixel point cloud feature and a voxel point cloud feature, which specifically include:
encoding the pixel characteristics to a point cloud position by using a quadratic linear interpolation algorithm based on a four-near neighborhood to obtain the pixel point cloud characteristics;
and coding the voxel characteristic to a point cloud position by using a cubic linear interpolation algorithm based on an inverse distance weight method to obtain the voxel point cloud characteristic.
In step S8, performing feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature, which specifically includes:
processing the pixel point cloud feature by using a first full-connection block to obtain a one-dimensional pixel point cloud feature; the first full connection block comprises three full connection layers, two batch normalization layers and two ReLU activation function layers;
processing the voxel point cloud characteristics by using a second full connecting block to obtain one-dimensional voxel point cloud characteristics; the second full connection block comprises three full connection layers, two batch normalization layers and two ReLU activation function layers;
processing the one-dimensional pixel point cloud characteristic and the one-dimensional voxel point cloud characteristic by using a sigmoid function to obtain a pixel point cloud characteristic weight and a voxel point cloud characteristic weight; the sigmoid function is as follows:
Figure BDA0003431947350000101
wherein, wppAs pixel point cloud feature weights, wpvThe weight of the voxel point cloud characteristic is obtained,
Figure BDA0003431947350000102
is a one-dimensional pixel point cloud characteristic,
Figure BDA0003431947350000103
is a one-dimensional voxel point cloud characteristic;
fusing the pixel point cloud characteristic and the voxel point cloud characteristic by using the pixel point cloud characteristic weight and the voxel point cloud characteristic weight to obtain a point cloud fused characteristic; the fusion expression is:
ffuse=[fpp(1+wpp),fpv(1+wpv)]
wherein f isfuseAs a point cloud fusion feature, fppAs a pixel point cloud feature, fpvIs a voxel point cloud characteristic.
In step S9, inverse mapping is performed on the point cloud fusion features to obtain a multi-sensor fusion voxel feature map, which specifically includes:
using a cubic linear interpolation method to reversely map the point cloud fusion feature to the position of a non-empty voxel in the voxel feature to obtain a replacement point cloud fusion feature;
and replacing the non-empty voxel characteristics of the corresponding positions in the voxel characteristics by the replacement point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
After step S9, the method further includes: and sending the obtained multi-sensor fusion voxel characteristic graph into a subsequent network to perform a three-dimensional target detection task. In this embodiment, the subsequent network refers to the network structure of the PV-RCNN or other network structures (such as the SA-SSD). That is, the generated multi-sensor fusion voxel feature map can be connected to many existing network structures (as long as the multi-sensor fusion voxel feature map is based on the voxel feature map).
According to the method, the mapping relation between the original image and the laser radar point and the mapping relation between the laser radar point and the foresight image are established, the light sparse image is generated, and a foundation is provided for the rapid processing of image information. Meanwhile, the method provided by the invention integrates the characteristics extracted by two sensor characteristic extraction networks, constructs the environment information characteristics with richer information, and improves the accuracy of three-dimensional target detection.
To facilitate an understanding of the present invention, the present invention is described below with reference to fig. 2 to 6.
The method comprises the following steps of firstly, acquiring three-dimensional laser radar point cloud data of the surrounding environment and a picture of the surrounding environment.
In this embodiment, the method of the present invention is described in detail by using a KITTI autopilot data set which is most commonly used and authoritative in the autopilot field, wherein data of the data set is acquired by 2 grayscale cameras (0 and 1), two color cameras (2 and 3) and a laser radar, and the 0 camera is a reference camera. An iteration cycle of data is illustrated by taking a color camera 2 original image of a 7312 sample of a training set in a KITTI automatic driving data set and corresponding lidar point cloud data as an example, the resolution of the original image is 1242 × 375, and 465750 pixels are total, as shown in FIG. 2. The lidar point cloud data within the corresponding range contains 18656 points, as shown in fig. 3.
Establishing a mapping relation between an original image and a laser radar point, and defining P: { Pi=(xi,yi,zi,ri) And i is 1,2,3, …,18656, where x is the coordinate in the depth dimension of the point cloud, y is the coordinate in the width dimension of the point cloud, z is the coordinate in the height dimension of the point cloud, and r is the reflection intensity corresponding to the point. p is a radical ofiPoints in the laser radar point cloud, RP: { RPiThe color camera image processing method based on the color camera image processing includes the steps that (ur, vr), i is 1,2,3, …, 465750} pixels in a No. 2 color camera original image, ur is an abscissa of the pixels in the original image, and vr is an ordinate of the pixels in the original image. The mapping relationship between the original image pixels captured by the color camera No. 2 and the lidar points can be represented by equation (1).
And step two, establishing a mapping relation between the original image pixels and the laser radar points and a mapping relation between the laser radar points and the forward-looking image.
Figure BDA0003431947350000111
Wherein the reference matrix I of the No. 2 color camera2Comprises the following steps:
Figure BDA0003431947350000112
modified rotation matrix R of No. 0 gray-scale camera0Comprises the following steps:
Figure BDA0003431947350000113
the external reference matrix Tr _ velo _ to _ cam for mapping the laser radar points to the 0 # gray-scale camera is as follows:
Figure BDA0003431947350000121
and then, establishing a mapping relation between the laser radar point and the forward-looking image. The upper vertical viewing angle of the lidar is fovdownSetting the width and height w and h of the generated front view to 512 and 48 respectively at 0.43rad and fov rad and 0.47rad, can obtain the mapping relationship between the lidar point and the front view pixel as shown in equation (5).
Figure BDA0003431947350000122
And step three, mapping pixels in the original image to corresponding laser radar points, further mapping the pixels to pixels of the generated foresight image for sparse coding, generating a lightweight sparse image, and simultaneously converting the original point cloud into three-dimensional voxels.
Mapping color information in the original image pixel to the laser radar point according to the mapping relation between the original image pixel and the laser radar point shown in the formula (1), wherein the characteristic of the laser radar point is updated to be P: { Pi=(rdi,gri,bli) And i is 1,2,3, …, N, where rd, gr, and bl are the red channel value, the green channel value, and the blue channel value of the pixel in the original image corresponding to the lidar point, respectively. And mapping the laser radar points to the forward-looking image pixels according to the mapping relation between the laser radar points and the forward-looking image pixels shown in the formula (5), and performing sparse coding. The foresight image pixel sparsification coding follows the following rules: for the condition that a plurality of laser radar points are mapped to the same forward-looking image pixel, the average value of the characteristics of the laser radar points is used as the coding value of the pixel; for the condition that one laser radar point is mapped to one forward-looking image pixel, using the characteristic value of the corresponding laser radar point as the coding value of the pixel; for forward-looking image pixels where there is no corresponding lidar pointThe encoding is not performed, and the sparsity is maintained. Through the sparse coding operation, the front-view image is converted into a light-weight sparse image, and color information in the original high-resolution image is transferred to the generated light-weight sparse image. 7312 sample lightweight sparse images are shown in fig. 4, where black pixels are uncoded pixels, and it can be seen from the figure that compared with a dense front-view image without sparse coding, some uncoded pixels exist in the sparse image with sparse coding, and therefore, less memory is occupied.
The point cloud space is defined to range between [0, -40, -1] meters and [70.4,40,3] meters in the depth, width and height dimensions, and the voxel size is [0.05,0.05,0.1] meters in the depth, width and height dimensions, respectively, resulting in a three-dimensional voxel with a resolution of 1408 x 1600 x 40. Similar to the lightweight sparse image generation, if a voxel contains a plurality of lidar points, the average value of these points is used as the code value of the voxel, and if the voxel does not contain a lidar point, the coding is not performed. And comprehensively considering the calculation consumption, and keeping a maximum of 16000 non-empty three-dimensional voxels.
And step four, performing feature extraction on the lightweight sparse image and the three-dimensional voxels by using a feature extraction network to obtain pixel features and voxel features, and encoding the two features on the point cloud to obtain two point cloud features.
TABLE 1 voxel characteristic extraction network architecture and parameters thereof
Figure BDA0003431947350000131
Constructing a feature extraction network based on sparse convolution to perform feature extraction on the generated lightweight sparse image and three-dimensional voxel to obtain a pixel feature map with a down-sampling scale of 8
Figure BDA0003431947350000132
And a voxel characteristic map
Figure BDA0003431947350000133
Wherein the characteristics are respectively
Figure BDA0003431947350000134
And
Figure BDA0003431947350000135
the structure of the feature extraction network and its parameters are shown in table 1, where for the extraction of pixel features and voxel features, two-dimensional sparse convolution and three-dimensional sparse convolution are respectively adopted, and after each sparse convolution or manifold sparse convolution, batch normalization operation is performed and a ReLU activation function is used for activation.
Figure BDA0003431947350000136
Representing FMpIs a two-dimensional set of pixels of size 8 x 64, which is sparse as previously described,
Figure BDA0003431947350000137
representing FMpThe number of non-empty pixels is 348, and each pixel is characterized by 64 characteristic channels.
Figure BDA0003431947350000138
Representing FMvIs a three-dimensional set of voxels of size 5 x 200 x 176, which is sparse as described previously (i.e., empty and non-empty voxels of these 5 x 200 x 176 voxels),
Figure BDA0003431947350000139
representing FMvThe number of non-empty voxels in the medium is 10632, and each voxel is characterized by 64 characteristic channels.
Encoding pixel features to point cloud by using quadratic linear interpolation algorithm based on four-near neighborhood to obtain pixel point cloud features
Figure BDA0003431947350000141
The calculation process is shown in formula (6), wherein pul,pur,pblAnd pbrRepresenting a coding point, piE.g. four sparse pixels of the neighborhood around P, the subscripts x and y represent the coordinates in the width and height directions of the pixel feature map, respectively.
Figure BDA0003431947350000142
Encoding the voxel characteristics to a point cloud position by using a cubic linear interpolation algorithm based on an inverse distance weight method to obtain the voxel point cloud characteristics
Figure BDA0003431947350000143
The calculation is shown in equation (7), where j is the index of the voxel, fvjIs a feature of the jth voxel, wj(pi) For the j' th voxel and the encoding point p calculated based on the inverse distance weighting methodiE is the weight between P, the weight is calculated by formula (8), ritpSearch radius, η (v) for inverse distance weightingj) And η (p)i) The three-dimensional coordinates of the jth voxel and the ith point respectively.
Figure BDA0003431947350000144
Figure BDA0003431947350000145
And step five, sending the two point cloud characteristics into a characteristic fusion network to obtain point cloud fusion characteristics.
Fig. 5 shows the constructed feature fusion network, the parameters of the full connection layer are shown in table 2, and the full connection blocks 1 and 2 are composed of three full connection layers, two batch normalization layers and two ReLU activation function layers. Firstly, point cloud pixel characteristics and point cloud pixel characteristics are processed by using two full-connection blocks to obtain one-dimensional characteristics sum of point clouds
Figure BDA0003431947350000146
And
Figure BDA0003431947350000147
TABLE 2 feature fusion network architecture and parameters thereof
Full connection layer characteristic input and output dimensions
Full connecting block 1 (64,64)
Full connecting block 2 (64,64)
Then, processing the obtained point cloud one-dimensional features by using a sigmoid function, compressing the one-dimensional features to be between [0,1] to obtain point cloud weight, wherein the sigmoid function is shown as a formula (9):
Figure BDA0003431947350000151
wherein, WppPointing to cloud pixel feature weights, WpvAnd pointing out the cloud pixel characteristic weight.
Finally, the point cloud weight is utilized to fuse the two point cloud characteristics to obtain a point cloud fusion characteristic ffuseThe point cloud fusion feature expression is shown as formula (10):
ffuse=[fpp(1+wpp),fpv(1+wpv)] (10)
and step six, obtaining a multi-sensor fusion voxel characteristic diagram from the point cloud fusion characteristics and sending the multi-sensor fusion voxel characteristic diagram into a subsequent network for a three-dimensional target detection task.
Inverse mapping of point cloud fusion features to the output of a feature extraction network using an inverse distance weight based cubic linear interpolation method as shown in FIG. 6 (downsampling)Voxel feature map FM with dimension 8v) The calculation process for the positions of the non-empty voxels is shown in equation (11), where i is the index of the point, and w isi(vj) For the ith point and voxel v calculated based on the inverse distance weighting methodj∈FMvThe weight is calculated by formula (12), ritpSearch radius, η (v) for inverse distance weightingj) And η (p)i) The three-dimensional coordinates of the jth voxel and the ith point respectively.
Figure BDA0003431947350000152
Figure BDA0003431947350000153
From f'fuseReplacement voxel profile FMvCharacteristics of non-empty voxels
Figure BDA0003431947350000154
And generating a multi-sensor fusion voxel characteristic map. And finally, accessing the characteristic diagram to a subsequent network for carrying out a target detection task. f'fuseTo reflect to FMvVoxel characteristics of non-empty voxel locations.
All experiments in this example were performed on the same experimental platform (Intel Jordan RTX-2080Ti display card, 64GB memory). The training set and the verification set are obtained from training samples of the KITTI public data set, wherein 3712 training samples are used in the training set, 3769 training samples are used in the verification set, and vehicles in the verification set are divided into three difficulties, namely simple difficulty, medium difficulty and difficulty according to the size of a vehicle surrounding frame and the shielded degree. The training batch size is 2, the learning rate is 0.00125, and the training period is 50. In the embodiment, two sets of experiments are performed, one set is an original PV-RCNN target detection network experiment, and the other set replaces the voxel characteristic diagram generation network in the original PV-RCNN network with the multi-sensor fusion voxel characteristic diagram generation network to perform three-dimensional target detection, and the results are shown in table 3. Table 3 evaluates the car detection result at the three-dimensional viewing angle using the average accuracy, and if the intersection between the finally detected car bounding box and the true value is greater than 70%, the detection is considered to be correct, otherwise, the detection is wrong.
As shown in table 3, the average accuracy of simple difficult vehicles is slightly reduced by the method of the present invention, but the detection accuracy of the most important and most frequently used medium difficult vehicle in the three-dimensional target detection evaluation is effectively improved, and the method of the present invention also has a beneficial effect on the detection of difficult vehicles. On the whole, by using the method disclosed by the invention, the three-dimensional target detection accuracy of the PV-RCNN target detection network is improved by 0.19%, and the effectiveness of the method disclosed by the invention is verified.
TABLE 3 PV-RCNN and target detection results of the method of the invention
Figure BDA0003431947350000161
In conclusion, the method establishes the mapping relation between the original image pixels and the laser radar points and the mapping relation between the laser radar points and the foresight image pixels, generates the lightweight sparse image, and provides a basis for the rapid processing of image information. Meanwhile, the method provided by the invention integrates the features extracted by two sensor features to extract the network extracted features, constructs the environment information features with richer information, and provides a good foundation for subsequent computer vision tasks. The method disclosed by the invention uses a linear interpolation method in the point cloud characteristic coding process and the multi-sensor fusion voxel characteristic map generating process, and has higher calculation efficiency compared with the existing point set abstract algorithm. Compared with the method using the original image as input, the characteristic diagram generated by the method disclosed by the invention is fused with additional three-dimensional coordinate information; compared with the method using point cloud as input, the method of the invention fuses additional image RGB information.
Example 2:
referring to fig. 7, the present invention provides a multi-sensor fused voxel feature map generating system, which includes:
the data acquisition module 1 is used for acquiring laser radar points and original images;
the first mapping module 2 is configured to map pixels of the original image to corresponding lidar points according to a mapping relationship between the original image and the lidar points to obtain first mapping data;
the second mapping module 3 is configured to map the first mapping data to pixels of a forward-looking image according to a mapping relationship between the lidar point and the forward-looking image, so as to obtain second mapping data;
the sparse coding module 4 is used for carrying out sparse coding on the second mapping data to generate a lightweight sparse image;
the voxelization module 5 is configured to voxelize the laser radar point to obtain a three-dimensional voxel of the laser radar point;
the feature extraction module 6 is configured to perform feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain a pixel feature and a voxel feature;
the point cloud characteristic acquisition module 7 is used for encoding the pixel characteristic and the voxel characteristic to corresponding point cloud positions to obtain the pixel point cloud characteristic and the voxel point cloud characteristic;
the feature fusion module 8 is configured to perform feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature;
and the reflection module 9 is used for carrying out inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A multi-sensor fusion voxel characteristic map generation method is characterized by comprising the following steps:
acquiring a laser radar point and an original image;
mapping pixels of the original image to corresponding laser radar points according to the mapping relation between the original image and the laser radar points to obtain first mapping data;
mapping the first mapping data to pixels of a forward-looking image according to the mapping relation between the laser radar point and the forward-looking image to obtain second mapping data;
carrying out sparse coding on the second mapping data to generate a lightweight sparse image;
performing voxelization on the laser radar point to obtain a three-dimensional voxel of the laser radar point;
carrying out feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain pixel features and voxel features;
coding the pixel characteristics and the voxel characteristics to corresponding point cloud positions to obtain pixel point cloud characteristics and voxel point cloud characteristics;
performing feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature;
and performing inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
2. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein mapping pixels of the original image to corresponding lidar points according to a mapping relationship between the original image and the lidar points to obtain first mapping data specifically comprises:
establishing a first mapping relation between the original image and the laser radar points according to a calibration matrix between a camera and the laser radar; the first mapping relationship is:
Figure FDA0003431947340000011
wherein x is a coordinate on a depth dimension of the point cloud, y is a coordinate on a width dimension of the point cloud, z is a coordinate on a height dimension of the point cloud, ur is an abscissa of a pixel in the original image, vr is an ordinate of a pixel in the original image, I is an internal reference matrix, and E is an external reference matrix.
3. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein mapping the first mapping data to pixels of a forward-looking image according to a mapping relationship between the lidar point and the forward-looking image to obtain second mapping data specifically comprises:
establishing a second mapping relation between the laser radar point and the forward-looking image by utilizing a spherical projection principle; the second mapping relation is as follows:
Figure FDA0003431947340000021
wherein u is an abscissa of a pixel in the forward-looking image, v is an ordinate of a pixel in the forward-looking image, x is a coordinate in a depth dimension of the point cloud, y is a coordinate in a width dimension of the point cloud, z is a coordinate in a height dimension of the point cloud, r is a reflection intensity, w is a width of the forward-looking image, h is a height of the forward-looking image, fovdownThe vertical view angle below the laser radar is shown, and fov is the vertical view angle of the laser radar;
and mapping the first mapping data to a forward-looking image pixel according to the second mapping relation to obtain second mapping data.
4. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein the sparse coding is performed on the second mapping data to generate a lightweight sparse image, and specifically comprises:
judging whether any mapping data in the second mapping data is obtained by mapping a plurality of mapping data in the first mapping data, and obtaining a first judgment result;
if the first judgment result is yes, taking an average value of the plurality of mapping data in the first mapping data as a code value of any one mapping data in the second mapping data;
judging whether any mapping data in the second mapping data is obtained by mapping one mapping data in the first mapping data to obtain a second judgment result;
if the second judgment result is yes, taking the corresponding mapping data in the first mapping data as the coding value of any mapping data in the second mapping data;
and if the first judgment result and the second judgment result are both negative, not coding.
5. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein the laser radar point is voxelized to obtain a three-dimensional voxel of the laser radar point, and specifically comprises:
judging whether any voxel in the voxels contains a plurality of laser radar points to obtain a third judgment result;
if the third judgment result is yes, using the average value of the plurality of laser radar points as the coding value of any voxel in the voxels;
judging whether any voxel in the voxels contains one laser radar point or not to obtain a fourth judgment result;
if the fourth judgment result is yes, using the corresponding laser radar point as the coding value of any one of the voxels;
and if the third judgment result and the fourth judgment result are both negative, not coding.
6. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein feature extraction is performed on the lightweight sparse image and the three-dimensional voxel to obtain pixel features and voxel features, and specifically comprises:
respectively constructing a two-dimensional sparse convolution feature extraction network and a three-dimensional sparse convolution feature extraction network;
performing feature extraction on the lightweight sparse image by using the feature extraction network of the two-dimensional sparse convolution to obtain pixel features;
and performing feature extraction on the three-dimensional voxels by using the feature extraction network of the three-dimensional sparse convolution to obtain voxel features.
7. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein the pixel features and the voxel features are encoded to corresponding point cloud positions to obtain pixel point cloud features and voxel point cloud features, and specifically comprises:
encoding the pixel characteristics to a point cloud position by using a quadratic linear interpolation algorithm based on a four-near neighborhood to obtain the pixel point cloud characteristics;
and coding the voxel characteristic to a point cloud position by using a cubic linear interpolation algorithm based on an inverse distance weight method to obtain the voxel point cloud characteristic.
8. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein feature fusion is performed on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature, and specifically comprises:
processing the pixel point cloud feature by using a first full-connection block to obtain a one-dimensional pixel point cloud feature; the first full connection block comprises three full connection layers, two batch normalization layers and two ReLU activation function layers;
processing the voxel point cloud characteristics by using a second full connecting block to obtain one-dimensional voxel point cloud characteristics; the second full connection block comprises three full connection layers, two batch normalization layers and two ReLU activation function layers;
processing the one-dimensional pixel point cloud characteristic and the one-dimensional voxel point cloud characteristic by using a sigmoid function to obtain a pixel point cloud characteristic weight and a voxel point cloud characteristic weight; the sigmoid function is as follows:
Figure FDA0003431947340000041
wherein, wppAs pixel point cloud feature weights, wpvThe weight of the voxel point cloud characteristic is obtained,
Figure FDA0003431947340000042
is a one-dimensional pixel point cloud characteristic,
Figure FDA0003431947340000043
is a one-dimensional voxel point cloud characteristic;
fusing the pixel point cloud characteristic and the voxel point cloud characteristic by using the pixel point cloud characteristic weight and the voxel point cloud characteristic weight to obtain a point cloud fused characteristic; the fusion expression is:
ffuse=[fpp(1+wpp),fpv(1+wpv)]
wherein f isfuseAs a point cloud fusion feature, fppAs a pixel point cloud feature, fpvIs a voxel point cloud characteristic.
9. The method for generating a multi-sensor fusion voxel feature map according to claim 1, wherein the point cloud fusion feature is subjected to inverse mapping to obtain the multi-sensor fusion voxel feature map, and specifically comprises:
using a cubic linear interpolation method to reversely map the point cloud fusion feature to the position of a non-empty voxel in the voxel feature to obtain a replacement point cloud fusion feature;
and replacing the non-empty voxel characteristics of the corresponding positions in the voxel characteristics by the replacement point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
10. A multi-sensor fused voxel signature generation system, comprising:
the data acquisition module is used for acquiring laser radar points and original images;
the first mapping module is used for mapping the pixels of the original image to corresponding laser radar points according to the mapping relation between the original image and the laser radar points to obtain first mapping data;
the second mapping module is used for mapping the first mapping data to pixels of a forward-looking image according to the mapping relation between the laser radar point and the forward-looking image to obtain second mapping data;
the sparse coding module is used for carrying out sparse coding on the second mapping data to generate a lightweight sparse image;
the voxelization module is used for voxelizing the laser radar point to obtain a three-dimensional voxel of the laser radar point;
the characteristic extraction module is used for carrying out characteristic extraction on the lightweight sparse image and the three-dimensional voxel to obtain pixel characteristics and voxel characteristics;
the point cloud characteristic acquisition module is used for coding the pixel characteristic and the voxel characteristic to corresponding point cloud positions to obtain the pixel point cloud characteristic and the voxel point cloud characteristic;
the characteristic fusion module is used for carrying out characteristic fusion on the pixel point cloud characteristic and the voxel point cloud characteristic to obtain a point cloud fusion characteristic;
and the reflection module is used for carrying out inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.
CN202111597823.3A 2021-12-24 2021-12-24 Multi-sensor fusion voxel characteristic map generation method and system Pending CN114332796A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111597823.3A CN114332796A (en) 2021-12-24 2021-12-24 Multi-sensor fusion voxel characteristic map generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111597823.3A CN114332796A (en) 2021-12-24 2021-12-24 Multi-sensor fusion voxel characteristic map generation method and system

Publications (1)

Publication Number Publication Date
CN114332796A true CN114332796A (en) 2022-04-12

Family

ID=81013374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111597823.3A Pending CN114332796A (en) 2021-12-24 2021-12-24 Multi-sensor fusion voxel characteristic map generation method and system

Country Status (1)

Country Link
CN (1) CN114332796A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115240249A (en) * 2022-07-07 2022-10-25 湖北大学 Feature extraction classification measurement learning method and system for face recognition and storage medium
CN115471561A (en) * 2022-11-14 2022-12-13 科大讯飞股份有限公司 Object key point positioning method, cleaning robot control method and related equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115240249A (en) * 2022-07-07 2022-10-25 湖北大学 Feature extraction classification measurement learning method and system for face recognition and storage medium
CN115240249B (en) * 2022-07-07 2023-06-06 湖北大学 Feature extraction classification metric learning method, system and storage medium for face recognition
CN115471561A (en) * 2022-11-14 2022-12-13 科大讯飞股份有限公司 Object key point positioning method, cleaning robot control method and related equipment

Similar Documents

Publication Publication Date Title
CN110738697B (en) Monocular depth estimation method based on deep learning
CN110570429B (en) Lightweight real-time semantic segmentation method based on three-dimensional point cloud
CN114724120B (en) Vehicle target detection method and system based on radar vision semantic segmentation adaptive fusion
CN111476242B (en) Laser point cloud semantic segmentation method and device
CN113284163B (en) Three-dimensional target self-adaptive detection method and system based on vehicle-mounted laser radar point cloud
CN112949633B (en) Improved YOLOv 3-based infrared target detection method
CN114332796A (en) Multi-sensor fusion voxel characteristic map generation method and system
CN112347987A (en) Multimode data fusion three-dimensional target detection method
CN113269040A (en) Driving environment sensing method combining image recognition and laser radar point cloud segmentation
CN113095152B (en) Regression-based lane line detection method and system
EP4174792A1 (en) Method for scene understanding and semantic analysis of objects
CN114463736A (en) Multi-target detection method and device based on multi-mode information fusion
CN112270694B (en) Method for detecting urban environment dynamic target based on laser radar scanning pattern
CN114298151A (en) 3D target detection method based on point cloud data and image data fusion
CN112288667A (en) Three-dimensional target detection method based on fusion of laser radar and camera
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN116279592A (en) Method for dividing travelable area of unmanned logistics vehicle
TW202225730A (en) High-efficiency LiDAR object detection method based on deep learning through direct processing of 3D point data to obtain a concise and fast 3D feature to solve the shortcomings of complexity and time-consuming of the current voxel network model
CN116246119A (en) 3D target detection method, electronic device and storage medium
CN115115917A (en) 3D point cloud target detection method based on attention mechanism and image feature fusion
CN117475428A (en) Three-dimensional target detection method, system and equipment
CN112924037A (en) Infrared body temperature detection system and detection method based on image registration
CN117115359A (en) Multi-view power grid three-dimensional space data reconstruction method based on depth map fusion
CN116704307A (en) Target detection method and system based on fusion of image virtual point cloud and laser point cloud
CN116797894A (en) Radar and video fusion target detection method for enhancing characteristic information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination