CN114332796A

CN114332796A - Multi-sensor fusion voxel characteristic map generation method and system

Info

Publication number: CN114332796A
Application number: CN202111597823.3A
Authority: CN
Inventors: 孔德明; 李晓伟; 曹尚杰; 张文宇; 沈阅
Original assignee: Hebei Yandayan Soft Information System Co ltd; QINHUANGDAO PORT CO Ltd; Yanshan University
Current assignee: Hebei Yandayan Soft Information System Co ltd; QINHUANGDAO PORT CO Ltd; Yanshan University
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-12

Abstract

The invention relates to a multi-sensor fusion voxel characteristic map generation method and a multi-sensor fusion voxel characteristic map generation system. Meanwhile, the method provided by the invention integrates the characteristics extracted by two sensor characteristic extraction networks, constructs the environment information characteristics with richer information, and improves the accuracy of three-dimensional target detection.

Description

Multi-sensor fusion voxel characteristic map generation method and system

Technical Field

The invention relates to the technical field of computer vision and automatic driving, in particular to a method and a system for generating a multi-sensor fusion voxel characteristic map.

Background

In the past decade, many scholars have been conducting research on computer vision technology and have achieved fruitful research results, and as one of the important applications of computer vision technology, environmental awareness technology in the field of automatic driving has been widely studied in recent years and has become a research focus. An automatic driving automobile is usually provided with a camera and a laser radar sensor so as to respectively obtain a high-resolution image and a laser radar point cloud of the surrounding environment, wherein the high-resolution image consists of pixels which are regularly and densely distributed and comprises color information; the lidar point cloud is composed of discrete points which are irregular and sparsely distributed in space, and comprises position information. These two kinds of information are the most common two kinds of input information in the context-aware technology, and most computer vision tasks are based on these two kinds of information. In the field of computer vision, a convolutional neural network is a common and effective technology, a deep learning network can be constructed by using the convolutional neural network, and data containing various information is input into the deep learning network, so that characteristic information in the input data is extracted and some visual tasks, such as three-dimensional target detection, semantic segmentation, instance segmentation and the like in an automatic driving scene, are completed.

Currently, in practical applications, existing high-performance three-dimensional target detection methods generally only use point cloud data as input, but such methods lack RGB channel information values in image information, and both the RGB information in an image and three-dimensional structure information contained in a point cloud are important features for identifying an object. There are also multi-modal fusion methods that typically fuse point cloud feature maps and image feature maps in a pixel-to-pixel fashion, but such fusion methods are limited by the alignment between feature maps and feature maps, and as a result, the accuracy of single-modal methods is not high.

Therefore, how to design a multi-sensor fusion voxel characteristic map generation method and system capable of improving the three-dimensional target detection accuracy rate becomes a technical problem to be solved in the field.

Disclosure of Invention

The invention aims to provide a method and a system for generating a multi-sensor fusion voxel characteristic map, which can improve the accuracy of three-dimensional target detection.

In order to achieve the purpose, the invention provides the following scheme:

a multi-sensor fused voxel feature map generation method, the method comprising the steps of:

acquiring a laser radar point and an original image;

mapping pixels of the original image to corresponding laser radar points according to the mapping relation between the original image and the laser radar points to obtain first mapping data;

mapping the first mapping data to pixels of a forward-looking image according to the mapping relation between the laser radar point and the forward-looking image to obtain second mapping data;

carrying out sparse coding on the second mapping data to generate a lightweight sparse image;

performing voxelization on the laser radar point to obtain a three-dimensional voxel of the laser radar point;

carrying out feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain pixel features and voxel features;

coding the pixel characteristics and the voxel characteristics to corresponding point cloud positions to obtain pixel point cloud characteristics and voxel point cloud characteristics;

performing feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature;

and performing inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.

Optionally, mapping the pixels of the original image to corresponding lidar points according to the mapping relationship between the original image and the lidar points to obtain first mapping data, which specifically includes:

establishing a first mapping relation between the original image and the laser radar points according to a calibration matrix between a camera and the laser radar; the first mapping relationship is:

wherein x is a coordinate on a depth dimension of the point cloud, y is a coordinate on a width dimension of the point cloud, z is a coordinate on a height dimension of the point cloud, ur is an abscissa of a pixel in the original image, vr is an ordinate of a pixel in the original image, I is an internal reference matrix, and E is an external reference matrix.

Optionally, mapping the first mapping data to pixels of a forward-looking image according to a mapping relationship between the lidar point and the forward-looking image to obtain second mapping data, specifically including:

establishing a second mapping relation between the laser radar point and the forward-looking image by utilizing a spherical projection principle; the second mapping relation is as follows:

wherein u is an abscissa of a pixel in the forward-looking image, v is an ordinate of a pixel in the forward-looking image, x is a coordinate in a depth dimension of the point cloud, y is a coordinate in a width dimension of the point cloud, z is a coordinate in a height dimension of the point cloud, r is a reflection intensity, w is a width of the forward-looking image, h is a height of the forward-looking image, fov_downThe vertical view angle below the laser radar is shown, and fov is the vertical view angle of the laser radar;

and mapping the first mapping data to a forward-looking image pixel according to the second mapping relation to obtain second mapping data.

Optionally, performing sparse coding on the second mapping data to generate a lightweight sparse image, specifically including:

judging whether any mapping data in the second mapping data is obtained by mapping a plurality of mapping data in the first mapping data, and obtaining a first judgment result;

if the first judgment result is yes, taking an average value of the plurality of mapping data in the first mapping data as a code value of any one mapping data in the second mapping data;

judging whether any mapping data in the second mapping data is obtained by mapping one mapping data in the first mapping data to obtain a second judgment result;

if the second judgment result is yes, taking the corresponding mapping data in the first mapping data as the coding value of any mapping data in the second mapping data;

and if the first judgment result and the second judgment result are both negative, not coding.

Optionally, the voxelization of the laser radar point is performed to obtain a three-dimensional voxel of the laser radar point, and the method specifically includes:

judging whether any voxel in the voxels contains a plurality of laser radar points to obtain a third judgment result;

if the third judgment result is yes, using the average value of the plurality of laser radar points as the coding value of any voxel in the voxels;

judging whether any voxel in the voxels contains one laser radar point or not to obtain a fourth judgment result;

if the fourth judgment result is yes, using the corresponding laser radar point as the coding value of any one of the voxels;

and if the third judgment result and the fourth judgment result are both negative, not coding.

Optionally, performing feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain a pixel feature and a voxel feature, specifically including:

respectively constructing a two-dimensional sparse convolution feature extraction network and a three-dimensional sparse convolution feature extraction network;

performing feature extraction on the lightweight sparse image by using the feature extraction network of the two-dimensional sparse convolution to obtain pixel features;

and performing feature extraction on the three-dimensional voxels by using the feature extraction network of the three-dimensional sparse convolution to obtain voxel features.

Optionally, the pixel feature and the voxel feature are encoded to a corresponding point cloud position to obtain a pixel point cloud feature and a voxel point cloud feature, and the method specifically includes:

encoding the pixel characteristics to a point cloud position by using a quadratic linear interpolation algorithm based on a four-near neighborhood to obtain the pixel point cloud characteristics;

and coding the voxel characteristic to a point cloud position by using a cubic linear interpolation algorithm based on an inverse distance weight method to obtain the voxel point cloud characteristic.

Optionally, feature fusion is performed on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature, which specifically includes:

processing the pixel point cloud feature by using a first full-connection block to obtain a one-dimensional pixel point cloud feature; the first full connection block comprises three full connection layers, two batch normalization layers and two ReLU activation function layers;

processing the voxel point cloud characteristics by using a second full connecting block to obtain one-dimensional voxel point cloud characteristics; the second full connection block comprises three full connection layers, two batch normalization layers and two ReLU activation function layers;

processing the one-dimensional pixel point cloud characteristic and the one-dimensional voxel point cloud characteristic by using a sigmoid function to obtain a pixel point cloud characteristic weight and a voxel point cloud characteristic weight; the sigmoid function is as follows:

wherein, w_ppAs pixel point cloud feature weights, w_pvThe weight of the voxel point cloud characteristic is obtained,

is a one-dimensional pixel point cloud characteristic,

is a one-dimensional voxel point cloud characteristic;

fusing the pixel point cloud characteristic and the voxel point cloud characteristic by using the pixel point cloud characteristic weight and the voxel point cloud characteristic weight to obtain a point cloud fused characteristic; the fusion expression is:

f_fuse＝[f_pp(1+w_pp),f_pv(1+w_pv)]

wherein f is_fuseAs a point cloud fusion feature, f_ppAs a pixel point cloud feature, f_pvIs a voxel point cloud characteristic.

Optionally, the point cloud fusion features are subjected to inverse mapping to obtain a multi-sensor fusion voxel feature map, which specifically includes:

using a cubic linear interpolation method to reversely map the point cloud fusion feature to the position of a non-empty voxel in the voxel feature to obtain a replacement point cloud fusion feature;

and replacing the non-empty voxel characteristics of the corresponding positions in the voxel characteristics by the replacement point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.

The invention also provides a multi-sensor fusion voxel characteristic map generation system, which comprises:

the data acquisition module is used for acquiring laser radar points and original images;

the first mapping module is used for mapping the pixels of the original image to corresponding laser radar points according to the mapping relation between the original image and the laser radar points to obtain first mapping data;

the second mapping module is used for mapping the first mapping data to pixels of a forward-looking image according to the mapping relation between the laser radar point and the forward-looking image to obtain second mapping data;

the sparse coding module is used for carrying out sparse coding on the second mapping data to generate a lightweight sparse image;

the voxelization module is used for voxelizing the laser radar point to obtain a three-dimensional voxel of the laser radar point;

the characteristic extraction module is used for carrying out characteristic extraction on the lightweight sparse image and the three-dimensional voxel to obtain pixel characteristics and voxel characteristics;

the point cloud characteristic acquisition module is used for coding the pixel characteristic and the voxel characteristic to corresponding point cloud positions to obtain the pixel point cloud characteristic and the voxel point cloud characteristic;

the characteristic fusion module is used for carrying out characteristic fusion on the pixel point cloud characteristic and the voxel point cloud characteristic to obtain a point cloud fusion characteristic;

and the reflection module is used for carrying out inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

according to the method, the mapping relation between the original image and the laser radar point and the mapping relation between the laser radar point and the foresight image are established, the light sparse image is generated, and a foundation is provided for the rapid processing of image information. Meanwhile, the method provided by the invention integrates the characteristics extracted by two sensor characteristic extraction networks, constructs the environment information characteristics with richer information, and improves the accuracy of three-dimensional target detection.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of a multi-sensor fusion voxel feature map generation method according to embodiment 1 of the present invention;

fig. 2 is a raw image of sample number 7312 in the KITTI autopilot dataset;

FIG. 3 is a laser radar spot for sample number 7312 in the KITTI autonomous driving dataset;

fig. 4 is a lightweight sparse image of sample number 7312 in the KITTI autonomous driving dataset;

FIG. 5 is a diagram of a feature fusion network architecture;

FIG. 6 is a schematic diagram of multi-sensor fusion voxel feature map generation.

Fig. 7 is a structural diagram of a multi-sensor fusion voxel feature map generation system according to embodiment 2 of the present invention.

Description of the symbols:

1. a data acquisition module; 2. a first mapping module; 3. a second mapping module; 4. a sparsification coding module; 5. a voxelization module; 6. a feature extraction module; 7. a point cloud feature acquisition module; 8. a feature fusion module; 9. and a reflection module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Example 1:

referring to fig. 1, the present invention provides a method for generating a multi-sensor fused voxel feature map, which includes the following steps:

s1: acquiring a laser radar point and an original image;

s2: mapping pixels of the original image to corresponding laser radar points according to the mapping relation between the original image and the laser radar points to obtain first mapping data;

s3: mapping the first mapping data to pixels of a forward-looking image according to the mapping relation between the laser radar point and the forward-looking image to obtain second mapping data;

s4: carrying out sparse coding on the second mapping data to generate a lightweight sparse image;

s5: performing voxelization on the laser radar point to obtain a three-dimensional voxel of the laser radar point;

s6: carrying out feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain pixel features and voxel features;

s7: coding the pixel characteristics and the voxel characteristics to corresponding point cloud positions to obtain pixel point cloud characteristics and voxel point cloud characteristics;

s8: performing feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature;

s9: and performing inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.

In step S2, mapping the pixels of the original image to corresponding lidar points according to the mapping relationship between the original image and the lidar points to obtain first mapping data, which specifically includes:

In step S3, mapping the first mapping data to pixels of a forward-looking image according to a mapping relationship between the lidar point and the forward-looking image, to obtain second mapping data, specifically including:

In step S4, the sparse coding is performed on the second mapping data to generate a lightweight sparse image, specifically including:

In step S5, the voxelizing the lidar point to obtain a three-dimensional voxel of the lidar point specifically includes:

In step S6, performing feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain a pixel feature and a voxel feature, specifically including:

In step S7, the pixel feature and the voxel feature are encoded to a corresponding point cloud position to obtain a pixel point cloud feature and a voxel point cloud feature, which specifically include:

In step S8, performing feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature, which specifically includes:

is a one-dimensional pixel point cloud characteristic,

is a one-dimensional voxel point cloud characteristic;

f_fuse＝[f_pp(1+w_pp),f_pv(1+w_pv)]

In step S9, inverse mapping is performed on the point cloud fusion features to obtain a multi-sensor fusion voxel feature map, which specifically includes:

After step S9, the method further includes: and sending the obtained multi-sensor fusion voxel characteristic graph into a subsequent network to perform a three-dimensional target detection task. In this embodiment, the subsequent network refers to the network structure of the PV-RCNN or other network structures (such as the SA-SSD). That is, the generated multi-sensor fusion voxel feature map can be connected to many existing network structures (as long as the multi-sensor fusion voxel feature map is based on the voxel feature map).

To facilitate an understanding of the present invention, the present invention is described below with reference to fig. 2 to 6.

The method comprises the following steps of firstly, acquiring three-dimensional laser radar point cloud data of the surrounding environment and a picture of the surrounding environment.

In this embodiment, the method of the present invention is described in detail by using a KITTI autopilot data set which is most commonly used and authoritative in the autopilot field, wherein data of the data set is acquired by 2 grayscale cameras (0 and 1), two color cameras (2 and 3) and a laser radar, and the 0 camera is a reference camera. An iteration cycle of data is illustrated by taking a color camera 2 original image of a 7312 sample of a training set in a KITTI automatic driving data set and corresponding lidar point cloud data as an example, the resolution of the original image is 1242 × 375, and 465750 pixels are total, as shown in FIG. 2. The lidar point cloud data within the corresponding range contains 18656 points, as shown in fig. 3.

Establishing a mapping relation between an original image and a laser radar point, and defining P: { P_i＝(x_i,y_i,z_i,r_i) And i is 1,2,3, …,18656, where x is the coordinate in the depth dimension of the point cloud, y is the coordinate in the width dimension of the point cloud, z is the coordinate in the height dimension of the point cloud, and r is the reflection intensity corresponding to the point. p is a radical of_iPoints in the laser radar point cloud, RP: { RP_iThe color camera image processing method based on the color camera image processing includes the steps that (ur, vr), i is 1,2,3, …, 465750} pixels in a No. 2 color camera original image, ur is an abscissa of the pixels in the original image, and vr is an ordinate of the pixels in the original image. The mapping relationship between the original image pixels captured by the color camera No. 2 and the lidar points can be represented by equation (1).

And step two, establishing a mapping relation between the original image pixels and the laser radar points and a mapping relation between the laser radar points and the forward-looking image.

Wherein the reference matrix I of the No. 2 color camera₂Comprises the following steps:

modified rotation matrix R of No. 0 gray-scale camera₀Comprises the following steps:

the external reference matrix Tr _ velo _ to _ cam for mapping the laser radar points to the 0 # gray-scale camera is as follows:

and then, establishing a mapping relation between the laser radar point and the forward-looking image. The upper vertical viewing angle of the lidar is fov_downSetting the width and height w and h of the generated front view to 512 and 48 respectively at 0.43rad and fov rad and 0.47rad, can obtain the mapping relationship between the lidar point and the front view pixel as shown in equation (5).

And step three, mapping pixels in the original image to corresponding laser radar points, further mapping the pixels to pixels of the generated foresight image for sparse coding, generating a lightweight sparse image, and simultaneously converting the original point cloud into three-dimensional voxels.

Mapping color information in the original image pixel to the laser radar point according to the mapping relation between the original image pixel and the laser radar point shown in the formula (1), wherein the characteristic of the laser radar point is updated to be P: { P_i＝(rd_i,gr_i,bl_i) And i is 1,2,3, …, N, where rd, gr, and bl are the red channel value, the green channel value, and the blue channel value of the pixel in the original image corresponding to the lidar point, respectively. And mapping the laser radar points to the forward-looking image pixels according to the mapping relation between the laser radar points and the forward-looking image pixels shown in the formula (5), and performing sparse coding. The foresight image pixel sparsification coding follows the following rules: for the condition that a plurality of laser radar points are mapped to the same forward-looking image pixel, the average value of the characteristics of the laser radar points is used as the coding value of the pixel; for the condition that one laser radar point is mapped to one forward-looking image pixel, using the characteristic value of the corresponding laser radar point as the coding value of the pixel; for forward-looking image pixels where there is no corresponding lidar pointThe encoding is not performed, and the sparsity is maintained. Through the sparse coding operation, the front-view image is converted into a light-weight sparse image, and color information in the original high-resolution image is transferred to the generated light-weight sparse image. 7312 sample lightweight sparse images are shown in fig. 4, where black pixels are uncoded pixels, and it can be seen from the figure that compared with a dense front-view image without sparse coding, some uncoded pixels exist in the sparse image with sparse coding, and therefore, less memory is occupied.

The point cloud space is defined to range between [0, -40, -1] meters and [70.4,40,3] meters in the depth, width and height dimensions, and the voxel size is [0.05,0.05,0.1] meters in the depth, width and height dimensions, respectively, resulting in a three-dimensional voxel with a resolution of 1408 x 1600 x 40. Similar to the lightweight sparse image generation, if a voxel contains a plurality of lidar points, the average value of these points is used as the code value of the voxel, and if the voxel does not contain a lidar point, the coding is not performed. And comprehensively considering the calculation consumption, and keeping a maximum of 16000 non-empty three-dimensional voxels.

And step four, performing feature extraction on the lightweight sparse image and the three-dimensional voxels by using a feature extraction network to obtain pixel features and voxel features, and encoding the two features on the point cloud to obtain two point cloud features.

TABLE 1 voxel characteristic extraction network architecture and parameters thereof

Constructing a feature extraction network based on sparse convolution to perform feature extraction on the generated lightweight sparse image and three-dimensional voxel to obtain a pixel feature map with a down-sampling scale of 8

And a voxel characteristic map

Wherein the characteristics are respectively

And

the structure of the feature extraction network and its parameters are shown in table 1, where for the extraction of pixel features and voxel features, two-dimensional sparse convolution and three-dimensional sparse convolution are respectively adopted, and after each sparse convolution or manifold sparse convolution, batch normalization operation is performed and a ReLU activation function is used for activation.

Representing FM_pIs a two-dimensional set of pixels of size 8 x 64, which is sparse as previously described,

representing FM_pThe number of non-empty pixels is 348, and each pixel is characterized by 64 characteristic channels.

Representing FM_vIs a three-dimensional set of voxels of size 5 x 200 x 176, which is sparse as described previously (i.e., empty and non-empty voxels of these 5 x 200 x 176 voxels),

representing FM_vThe number of non-empty voxels in the medium is 10632, and each voxel is characterized by 64 characteristic channels.

Encoding pixel features to point cloud by using quadratic linear interpolation algorithm based on four-near neighborhood to obtain pixel point cloud features

The calculation process is shown in formula (6), wherein p_ul,p_ur,p_blAnd p_brRepresenting a coding point, p_iE.g. four sparse pixels of the neighborhood around P, the subscripts x and y represent the coordinates in the width and height directions of the pixel feature map, respectively.

Encoding the voxel characteristics to a point cloud position by using a cubic linear interpolation algorithm based on an inverse distance weight method to obtain the voxel point cloud characteristics

The calculation is shown in equation (7), where j is the index of the voxel, f_vjIs a feature of the jth voxel, w_j(p_i) For the j' th voxel and the encoding point p calculated based on the inverse distance weighting method_iE is the weight between P, the weight is calculated by formula (8), r_itpSearch radius, η (v) for inverse distance weighting_j) And η (p)_i) The three-dimensional coordinates of the jth voxel and the ith point respectively.

And step five, sending the two point cloud characteristics into a characteristic fusion network to obtain point cloud fusion characteristics.

Fig. 5 shows the constructed feature fusion network, the parameters of the full connection layer are shown in table 2, and the full connection blocks 1 and 2 are composed of three full connection layers, two batch normalization layers and two ReLU activation function layers. Firstly, point cloud pixel characteristics and point cloud pixel characteristics are processed by using two full-connection blocks to obtain one-dimensional characteristics sum of point clouds

And

TABLE 2 feature fusion network architecture and parameters thereof

	Full connection layer characteristic input and output dimensions
		Full connecting block 1	(64，64)
Full connecting block 2	(64，64)

Then, processing the obtained point cloud one-dimensional features by using a sigmoid function, compressing the one-dimensional features to be between [0,1] to obtain point cloud weight, wherein the sigmoid function is shown as a formula (9):

wherein, W_ppPointing to cloud pixel feature weights, W_pvAnd pointing out the cloud pixel characteristic weight.

Finally, the point cloud weight is utilized to fuse the two point cloud characteristics to obtain a point cloud fusion characteristic f_fuseThe point cloud fusion feature expression is shown as formula (10):

f_fuse＝[f_pp(1+w_pp)，f_pv(1+w_pv)] (10)

and step six, obtaining a multi-sensor fusion voxel characteristic diagram from the point cloud fusion characteristics and sending the multi-sensor fusion voxel characteristic diagram into a subsequent network for a three-dimensional target detection task.

Inverse mapping of point cloud fusion features to the output of a feature extraction network using an inverse distance weight based cubic linear interpolation method as shown in FIG. 6 (downsampling)Voxel feature map FM with dimension 8_v) The calculation process for the positions of the non-empty voxels is shown in equation (11), where i is the index of the point, and w is_i(v_j) For the ith point and voxel v calculated based on the inverse distance weighting method_j∈FM_vThe weight is calculated by formula (12), r_itpSearch radius, η (v) for inverse distance weighting_j) And η (p)_i) The three-dimensional coordinates of the jth voxel and the ith point respectively.

From f'_fuseReplacement voxel profile FM_vCharacteristics of non-empty voxels

And generating a multi-sensor fusion voxel characteristic map. And finally, accessing the characteristic diagram to a subsequent network for carrying out a target detection task. f'_fuseTo reflect to FM_vVoxel characteristics of non-empty voxel locations.

All experiments in this example were performed on the same experimental platform (Intel Jordan RTX-2080Ti display card, 64GB memory). The training set and the verification set are obtained from training samples of the KITTI public data set, wherein 3712 training samples are used in the training set, 3769 training samples are used in the verification set, and vehicles in the verification set are divided into three difficulties, namely simple difficulty, medium difficulty and difficulty according to the size of a vehicle surrounding frame and the shielded degree. The training batch size is 2, the learning rate is 0.00125, and the training period is 50. In the embodiment, two sets of experiments are performed, one set is an original PV-RCNN target detection network experiment, and the other set replaces the voxel characteristic diagram generation network in the original PV-RCNN network with the multi-sensor fusion voxel characteristic diagram generation network to perform three-dimensional target detection, and the results are shown in table 3. Table 3 evaluates the car detection result at the three-dimensional viewing angle using the average accuracy, and if the intersection between the finally detected car bounding box and the true value is greater than 70%, the detection is considered to be correct, otherwise, the detection is wrong.

As shown in table 3, the average accuracy of simple difficult vehicles is slightly reduced by the method of the present invention, but the detection accuracy of the most important and most frequently used medium difficult vehicle in the three-dimensional target detection evaluation is effectively improved, and the method of the present invention also has a beneficial effect on the detection of difficult vehicles. On the whole, by using the method disclosed by the invention, the three-dimensional target detection accuracy of the PV-RCNN target detection network is improved by 0.19%, and the effectiveness of the method disclosed by the invention is verified.

TABLE 3 PV-RCNN and target detection results of the method of the invention

In conclusion, the method establishes the mapping relation between the original image pixels and the laser radar points and the mapping relation between the laser radar points and the foresight image pixels, generates the lightweight sparse image, and provides a basis for the rapid processing of image information. Meanwhile, the method provided by the invention integrates the features extracted by two sensor features to extract the network extracted features, constructs the environment information features with richer information, and provides a good foundation for subsequent computer vision tasks. The method disclosed by the invention uses a linear interpolation method in the point cloud characteristic coding process and the multi-sensor fusion voxel characteristic map generating process, and has higher calculation efficiency compared with the existing point set abstract algorithm. Compared with the method using the original image as input, the characteristic diagram generated by the method disclosed by the invention is fused with additional three-dimensional coordinate information; compared with the method using point cloud as input, the method of the invention fuses additional image RGB information.

Example 2:

referring to fig. 7, the present invention provides a multi-sensor fused voxel feature map generating system, which includes:

the data acquisition module 1 is used for acquiring laser radar points and original images;

the first mapping module 2 is configured to map pixels of the original image to corresponding lidar points according to a mapping relationship between the original image and the lidar points to obtain first mapping data;

the second mapping module 3 is configured to map the first mapping data to pixels of a forward-looking image according to a mapping relationship between the lidar point and the forward-looking image, so as to obtain second mapping data;

the sparse coding module 4 is used for carrying out sparse coding on the second mapping data to generate a lightweight sparse image;

the voxelization module 5 is configured to voxelize the laser radar point to obtain a three-dimensional voxel of the laser radar point;

the feature extraction module 6 is configured to perform feature extraction on the lightweight sparse image and the three-dimensional voxel to obtain a pixel feature and a voxel feature;

the point cloud characteristic acquisition module 7 is used for encoding the pixel characteristic and the voxel characteristic to corresponding point cloud positions to obtain the pixel point cloud characteristic and the voxel point cloud characteristic;

the feature fusion module 8 is configured to perform feature fusion on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature;

and the reflection module 9 is used for carrying out inverse mapping on the point cloud fusion characteristics to obtain a multi-sensor fusion voxel characteristic map.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A multi-sensor fusion voxel characteristic map generation method is characterized by comprising the following steps:

acquiring a laser radar point and an original image;

2. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein mapping pixels of the original image to corresponding lidar points according to a mapping relationship between the original image and the lidar points to obtain first mapping data specifically comprises:

3. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein mapping the first mapping data to pixels of a forward-looking image according to a mapping relationship between the lidar point and the forward-looking image to obtain second mapping data specifically comprises:

4. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein the sparse coding is performed on the second mapping data to generate a lightweight sparse image, and specifically comprises:

5. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein the laser radar point is voxelized to obtain a three-dimensional voxel of the laser radar point, and specifically comprises:

6. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein feature extraction is performed on the lightweight sparse image and the three-dimensional voxel to obtain pixel features and voxel features, and specifically comprises:

7. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein the pixel features and the voxel features are encoded to corresponding point cloud positions to obtain pixel point cloud features and voxel point cloud features, and specifically comprises:

8. The multi-sensor fusion voxel feature map generation method according to claim 1, wherein feature fusion is performed on the pixel point cloud feature and the voxel point cloud feature to obtain a point cloud fusion feature, and specifically comprises:

is a one-dimensional pixel point cloud characteristic,

is a one-dimensional voxel point cloud characteristic;

f_fuse＝[f_pp(1+w_pp),f_pv(1+w_pv)]

9. The method for generating a multi-sensor fusion voxel feature map according to claim 1, wherein the point cloud fusion feature is subjected to inverse mapping to obtain the multi-sensor fusion voxel feature map, and specifically comprises:

10. A multi-sensor fused voxel signature generation system, comprising: