CN115393680A

CN115393680A - 3D target detection method and system for multi-mode information space-time fusion in foggy day scene

Info

Publication number: CN115393680A
Application number: CN202210945302.0A
Authority: CN
Inventors: 尹智帅; 焦钰军; 刘峻恺
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-11-25
Anticipated expiration: 2042-08-08
Also published as: CN115393680B

Abstract

The invention discloses a 3D target detection method for multi-modal information space-time fusion in a foggy day scene, which comprises the following steps: acquiring laser radar point cloud and millimeter wave radar point cloud data in a foggy day scene, preprocessing, performing space-time feature matching, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, fusing the resampled point clouds in a time sequence to obtain preliminary space-time fusion features under a bird-eye view angle, inputting the preliminary space-time fusion features into a self-attention-based transform encoder, and performing feature coding by adopting a self-attention mode to obtain higher-dimensional space-time semantic features; inputting the target classification detection head and the frame regression detection head, and outputting the final target detection result including the object category and the position in space. The method effectively fuses the laser radar point cloud and the millimeter wave radar point cloud, and complements the data representation advantages, thereby realizing robust and efficient 3D target detection in a foggy scene.

Description

3D target detection method and system based on multi-mode information space-time fusion in foggy day scene

Technical Field

The invention relates to the field of environmental perception in automatic driving, in particular to a 3D target detection method and system for multi-mode information space-time fusion in a foggy day scene.

Background

In recent years, the landing of high-level automatic driving becomes a serious difficulty to be overcome in the field of automatic driving. 3D target detection is taken as a key research direction in the field of automatic driving, and the key challenge is to realize all-weather multi-scene target detection, namely, objects around can be accurately identified under any weather condition. Nowadays, the target detection of the automatic driving automobile is mostly completed by adopting a multi-sensor fusion configuration scheme, such as a camera, a laser radar, a millimeter wave radar and the like. The approach of fusing multiple sensors can overcome the problem of system failure due to occasional failure of a single sensor and produce more accurate target detection results than if only a single sensor were used.

The existing multi-sensor fusion target detection method mainly completes a perception task based on a laser radar, a camera and the like, can generate a point cloud with fine granularity or an image with high resolution under a good weather condition and provides rich and redundant visual information. However, these visual sensors are sensitive to weather conditions, and in bad weather (such as fog), opaque particles distort the light, significantly reducing the sensing range of the lidar and camera, and thus causing the detection result to become unreliable.

In the millimeter wave radar, in addition to being inexpensive and widely used compared to the laser radar and the camera, the millimeter wave radar uses a millimeter wave signal having a wavelength much larger than particles of fog, rain, snow, and the like, and thus easily penetrates or diffracts around them. Therefore, the influence of the millimeter wave radar data on rain and fog weather is small, and a robust 3D target detection task in a foggy scene can be completed by adopting a mode of fusing the millimeter wave radar and other sensors.

Disclosure of Invention

The invention mainly aims to relieve the interference caused by weather conditions and target motion shielding and realize robust and efficient 3D target detection in foggy scenes.

The technical scheme adopted by the invention is as follows:

the 3D target detection method for multi-modal information space-time fusion in the foggy day scene comprises the following steps:

s1, laser radar point cloud data and millimeter wave radar point cloud data in a foggy day scene are obtained and are respectively preprocessed;

s2, performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the preprocessed millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and further performing fusion on a time sequence to obtain preliminary space-time fusion features under a bird' S-eye view angle;

s3, inputting the space-time fusion characteristics under the bird' S-eye view angle into a self-attention-based transform encoder, and performing characteristic encoding in a self-attention mode to obtain space-time semantic characteristics which are the same as the original characteristic diagram in size but higher in dimension;

and S4, respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head, and outputting the final target detection result comprising the object type and the position in the space.

According to the technical scheme, the point cloud data of the laser radar is extracted into voxel, and the point cloud of the millimeter wave radar is preprocessed in a PointNet mode.

According to the technical scheme, in the step S2, the millimeter wave radar point cloud is converted to the laser radar coordinate system to be matched with the voxels, and then all the voxels and the point cloud feature conversion space positions are projected to the aerial view.

According to the technical scheme, in the step S2, particularly, a millimeter wave radar point cloud is used as a center, voxels generated by a laser radar in a certain range are searched by adopting KNN, and random sampling is carried out; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features.

In connection with the above technical solution, step S3 specifically includes the following steps:

taking 2.5m multiplied by 2.5m as the size of a space-time window, and taking all voxel characteristics at different moments but at the same position as all elements in the space-time window;

uniformly inputting the divided 40 multiplied by 40 space-time windows into a self-attention-based transform coder as a batch for feature coding to output high-dimensional semantic space-time features;

and (4) remapping and expressing the high-dimensional semantic features into a regularly-rasterized feature map under the bird's-eye view angle by means of the coordinates of the voxels.

In connection with the above technical solution, step S4 specifically includes the following steps:

arranging reference frames with the orientation of 0 degrees and 90 degrees respectively at each position of a high-dimensional space-time semantic feature map;

and respectively inputting the space-time semantic feature map with the arranged reference frame into a full-connection layer of two branches of a target classification detection head and a frame regression detection head to obtain a network object classification score and a predicted frame, and filtering the detection frame with a low input threshold value based on the score to obtain a high-quality detection frame.

According to the technical scheme, the size of the reference frame is obtained according to the average value of the labeled data of a certain category in the data set, so that the difficulty of network learning is reduced.

The invention also provides a 3D target detection method for multi-modal information space-time fusion in a foggy day scene, which comprises the following steps:

the preprocessing module is used for acquiring laser radar point cloud data and millimeter wave radar point cloud data in a foggy weather scene and respectively preprocessing the laser radar point cloud data and the millimeter wave radar point cloud data;

the space-time feature matching module is used for performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and then further performing fusion on a time sequence to obtain a preliminary space-time fusion feature under a bird's-eye view angle;

the feature coding module is used for inputting the space-time fusion features under the bird's-eye view angle to a self-attention-based transform encoder, and performing feature coding in a self-attention mode to obtain space-time semantic features which are the same as the original feature map in size and higher in dimension;

and the classification module is used for respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head and outputting the final target detection result comprising the object class and the position in the space.

The invention also provides a computer storage medium, which stores a computer program capable of being executed by a processor, and the computer program executes the 3D target detection method for the multi-mode information space-time fusion in the foggy day scene.

The invention also provides a vehicle-mounted foggy day scene target detection system which comprises a data collector, a vehicle-mounted storage computing platform and a vehicle actuator, wherein the data collector comprises a laser radar, a millimeter wave radar and a vehicle data sensor, the vehicle-mounted storage computing platform is internally provided with the computer storage medium as claimed in claim 9, and the vehicle actuator executes corresponding actions according to control instructions output by the vehicle-mounted storage computing platform.

The invention has the following beneficial effects: according to the invention, the sensing robustness of the millimeter wave radar in foggy days is utilized to enhance the point cloud characteristics of the laser radar, and meanwhile, the defects of large sensing error of the millimeter wave radar on height information and low data resolution are overcome by positioning and detecting a target based on the point cloud of the laser radar, and the advantages of two sensors in foggy days are fully combined. Meanwhile, multi-frame data are adopted for multi-time-space fusion, data representation can be further enhanced, and interference caused by weather conditions and target motion shielding is relieved to a certain extent, so that robust and efficient 3D target detection in foggy scenes is achieved.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a first flowchart of a 3D target detection method of multi-modal information space-time fusion in a foggy day scene according to an embodiment of the invention;

FIG. 2 is a flow chart of a second method for detecting a 3D target by means of multi-modal information space-time fusion in a foggy day scene according to the embodiment of the invention;

FIG. 3 is a schematic structural diagram of a 3D target detection system for multi-modal information space-time fusion in a foggy day scene according to an embodiment of the invention;

FIG. 4 is a block diagram of a vehicle-mounted foggy day scene target detection system implemented in accordance with the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The method is mainly used for improving the target detection performance under severe weather conditions such as foggy days and the like, and improving the reliability of a 3D target detection system for dealing with extreme weather.

As shown in fig. 1, the 3D target detection method based on multi-modal information space-time fusion in the foggy weather scene in the embodiment of the present invention mainly utilizes a point cloud space-time feature fusion detection mechanism of a laser radar and a millimeter wave radar, and mainly includes the following steps:

s1, spatial point cloud data are obtained and preprocessed respectively, wherein the spatial point cloud data comprise laser radar point cloud data and millimeter wave radar point cloud. The laser radar point cloud data volume is large, the laser radar point cloud data volume can be extracted into voxel, the millimeter wave radar point cloud is too sparse, and a pointNet form can be adopted for preprocessing.

S2, space-time feature matching and self-adaptive sampling: and performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and millimeter wave radar point cloud. Under a foggy scene, because the millimeter wave radar point cloud is less interfered, the millimeter wave radar point cloud characteristics are resampled by the millimeter wave point cloud characteristics, and then the laser radar point cloud characteristics are further fused in time sequence to obtain the space-time fusion characteristics under a preliminary bird-eye view angle.

S3, space-time characteristic coding: and inputting the matched and resampled fusion features into a transform encoder, performing feature encoding by adopting a self-attention mode, improving the global dependency of the features, and finally outputting to obtain the space-time semantic features with the same size as the original feature graph but higher dimension.

S4, target classification and frame regression: after the higher-dimensional space-time semantic features under the bird's-eye view are obtained, the space-time semantic features are respectively input into two branches of a target classification detection head and a frame regression detection head, and the final target detection result, namely the object type and the position in the space, is output.

A specific implementation flow of the foggy day scene target detection method according to another embodiment of the present invention is shown in fig. 2, and may specifically include the following steps:

s100, acquiring spatial point cloud data and respectively preprocessing, wherein the method specifically comprises the following steps:

s110, grid division: the detection range needs to be set according to a specific scene and the installation position of the sensor, and in the embodiment of the invention, the ranges of [ 50m,50m ], [ -40m,40m ], [ -3m,5m ] are respectively used as the sensing ranges in the X direction, the Y direction and the Z direction in a laser radar coordinate system. Dividing the point cloud in the range into networks with equal size according to the resolution of 0.25m multiplied by 8 m; and performing preliminary feature extraction on the point cloud obtained by the millimeter wave radar in a PointNet mode.

S120, point cloud grouping and feature aggregation: grouping the laser radar point clouds according to the network divided in the S110; and performing feature aggregation on the grouped point clouds according to grids to obtain voxel features. In order to balance the calculated quantity and the robustness of the features, the voxel features are generated in a mode of combining average pooling and maximum pooling.

S200, space-time feature matching and self-adaptive sampling: and performing space-time feature matching on the processed multi-frame laser radar point cloud voxel features and the millimeter wave radar point cloud. In a foggy day scene, because the millimeter wave radar point cloud is less interfered, the millimeter wave radar point cloud characteristics are resampled by the millimeter wave point cloud characteristics, and then the laser radar point cloud characteristics are further fused in a time sequence to obtain preliminary space-time fusion characteristics under a bird's-eye view angle. The step S200 specifically includes:

s210, converting the unified coordinate system and the aerial view: and converting the millimeter wave radar point cloud into a laser radar coordinate system to be matched with the voxel of the millimeter wave radar point cloud. All voxels and point cloud feature transformation spatial positions are then projected onto the aerial view.

S220, characteristic resampling and enhancement: using the millimeter wave radar point cloud as a center, searching voxels generated by a laser radar in a certain range by adopting KNN, and performing multi-scale random sampling according to the statistical relationship between the effective voxels of the laser radar and the millimeter wave radar point cloud data; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features. The statistical relationship means that the laser radar data and the millimeter wave radar data have a certain proportional relationship under a certain fog concentration. Multiscale means that KNN can be searched according to multiple ranges, sampling different amounts of data at different ranges.

And S230, processing the multi-frame point clouds and the voxels according to the above mode, and arranging the point clouds and the voxels according to a time stamp sequence to obtain the space-time fusion characteristics.

S300, space-time characteristic coding: and inputting the matched and resampled space-time fusion characteristics into a Transformer encoder, performing characteristic encoding by adopting a self-attention mode, improving the global dependency of the characteristics, and finally outputting to obtain space-time semantic characteristics with the same size as the original characteristic diagram but higher dimension. Wherein S300 specifically comprises:

s310, space-time window division and feature coding: specifically, 2.5m × 2.5m is taken as the size of the spatio-temporal window, and all voxel features at different times but at the same window position in S230 are taken as all elements in the spatio-temporal window. And then, inputting the elements in the same time-space window into a self-attention-based transform encoder, and through the global dependency of a self-attention mechanism, on one hand, learning all the geometric features and position features of the environment and the target at the current space position can be realized, on the other hand, the time sequence dependency of the same target in a period of time can be modeled, and the historical frame information is fully utilized to enhance the feature expression, so that the foggy day interference resistance is realized.

S320, space-time characteristic batch processing: the divided 40 multiplied by 40 space-time windows are uniformly used as a batch and input into a self-attention-based transform coder to output high-dimensional semantic space-time characteristics by adopting the same characteristic coding mode, so that the calculated amount can be reduced, and the global dependency in the characteristic extraction process can be improved.

S330, characteristic re-rasterization: the high-dimensional semantic features output in the step S320 are a series of disordered element features, but the difference between the high-dimensional semantic features and the voxel representation form is not large, so that the high-dimensional semantic features can be remapped and expressed as a feature map which is regularly rasterized under a bird' S-eye view angle by means of the coordinates of the voxels.

S400, target classification and frame regression: after the high-dimensional semantic features under the bird's-eye view angle are obtained, the high-dimensional semantic features are respectively input into two branches of a target classification detection head and a frame regression detection head, and the final target detection result, namely the object type and the position in space, is output. Which comprises the following steps:

s410, setting a regression reference frame: and arranging reference frames with the orientations of 0 degrees and 90 degrees respectively at each position of the characteristic diagram obtained in the step S330, wherein the size of each reference frame is obtained according to the average value of the labeled data of a certain category in the data set, so that the difficulty of network learning is reduced.

S420, target classification and frame regression: and (5) inputting the feature graph obtained in the step (S330) into a full connection layer of the classification branch regression and the frame regression respectively to obtain a network object class score and a predicted frame. And filtering the detection boxes with low input threshold values based on the scores to obtain high-quality detection boxes.

The 3D target detection system of the embodiment of the present invention with multi-modal information space-time fusion in a foggy day scene, as shown in fig. 3, is mainly used to implement the method of the above embodiment, and the system includes:

the space-time feature matching module is used for performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and then further fusing in time sequence to obtain preliminary space-time fusion features under the bird's-eye view angle;

and the classification module is used for respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head and outputting the final target detection result comprising the object class and the position in space.

The present application also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor performs a corresponding function. When being executed by a processor, the computer-readable storage medium of the embodiment realizes the 3D target detection method of the multi-modal information space-time fusion in the foggy scene.

Based on the target detection method for fusion of the space-time characteristics of the foggy day scene, the invention further constructs a vehicle-mounted foggy day scene target detection system, and the system architecture diagram is shown in fig. 4, and the system architecture diagram comprises sensors (used for data acquisition, including laser radar, millimeter wave radar, vehicle data sensors and the like, and certainly also including cameras) and a vehicle-mounted storage computing platform (a memory, a positioning perception computing platform) and the like. The sensor communicates with the vehicle-mounted storage computing platform through a data transmission interface (Ethernet, USB and CAN), and the execution flow of the system is as follows:

(1) The foggy day scene space-time characteristic fusion target detection algorithm provided by the invention is converted into instruction codes and deployed in a vehicle-mounted computing platform memory.

(2) And (3) configuring the drivers of the laser radar and the millimeter wave radar sensor to realize the analysis and the forwarding of the sensor data, wherein the form of the forwarded data is matched with the instruction code in the step (1).

(3) And (3) calculating the analyzed and forwarded data on the basis of the instruction codes in the step (1) on a perception positioning calculation platform, obtaining a detection result and sending the detection result to a memory, reading the real-time detection result from the memory by a planning control platform, and finishing downstream tasks on the planning control calculation platform according to the positioning perception result obtained by other algorithms.

(4) And the vehicle actuator performs action execution according to the corresponding control command of the downstream task.

In conclusion, the method utilizes the sensing robustness of the millimeter wave radar in the foggy weather to enhance the point cloud characteristics of the laser radar, and simultaneously, the method still carries out positioning and detection on the target based on the point cloud of the laser radar, overcomes the defects of large sensing error of the millimeter wave radar on height information and low data resolution, and fully combines the advantages of two sensors in the foggy weather scene. Meanwhile, multi-frame data are adopted for multi-time-space fusion, data representation can be further enhanced, interference caused by weather conditions and target motion is relieved to a certain extent, and therefore robust and efficient 3D target detection in foggy weather scenes is achieved.

It will be appreciated that modifications and variations are possible to those skilled in the art in light of the above teachings, and it is intended to cover all such modifications and variations as fall within the scope of the appended claims.

Claims

1. A3D target detection method based on multi-mode information space-time fusion in a foggy day scene is characterized by comprising the following steps:

and S4, respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head, and outputting the final target detection result including the object type and the position in space.

2. The method for detecting the 3D target through the multi-modal information space-time fusion in the foggy day scene according to claim 1, wherein the laser radar point cloud data are extracted into voxel points, and the millimeter wave radar point cloud is preprocessed in a PointNet mode.

3. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene according to claim 1, wherein in step S2, the point cloud of the millimeter wave radar is converted into a laser radar coordinate system to be matched with voxels of the laser radar, and then all the voxels and the spatial position of the point cloud feature conversion are projected onto a bird' S-eye view.

4. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene according to claim 1, wherein in the step S2, voxels generated by a laser radar in a certain range are searched by adopting KNN (K-nearest neighbor) with a millimeter wave radar point cloud as a center, and random sampling is performed; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features.

5. The method for detecting the 3D target through the spatio-temporal fusion of the multimodal information in the foggy day scene according to claim 1, wherein the step S3 specifically comprises the following steps:

taking 2.5m multiplied by 2.5m as the size of a space-time window, and taking all voxel characteristics at different moments but at the same window position as all elements in the space-time window;

6. The method for detecting the 3D target through the multi-modal information space-time fusion in the foggy day scene according to claim 1, wherein the step S4 specifically comprises the following steps:

and respectively inputting the space-time semantic feature map with the arranged reference frame into a full-connection layer of two branches of the target classification detection head and the frame regression detection head to obtain a network object classification score and a prediction frame, and filtering the detection frame with a low input threshold value based on the score to obtain a high-quality detection frame.

7. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene as claimed in claim 6, wherein the size of the reference frame is obtained according to the average value of the labeled data of a certain category in the data set so as to reduce the difficulty of the network learning.

8. A3D target detection system for multi-modal information space-time fusion in a foggy day scene is characterized by comprising:

the feature coding module is used for inputting the space-time fusion features under the bird's-eye view angle to a self-attention-based transform encoder, and performing feature coding in a self-attention mode to obtain space-time semantic features which have the same size as the original feature map but have higher dimension;

9. A computer storage medium having stored therein a computer program executable by a processor, the computer program executing the method for 3D object detection by spatiotemporal fusion of multimodal information in foggy day scenes according to any one of claims 1 to 7.

10. A vehicle-mounted foggy day scene target detection system is characterized by comprising a data collector, a vehicle-mounted storage computing platform and a vehicle actuator, wherein the data collector comprises a laser radar, a millimeter wave radar and a vehicle data sensor, a computer storage medium according to claim 9 is arranged in the vehicle-mounted storage computing platform, and the vehicle actuator executes corresponding actions according to a control instruction output by the vehicle-mounted storage computing platform.