CN115393680A - 3D target detection method and system for multi-mode information space-time fusion in foggy day scene - Google Patents
3D target detection method and system for multi-mode information space-time fusion in foggy day scene Download PDFInfo
- Publication number
- CN115393680A CN115393680A CN202210945302.0A CN202210945302A CN115393680A CN 115393680 A CN115393680 A CN 115393680A CN 202210945302 A CN202210945302 A CN 202210945302A CN 115393680 A CN115393680 A CN 115393680A
- Authority
- CN
- China
- Prior art keywords
- space
- point cloud
- time
- millimeter wave
- laser radar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 74
- 230000004927 fusion Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000012952 Resampling Methods 0.000 claims abstract description 7
- 240000004050 Pentaglottis sempervirens Species 0.000 claims description 17
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 claims description 17
- 238000010586 diagram Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 230000000875 corresponding effect Effects 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 241001300198 Caperonia palustris Species 0.000 abstract description 2
- 235000000384 Veronica chamaedrys Nutrition 0.000 abstract description 2
- 230000008447 perception Effects 0.000 description 5
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Optical Radar Systems And Details Thereof (AREA)
Abstract
The invention discloses a 3D target detection method for multi-modal information space-time fusion in a foggy day scene, which comprises the following steps: acquiring laser radar point cloud and millimeter wave radar point cloud data in a foggy day scene, preprocessing, performing space-time feature matching, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, fusing the resampled point clouds in a time sequence to obtain preliminary space-time fusion features under a bird-eye view angle, inputting the preliminary space-time fusion features into a self-attention-based transform encoder, and performing feature coding by adopting a self-attention mode to obtain higher-dimensional space-time semantic features; inputting the target classification detection head and the frame regression detection head, and outputting the final target detection result including the object category and the position in space. The method effectively fuses the laser radar point cloud and the millimeter wave radar point cloud, and complements the data representation advantages, thereby realizing robust and efficient 3D target detection in a foggy scene.
Description
Technical Field
The invention relates to the field of environmental perception in automatic driving, in particular to a 3D target detection method and system for multi-mode information space-time fusion in a foggy day scene.
Background
In recent years, the landing of high-level automatic driving becomes a serious difficulty to be overcome in the field of automatic driving. 3D target detection is taken as a key research direction in the field of automatic driving, and the key challenge is to realize all-weather multi-scene target detection, namely, objects around can be accurately identified under any weather condition. Nowadays, the target detection of the automatic driving automobile is mostly completed by adopting a multi-sensor fusion configuration scheme, such as a camera, a laser radar, a millimeter wave radar and the like. The approach of fusing multiple sensors can overcome the problem of system failure due to occasional failure of a single sensor and produce more accurate target detection results than if only a single sensor were used.
The existing multi-sensor fusion target detection method mainly completes a perception task based on a laser radar, a camera and the like, can generate a point cloud with fine granularity or an image with high resolution under a good weather condition and provides rich and redundant visual information. However, these visual sensors are sensitive to weather conditions, and in bad weather (such as fog), opaque particles distort the light, significantly reducing the sensing range of the lidar and camera, and thus causing the detection result to become unreliable.
In the millimeter wave radar, in addition to being inexpensive and widely used compared to the laser radar and the camera, the millimeter wave radar uses a millimeter wave signal having a wavelength much larger than particles of fog, rain, snow, and the like, and thus easily penetrates or diffracts around them. Therefore, the influence of the millimeter wave radar data on rain and fog weather is small, and a robust 3D target detection task in a foggy scene can be completed by adopting a mode of fusing the millimeter wave radar and other sensors.
Disclosure of Invention
The invention mainly aims to relieve the interference caused by weather conditions and target motion shielding and realize robust and efficient 3D target detection in foggy scenes.
The technical scheme adopted by the invention is as follows:
the 3D target detection method for multi-modal information space-time fusion in the foggy day scene comprises the following steps:
s1, laser radar point cloud data and millimeter wave radar point cloud data in a foggy day scene are obtained and are respectively preprocessed;
s2, performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the preprocessed millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and further performing fusion on a time sequence to obtain preliminary space-time fusion features under a bird' S-eye view angle;
s3, inputting the space-time fusion characteristics under the bird' S-eye view angle into a self-attention-based transform encoder, and performing characteristic encoding in a self-attention mode to obtain space-time semantic characteristics which are the same as the original characteristic diagram in size but higher in dimension;
and S4, respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head, and outputting the final target detection result comprising the object type and the position in the space.
According to the technical scheme, the point cloud data of the laser radar is extracted into voxel, and the point cloud of the millimeter wave radar is preprocessed in a PointNet mode.
According to the technical scheme, in the step S2, the millimeter wave radar point cloud is converted to the laser radar coordinate system to be matched with the voxels, and then all the voxels and the point cloud feature conversion space positions are projected to the aerial view.
According to the technical scheme, in the step S2, particularly, a millimeter wave radar point cloud is used as a center, voxels generated by a laser radar in a certain range are searched by adopting KNN, and random sampling is carried out; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features.
In connection with the above technical solution, step S3 specifically includes the following steps:
taking 2.5m multiplied by 2.5m as the size of a space-time window, and taking all voxel characteristics at different moments but at the same position as all elements in the space-time window;
uniformly inputting the divided 40 multiplied by 40 space-time windows into a self-attention-based transform coder as a batch for feature coding to output high-dimensional semantic space-time features;
and (4) remapping and expressing the high-dimensional semantic features into a regularly-rasterized feature map under the bird's-eye view angle by means of the coordinates of the voxels.
In connection with the above technical solution, step S4 specifically includes the following steps:
arranging reference frames with the orientation of 0 degrees and 90 degrees respectively at each position of a high-dimensional space-time semantic feature map;
and respectively inputting the space-time semantic feature map with the arranged reference frame into a full-connection layer of two branches of a target classification detection head and a frame regression detection head to obtain a network object classification score and a predicted frame, and filtering the detection frame with a low input threshold value based on the score to obtain a high-quality detection frame.
According to the technical scheme, the size of the reference frame is obtained according to the average value of the labeled data of a certain category in the data set, so that the difficulty of network learning is reduced.
The invention also provides a 3D target detection method for multi-modal information space-time fusion in a foggy day scene, which comprises the following steps:
the preprocessing module is used for acquiring laser radar point cloud data and millimeter wave radar point cloud data in a foggy weather scene and respectively preprocessing the laser radar point cloud data and the millimeter wave radar point cloud data;
the space-time feature matching module is used for performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and then further performing fusion on a time sequence to obtain a preliminary space-time fusion feature under a bird's-eye view angle;
the feature coding module is used for inputting the space-time fusion features under the bird's-eye view angle to a self-attention-based transform encoder, and performing feature coding in a self-attention mode to obtain space-time semantic features which are the same as the original feature map in size and higher in dimension;
and the classification module is used for respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head and outputting the final target detection result comprising the object class and the position in the space.
The invention also provides a computer storage medium, which stores a computer program capable of being executed by a processor, and the computer program executes the 3D target detection method for the multi-mode information space-time fusion in the foggy day scene.
The invention also provides a vehicle-mounted foggy day scene target detection system which comprises a data collector, a vehicle-mounted storage computing platform and a vehicle actuator, wherein the data collector comprises a laser radar, a millimeter wave radar and a vehicle data sensor, the vehicle-mounted storage computing platform is internally provided with the computer storage medium as claimed in claim 9, and the vehicle actuator executes corresponding actions according to control instructions output by the vehicle-mounted storage computing platform.
The invention has the following beneficial effects: according to the invention, the sensing robustness of the millimeter wave radar in foggy days is utilized to enhance the point cloud characteristics of the laser radar, and meanwhile, the defects of large sensing error of the millimeter wave radar on height information and low data resolution are overcome by positioning and detecting a target based on the point cloud of the laser radar, and the advantages of two sensors in foggy days are fully combined. Meanwhile, multi-frame data are adopted for multi-time-space fusion, data representation can be further enhanced, and interference caused by weather conditions and target motion shielding is relieved to a certain extent, so that robust and efficient 3D target detection in foggy scenes is achieved.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a first flowchart of a 3D target detection method of multi-modal information space-time fusion in a foggy day scene according to an embodiment of the invention;
FIG. 2 is a flow chart of a second method for detecting a 3D target by means of multi-modal information space-time fusion in a foggy day scene according to the embodiment of the invention;
FIG. 3 is a schematic structural diagram of a 3D target detection system for multi-modal information space-time fusion in a foggy day scene according to an embodiment of the invention;
FIG. 4 is a block diagram of a vehicle-mounted foggy day scene target detection system implemented in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The method is mainly used for improving the target detection performance under severe weather conditions such as foggy days and the like, and improving the reliability of a 3D target detection system for dealing with extreme weather.
As shown in fig. 1, the 3D target detection method based on multi-modal information space-time fusion in the foggy weather scene in the embodiment of the present invention mainly utilizes a point cloud space-time feature fusion detection mechanism of a laser radar and a millimeter wave radar, and mainly includes the following steps:
s1, spatial point cloud data are obtained and preprocessed respectively, wherein the spatial point cloud data comprise laser radar point cloud data and millimeter wave radar point cloud. The laser radar point cloud data volume is large, the laser radar point cloud data volume can be extracted into voxel, the millimeter wave radar point cloud is too sparse, and a pointNet form can be adopted for preprocessing.
S2, space-time feature matching and self-adaptive sampling: and performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and millimeter wave radar point cloud. Under a foggy scene, because the millimeter wave radar point cloud is less interfered, the millimeter wave radar point cloud characteristics are resampled by the millimeter wave point cloud characteristics, and then the laser radar point cloud characteristics are further fused in time sequence to obtain the space-time fusion characteristics under a preliminary bird-eye view angle.
S3, space-time characteristic coding: and inputting the matched and resampled fusion features into a transform encoder, performing feature encoding by adopting a self-attention mode, improving the global dependency of the features, and finally outputting to obtain the space-time semantic features with the same size as the original feature graph but higher dimension.
S4, target classification and frame regression: after the higher-dimensional space-time semantic features under the bird's-eye view are obtained, the space-time semantic features are respectively input into two branches of a target classification detection head and a frame regression detection head, and the final target detection result, namely the object type and the position in the space, is output.
A specific implementation flow of the foggy day scene target detection method according to another embodiment of the present invention is shown in fig. 2, and may specifically include the following steps:
s100, acquiring spatial point cloud data and respectively preprocessing, wherein the method specifically comprises the following steps:
s110, grid division: the detection range needs to be set according to a specific scene and the installation position of the sensor, and in the embodiment of the invention, the ranges of [ 50m,50m ], [ -40m,40m ], [ -3m,5m ] are respectively used as the sensing ranges in the X direction, the Y direction and the Z direction in a laser radar coordinate system. Dividing the point cloud in the range into networks with equal size according to the resolution of 0.25m multiplied by 8 m; and performing preliminary feature extraction on the point cloud obtained by the millimeter wave radar in a PointNet mode.
S120, point cloud grouping and feature aggregation: grouping the laser radar point clouds according to the network divided in the S110; and performing feature aggregation on the grouped point clouds according to grids to obtain voxel features. In order to balance the calculated quantity and the robustness of the features, the voxel features are generated in a mode of combining average pooling and maximum pooling.
S200, space-time feature matching and self-adaptive sampling: and performing space-time feature matching on the processed multi-frame laser radar point cloud voxel features and the millimeter wave radar point cloud. In a foggy day scene, because the millimeter wave radar point cloud is less interfered, the millimeter wave radar point cloud characteristics are resampled by the millimeter wave point cloud characteristics, and then the laser radar point cloud characteristics are further fused in a time sequence to obtain preliminary space-time fusion characteristics under a bird's-eye view angle. The step S200 specifically includes:
s210, converting the unified coordinate system and the aerial view: and converting the millimeter wave radar point cloud into a laser radar coordinate system to be matched with the voxel of the millimeter wave radar point cloud. All voxels and point cloud feature transformation spatial positions are then projected onto the aerial view.
S220, characteristic resampling and enhancement: using the millimeter wave radar point cloud as a center, searching voxels generated by a laser radar in a certain range by adopting KNN, and performing multi-scale random sampling according to the statistical relationship between the effective voxels of the laser radar and the millimeter wave radar point cloud data; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features. The statistical relationship means that the laser radar data and the millimeter wave radar data have a certain proportional relationship under a certain fog concentration. Multiscale means that KNN can be searched according to multiple ranges, sampling different amounts of data at different ranges.
And S230, processing the multi-frame point clouds and the voxels according to the above mode, and arranging the point clouds and the voxels according to a time stamp sequence to obtain the space-time fusion characteristics.
S300, space-time characteristic coding: and inputting the matched and resampled space-time fusion characteristics into a Transformer encoder, performing characteristic encoding by adopting a self-attention mode, improving the global dependency of the characteristics, and finally outputting to obtain space-time semantic characteristics with the same size as the original characteristic diagram but higher dimension. Wherein S300 specifically comprises:
s310, space-time window division and feature coding: specifically, 2.5m × 2.5m is taken as the size of the spatio-temporal window, and all voxel features at different times but at the same window position in S230 are taken as all elements in the spatio-temporal window. And then, inputting the elements in the same time-space window into a self-attention-based transform encoder, and through the global dependency of a self-attention mechanism, on one hand, learning all the geometric features and position features of the environment and the target at the current space position can be realized, on the other hand, the time sequence dependency of the same target in a period of time can be modeled, and the historical frame information is fully utilized to enhance the feature expression, so that the foggy day interference resistance is realized.
S320, space-time characteristic batch processing: the divided 40 multiplied by 40 space-time windows are uniformly used as a batch and input into a self-attention-based transform coder to output high-dimensional semantic space-time characteristics by adopting the same characteristic coding mode, so that the calculated amount can be reduced, and the global dependency in the characteristic extraction process can be improved.
S330, characteristic re-rasterization: the high-dimensional semantic features output in the step S320 are a series of disordered element features, but the difference between the high-dimensional semantic features and the voxel representation form is not large, so that the high-dimensional semantic features can be remapped and expressed as a feature map which is regularly rasterized under a bird' S-eye view angle by means of the coordinates of the voxels.
S400, target classification and frame regression: after the high-dimensional semantic features under the bird's-eye view angle are obtained, the high-dimensional semantic features are respectively input into two branches of a target classification detection head and a frame regression detection head, and the final target detection result, namely the object type and the position in space, is output. Which comprises the following steps:
s410, setting a regression reference frame: and arranging reference frames with the orientations of 0 degrees and 90 degrees respectively at each position of the characteristic diagram obtained in the step S330, wherein the size of each reference frame is obtained according to the average value of the labeled data of a certain category in the data set, so that the difficulty of network learning is reduced.
S420, target classification and frame regression: and (5) inputting the feature graph obtained in the step (S330) into a full connection layer of the classification branch regression and the frame regression respectively to obtain a network object class score and a predicted frame. And filtering the detection boxes with low input threshold values based on the scores to obtain high-quality detection boxes.
The 3D target detection system of the embodiment of the present invention with multi-modal information space-time fusion in a foggy day scene, as shown in fig. 3, is mainly used to implement the method of the above embodiment, and the system includes:
the preprocessing module is used for acquiring laser radar point cloud data and millimeter wave radar point cloud data in a foggy weather scene and respectively preprocessing the laser radar point cloud data and the millimeter wave radar point cloud data;
the space-time feature matching module is used for performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and then further fusing in time sequence to obtain preliminary space-time fusion features under the bird's-eye view angle;
the feature coding module is used for inputting the space-time fusion features under the bird's-eye view angle to a self-attention-based transform encoder, and performing feature coding in a self-attention mode to obtain space-time semantic features which are the same as the original feature map in size and higher in dimension;
and the classification module is used for respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head and outputting the final target detection result comprising the object class and the position in space.
The present application also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor performs a corresponding function. When being executed by a processor, the computer-readable storage medium of the embodiment realizes the 3D target detection method of the multi-modal information space-time fusion in the foggy scene.
Based on the target detection method for fusion of the space-time characteristics of the foggy day scene, the invention further constructs a vehicle-mounted foggy day scene target detection system, and the system architecture diagram is shown in fig. 4, and the system architecture diagram comprises sensors (used for data acquisition, including laser radar, millimeter wave radar, vehicle data sensors and the like, and certainly also including cameras) and a vehicle-mounted storage computing platform (a memory, a positioning perception computing platform) and the like. The sensor communicates with the vehicle-mounted storage computing platform through a data transmission interface (Ethernet, USB and CAN), and the execution flow of the system is as follows:
(1) The foggy day scene space-time characteristic fusion target detection algorithm provided by the invention is converted into instruction codes and deployed in a vehicle-mounted computing platform memory.
(2) And (3) configuring the drivers of the laser radar and the millimeter wave radar sensor to realize the analysis and the forwarding of the sensor data, wherein the form of the forwarded data is matched with the instruction code in the step (1).
(3) And (3) calculating the analyzed and forwarded data on the basis of the instruction codes in the step (1) on a perception positioning calculation platform, obtaining a detection result and sending the detection result to a memory, reading the real-time detection result from the memory by a planning control platform, and finishing downstream tasks on the planning control calculation platform according to the positioning perception result obtained by other algorithms.
(4) And the vehicle actuator performs action execution according to the corresponding control command of the downstream task.
In conclusion, the method utilizes the sensing robustness of the millimeter wave radar in the foggy weather to enhance the point cloud characteristics of the laser radar, and simultaneously, the method still carries out positioning and detection on the target based on the point cloud of the laser radar, overcomes the defects of large sensing error of the millimeter wave radar on height information and low data resolution, and fully combines the advantages of two sensors in the foggy weather scene. Meanwhile, multi-frame data are adopted for multi-time-space fusion, data representation can be further enhanced, interference caused by weather conditions and target motion is relieved to a certain extent, and therefore robust and efficient 3D target detection in foggy weather scenes is achieved.
It will be appreciated that modifications and variations are possible to those skilled in the art in light of the above teachings, and it is intended to cover all such modifications and variations as fall within the scope of the appended claims.
Claims (10)
1. A3D target detection method based on multi-mode information space-time fusion in a foggy day scene is characterized by comprising the following steps:
s1, laser radar point cloud data and millimeter wave radar point cloud data in a foggy day scene are obtained and are respectively preprocessed;
s2, performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the preprocessed millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and further performing fusion on a time sequence to obtain preliminary space-time fusion features under a bird' S-eye view angle;
s3, inputting the space-time fusion characteristics under the bird' S-eye view angle into a self-attention-based transform encoder, and performing characteristic encoding in a self-attention mode to obtain space-time semantic characteristics which are the same as the original characteristic diagram in size but higher in dimension;
and S4, respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head, and outputting the final target detection result including the object type and the position in space.
2. The method for detecting the 3D target through the multi-modal information space-time fusion in the foggy day scene according to claim 1, wherein the laser radar point cloud data are extracted into voxel points, and the millimeter wave radar point cloud is preprocessed in a PointNet mode.
3. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene according to claim 1, wherein in step S2, the point cloud of the millimeter wave radar is converted into a laser radar coordinate system to be matched with voxels of the laser radar, and then all the voxels and the spatial position of the point cloud feature conversion are projected onto a bird' S-eye view.
4. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene according to claim 1, wherein in the step S2, voxels generated by a laser radar in a certain range are searched by adopting KNN (K-nearest neighbor) with a millimeter wave radar point cloud as a center, and random sampling is performed; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features.
5. The method for detecting the 3D target through the spatio-temporal fusion of the multimodal information in the foggy day scene according to claim 1, wherein the step S3 specifically comprises the following steps:
taking 2.5m multiplied by 2.5m as the size of a space-time window, and taking all voxel characteristics at different moments but at the same window position as all elements in the space-time window;
uniformly inputting the divided 40 multiplied by 40 space-time windows into a self-attention-based transform coder as a batch for feature coding to output high-dimensional semantic space-time features;
and (4) remapping and expressing the high-dimensional semantic features into a regularly-rasterized feature map under the bird's-eye view angle by means of the coordinates of the voxels.
6. The method for detecting the 3D target through the multi-modal information space-time fusion in the foggy day scene according to claim 1, wherein the step S4 specifically comprises the following steps:
arranging reference frames with the orientation of 0 degrees and 90 degrees respectively at each position of a high-dimensional space-time semantic feature map;
and respectively inputting the space-time semantic feature map with the arranged reference frame into a full-connection layer of two branches of the target classification detection head and the frame regression detection head to obtain a network object classification score and a prediction frame, and filtering the detection frame with a low input threshold value based on the score to obtain a high-quality detection frame.
7. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene as claimed in claim 6, wherein the size of the reference frame is obtained according to the average value of the labeled data of a certain category in the data set so as to reduce the difficulty of the network learning.
8. A3D target detection system for multi-modal information space-time fusion in a foggy day scene is characterized by comprising:
the preprocessing module is used for acquiring laser radar point cloud data and millimeter wave radar point cloud data in a foggy weather scene and respectively preprocessing the laser radar point cloud data and the millimeter wave radar point cloud data;
the space-time feature matching module is used for performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and then further fusing in time sequence to obtain preliminary space-time fusion features under the bird's-eye view angle;
the feature coding module is used for inputting the space-time fusion features under the bird's-eye view angle to a self-attention-based transform encoder, and performing feature coding in a self-attention mode to obtain space-time semantic features which have the same size as the original feature map but have higher dimension;
and the classification module is used for respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head and outputting the final target detection result comprising the object class and the position in the space.
9. A computer storage medium having stored therein a computer program executable by a processor, the computer program executing the method for 3D object detection by spatiotemporal fusion of multimodal information in foggy day scenes according to any one of claims 1 to 7.
10. A vehicle-mounted foggy day scene target detection system is characterized by comprising a data collector, a vehicle-mounted storage computing platform and a vehicle actuator, wherein the data collector comprises a laser radar, a millimeter wave radar and a vehicle data sensor, a computer storage medium according to claim 9 is arranged in the vehicle-mounted storage computing platform, and the vehicle actuator executes corresponding actions according to a control instruction output by the vehicle-mounted storage computing platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210945302.0A CN115393680B (en) | 2022-08-08 | 2022-08-08 | 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210945302.0A CN115393680B (en) | 2022-08-08 | 2022-08-08 | 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115393680A true CN115393680A (en) | 2022-11-25 |
CN115393680B CN115393680B (en) | 2023-06-06 |
Family
ID=84118249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210945302.0A Active CN115393680B (en) | 2022-08-08 | 2022-08-08 | 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115393680B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115965961A (en) * | 2023-02-23 | 2023-04-14 | 上海人工智能创新中心 | Local-to-global multi-modal fusion method, system, device and storage medium |
CN116363615A (en) * | 2023-03-27 | 2023-06-30 | 小米汽车科技有限公司 | Data fusion method, device, vehicle and storage medium |
CN116467848A (en) * | 2023-03-21 | 2023-07-21 | 之江实验室 | Millimeter wave radar point cloud simulation method and device |
CN117576150A (en) * | 2023-11-03 | 2024-02-20 | 扬州万方科技股份有限公司 | Multi-mode multi-target 3D tracking method and device considering far-frame dependency relationship |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158763A (en) * | 2021-02-23 | 2021-07-23 | 清华大学 | Three-dimensional target detection method based on multi-view feature fusion of 4D millimeter waves and laser point clouds |
CN113506372A (en) * | 2021-07-26 | 2021-10-15 | 西北工业大学 | Environment reconstruction method and device |
WO2022000857A1 (en) * | 2020-06-30 | 2022-01-06 | 广东小鹏汽车科技有限公司 | Dataset establishment method, vehicle, and storage medium |
CN114708585A (en) * | 2022-04-15 | 2022-07-05 | 电子科技大学 | Three-dimensional target detection method based on attention mechanism and integrating millimeter wave radar with vision |
CN114763997A (en) * | 2022-04-14 | 2022-07-19 | 中国第一汽车股份有限公司 | Method and device for processing radar point cloud data acquired by vehicle and electronic equipment |
CN114814826A (en) * | 2022-04-08 | 2022-07-29 | 苏州大学 | Radar rail-mounted area environment sensing method based on target grid |
-
2022
- 2022-08-08 CN CN202210945302.0A patent/CN115393680B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022000857A1 (en) * | 2020-06-30 | 2022-01-06 | 广东小鹏汽车科技有限公司 | Dataset establishment method, vehicle, and storage medium |
CN113158763A (en) * | 2021-02-23 | 2021-07-23 | 清华大学 | Three-dimensional target detection method based on multi-view feature fusion of 4D millimeter waves and laser point clouds |
CN113506372A (en) * | 2021-07-26 | 2021-10-15 | 西北工业大学 | Environment reconstruction method and device |
CN114814826A (en) * | 2022-04-08 | 2022-07-29 | 苏州大学 | Radar rail-mounted area environment sensing method based on target grid |
CN114763997A (en) * | 2022-04-14 | 2022-07-19 | 中国第一汽车股份有限公司 | Method and device for processing radar point cloud data acquired by vehicle and electronic equipment |
CN114708585A (en) * | 2022-04-15 | 2022-07-05 | 电子科技大学 | Three-dimensional target detection method based on attention mechanism and integrating millimeter wave radar with vision |
Non-Patent Citations (1)
Title |
---|
李朝;兰海;魏宪: "基于注意力的毫米波-激光雷达融合目标检测", 计算机应用 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115965961A (en) * | 2023-02-23 | 2023-04-14 | 上海人工智能创新中心 | Local-to-global multi-modal fusion method, system, device and storage medium |
CN115965961B (en) * | 2023-02-23 | 2024-04-05 | 上海人工智能创新中心 | Local-global multi-mode fusion method, system, equipment and storage medium |
CN116467848A (en) * | 2023-03-21 | 2023-07-21 | 之江实验室 | Millimeter wave radar point cloud simulation method and device |
CN116467848B (en) * | 2023-03-21 | 2023-11-03 | 之江实验室 | Millimeter wave radar point cloud simulation method and device |
CN116363615A (en) * | 2023-03-27 | 2023-06-30 | 小米汽车科技有限公司 | Data fusion method, device, vehicle and storage medium |
CN116363615B (en) * | 2023-03-27 | 2024-02-23 | 小米汽车科技有限公司 | Data fusion method, device, vehicle and storage medium |
CN117576150A (en) * | 2023-11-03 | 2024-02-20 | 扬州万方科技股份有限公司 | Multi-mode multi-target 3D tracking method and device considering far-frame dependency relationship |
Also Published As
Publication number | Publication date |
---|---|
CN115393680B (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111027401B (en) | End-to-end target detection method with integration of camera and laser radar | |
CN109635685B (en) | Target object 3D detection method, device, medium and equipment | |
CN115393680B (en) | 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene | |
JP7033373B2 (en) | Target detection method and device, smart operation method, device and storage medium | |
CN113673425B (en) | Multi-view target detection method and system based on Transformer | |
CN113761999A (en) | Target detection method and device, electronic equipment and storage medium | |
JP2023549036A (en) | Efficient 3D object detection from point clouds | |
US20220269900A1 (en) | Low level sensor fusion based on lightweight semantic segmentation of 3d point clouds | |
CN116229408A (en) | Target identification method for fusing image information and laser radar point cloud information | |
CN116503803A (en) | Obstacle detection method, obstacle detection device, electronic device and storage medium | |
CN115830265A (en) | Automatic driving movement obstacle segmentation method based on laser radar | |
CN116486368A (en) | Multi-mode fusion three-dimensional target robust detection method based on automatic driving scene | |
CN116258859A (en) | Semantic segmentation method, semantic segmentation device, electronic equipment and storage medium | |
CN114283343A (en) | Map updating method, training method and equipment based on remote sensing satellite image | |
CN114241448A (en) | Method and device for obtaining heading angle of obstacle, electronic equipment and vehicle | |
US20240193788A1 (en) | Method, device, computer system for detecting pedestrian based on 3d point clouds | |
CN112529011A (en) | Target detection method and related device | |
CN114581748B (en) | Multi-agent perception fusion system based on machine learning and implementation method thereof | |
US20230105331A1 (en) | Methods and systems for semantic scene completion for sparse 3d data | |
CN115937259A (en) | Moving object detection method and device, flight equipment and storage medium | |
CN114926637A (en) | Garden map construction method based on multi-scale distance map and point cloud semantic segmentation | |
KR20230119334A (en) | 3d object detection method applying self-attention module for removing radar clutter | |
CN112766100A (en) | 3D target detection method based on key points | |
CN113222111A (en) | Automatic driving 4D perception method, system and medium suitable for all-weather environment | |
CN115082902B (en) | Vehicle target detection method based on laser radar point cloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |