CN115393680A - 3D target detection method and system for multi-mode information space-time fusion in foggy day scene - Google Patents

3D target detection method and system for multi-mode information space-time fusion in foggy day scene Download PDF

Info

Publication number
CN115393680A
CN115393680A CN202210945302.0A CN202210945302A CN115393680A CN 115393680 A CN115393680 A CN 115393680A CN 202210945302 A CN202210945302 A CN 202210945302A CN 115393680 A CN115393680 A CN 115393680A
Authority
CN
China
Prior art keywords
space
point cloud
time
millimeter wave
laser radar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210945302.0A
Other languages
Chinese (zh)
Other versions
CN115393680B (en
Inventor
尹智帅
焦钰军
刘峻恺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210945302.0A priority Critical patent/CN115393680B/en
Publication of CN115393680A publication Critical patent/CN115393680A/en
Application granted granted Critical
Publication of CN115393680B publication Critical patent/CN115393680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Optical Radar Systems And Details Thereof (AREA)

Abstract

The invention discloses a 3D target detection method for multi-modal information space-time fusion in a foggy day scene, which comprises the following steps: acquiring laser radar point cloud and millimeter wave radar point cloud data in a foggy day scene, preprocessing, performing space-time feature matching, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, fusing the resampled point clouds in a time sequence to obtain preliminary space-time fusion features under a bird-eye view angle, inputting the preliminary space-time fusion features into a self-attention-based transform encoder, and performing feature coding by adopting a self-attention mode to obtain higher-dimensional space-time semantic features; inputting the target classification detection head and the frame regression detection head, and outputting the final target detection result including the object category and the position in space. The method effectively fuses the laser radar point cloud and the millimeter wave radar point cloud, and complements the data representation advantages, thereby realizing robust and efficient 3D target detection in a foggy scene.

Description

3D target detection method and system based on multi-mode information space-time fusion in foggy day scene
Technical Field
The invention relates to the field of environmental perception in automatic driving, in particular to a 3D target detection method and system for multi-mode information space-time fusion in a foggy day scene.
Background
In recent years, the landing of high-level automatic driving becomes a serious difficulty to be overcome in the field of automatic driving. 3D target detection is taken as a key research direction in the field of automatic driving, and the key challenge is to realize all-weather multi-scene target detection, namely, objects around can be accurately identified under any weather condition. Nowadays, the target detection of the automatic driving automobile is mostly completed by adopting a multi-sensor fusion configuration scheme, such as a camera, a laser radar, a millimeter wave radar and the like. The approach of fusing multiple sensors can overcome the problem of system failure due to occasional failure of a single sensor and produce more accurate target detection results than if only a single sensor were used.
The existing multi-sensor fusion target detection method mainly completes a perception task based on a laser radar, a camera and the like, can generate a point cloud with fine granularity or an image with high resolution under a good weather condition and provides rich and redundant visual information. However, these visual sensors are sensitive to weather conditions, and in bad weather (such as fog), opaque particles distort the light, significantly reducing the sensing range of the lidar and camera, and thus causing the detection result to become unreliable.
In the millimeter wave radar, in addition to being inexpensive and widely used compared to the laser radar and the camera, the millimeter wave radar uses a millimeter wave signal having a wavelength much larger than particles of fog, rain, snow, and the like, and thus easily penetrates or diffracts around them. Therefore, the influence of the millimeter wave radar data on rain and fog weather is small, and a robust 3D target detection task in a foggy scene can be completed by adopting a mode of fusing the millimeter wave radar and other sensors.
Disclosure of Invention
The invention mainly aims to relieve the interference caused by weather conditions and target motion shielding and realize robust and efficient 3D target detection in foggy scenes.
The technical scheme adopted by the invention is as follows:
the 3D target detection method for multi-modal information space-time fusion in the foggy day scene comprises the following steps:
s1, laser radar point cloud data and millimeter wave radar point cloud data in a foggy day scene are obtained and are respectively preprocessed;
s2, performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the preprocessed millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and further performing fusion on a time sequence to obtain preliminary space-time fusion features under a bird' S-eye view angle;
s3, inputting the space-time fusion characteristics under the bird' S-eye view angle into a self-attention-based transform encoder, and performing characteristic encoding in a self-attention mode to obtain space-time semantic characteristics which are the same as the original characteristic diagram in size but higher in dimension;
and S4, respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head, and outputting the final target detection result comprising the object type and the position in the space.
According to the technical scheme, the point cloud data of the laser radar is extracted into voxel, and the point cloud of the millimeter wave radar is preprocessed in a PointNet mode.
According to the technical scheme, in the step S2, the millimeter wave radar point cloud is converted to the laser radar coordinate system to be matched with the voxels, and then all the voxels and the point cloud feature conversion space positions are projected to the aerial view.
According to the technical scheme, in the step S2, particularly, a millimeter wave radar point cloud is used as a center, voxels generated by a laser radar in a certain range are searched by adopting KNN, and random sampling is carried out; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features.
In connection with the above technical solution, step S3 specifically includes the following steps:
taking 2.5m multiplied by 2.5m as the size of a space-time window, and taking all voxel characteristics at different moments but at the same position as all elements in the space-time window;
uniformly inputting the divided 40 multiplied by 40 space-time windows into a self-attention-based transform coder as a batch for feature coding to output high-dimensional semantic space-time features;
and (4) remapping and expressing the high-dimensional semantic features into a regularly-rasterized feature map under the bird's-eye view angle by means of the coordinates of the voxels.
In connection with the above technical solution, step S4 specifically includes the following steps:
arranging reference frames with the orientation of 0 degrees and 90 degrees respectively at each position of a high-dimensional space-time semantic feature map;
and respectively inputting the space-time semantic feature map with the arranged reference frame into a full-connection layer of two branches of a target classification detection head and a frame regression detection head to obtain a network object classification score and a predicted frame, and filtering the detection frame with a low input threshold value based on the score to obtain a high-quality detection frame.
According to the technical scheme, the size of the reference frame is obtained according to the average value of the labeled data of a certain category in the data set, so that the difficulty of network learning is reduced.
The invention also provides a 3D target detection method for multi-modal information space-time fusion in a foggy day scene, which comprises the following steps:
the preprocessing module is used for acquiring laser radar point cloud data and millimeter wave radar point cloud data in a foggy weather scene and respectively preprocessing the laser radar point cloud data and the millimeter wave radar point cloud data;
the space-time feature matching module is used for performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and then further performing fusion on a time sequence to obtain a preliminary space-time fusion feature under a bird's-eye view angle;
the feature coding module is used for inputting the space-time fusion features under the bird's-eye view angle to a self-attention-based transform encoder, and performing feature coding in a self-attention mode to obtain space-time semantic features which are the same as the original feature map in size and higher in dimension;
and the classification module is used for respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head and outputting the final target detection result comprising the object class and the position in the space.
The invention also provides a computer storage medium, which stores a computer program capable of being executed by a processor, and the computer program executes the 3D target detection method for the multi-mode information space-time fusion in the foggy day scene.
The invention also provides a vehicle-mounted foggy day scene target detection system which comprises a data collector, a vehicle-mounted storage computing platform and a vehicle actuator, wherein the data collector comprises a laser radar, a millimeter wave radar and a vehicle data sensor, the vehicle-mounted storage computing platform is internally provided with the computer storage medium as claimed in claim 9, and the vehicle actuator executes corresponding actions according to control instructions output by the vehicle-mounted storage computing platform.
The invention has the following beneficial effects: according to the invention, the sensing robustness of the millimeter wave radar in foggy days is utilized to enhance the point cloud characteristics of the laser radar, and meanwhile, the defects of large sensing error of the millimeter wave radar on height information and low data resolution are overcome by positioning and detecting a target based on the point cloud of the laser radar, and the advantages of two sensors in foggy days are fully combined. Meanwhile, multi-frame data are adopted for multi-time-space fusion, data representation can be further enhanced, and interference caused by weather conditions and target motion shielding is relieved to a certain extent, so that robust and efficient 3D target detection in foggy scenes is achieved.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a first flowchart of a 3D target detection method of multi-modal information space-time fusion in a foggy day scene according to an embodiment of the invention;
FIG. 2 is a flow chart of a second method for detecting a 3D target by means of multi-modal information space-time fusion in a foggy day scene according to the embodiment of the invention;
FIG. 3 is a schematic structural diagram of a 3D target detection system for multi-modal information space-time fusion in a foggy day scene according to an embodiment of the invention;
FIG. 4 is a block diagram of a vehicle-mounted foggy day scene target detection system implemented in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The method is mainly used for improving the target detection performance under severe weather conditions such as foggy days and the like, and improving the reliability of a 3D target detection system for dealing with extreme weather.
As shown in fig. 1, the 3D target detection method based on multi-modal information space-time fusion in the foggy weather scene in the embodiment of the present invention mainly utilizes a point cloud space-time feature fusion detection mechanism of a laser radar and a millimeter wave radar, and mainly includes the following steps:
s1, spatial point cloud data are obtained and preprocessed respectively, wherein the spatial point cloud data comprise laser radar point cloud data and millimeter wave radar point cloud. The laser radar point cloud data volume is large, the laser radar point cloud data volume can be extracted into voxel, the millimeter wave radar point cloud is too sparse, and a pointNet form can be adopted for preprocessing.
S2, space-time feature matching and self-adaptive sampling: and performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and millimeter wave radar point cloud. Under a foggy scene, because the millimeter wave radar point cloud is less interfered, the millimeter wave radar point cloud characteristics are resampled by the millimeter wave point cloud characteristics, and then the laser radar point cloud characteristics are further fused in time sequence to obtain the space-time fusion characteristics under a preliminary bird-eye view angle.
S3, space-time characteristic coding: and inputting the matched and resampled fusion features into a transform encoder, performing feature encoding by adopting a self-attention mode, improving the global dependency of the features, and finally outputting to obtain the space-time semantic features with the same size as the original feature graph but higher dimension.
S4, target classification and frame regression: after the higher-dimensional space-time semantic features under the bird's-eye view are obtained, the space-time semantic features are respectively input into two branches of a target classification detection head and a frame regression detection head, and the final target detection result, namely the object type and the position in the space, is output.
A specific implementation flow of the foggy day scene target detection method according to another embodiment of the present invention is shown in fig. 2, and may specifically include the following steps:
s100, acquiring spatial point cloud data and respectively preprocessing, wherein the method specifically comprises the following steps:
s110, grid division: the detection range needs to be set according to a specific scene and the installation position of the sensor, and in the embodiment of the invention, the ranges of [ 50m,50m ], [ -40m,40m ], [ -3m,5m ] are respectively used as the sensing ranges in the X direction, the Y direction and the Z direction in a laser radar coordinate system. Dividing the point cloud in the range into networks with equal size according to the resolution of 0.25m multiplied by 8 m; and performing preliminary feature extraction on the point cloud obtained by the millimeter wave radar in a PointNet mode.
S120, point cloud grouping and feature aggregation: grouping the laser radar point clouds according to the network divided in the S110; and performing feature aggregation on the grouped point clouds according to grids to obtain voxel features. In order to balance the calculated quantity and the robustness of the features, the voxel features are generated in a mode of combining average pooling and maximum pooling.
S200, space-time feature matching and self-adaptive sampling: and performing space-time feature matching on the processed multi-frame laser radar point cloud voxel features and the millimeter wave radar point cloud. In a foggy day scene, because the millimeter wave radar point cloud is less interfered, the millimeter wave radar point cloud characteristics are resampled by the millimeter wave point cloud characteristics, and then the laser radar point cloud characteristics are further fused in a time sequence to obtain preliminary space-time fusion characteristics under a bird's-eye view angle. The step S200 specifically includes:
s210, converting the unified coordinate system and the aerial view: and converting the millimeter wave radar point cloud into a laser radar coordinate system to be matched with the voxel of the millimeter wave radar point cloud. All voxels and point cloud feature transformation spatial positions are then projected onto the aerial view.
S220, characteristic resampling and enhancement: using the millimeter wave radar point cloud as a center, searching voxels generated by a laser radar in a certain range by adopting KNN, and performing multi-scale random sampling according to the statistical relationship between the effective voxels of the laser radar and the millimeter wave radar point cloud data; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features. The statistical relationship means that the laser radar data and the millimeter wave radar data have a certain proportional relationship under a certain fog concentration. Multiscale means that KNN can be searched according to multiple ranges, sampling different amounts of data at different ranges.
And S230, processing the multi-frame point clouds and the voxels according to the above mode, and arranging the point clouds and the voxels according to a time stamp sequence to obtain the space-time fusion characteristics.
S300, space-time characteristic coding: and inputting the matched and resampled space-time fusion characteristics into a Transformer encoder, performing characteristic encoding by adopting a self-attention mode, improving the global dependency of the characteristics, and finally outputting to obtain space-time semantic characteristics with the same size as the original characteristic diagram but higher dimension. Wherein S300 specifically comprises:
s310, space-time window division and feature coding: specifically, 2.5m × 2.5m is taken as the size of the spatio-temporal window, and all voxel features at different times but at the same window position in S230 are taken as all elements in the spatio-temporal window. And then, inputting the elements in the same time-space window into a self-attention-based transform encoder, and through the global dependency of a self-attention mechanism, on one hand, learning all the geometric features and position features of the environment and the target at the current space position can be realized, on the other hand, the time sequence dependency of the same target in a period of time can be modeled, and the historical frame information is fully utilized to enhance the feature expression, so that the foggy day interference resistance is realized.
S320, space-time characteristic batch processing: the divided 40 multiplied by 40 space-time windows are uniformly used as a batch and input into a self-attention-based transform coder to output high-dimensional semantic space-time characteristics by adopting the same characteristic coding mode, so that the calculated amount can be reduced, and the global dependency in the characteristic extraction process can be improved.
S330, characteristic re-rasterization: the high-dimensional semantic features output in the step S320 are a series of disordered element features, but the difference between the high-dimensional semantic features and the voxel representation form is not large, so that the high-dimensional semantic features can be remapped and expressed as a feature map which is regularly rasterized under a bird' S-eye view angle by means of the coordinates of the voxels.
S400, target classification and frame regression: after the high-dimensional semantic features under the bird's-eye view angle are obtained, the high-dimensional semantic features are respectively input into two branches of a target classification detection head and a frame regression detection head, and the final target detection result, namely the object type and the position in space, is output. Which comprises the following steps:
s410, setting a regression reference frame: and arranging reference frames with the orientations of 0 degrees and 90 degrees respectively at each position of the characteristic diagram obtained in the step S330, wherein the size of each reference frame is obtained according to the average value of the labeled data of a certain category in the data set, so that the difficulty of network learning is reduced.
S420, target classification and frame regression: and (5) inputting the feature graph obtained in the step (S330) into a full connection layer of the classification branch regression and the frame regression respectively to obtain a network object class score and a predicted frame. And filtering the detection boxes with low input threshold values based on the scores to obtain high-quality detection boxes.
The 3D target detection system of the embodiment of the present invention with multi-modal information space-time fusion in a foggy day scene, as shown in fig. 3, is mainly used to implement the method of the above embodiment, and the system includes:
the preprocessing module is used for acquiring laser radar point cloud data and millimeter wave radar point cloud data in a foggy weather scene and respectively preprocessing the laser radar point cloud data and the millimeter wave radar point cloud data;
the space-time feature matching module is used for performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and then further fusing in time sequence to obtain preliminary space-time fusion features under the bird's-eye view angle;
the feature coding module is used for inputting the space-time fusion features under the bird's-eye view angle to a self-attention-based transform encoder, and performing feature coding in a self-attention mode to obtain space-time semantic features which are the same as the original feature map in size and higher in dimension;
and the classification module is used for respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head and outputting the final target detection result comprising the object class and the position in space.
The present application also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor performs a corresponding function. When being executed by a processor, the computer-readable storage medium of the embodiment realizes the 3D target detection method of the multi-modal information space-time fusion in the foggy scene.
Based on the target detection method for fusion of the space-time characteristics of the foggy day scene, the invention further constructs a vehicle-mounted foggy day scene target detection system, and the system architecture diagram is shown in fig. 4, and the system architecture diagram comprises sensors (used for data acquisition, including laser radar, millimeter wave radar, vehicle data sensors and the like, and certainly also including cameras) and a vehicle-mounted storage computing platform (a memory, a positioning perception computing platform) and the like. The sensor communicates with the vehicle-mounted storage computing platform through a data transmission interface (Ethernet, USB and CAN), and the execution flow of the system is as follows:
(1) The foggy day scene space-time characteristic fusion target detection algorithm provided by the invention is converted into instruction codes and deployed in a vehicle-mounted computing platform memory.
(2) And (3) configuring the drivers of the laser radar and the millimeter wave radar sensor to realize the analysis and the forwarding of the sensor data, wherein the form of the forwarded data is matched with the instruction code in the step (1).
(3) And (3) calculating the analyzed and forwarded data on the basis of the instruction codes in the step (1) on a perception positioning calculation platform, obtaining a detection result and sending the detection result to a memory, reading the real-time detection result from the memory by a planning control platform, and finishing downstream tasks on the planning control calculation platform according to the positioning perception result obtained by other algorithms.
(4) And the vehicle actuator performs action execution according to the corresponding control command of the downstream task.
In conclusion, the method utilizes the sensing robustness of the millimeter wave radar in the foggy weather to enhance the point cloud characteristics of the laser radar, and simultaneously, the method still carries out positioning and detection on the target based on the point cloud of the laser radar, overcomes the defects of large sensing error of the millimeter wave radar on height information and low data resolution, and fully combines the advantages of two sensors in the foggy weather scene. Meanwhile, multi-frame data are adopted for multi-time-space fusion, data representation can be further enhanced, interference caused by weather conditions and target motion is relieved to a certain extent, and therefore robust and efficient 3D target detection in foggy weather scenes is achieved.
It will be appreciated that modifications and variations are possible to those skilled in the art in light of the above teachings, and it is intended to cover all such modifications and variations as fall within the scope of the appended claims.

Claims (10)

1. A3D target detection method based on multi-mode information space-time fusion in a foggy day scene is characterized by comprising the following steps:
s1, laser radar point cloud data and millimeter wave radar point cloud data in a foggy day scene are obtained and are respectively preprocessed;
s2, performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the preprocessed millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and further performing fusion on a time sequence to obtain preliminary space-time fusion features under a bird' S-eye view angle;
s3, inputting the space-time fusion characteristics under the bird' S-eye view angle into a self-attention-based transform encoder, and performing characteristic encoding in a self-attention mode to obtain space-time semantic characteristics which are the same as the original characteristic diagram in size but higher in dimension;
and S4, respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head, and outputting the final target detection result including the object type and the position in space.
2. The method for detecting the 3D target through the multi-modal information space-time fusion in the foggy day scene according to claim 1, wherein the laser radar point cloud data are extracted into voxel points, and the millimeter wave radar point cloud is preprocessed in a PointNet mode.
3. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene according to claim 1, wherein in step S2, the point cloud of the millimeter wave radar is converted into a laser radar coordinate system to be matched with voxels of the laser radar, and then all the voxels and the spatial position of the point cloud feature conversion are projected onto a bird' S-eye view.
4. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene according to claim 1, wherein in the step S2, voxels generated by a laser radar in a certain range are searched by adopting KNN (K-nearest neighbor) with a millimeter wave radar point cloud as a center, and random sampling is performed; and finally, carrying out association and feature splicing on the screened point cloud voxels of the laser radar and the point cloud of the millimeter wave radar to obtain enhanced fusion features.
5. The method for detecting the 3D target through the spatio-temporal fusion of the multimodal information in the foggy day scene according to claim 1, wherein the step S3 specifically comprises the following steps:
taking 2.5m multiplied by 2.5m as the size of a space-time window, and taking all voxel characteristics at different moments but at the same window position as all elements in the space-time window;
uniformly inputting the divided 40 multiplied by 40 space-time windows into a self-attention-based transform coder as a batch for feature coding to output high-dimensional semantic space-time features;
and (4) remapping and expressing the high-dimensional semantic features into a regularly-rasterized feature map under the bird's-eye view angle by means of the coordinates of the voxels.
6. The method for detecting the 3D target through the multi-modal information space-time fusion in the foggy day scene according to claim 1, wherein the step S4 specifically comprises the following steps:
arranging reference frames with the orientation of 0 degrees and 90 degrees respectively at each position of a high-dimensional space-time semantic feature map;
and respectively inputting the space-time semantic feature map with the arranged reference frame into a full-connection layer of two branches of the target classification detection head and the frame regression detection head to obtain a network object classification score and a prediction frame, and filtering the detection frame with a low input threshold value based on the score to obtain a high-quality detection frame.
7. The method for detecting the 3D target through the space-time fusion of the multi-modal information in the foggy day scene as claimed in claim 6, wherein the size of the reference frame is obtained according to the average value of the labeled data of a certain category in the data set so as to reduce the difficulty of the network learning.
8. A3D target detection system for multi-modal information space-time fusion in a foggy day scene is characterized by comprising:
the preprocessing module is used for acquiring laser radar point cloud data and millimeter wave radar point cloud data in a foggy weather scene and respectively preprocessing the laser radar point cloud data and the millimeter wave radar point cloud data;
the space-time feature matching module is used for performing space-time feature matching on the preprocessed multi-frame laser radar point cloud and the millimeter wave radar point cloud, resampling the laser radar point cloud features by means of the millimeter wave point cloud features, and then further fusing in time sequence to obtain preliminary space-time fusion features under the bird's-eye view angle;
the feature coding module is used for inputting the space-time fusion features under the bird's-eye view angle to a self-attention-based transform encoder, and performing feature coding in a self-attention mode to obtain space-time semantic features which have the same size as the original feature map but have higher dimension;
and the classification module is used for respectively inputting the higher-dimensional space-time semantic features into two branches of the target classification detection head and the frame regression detection head and outputting the final target detection result comprising the object class and the position in the space.
9. A computer storage medium having stored therein a computer program executable by a processor, the computer program executing the method for 3D object detection by spatiotemporal fusion of multimodal information in foggy day scenes according to any one of claims 1 to 7.
10. A vehicle-mounted foggy day scene target detection system is characterized by comprising a data collector, a vehicle-mounted storage computing platform and a vehicle actuator, wherein the data collector comprises a laser radar, a millimeter wave radar and a vehicle data sensor, a computer storage medium according to claim 9 is arranged in the vehicle-mounted storage computing platform, and the vehicle actuator executes corresponding actions according to a control instruction output by the vehicle-mounted storage computing platform.
CN202210945302.0A 2022-08-08 2022-08-08 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene Active CN115393680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210945302.0A CN115393680B (en) 2022-08-08 2022-08-08 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210945302.0A CN115393680B (en) 2022-08-08 2022-08-08 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene

Publications (2)

Publication Number Publication Date
CN115393680A true CN115393680A (en) 2022-11-25
CN115393680B CN115393680B (en) 2023-06-06

Family

ID=84118249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210945302.0A Active CN115393680B (en) 2022-08-08 2022-08-08 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene

Country Status (1)

Country Link
CN (1) CN115393680B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965961A (en) * 2023-02-23 2023-04-14 上海人工智能创新中心 Local-to-global multi-modal fusion method, system, device and storage medium
CN116363615A (en) * 2023-03-27 2023-06-30 小米汽车科技有限公司 Data fusion method, device, vehicle and storage medium
CN116467848A (en) * 2023-03-21 2023-07-21 之江实验室 Millimeter wave radar point cloud simulation method and device
CN117576150A (en) * 2023-11-03 2024-02-20 扬州万方科技股份有限公司 Multi-mode multi-target 3D tracking method and device considering far-frame dependency relationship

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158763A (en) * 2021-02-23 2021-07-23 清华大学 Three-dimensional target detection method based on multi-view feature fusion of 4D millimeter waves and laser point clouds
CN113506372A (en) * 2021-07-26 2021-10-15 西北工业大学 Environment reconstruction method and device
WO2022000857A1 (en) * 2020-06-30 2022-01-06 广东小鹏汽车科技有限公司 Dataset establishment method, vehicle, and storage medium
CN114708585A (en) * 2022-04-15 2022-07-05 电子科技大学 Three-dimensional target detection method based on attention mechanism and integrating millimeter wave radar with vision
CN114763997A (en) * 2022-04-14 2022-07-19 中国第一汽车股份有限公司 Method and device for processing radar point cloud data acquired by vehicle and electronic equipment
CN114814826A (en) * 2022-04-08 2022-07-29 苏州大学 Radar rail-mounted area environment sensing method based on target grid

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022000857A1 (en) * 2020-06-30 2022-01-06 广东小鹏汽车科技有限公司 Dataset establishment method, vehicle, and storage medium
CN113158763A (en) * 2021-02-23 2021-07-23 清华大学 Three-dimensional target detection method based on multi-view feature fusion of 4D millimeter waves and laser point clouds
CN113506372A (en) * 2021-07-26 2021-10-15 西北工业大学 Environment reconstruction method and device
CN114814826A (en) * 2022-04-08 2022-07-29 苏州大学 Radar rail-mounted area environment sensing method based on target grid
CN114763997A (en) * 2022-04-14 2022-07-19 中国第一汽车股份有限公司 Method and device for processing radar point cloud data acquired by vehicle and electronic equipment
CN114708585A (en) * 2022-04-15 2022-07-05 电子科技大学 Three-dimensional target detection method based on attention mechanism and integrating millimeter wave radar with vision

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李朝;兰海;魏宪: "基于注意力的毫米波-激光雷达融合目标检测", 计算机应用 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965961A (en) * 2023-02-23 2023-04-14 上海人工智能创新中心 Local-to-global multi-modal fusion method, system, device and storage medium
CN115965961B (en) * 2023-02-23 2024-04-05 上海人工智能创新中心 Local-global multi-mode fusion method, system, equipment and storage medium
CN116467848A (en) * 2023-03-21 2023-07-21 之江实验室 Millimeter wave radar point cloud simulation method and device
CN116467848B (en) * 2023-03-21 2023-11-03 之江实验室 Millimeter wave radar point cloud simulation method and device
CN116363615A (en) * 2023-03-27 2023-06-30 小米汽车科技有限公司 Data fusion method, device, vehicle and storage medium
CN116363615B (en) * 2023-03-27 2024-02-23 小米汽车科技有限公司 Data fusion method, device, vehicle and storage medium
CN117576150A (en) * 2023-11-03 2024-02-20 扬州万方科技股份有限公司 Multi-mode multi-target 3D tracking method and device considering far-frame dependency relationship

Also Published As

Publication number Publication date
CN115393680B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN111027401B (en) End-to-end target detection method with integration of camera and laser radar
CN109635685B (en) Target object 3D detection method, device, medium and equipment
CN115393680B (en) 3D target detection method and system for multi-mode information space-time fusion in foggy weather scene
JP7033373B2 (en) Target detection method and device, smart operation method, device and storage medium
CN113673425B (en) Multi-view target detection method and system based on Transformer
CN113761999A (en) Target detection method and device, electronic equipment and storage medium
JP2023549036A (en) Efficient 3D object detection from point clouds
US20220269900A1 (en) Low level sensor fusion based on lightweight semantic segmentation of 3d point clouds
CN116229408A (en) Target identification method for fusing image information and laser radar point cloud information
CN116503803A (en) Obstacle detection method, obstacle detection device, electronic device and storage medium
CN115830265A (en) Automatic driving movement obstacle segmentation method based on laser radar
CN116486368A (en) Multi-mode fusion three-dimensional target robust detection method based on automatic driving scene
CN116258859A (en) Semantic segmentation method, semantic segmentation device, electronic equipment and storage medium
CN114283343A (en) Map updating method, training method and equipment based on remote sensing satellite image
CN114241448A (en) Method and device for obtaining heading angle of obstacle, electronic equipment and vehicle
US20240193788A1 (en) Method, device, computer system for detecting pedestrian based on 3d point clouds
CN112529011A (en) Target detection method and related device
CN114581748B (en) Multi-agent perception fusion system based on machine learning and implementation method thereof
US20230105331A1 (en) Methods and systems for semantic scene completion for sparse 3d data
CN115937259A (en) Moving object detection method and device, flight equipment and storage medium
CN114926637A (en) Garden map construction method based on multi-scale distance map and point cloud semantic segmentation
KR20230119334A (en) 3d object detection method applying self-attention module for removing radar clutter
CN112766100A (en) 3D target detection method based on key points
CN113222111A (en) Automatic driving 4D perception method, system and medium suitable for all-weather environment
CN115082902B (en) Vehicle target detection method based on laser radar point cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant