CN116129371A - Traffic target detection method and device and electronic equipment - Google Patents

Traffic target detection method and device and electronic equipment Download PDF

Info

Publication number
CN116129371A
CN116129371A CN202310090468.3A CN202310090468A CN116129371A CN 116129371 A CN116129371 A CN 116129371A CN 202310090468 A CN202310090468 A CN 202310090468A CN 116129371 A CN116129371 A CN 116129371A
Authority
CN
China
Prior art keywords
image
data
fusion
perception
rgb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310090468.3A
Other languages
Chinese (zh)
Inventor
张磊
时尧成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202310090468.3A priority Critical patent/CN116129371A/en
Publication of CN116129371A publication Critical patent/CN116129371A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • G01S17/8943D imaging with simultaneous measurement of time-of-flight at a 2D array of receiver pixels, e.g. time-of-flight cameras or flash lidar
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/48Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S17/00
    • G01S7/4802Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S17/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/48Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S17/00
    • G01S7/483Details of pulse systems
    • G01S7/486Receivers
    • G01S7/487Extracting wanted echo signals, e.g. pulse detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Evolutionary Computation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Electromagnetism (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a traffic target detection method, a traffic target detection device and electronic equipment, wherein the method comprises the following steps: acquiring first perception data acquired by an FMCW laser radar and second perception data acquired by a video camera; filtering data, in the first perception data, of which the scattered echo intensity does not belong to an intensity threshold range, wherein the intensity threshold range represents a range to which the scattered echo intensity of pedestrians and/or non-motor vehicles belongs; carrying out structuring processing on the filtered first perception data to obtain image processing structure data, wherein the image processing structure data comprises a 3D point cloud voxel feature vector, an SAR image and a Doppler image; video preprocessing is carried out on the second perception data, and RGB images synchronous with the first perception data are obtained; and carrying out image fusion on the 3D point cloud voxel feature vector, the SAR image, the Doppler image and the RGB image, and carrying out pedestrian and/or non-motor vehicle detection based on an image fusion result, thereby improving the accuracy of pedestrian and/or non-motor vehicle detection.

Description

Traffic target detection method and device and electronic equipment
Technical Field
The invention relates to the technical field of intelligent traffic, in particular to a traffic target detection method, a traffic target detection device and electronic equipment.
Background
The intelligent traffic commonly uses multi-sensor fusion to detect the target, which includes fusing the data collected by the laser radar and the video camera and then detecting the target. In the prior art, no matter at a vehicle end or a road end, a laser radar uses data collected by a TOF (Time of Flight) laser radar and a video camera as input of a deep learning model, so as to recognize a target after learning.
The target detection method based on the TOF laser radar and the video camera has good detection effect on large targets such as vehicles, has poor detection effect on pedestrians, non-motor vehicles and the like, and often has the problems of missed detection, false detection and the like. This is one of the important reasons for restricting the automatic driving of vehicles to get on the road, and the actual road pedestrians and non-motor vehicles are important participants, and the behaviors of the vehicles are abrupt and changeable, so that the accuracy of detecting the road pedestrians and/or non-motor vehicles is needed to be improved in order to improve the safety of automatic driving.
Disclosure of Invention
The invention provides a traffic target detection method, a traffic target detection device and electronic equipment, which are used for solving the technical problem of lower detection accuracy of a road ascending person and/or a non-motor vehicle in the prior art.
In a first aspect, the present invention provides a traffic target detection method applied to a road side sensing system, where the road side sensing system includes at least one group of FMCW lidar and a video camera, the FMCW lidar and the video camera are disposed on a road side and have overlapping sensing areas, and the method includes:
acquiring first perception data acquired by the FMCW laser radar and second perception data acquired by the video camera;
filtering data, in the first perception data, of which the scattered echo intensity does not belong to an intensity threshold range, wherein the intensity threshold range represents a range to which the scattered echo intensity of pedestrians and/or non-motor vehicles belongs;
carrying out structuring processing on the filtered first perception data to obtain image processing structure data, wherein the image processing structure data comprises a 3D point cloud voxel feature vector, an SAR image and a Doppler image;
performing video preprocessing on the second perception data to obtain an RGB image synchronous with the first perception data;
and carrying out image fusion on the 3D point cloud voxel feature vector, the Doppler image of the SAR image and the RGB image, and carrying out pedestrian and/or non-motor vehicle detection based on an image fusion result.
Optionally, after the video preprocessing is performed on the second perceived data to obtain an RGB image synchronized with the first perceived data, the method further includes:
acquiring a target sensing area where pedestrians and/or non-motor vehicles are located in a sensing area of the video camera;
and extracting target RGB data in the RGB image based on the target perception area, and updating the RGB image based on the target RGB data.
Optionally, the doppler image comprises a time-doppler spectrum graph and a range-doppler graph.
Optionally, the image fusion of the 3D point cloud voxel feature vector, the doppler image of the SAR image, and the RGB image, and the pedestrian and/or non-motor vehicle detection based on the image fusion result, includes:
performing image fusion on the 3D point cloud voxel feature vector, the Doppler image and the RGB image of the SAR image to obtain a multi-source feature fusion feature map;
and carrying out secondary fusion on the multisource feature fusion feature map and the original point cloud data acquired by the FMCW laser radar, and carrying out pedestrian and/or non-motor vehicle detection based on the data after secondary fusion.
Optionally, the step of performing pedestrian and/or non-motor vehicle detection based on the image fusion result includes:
inputting the multisource feature fusion feature map or the multisource feature fusion feature map and the original point cloud data into a trained convolutional neural network for human and/or non-motor vehicle detection;
the convolutional neural network comprises three cascaded feature extraction layers, each feature extraction layer consists of two convolutional layers and one pooling layer, and the convolution kernel size of each convolutional layer is 3 multiplied by 3.
In a second aspect, the present invention provides a traffic target detection device applied to a road side sensing system, the road side sensing system including at least one group of FMCW lidar and a video camera, the FMCW lidar and the video camera being disposed on a road side with a sensing area overlapping, the device comprising:
the acquisition unit is used for acquiring first perception data acquired by the FMCW laser radar and second perception data acquired by the video camera;
the filtering unit is used for filtering data, in the first perception data, of which the scattered echo intensity does not belong to an intensity threshold range, wherein the intensity threshold range represents the range of the scattered echo intensity of pedestrians and/or non-motor vehicles;
the structuring unit is used for carrying out structuring processing on the filtered first perception data to obtain image processing structure data, wherein the image processing structure data comprises a 3D point cloud voxel feature vector, an SAR image and a Doppler image;
the video processing unit is used for carrying out video preprocessing on the second perception data to obtain an RGB image synchronous with the first perception data;
and the fusion detection unit is used for carrying out image fusion on the 3D point cloud voxel feature vector, the Doppler image of the SAR image and the RGB image, and carrying out pedestrian and/or non-motor vehicle detection based on an image fusion result.
Optionally, the apparatus further includes: the extraction unit is used for acquiring a target sensing area where pedestrians and/or non-motor vehicles are located in a sensing area of the video camera after the second sensing data are subjected to video preprocessing to acquire RGB images synchronous with the first sensing data; and extracting target RGB data in the RGB image based on the target perception area, and updating the RGB image based on the target RGB data.
Optionally, the fusion detection unit is further configured to:
performing image fusion on the 3D point cloud voxel feature vector, the Doppler image and the RGB image of the SAR image to obtain a multi-source feature fusion feature map; and carrying out secondary fusion on the multisource feature fusion feature map and the original point cloud data acquired by the FMCW laser radar, and carrying out pedestrian and/or non-motor vehicle detection based on the data after secondary fusion.
In a third aspect, the present invention provides an electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to implement any of the methods of the first aspect by execution of the one or more programs by one or more processors.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements any of the methods as described in the first aspect.
The above technical solutions in the embodiments of the present application at least have the following technical effects:
according to the traffic target detection method provided by the invention, the FMCW laser radar and the video camera are adopted as road side sensing equipment, the 3D point cloud voxel feature vector, the SAR image, the Doppler image and the RGB image are obtained based on the sensing data of the FMCW laser radar and the video camera, and the resolution and the micro-motion characteristic of the pedestrian/the non-motor vehicle are increased through the SAR image and the Doppler image, so that the accuracy of data fusion and detection of the pedestrian/the non-motor vehicle based on the four dimensions is greatly improved, and the technical problem of lower detection accuracy of the pedestrian/the non-motor vehicle on the road in the prior art is solved. Meanwhile, the radar sensing data is filtered through the scattered wave echo intensity, most of noise except pedestrians and/or non-motor vehicles is filtered, and the detection accuracy of the pedestrians and/or the non-motor vehicles is further improved.
Drawings
Fig. 1 is a schematic diagram of a road side sensing system according to an embodiment of the present application;
FIG. 2 is a flow chart of traffic target detection provided in an embodiment of the present application;
fig. 3 is a flowchart of an image fusion process provided in an embodiment of the present application;
fig. 4 is a schematic diagram of a network structure of a convolutional neural network according to an embodiment of the present application;
fig. 5 is a schematic diagram of a traffic target detection device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Before describing embodiments of the present disclosure, it should be noted that:
some embodiments of the disclosure are described as process flows, in which the various operational steps of the flows may be numbered sequentially, but may be performed in parallel, concurrently, or simultaneously.
The term "and/or," "and/or" may be used in embodiments of the present disclosure to include any and all combinations of one or more of the associated features listed.
It will be understood that when two elements are described in a connected or communicating relationship, unless a direct connection or direct communication between the two elements is explicitly stated, connection or communication between the two elements may be understood as direct connection or communication, as well as indirect connection or communication via intermediate elements.
In order to make the technical solutions and advantages of the embodiments of the present disclosure more apparent, the following detailed description of exemplary embodiments of the present disclosure is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments of which are exhaustive. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other.
Example 1
Referring to fig. 1, embodiment 1 provides a road side sensing system, which includes: FMCW (Frequency Modulated Continuous Wave ) lidar, video camera, roadside computing module and perception result data communication module. The road side computing module can be edge computing equipment, and the sensing result data communication module can be equipment with data communication such as a 5G base station and an Internet of things base station. The FMCW laser radar and the video camera are arranged on a support at the road side. The FMCW laser radar and the video camera can have n groups, n is more than or equal to 1, each group of FMCW laser radar and the video camera comprises at least one FMCW laser radar and at least one video camera, and the sensing areas of the FMCW laser radar and the video camera are completely overlapped or partially overlapped, so that fusion detection can be carried out on data acquired by each group of FMCW laser radar and the video camera.
Under the condition that Doppler frequency shift of a target is not considered, beat frequencies of a stationary target transmitting signal and an echo signal at the same radial distance are consistent, the beat frequencies are inconsistent at different radial distances, and if Doppler frequency shift exists in the target, the beat signals are different, so that the FMCW laser radar is sensitive to distance and speed parameters of the target. The inventor finds that small targets with the size smaller than the set threshold value in the road, such as pedestrians, non-motor vehicles and the like, have obvious micro Doppler characteristics, but the positions of the pedestrians and/or the non-motor vehicles on the road are changeable compared with the positions of the vehicles, the resolution of the pedestrians and the non-motor vehicles is far lower than the resolution of the vehicles, and the accuracy of target detection is poor. Therefore, the embodiment utilizes the FMCW laser radar to acquire data of a sensing area and generate SAR images to obtain small target information under high resolution, and combines RGB information acquired by a video camera to improve accuracy and recall rate of pedestrian and/or non-motor vehicle detection.
Example 2:
based on the above-mentioned road side sensing system, embodiment 2 provides a traffic target detection method, please refer to fig. 2, which includes:
s210, acquiring first perception data acquired by FMCW laser radar and second perception data acquired by a video camera;
s220, carrying out structuring processing on the first perception data to obtain image processing structure data, wherein the image processing structure data comprises a 3D point cloud voxel feature vector, an SAR image and a Doppler image;
s230, performing video preprocessing on the second perception data to obtain an RGB image synchronous with the first perception data;
s240, performing image fusion on the 3D point cloud voxel feature vector, the SAR image, the Doppler image and the RGB image, and performing pedestrian and/or non-motor vehicle detection based on the image fusion result.
In the specific implementation process, the SAR imaging technology can form a large synthetic aperture by utilizing the relative motion of the radar and the target, break through the limit of the real aperture of the antenna and realize high-resolution imaging. The principle of SAR imaging technology is: the small antenna on the radar is used as a single radiation unit, the unit is continuously moved along a straight line, echo signals of the same target object are received at different positions and processed, and then a higher-resolution image of the target object can be obtained. The small antennas can be combined into an equivalent large antenna by moving. Thus, SAR imaging techniques enable relative motion between the conditional radar and the target. In the application scenario of the road side radar, the road side radar is in a static state, pedestrians and vehicles on the road are in a moving state, and for this purpose, a high-resolution image of the vehicles and/or pedestrians can be obtained by adopting an SAR imaging technology.
S220, when the first sensing data is processed to obtain the SAR image, a series of data optimization processes such as Doppler parameter estimation, motion compensation, distance direction compression and the like can be performed on the first sensing data, and filtering processing is performed after the data optimization processes. Of course, the first perceptual data may also be filtered directly or synthetic aperture imaged. Preferably, the accuracy and pertinence of the data can be effectively improved through data optimization and filtering processing.
It should be understood that, in the application scenario of the roadside radar, the echo data acquired by the roadside radar includes echo data of pedestrians and/or non-automotive objects such as pedestrians/non-automotive vehicles and echo data of guardrails, vehicles and the like. That is, the first sensing data includes not only pedestrian and/or non-motor vehicle data to be detected, but also other target data except the target data. Different ground object targets, positions, ground object structures, surface morphology, dielectric properties, different scattered echo intensities for radar beams, for example: bridge, power transmission line, house, road edge, greenbelt, etc. and their echo signals are very strong, and the echo signal intensity of ground and road mark, etc. is relatively weak, and every kind of target has its own scattered echo intensity characteristic.
In order to form SAR images which are more relevant to pedestrians and/or non-motor vehicles and have higher accuracy, filtering the data after data optimization processing, deleting the data of which the scattered echo intensity does not belong to an intensity threshold range from the optimized data, wherein the intensity threshold range represents the range of the scattered echo intensity of the pedestrians and/or the non-motor vehicles, namely. Only data matching the scattered echo intensities to pedestrians and/or non-motor vehicles is retained by the filtering process. And executing a synthetic aperture preprocessing algorithm on the filtered data to obtain SAR images containing pedestrians and/or non-motor vehicles. The processing process is simple and feasible, the operation pressure of the data processing equipment is effectively reduced, and the efficiency of target object detection is improved.
The synthetic aperture preprocessing algorithm divides a two-dimensional signal received by a radar receiver into one-dimensional signals cascaded in the distance direction and the azimuth direction, and then respectively carries out signal processing on the signals in the two directions. The method comprises the steps of compressing the signals in the distance direction through frequency modulation processing, coupling the signals in the distance direction and the azimuth direction due to the influence of distance migration in the azimuth direction, not directly compressing the signals in the azimuth direction, realizing the compression in the azimuth direction through adopting an interpolation method before compressing the signals in the azimuth direction, and finally carrying out focusing imaging on the signals in the azimuth direction through matched filtering to output related SAR images.
S220 extracts two doppler features for identifying pedestrians and/or non-vehicles when processing the first perception data to obtain a doppler image. The Doppler features include a range-Doppler plot and a time-Doppler spectrum plot.
A range-doppler plot describing the range and speed of the radar and target object in a frame of radar data. The range-doppler plot may be obtained by sequentially performing a range-wiener windowing process, an FFT, a doppler wiener windowing process, and an FFT on a frame of radar data. First, a windowing process is performed on a distance dimension of an FMCW radar data frame in order to reduce spectrum leakage, and then an FFT process is performed on the distance dimension, so that distance information of a target object can be obtained. Then, a windowing process is performed in the Doppler dimension, and an FFT process is performed in the Doppler dimension, so that velocity information of the target object can be obtained, and finally, a distance-Doppler diagram can be obtained.
Assume that a radar data frame is represented by the following formula:
Figure BDA0004070137410000073
wherein L represents the number of chirp in a radar data frame, K represents the number of sampling points of each chirp, A m Representing the amplitude of the signal reflected from the target, f r Representing the distance frequency received from the target, f D Indicating the doppler frequency caused by the radial velocity of the target and j indicating the imaginary unit.
The final range-doppler plot obtained by the range-wiener windowing, FFT, doppler wiener windowing, FFT can be expressed as:
Figure BDA0004070137410000071
and (3) carrying out time-frequency analysis on the time-domain radar signal to obtain a Doppler frequency time-varying image of the echo signal. Using short-time fourier transform STFT:
Figure BDA0004070137410000072
where x [ n ] is the discrete time signal, ω [ n ] is the window function, m is the sliding position of the window function, ω is the angular frequency. The result of STFT is a distribution in a two-dimensional plane of time and frequency, the square of the modulus of the result of STFT is taken to represent the power distribution of the input signal x [ n ] in the plane of time and frequency, and the power distribution is represented by a spectrogram.
In combination with experimental data, it is found that the time-doppler spectrum patterns, i.e. the spectrogram features, of the echo signals of the pedestrian/non-motor vehicle differ from the features of the motor vehicle, so if the time-doppler spectrogram feature differences of the pedestrian/non-motor vehicle and the motor vehicle can be extracted from the doppler images of the echo signals acquired by the radar, the type identification of the pedestrian/non-motor vehicle can be performed based on the feature differences.
S220 may process each voxel interior point after grouping the radar data into different voxels when processing the first perception data to obtain a 3D point cloud voxel feature vector. Specifically, a network such as VoxelNet (Voxel Network), voxel-FPN (Voxel-Feature Pyramid Network) and the like can be trained, so that the network can encode the interior points of the Voxel, and finally form a feature vector in the Voxel. The voxel-based method is excellent in performance, considerable in calculation speed and beneficial to subsequent multi-source data fusion.
The voxel feature vector, the SAR image and the Doppler image are obtained by converting the first perception data, so that the data dimension of a small target is greatly increased, and meanwhile, a lot of noise is introduced.
Before, after, or simultaneously with the step S220, the step S230 of performing video preprocessing on the second sensing data to obtain an RGB image synchronized with the first sensing data is performed. The preprocessing comprises image extraction, filtering, image enhancement, image difference and other image processing, and time and space synchronous processing so as to ensure the data quality of RGB images and the synchronism with first perception data.
After video preprocessing is carried out on the second perception data to obtain an RGB image synchronous with the first perception data, a target perception area where pedestrians and/or non-motor vehicles are located in a perception area of a video camera can be further obtained; and extracting target RGB data in the RGB image based on the target perception region, and updating the RGB image based on the target RGB data to extract the region data of interest and improve the fusion efficiency of the subsequent images. The target area may be defined manually, or may be constructed by traffic sign, for example, a road area where lane lines, non-motor vehicles, and zebra crossings are all marked is used as the target area. The target area may also be obtained by image recognition. The target RGB data can be obtained by manufacturing a target perception region mask based on the target perception region, multiplying the target perception region mask by the RGB image to obtain a new RGB image, wherein the image value in the target perception region in the figure is kept unchanged, namely the target RGB data is extracted, the image value outside the region is 0, and the original RGB image is updated to the new RGB image.
Referring to fig. 3, for accuracy of subsequent data fusion, the present embodiment may further perform the following operations after S230:
and S301, judging whether the data anomalies exist in the 3D point cloud voxel feature vectors, the RGB image, the SAR image and the Doppler image. If yes, screening frame by frame, and obtaining an abnormal data frame for repairing, discarding or secondary sampling; if not, the process proceeds to S302.
S302, judging whether data unbalance exists in the data. This step is used during the model training phase and the model detection phase need not perform this step. The data imbalance includes two types, one is different target data imbalance and the other is the same target three-dimensional characteristic data imbalance. Further adoption of target data imbalance is performed to balance target types. The data enhancement may be performed after the feature data imbalance is performed S303. For the case where the determination result of S302 is yes, S303 may be directly performed.
S303, data normalization. The original image to be processed is converted into a corresponding unique standard form (the standard form image has invariant properties for affine transformations such as translation, rotation, scaling, etc.) by a series of transformations (i.e. using invariant moments of the image to find a set of parameters that enable it to eliminate the effect of other transformation functions on the image transformation).
S240 is performed after S303 or S230 to perform image fusion and target recognition. The image fusion may output a multi-source feature fusion feature map. And carrying out target identification based on the multisource feature fusion feature map and the trained convolutional neural network to obtain a target object. Or, performing secondary fusion on the multi-source feature fusion feature map and radar original point cloud data, incorporating corresponding distance, speed and other information, generating more complete target object information as features, and inputting the more complete target object information into a subsequent convolutional neural network for target recognition to obtain a target object.
The embodiment builds a cascade convolution neural network based on a convolution neural network by adopting a superposition convolution and pooling layer structure, and each layer of convolution layer adopts 3 multiplied by 3 and other small-size convolution kernels. The reason that the small-size filter is adopted is that the small-size filter is easier to extract the detail features of the target image, and along with the forward calculation of the convolutional neural network, the detail of the features extracted by the network is richer and more abstract. The essence of the convolution layer is to map the features of the target image with a linear relationship, and to fit more complex features, an activation function is introduced to nonlinear the features.
Referring to fig. 4, the convolutional neural network for object recognition provided in this embodiment includes a convolutional layer, a pooling layer, and a fully-connected layer. Wherein, two convolution layers and one pooling layer form a feature extraction layer, and after three feature extraction layers are cascaded, the three feature extraction layers are propagated to a final layer of the network, namely a full connection layer. The size of each convolution layer is 32×3×3. The full-connection layer links the feature graphs composed of the neurons in the hidden layer one by one, and expands the feature graphs into feature vectors distributed widely for subsequent classification output. The fully connected layer can also be realized by convolution method, and for the fully connected layer of which the input layer is the fully connected layer, the filter can be used for carrying out linear convolution on the front fully connected layer to map all target characteristics.
Training of the convolutional neural network with the structure, using the 3D point cloud voxel feature vector, the SAR image and the Doppler image as model inputs and using the pedestrian and/or non-motor vehicle object identifiers as labels to construct training samples, or using the 3D point cloud voxel feature vector, the SAR image, the Doppler image and the original radar point cloud as model inputs and using the pedestrian and/or non-motor vehicle object identifiers as labels to construct training samples. Network training is performed based on a large number of samples until the network converges.
In the embodiment, the FMCW lidar and the video camera are arranged on the road side as the road side sensing equipment, the 3D point cloud voxel feature vector, the SAR image, the doppler image and the RGB image are obtained based on the sensing data of the FMCW lidar and the video camera, and the resolution and the micro-motion characteristic of the pedestrian/non-motor vehicle are increased through the SAR image and the doppler image, so that the accuracy of data fusion and detection of the pedestrian/non-motor vehicle based on the four dimensions is greatly improved, and the technical problem of lower detection accuracy of the pedestrian/non-motor vehicle on the road in the prior art is solved. Meanwhile, the radar sensing data is filtered through the scattered wave echo intensity, most of noise except pedestrians and/or non-motor vehicles is filtered, and the detection accuracy of the pedestrians and/or the non-motor vehicles is further improved.
Example 3:
based on the traffic target detection method provided in fig. 2, 3 further provides a traffic target detection device correspondingly, which is applied to a road side sensing system, wherein the road side sensing system comprises at least one group of FMCW laser radar and a video camera, the FMCW laser radar and the video camera are arranged on the road side and have overlapping sensing areas, and referring to fig. 5, the traffic target detection device comprises:
an acquiring unit 51, configured to acquire first sensing data acquired by the FMCW lidar and second sensing data acquired by the video camera;
a filtering unit 52, configured to filter data, of the first perceived data, for which the scattered echo intensity does not belong to an intensity threshold range, where the intensity threshold range represents a range to which the scattered echo intensity of a pedestrian and/or a non-motor vehicle belongs;
a structuring unit 53, configured to perform structuring processing on the filtered first sensing data to obtain image processing structure data, where the image processing structure data includes a 3D point cloud voxel feature vector, a SAR image, and a doppler image;
a video processing unit 54, configured to perform video preprocessing on the second perceived data, to obtain an RGB image synchronized with the first perceived data;
and the fusion detection unit 55 is used for carrying out image fusion on the 3D point cloud voxel feature vector, the Doppler image of the SAR image and the RGB image, and carrying out pedestrian and/or non-motor vehicle detection based on the image fusion result.
As an alternative embodiment, the apparatus further comprises:
an extracting unit 56, configured to obtain a target sensing area where a pedestrian and/or a non-motor vehicle is located in a sensing area of the video camera after performing video preprocessing on the second sensing data to obtain an RGB image synchronized with the first sensing data;
and extracting target RGB data in the RGB image based on the target perception area, and updating the RGB image based on the target RGB data.
As an alternative embodiment, the fusion detection unit 55 is further configured to:
performing image fusion on the 3D point cloud voxel feature vector, the Doppler image and the RGB image of the SAR image to obtain a multi-source feature fusion feature map;
and carrying out secondary fusion on the multisource feature fusion feature map and the original point cloud data acquired by the FMCW laser radar, and carrying out pedestrian and/or non-motor vehicle detection based on the data after secondary fusion.
As an alternative embodiment, the fusion detection unit is further configured to:
inputting the multisource feature fusion feature map or the multisource feature fusion feature map and the original point cloud data into a trained convolutional neural network for human and/or non-motor vehicle detection;
the convolutional neural network comprises three cascaded feature extraction layers, each feature extraction layer consists of two convolutional layers and one pooling layer, and the convolution kernel size of each convolutional layer is 3 multiplied by 3.
The specific manner in which the individual units perform the operations in relation to the apparatus of the above embodiments has been described in detail in relation to the embodiments of the method and will not be described in detail here.
Example 4:
fig. 6 is a block diagram illustrating an electronic device 600 for implementing a traffic target detection method according to an example embodiment. For example, the electronic device 600 may be an industrial personal computer, a computer, an edge server, an edge computing device, or the like.
Referring to fig. 6, an electronic device 600 may include one or more of the following components: a processing component 602, a memory 604, a power supply component 606, an input/output (I/O) interface 608, and a communication component 610.
The processing component 602 generally controls overall operation of the electronic device 600, such as operations associated with data computation, control, instruction issue, and camera triggering. The processing element 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components.
The memory 604 is configured to store various types of data to support operations at the device 600. Examples of such data include instructions, image data, association data, configuration data, and the like for any application or method operating on electronic device 600. The memory 604 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 606 provides power to the various components of the electronic device 600. The power supply components 606 can include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 600.
The communication component 610 is configured to facilitate communication between the electronic device 600 and other devices, either wired or wireless. The electronic device 600 may access a wireless network based on a communication standard, such as WiFi,4G, or 5G, or a combination thereof. In one exemplary embodiment, the communication part 410 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 410 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 600 may be implemented by one or more application specific integrated circuits
An (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic component implementation for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as memory 604, including instructions executable by processor 620 of electronic device 600 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. The point cloud data processing method of the above-described embodiment may be implemented when instructions in the non-transitory computer readable storage medium are executed by the processor 620 of the electronic device 600.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present invention is to be limited only by the following claims, which are set forth herein as illustrative only and not by way of limitation, and any such modifications, equivalents, improvements, etc., which fall within the spirit and principles of the present invention, are intended to be included within the scope of the present invention.

Claims (10)

1. The traffic target detection method is characterized by being applied to a road side sensing system, wherein the road side sensing system comprises at least one group of FMCW laser radar and a video camera, and the FMCW laser radar and the video camera are arranged on the road side and have overlapping sensing areas; the method comprises the following steps:
acquiring first perception data acquired by the FMCW laser radar and second perception data acquired by the video camera;
filtering data, in the first perception data, of which the scattered echo intensity does not belong to an intensity threshold range, wherein the intensity threshold range represents a range to which the scattered echo intensity of pedestrians and/or non-motor vehicles belongs;
carrying out structuring treatment on the filtered first perception data to obtain image processing structure data; the image processing structure data comprises a 3D point cloud voxel feature vector, an SAR image and a Doppler image;
performing video preprocessing on the second perception data to obtain an RGB image synchronous with the first perception data;
and carrying out image fusion on the 3D point cloud voxel feature vector, the SAR image, the Doppler image and the RGB image, and carrying out pedestrian and/or non-motor vehicle detection based on an image fusion result.
2. The traffic target detection method according to claim 1, wherein after said video preprocessing of said second perceived data to obtain RGB images synchronized with the first perceived data, said method further comprises:
acquiring a target sensing area where pedestrians and/or non-motor vehicles are located in a sensing area of the video camera;
and extracting target RGB data in the RGB image based on the target perception area, and updating the RGB image based on the target RGB data.
3. The traffic target detection method according to claim 1, wherein the doppler image comprises a time-doppler spectrum graph and a range-doppler graph.
4. A traffic target detection method according to any one of claims 1 to 3, wherein image fusion is performed on the 3D point cloud voxel feature vector, SAR image, doppler image and RGB image, and pedestrian and/or non-motor vehicle detection is performed based on the image fusion result, comprising:
performing image fusion on the 3D point cloud voxel feature vector, the Doppler image and the RGB image of the SAR image to obtain a multi-source feature fusion feature map;
and carrying out secondary fusion on the multisource feature fusion feature map and the original point cloud data acquired by the FMCW laser radar, and carrying out pedestrian and/or non-motor vehicle detection based on the data after secondary fusion.
5. The traffic target detection method according to claim 4, wherein the pedestrian and/or non-motor vehicle detection based on the image fusion result comprises:
inputting the multisource feature fusion feature map or the data obtained by secondarily fusing the multisource feature fusion feature map and the original point cloud data into a trained convolutional neural network for human and/or non-motor vehicle detection;
the convolutional neural network comprises three cascaded feature extraction layers, each feature extraction layer consists of two convolutional layers and one pooling layer, and the convolution kernel size of each convolutional layer is 3 multiplied by 3.
6. A traffic target detection device, the device being characterized in that it is applied to a road side perception system, the road side perception system comprising at least one group of FMCW lidar and a video camera, the FMCW lidar and the video camera being arranged on the road side with overlapping perception areas, the device comprising:
the acquisition unit is used for acquiring first perception data acquired by the FMCW laser radar and second perception data acquired by the video camera;
the filtering unit is used for filtering data, in the first perception data, of which the scattered echo intensity does not belong to an intensity threshold range, wherein the intensity threshold range represents the range of the scattered echo intensity of pedestrians and/or non-motor vehicles;
the structuring unit is used for carrying out structuring processing on the filtered first perception data to obtain image processing structure data, wherein the image processing structure data comprises a 3D point cloud voxel feature vector, an SAR image and a Doppler image;
the video processing unit is used for carrying out video preprocessing on the second perception data to obtain an RGB image synchronous with the first perception data;
and the fusion detection unit is used for carrying out image fusion on the 3D point cloud voxel feature vector, the Doppler image of the SAR image and the RGB image, and carrying out pedestrian and/or non-motor vehicle detection based on an image fusion result.
7. The traffic target detection device according to claim 6, further comprising:
the extraction unit is used for acquiring a target sensing area where pedestrians and/or non-motor vehicles are located in a sensing area of the video camera after the second sensing data are subjected to video preprocessing to acquire RGB images synchronous with the first sensing data;
and extracting target RGB data in the RGB image based on the target perception area, and updating the RGB image based on the target RGB data.
8. The traffic target detection device according to any one of claims 6 to 7, wherein the fusion detection unit is further configured to:
performing image fusion on the 3D point cloud voxel feature vector, the Doppler image and the RGB image of the SAR image to obtain a multi-source feature fusion feature map;
and carrying out secondary fusion on the multisource feature fusion feature map and the original point cloud data acquired by the FMCW laser radar, and carrying out pedestrian and/or non-motor vehicle detection based on the data after secondary fusion.
9. An electronic device comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to implement the method of any of claims 1-5 by execution of the one or more programs by one or more processors.
10. A computer readable storage medium, characterized in that a computer program is stored thereon, which program, when being executed by a processor, implements the steps of the method according to any of claims 1-5.
CN202310090468.3A 2023-02-09 2023-02-09 Traffic target detection method and device and electronic equipment Pending CN116129371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310090468.3A CN116129371A (en) 2023-02-09 2023-02-09 Traffic target detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310090468.3A CN116129371A (en) 2023-02-09 2023-02-09 Traffic target detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116129371A true CN116129371A (en) 2023-05-16

Family

ID=86309774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310090468.3A Pending CN116129371A (en) 2023-02-09 2023-02-09 Traffic target detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116129371A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173693A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device
CN118068318A (en) * 2024-04-17 2024-05-24 德心智能科技(常州)有限公司 Multimode sensing method and system based on millimeter wave radar and environment sensor

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173693A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device
CN117173693B (en) * 2023-11-02 2024-02-27 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device
CN118068318A (en) * 2024-04-17 2024-05-24 德心智能科技(常州)有限公司 Multimode sensing method and system based on millimeter wave radar and environment sensor

Similar Documents

Publication Publication Date Title
CN116129371A (en) Traffic target detection method and device and electronic equipment
EP3825728A1 (en) Method and device to improve radar data using reference data background
KR20200144862A (en) Method and device to improve resolution of radar
CN111521989A (en) Deep learning for super-resolution in radar systems
Armanious et al. An adversarial super-resolution remedy for radar design trade-offs
EP4254137A1 (en) Gesture recognition method and apparatus
Amiri et al. Micro-Doppler based target classification in ground surveillance radar systems
Rizik et al. Cost-efficient FMCW radar for multi-target classification in security gate monitoring
CN113219462B (en) Target identification method and device based on time-frequency diagram and terminal equipment
Bhatia et al. Object classification technique for mmWave FMCW radars using range-FFT features
CN111323757B (en) Target detection method and device for marine radar
CN111323756A (en) Deep learning-based marine radar target detection method and device
Kazemi et al. Deep learning for direct automatic target recognition from SAR data
Sun et al. A target recognition algorithm of multi-source remote sensing image based on visual Internet of Things
Krysik et al. Moving target detection and imaging using GSM-based passive radar
Lee et al. Background adaptive division filtering for hand-held ground penetrating radar
Pardhu et al. Human motion classification using Impulse Radio Ultra Wide Band through-wall RADAR model
Kondapalli et al. Real-time rain severity detection for autonomous driving applications
Hoffmann et al. Filter-based segmentation of automotive sar images
CN105223571B (en) The ISAR imaging method significantly paid attention to based on weighting L1 optimization with vision
Zhang et al. Cnn based target classification in vehicular networks with millimeter-wave radar
Pavlov et al. Investigation of the Influence of Speckle Noise on the Accuracy of Object Detection by Convolutional Neural Networks
Mardiev et al. Convolutional Neural Networks for Processing Micro-Doppler Signatures and Range-Azimuth Radar Maps of Frequency Modulated Continuous Wave Radars
Linnehan et al. Detecting slow moving targets in SAR images
Guo et al. Deep Model Based Road User Classification Using mm-Wave Radar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination