Multi-target vehicle detection and re-identification method based on radar vision fusion
Technical Field
The invention belongs to the technical field of mobile vehicle target detection, multi-sensor data fusion and vehicle re-identification, and relates to a method for detecting and re-identifying a multi-target vehicle by fusing video camera image data and millimeter wave radar data.
Background
In recent years, with the continuous development of economy in China, the living standard of people is improved day by day, driving for outgoing becomes the preferred mode for many people to go out, and driving for commuting also becomes a relatively common phenomenon. The rapid growth of motor vehicles has put even greater pressure on traffic regulators. With the video monitoring taking an increasingly important position in the public safety field, vehicle-related tasks such as vehicle target detection, vehicle classification, vehicle tracking, driver behavior analysis and the like are receiving more and more attention. How to judge whether the vehicles among different image frames of a single video camera video are the same vehicle or not and whether the vehicles among the video images across the video cameras are the same vehicle or not are important requirements of vehicle management and control can be better developed on the basis of the important requirements, and follow-up work such as vehicle track tracking, driver behavior analysis and the like can be better developed.
The vehicle weight recognition is a problem of vehicle identity recognition of the same vehicle in a traffic control scene in a specific range by judging whether vehicle data captured by roadside sensors (video cameras, millimeter wave radars and the like) at different times or different positions in non-overlapping areas are the same vehicle. The problem is solved, and the method has important significance in various aspects such as accurate vehicle control, regional security, vehicle road collaboration and the like in the region.
At present, the existing vehicle weight recognition methods are mainly classified into four types:
the first type is: sensor-based method
Various sensors are used to detect vehicles and infer that identity is the most basic, earliest emerging method of vehicle weight recognition. For each vehicle detected by a sensor, such methods typically use a specific means to extract vehicle features and determine vehicle identity through feature matching. The earliest vehicle re-identification methods were based on various hardware detectors (e.g., infrared, ultrasonic, etc.) to extract characteristic information of the vehicle. Thereafter, many vehicle identification methods using other sensors or inductors have been proposed, such as using three-dimensional magnetic inductors to detect the multidimensional characteristics of the vehicle and to be able to obtain time information from the inductors for training the gaussian maximum likelihood classifier. Induction coils are the most common means of acquiring data in traffic scenarios, and can monitor various vehicle attributes (such as speed, volume, and vehicle footprint), and are generally deployed on urban trunks and highways. The real-time vehicle weight recognition method based on the induction coil utilizes data provided by the induction coil to extract vehicle characteristics and estimate vehicle travel time, thereby completing a vehicle weight recognition task. With the advent of emerging sensors or technologies such as Global Positioning System (GPS), radio Frequency Identification (RFID), and cell phones, methods have explored beacon-based vehicle tracking and monitoring systems to accomplish vehicle re-identification tasks. For example, a vehicle weight identification algorithm based on a radio frequency identification tag is suitable for various toll stations, and a vehicle running time estimation method based on a GPS is used for solving the problem of vehicle weight identification. Most of the methods based on the sensor need to install a large amount of hardware equipment, and the experimental environment is harsh and difficult to reproduce. In addition, many methods are susceptible to objective environmental influences, such as weather conditions, signal strength, traffic congestion, and vehicle speed, which reduce the sensitivity of the sensor to varying degrees. There is no uniform performance evaluation criterion at the same time, so this type of method cannot be regarded as an ideal vehicle weight recognition method.
The second type: based on the vehicle license plate information.
With the development of computer vision technology, it becomes possible to identify vehicle identity information through image or video data, and the deployment amount of hardware facilities is greatly reduced, so that a large amount of hardware cost and manual installation cost can be saved. In the initial stage, the vehicle re-recognition method based on the computer vision technology mainly extracts the vehicle license plate information through license plate positioning, character segmentation and recognition methods. The positioning function is mainly realized by using methods such as gray information, color and texture information, and the segmentation and identification function is mainly realized by using methods such as template matching and neural network. The vehicle re-identification method based on the license plate has high accuracy, but has obvious defects: the traffic monitoring system has the problems of change of shooting visual angles, weather influence, illumination change, low image resolution and the like, and once the license plate information of a target vehicle is lost, the method is invalid. In an actual complex traffic environment, license plate information is difficult to acquire due to factors such as license plate shielding, low camera resolution, long shooting distance and shooting angle, which is a frequent event, and the accuracy of license plate identification is reduced to a great extent. The phenomenon that the number plate of the vehicle is fake, is overlapped or even is not available is not sufficient. Therefore, although license plate recognition is the simplest and most direct method for distinguishing different vehicles, the task of re-recognition cannot be completed only by means of license plate information in many cases.
In the third category: based on vehicle image characteristics.
The vehicle re-recognition work is finished by integrating other non-license plate information without completely depending on the license plate, and then the stability and accuracy of the vehicle re-recognition are improved. The traditional vehicle re-identification method without license plate information mainly comprises the steps of carrying out an image feature matching process by extracting HSV (hue, saturation, power) features, HOG (histogram of gravity) features and other features of a vehicle image, and then realizing the vehicle re-identification work by classifying vehicle colors, vehicle types, vehicle windshields and the like. The vehicle re-identification method based on the license plate-free information has strong interpretability, is easily influenced by visual angle change and shielding, and is still low in re-identification accuracy.
The fourth type: based on machine learning and deep neural networks.
In recent years, with the development of artificial intelligence algorithms and deep neural networks in the field of computer vision, a new technical route for vehicle re-identification is provided. More and more vehicle weight recognition methods based on machine learning and deep learning are proposed, and the methods greatly improve the accuracy of vehicle weight recognition and gradually become mainstream methods for realizing vehicle weight recognition.
The convolutional neural network takes the image as input, does not need to extract complex artificial features in advance, and carries out feature extraction through continuous forward learning and backward feedback processes. Each layer of the convolutional neural network mainly comprises feature extraction and feature mapping operations. In the feature extraction operation, the input of the neuron is the output of the previous layer, and the convolution kernel is used to perform convolution operation on the input to obtain local features, and a plurality of convolution kernels can be used for each layer to represent that a plurality of features are extracted for the input. Because the weight of the convolution kernel is shared, the parameters of the network are greatly reduced. In the feature mapping operation, a sigmod or tanh function is used as an activation function of the convolution network, so that the extracted features have displacement invariance. The convolutional neural network feature extraction operation automatically learns the training data, so that the condition that features are fixedly extracted by using a manually defined feature extraction method is avoided, the training data are implicitly and autonomously learned, and because the convolutional kernel weight is shared, parallel learning can be realized, and the calculation efficiency is improved.
Based on a convolutional neural network and takes account of the influence of the pose on the recognition accuracy. The method comprises the steps of obtaining a plurality of region segmentation results of a target vehicle from an image to be identified; and extracting regional characteristic vectors from the multiple regional segmentation results by using a Convolutional Neural Network (CNN), and fusing the regional characteristic vectors with the global characteristic vectors to obtain the appearance characteristic vectors of the target vehicle. And finally, the fused feature vectors are used for vehicle weight recognition and retrieval, although the scheme considers the influence of the posture on the vehicle weight recognition, the accuracy of the model is limited by the diversity of the data set, the data set must comprise vehicle pictures at various angles, the scale is large enough, and in a real scene, the collection of vehicle pictures of all vehicles at different angles and the data set with the quantity reaching hundreds of thousands of orders of magnitude is difficult. In addition, on the collected data set, the key points are labeled for the vehicle pictures at different angles, and the angles of different pictures are different, so that the number and the positions of the labeled key points are different, which results in huge workload. Thus, the method is complicated in terms of feasibility and workload.
In addition, due to the fact that the view angle of a camera and the illumination change and the like cause great intra-class differences of the same vehicle under different view angles or illumination conditions or the similarity between different vehicles is high due to the fact that the models of the different vehicles are the same, the serious challenge is still the main reason for limiting the accuracy of vehicle re-identification.
The various vehicle re-identification methods are based on image or video data shot by a road side camera, and perform re-identification on multi-target vehicles on the aspect of vehicle appearance, and the flow chart is shown in fig. 1 and comprises the steps of vehicle image or video acquisition, vehicle detection, feature extraction and expression, similarity measurement calculation and detection result display. However, it is common for different vehicles to be identical in appearance, except for the license plates. For this case, the target vehicle is re-identified only from the aspect, and the misjudgment rate is greatly increased.
The millimeter wave radar is a radar operating in the millimeter wave band for detection. Generally, the millimeter wave refers to the frequency domain (wavelength is 1-10 mm) of 30-300 GHz. Millimeter-wave radar has some of the advantages of both microwave and photoelectric radar because the wavelength of millimeter-wave waves is intermediate between microwave and centimeter waves. Compared with the centimeter wave seeker, the millimeter wave seeker has the advantages of being small in size, light in weight and high in spatial resolution. Compared with optical probes such as infrared, laser and television, the millimeter wave probe has strong capability of penetrating fog, smoke and dust and has the characteristics of all weather (except heavy rainy days) all day long. The light wave is seriously propagated and attenuated in the atmosphere, and the requirement on the processing precision of the device is high. Compared with light waves, millimeter waves have small attenuation when being transmitted by utilizing an atmospheric window (certain frequencies with extremely small attenuation values caused by resonance absorption of gas molecules when millimeter waves and submillimeter waves are transmitted in the atmosphere), and are less influenced by natural light and a thermal radiation source. For this reason, they are of great significance in communication, radar, guidance, remote sensing technology, radio astronomy and spectroscopy. The advantages are mainly as follows:
(1) Small antenna aperture, narrow beam: high tracking and guidance accuracy; the method is easy to track at a low elevation angle and can resist ground multipath and clutter interference; the method has high transverse resolution on near-empty targets; high angular resolution is provided for region imaging and target monitoring; high anti-interference performance of narrow beams; high antenna gain; easy detection of small targets, etc.
(2) Large bandwidth: the method has high information rate, and is easy to obtain the detailed structural characteristics of a target by adopting narrow pulse or broadband frequency modulation signals; the device has wide spectrum spreading capability, reduces multipath and clutter and enhances the anti-interference capability; radar or millimeter wave recognizers of adjacent frequencies operate, mutual interference is easy to overcome; high distance resolution and easy obtaining of accurate target tracking and identification capability.
(3) High doppler frequency: good detection and identification capabilities of slow targets and vibrating targets; the target characteristic identification is easy to be carried out by utilizing the target Doppler frequency characteristic; penetration characteristics to dry atmospheric pollution provide good detection capability under dust, smoke and dry snow conditions.
The video camera can extract the image characteristics of the multiple target vehicles, and meanwhile, the camera calibration technology and the target detection algorithm can be adopted to obtain the positioning data of the multiple target vehicles. The millimeter wave radar can also detect the positioning data of the multi-target vehicles, and the positioning data with higher precision of the multi-target vehicles can be obtained by adopting a data fusion algorithm. In addition, the millimeter wave radar can accurately capture geographic space information such as vehicle speed, driving direction and the like, new dimensionality can be added to vehicle attributes, and a solution can be provided for the vehicle re-identification difficulty by combining video image data.
Prior Art
Patent document CN108875754A
Patent document CN111582178A
Patent document CN111553205A
Patent document CN109508731A
Patent document CN111435421A
Description of related terms
1. Single video camera: a single video camera into and out of the field of view of which multiple target vehicles may be driven.
2. Cross-video camera: the multi-target vehicle can enter the visual field range of another video camera after driving out the visual field range of a certain video camera.
3. Vehicle image characteristics: the vehicle image HSV value, LBP and HOG all belong to vehicle image characteristics, and in addition, the vehicle image HSV value, LBP and HOG further comprise vehicle color, vehicle type, size and license plate information.
4. Vehicle geospatial features: the longitude and latitude coordinates, the speed and the course angle information of the vehicle are indicated.
5, HSV: HSV (Hue, saturation) is a color space created by a.r. smith in 1978, also known as the hexagonal cone model, based on the intuitive nature of color. The parameters of the colors in this model are: hue (H), saturation (S), lightness (V).
6, LBP: the LBP (Local Binary Pattern) operator is a valid texture description operator, it has the obvious advantages of rotation invariance, gray scale invariance and the like. The basic idea is to use the gray value of its central pixel as a threshold value and to express local texture features by using a binary code obtained by comparing it with its neighborhood.
7, HOG: histogram of Oriented Gradient (Histogram of Oriented Gradient) features are a feature descriptor used in computer vision and image processing for object detection. The method is characterized in that features are formed by calculating and counting gradient direction histograms of local regions of an image, and HOG features are combined with SVM classifiers and are widely applied to image recognition.
8. The intra-class difference is large: targets of the same class have large differences in feature reflection. Reflected on the vehicle weight recognition as: for the same vehicle, the image characteristics of the same vehicle have larger difference due to the influence of different shooting angles, different illumination intensities and other factors of the video cameras.
9. The similarity between classes is high: different classes of targets have a higher degree of similarity in feature reflection. Reflected on the vehicle weight recognition as: for different vehicles, the vehicle types, colors and sizes of the vehicles are similar or belong to the same model of the same brand, so that the image characteristics reflect higher similarity.
10. Video camera data: and video image data, namely, after calibrating the camera and carrying out target detection on the image, the pixel coordinates of the image detection target and the world geographic coordinates can be obtained.
11. Video camera positioning data: and the target positioning data output by the calibrated video camera is coordinates in a world geographic coordinate system converted from a pixel coordinate system.
12. Millimeter wave radar data: the distance, the azimuth angle, the speed, the vehicle course angle and the like of the vehicle relative to the millimeter wave radar can be used for calculating the coordinates of the vehicle in a radar coordinate system according to the data, and the coordinates of the vehicle in a world coordinate system can be calculated after the millimeter wave radar is calibrated.
13. Millimeter wave radar positioning data: target positioning coordinate data in the millimeter wave radar data.
14. an RTK: the high-precision GPS measurement must adopt a carrier phase observation value, and the RTK equipment adopts a differential positioning technology, namely a real-time dynamic positioning technology based on the carrier phase observation value, so that a three-dimensional positioning result of a station in a specified coordinate system can be provided in real time, and the centimeter-level precision is reached. RTKs can be classified into handheld types and vehicle-mounted types according to their uses. The handheld RTK can be used for handheld single-point coordinate measurement. The on-board type can be mounted on a test vehicle for continuous coordinate measurement.
RTK positioning data: and longitude and latitude coordinates obtained by handheld and vehicle-mounted RTK measurement.
16. Data set: the data set that data acquisition equipment acquireed refers to the experimental test vehicle's that video camera, millimeter wave radar and on-vehicle RTK equipment gathered location data in this patent.
17. The central computing server: and the computer equipment is used for receiving, processing and storing the data (including video camera image data and millimeter wave radar positioning data) acquired by each data acquisition equipment.
18. Space-time matching: the data sets acquired by different devices are synchronized in time and consistent in space for the positioning data acquired by the same target through an optimization matching method, namely, the error is within an acceptable range (0.5 m).
19. World geographic coordinate system: the world geographic coordinate system described in this patent is the WGS-84 coordinate system in the geographic coordinate system.
20. Pixel coordinate system: the pixel coordinate system represents the position of an image pixel in an image, and usually, the upper left pixel of the image is taken as an origin, the right direction is defined as a positive x-axis direction, the lower direction is defined as a positive y-axis direction, and the horizontal and vertical pixel coordinates represent the number of pixels from the y-axis and the x-axis, respectively.
21. Radar coordinate system: namely, a millimeter wave radar coordinate system, and a three-dimensional space coordinate system with the millimeter wave radar itself as the origin of the coordinate system.
22. Homography transformation matrix: the coordinate transformation matrix under different coordinate systems can be obtained by selecting corresponding key points (at least 4 pairs of non-collinear points) under different coordinate systems.
23. Repeatedly: at least 2 times.
24. A target vehicle: vehicles coming within the field of view of the sensors (video camera and millimeter wave radar).
25. Multi-target vehicle: a vehicle collection comprising at least 1 target vehicle.
26. Multi-target vehicle data: and the positioning data of at least 1 vehicle is obtained by the video camera and the millimeter wave radar.
27. Deep learning target detection algorithm: the deep learning target detection algorithm adopted in the patent is a YOLO v5 target detection algorithm for carrying out target detection on vehicles in a road.
28.bb: and a Bounding box, a target detection rectangular frame, and a target detection algorithm calculation result is returned for framing the target contour detected in the video image.
29. Multi-sensor data fusion algorithm: and synthesizing the integrated multi-source data to form the optimal consistent estimation of the tested object and the property thereof. In the patent, a Kalman filtering algorithm is adopted for data fusion.
30. Mahalanobis distance: mahalanobis distance (Mahalanobis distance) is proposed by the indian statistician Mahalanobis (p.c. Mahalanobis) to represent the distance between a point and a distribution. The method is an effective method for calculating the similarity of two unknown sample sets. Unlike euclidean distances, it allows for a link between characteristics (e.g. a piece of information about height would bring a piece of information about weight, since both are related) and is scale-independent, i.e. independent of the measurement scale.
31. Cosine distance: the cosine distance is also called cosine similarity or cosine similarity, and the similarity of two vectors is evaluated by calculating the cosine value of the included angle between the two vectors.
32. And (3) roadside: located at other positions not belonging to the road surface itself, such as the side of the road, above the road (supported by uprights or portal frames), etc.
33. The lost vehicle: vehicles that were present in the field of view of the set of sensor devices (video camera, millimeter wave radar) and that are not currently captured by one device.
34. A sensor device: the sensor refers to an instrument capable of acquiring data, and comprises a video camera, a millimeter wave radar and RTK positioning equipment (handheld type and vehicle-mounted type).
35. Vehicle weight recognition system: a system framework for re-identifying a missing vehicle.
36. Vehicle loss database: and recording the characteristics of the last n frames (the value of n can be adjusted according to the effect) before the vehicle is lost, including the image characteristics and the geographic space characteristics of the vehicle.
ID reassignment: when the vehicle weight recognition system retrieves the lost vehicle, the history ID of the vehicle is given to the retrieved vehicle.
[ summary of the invention ]
In order to solve the problem that the identification accuracy is low due to the fact that the intra-class difference of the same vehicle is large under different visual angles or illumination conditions or the similarity between classes of different vehicles is high due to the fact that the models of different vehicles are the same due to the fact that the visual angle of a camera and illumination change and the like in the existing vehicle re-identification technology based on video images, the invention adopts the technical scheme that:
a multi-target vehicle detection and re-identification method based on radar-vision fusion is characterized in that the geographic spatial information dimensionality is increased for vehicle characteristics by adding millimeter wave radar detection data.
The specific scheme is as follows:
1) For video camera S 1 And millimeter wave radar S 2 Calibrating to obtain vehicle-mounted RTK positioning data D 0 As a relative true value (since the on-board RTK positioning data error level is in centimeters, the video camera positioning data D 1 Positioning data D with millimeter wave radar 2 Error level is meter level, and the precision of vehicle-mounted RTK positioning data is far higher than that of positioning data of a video camera and a millimeter wave radar), and positioning data D are respectively established 1 And D 2 And positioning data D 0 The space-time matching optimization model adopts an optimization algorithm to carry out optimization on the positioning data D 1 And D 2 And performing space-time matching to finish the calibration work of the sensor.
2) And performing target detection on the multi-target vehicles in the video image of the video camera by using the multi-target vehicle data acquired by the video camera and the millimeter wave radar and adopting a deep learning target detection algorithm and respectively extracting the image characteristics and the geographic space characteristics of the target vehicles. Positioning data D with vehicle-mounted RTK 0 As a relative true value, a multi-sensor data fusion algorithm is adopted to perform data fusion on the multi-target vehicle data acquired by the video camera and the millimeter wave radar to obtain the best estimation of the positioning accuracy of the multi-target vehicle, and the fused positioning data D 3 Compared with D 1 And D 2 The positioning precision is higher;
and continuously tracking and re-identifying and judging the multi-target vehicles among different image frames of the single video camera and the multi-target vehicles across the images of the video camera by using the fused multi-target vehicle data.
The invention mainly relates to the technical problems that the invention comprises the following aspects:
RTK differential positioning technique;
2. camera calibration technology;
3. image data multi-target vehicle detection technology;
4. millimeter wave radar calibration technology;
5. a multi-sensor data space-time matching optimization technique;
6. multi-sensor data fusion techniques;
7. a single video camera image multi-target vehicle continuous tracking and re-identification technology;
8. a cross-video camera image multi-target vehicle continuous tracking and re-identification technology;
the technical scheme adopted by the invention for solving the technical problems is as follows:
1. calibration:
1) Calibrating a camera: and (3) carrying out world geographic coordinate accurate positioning on key points in the image picture of the camera by using a handheld RTK device, and selecting at least 4 non-collinear key points for positioning. The pixel coordinates of the key point in the camera image pixel coordinate system are determined. And calculating a homography transformation matrix of the pixel coordinate system and the world geographic coordinate system to finish the calibration of the camera.
2) Calibrating coordinates of a target detection frame: and (3) additionally arranging a vehicle-mounted RTK device for the experimental test vehicle to acquire real-time positioning data of the vehicle. And driving the experimental test vehicle into the visual field range of the camera, and labeling a detection frame for the experimental vehicle in the video image by using a deep learning target detection algorithm. And calculating the world geographic coordinates of the experimental vehicle, namely the experimental test vehicle positioning data detected by the video camera according to the calculated homography transformation matrix corresponding to the video camera image pixel coordinate system and the world geographic coordinate system by taking the midpoint pixel coordinate of the lower bottom edge of the detection frame as the pixel coordinate of the experimental test vehicle. Acquiring experimental vehicle positioning data output by a video camera of an experimental vehicle (the duration is more than 2 minutes, and the experimental vehicle repeatedly appears in the visual field range of the video camera) within a period of time and synchronous vehicle-mounted RTK positioning data D 0 Establishment of D 0 And D 1 Space-time matching optimization model between two data sets, solving parameters and completing video cameraAnd calibrating the coordinates of the target detection frame. 3) Calibrating a millimeter wave radar: and calculating the relative coordinates of the experimental vehicle relative to the millimeter wave radar, namely the coordinates of the experimental vehicle in a radar coordinate system according to the millimeter wave radar raw data (including the distance and azimuth angle data of the target relative to the millimeter wave radar). Acquiring positioning data D of the experimental vehicle acquired by the millimeter wave radar of the experimental vehicle within a period of time (the duration is more than 2 minutes, and the experimental vehicle repeatedly appears in the field range of the millimeter wave radar for many times) 2 With synchronous on-vehicle RTK positioning data D 0 Calculating D 0 And D 2 And (3) homography transformation matrixes of the two types of data sets are established, a space-time matching optimization model between the two types of data sets is established, and parameters are solved to finish millimeter wave radar calibration. The millimeter wave radar calibration data collection and the target detection frame coordinate calibration data collection can be carried out simultaneously or sequentially (not in sequence).
2. Data acquisition: video image data in the visual field range of the camera are obtained by using a video camera arranged on the road side, and multi-target vehicle data in the radar visual field range, including the ID of each vehicle target and millimeter wave radar data (positioning coordinate, speed, course angle and the like in a world geographic coordinate system), are obtained by using a millimeter wave radar arranged on the road side. And transmitting data (including video image data and millimeter wave radar positioning data) acquired by the video camera and the millimeter wave radar to a central computing server for subsequent computation.
3. Vehicle target identification and feature extraction: and the central computing server receives data (including video image data and millimeter wave radar positioning data) uploaded by the video camera and the millimeter wave radar, and then carries out vehicle target identification and feature extraction. For video image data: the method comprises the steps of performing target recognition on multiple target vehicles in a camera view range by using a pre-trained deep learning target detection algorithm, extracting target vehicle detection frame images (video image parts in a target detection frame selection range, belonging to a subset of a complete video image) with high confidence coefficient (the confidence coefficient is greater than 0.6), and calculating each target vehicle image feature (including vehicle color, vehicle type, size and license plate information (unnecessary)) and geospatial feature, namely positioning data (world geographic coordinates). For millimeter wave radar data: extracting the geospatial features of the target vehicle, comprising: vehicle world geographic coordinates, speed, heading angle, etc.
4. Feature matching and data fusion: the method comprises the steps of matching the geospatial characteristics of the multi-target vehicles acquired by the video camera and the millimeter wave radar, and fusing positioning data acquired by the video camera and the millimeter wave radar by adopting a multi-sensor data fusion method to improve the positioning data precision.
5. Vehicle weight tracking:
1) Vehicle re-tracking within a single video camera field of view: the method comprises the steps of identifying vehicles in the visual field of a single video camera and marking a target frame by using a pre-trained deep learning target detection algorithm, giving IDs to multi-target vehicles by using a Deepsort multi-target tracking algorithm, and re-identifying and matching the target frames among different frames to realize re-tracking of the vehicles in the visual field of the single video camera.
2) Vehicle re-tracking across video cameras: when a certain target vehicle V x When the camera 1 loses the visual field, the vehicle weight recognition system records the target vehicle V x The image characteristics and the geographic space characteristics of the vehicle before the loss are transmitted into a lost vehicle database, and the lost target vehicle V x When the vehicle enters the visual field of the camera 2, the vehicle weight recognition system captures vehicle features (including image features and geographic spatial features) and performs feature matching in the vehicle loss database, ID weight giving is performed on a vehicle with high similarity (the similarity is greater than 0.5), and when the ID weight is given, the lost vehicle ID with the highest similarity in the lost database is taken for ID weight giving, so that vehicle weight tracking is realized. If no matching vehicle is found in the vehicle loss database (i.e., all the missing vehicle feature similarities with the vehicle loss database are less than 0.5), then the target vehicle V is selected x A new ID is assigned.
Brief description of the drawings
FIG. 1 is a flow diagram of a conventional vehicle re-identification technique route;
FIG. 2 is a flow chart of a vehicle re-identification technology route of the present invention;
FIG. 3 is a schematic diagram of a sensor layout;
FIG. 4 is a schematic view of camera calibration;
FIG. 5 is a schematic view of a camera calibration image;
FIG. 6 is a schematic diagram of coordinate transformation of a camera image inspection vehicle target frame;
FIG. 7 is a flowchart of a model for time-synchronized optimization of target detection frame data and vehicle RTK positioning data;
FIG. 8 is a flowchart of a spatial error calibration of target detection frame data and vehicle RTK positioning data;
FIG. 9 is a schematic diagram illustrating a rotation offset angle and target detection of the millimeter wave radar;
FIG. 10 is a schematic diagram illustrating world geographic coordinate calculation of a millimeter wave radar detection target;
FIG. 11 is a flowchart of a millimeter wave radar detection data and vehicle RTK positioning data time synchronization optimization model;
FIG. 12 is a flowchart of a spatial calibration of millimeter wave radar detection data and vehicle RTK positioning data;
FIG. 13 is a schematic view of a single video camera vehicle re-identification;
FIG. 14 is a cross video camera vehicle identification flow diagram;
FIG. 15 is a schematic view of vehicle re-identification across video cameras.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a multi-target vehicle detection and re-identification method based on radar vision fusion, the general technical route is shown as the attached figure 2, the specific process is divided into five steps, and the method comprises the following steps: the method comprises the following steps of sensor layout and data acquisition, sensor calibration (a video camera and a millimeter wave radar), multi-sensor data fusion and vehicle re-identification (a single video camera and a cross-video camera).
The first step is as follows: sensor layout and data acquisition
The invention realizes the detection, tracking, data fusion and vehicle weight identification of multiple vehicle targets by means of two types of sensor data of a video camera and a millimeter wave radar. The video camera adopts a directional camera, and the millimeter wave radar adopts a long-distance radar with a 79GHz frequency band. The basic layout scheme is as shown in figure 3, the video camera and the millimeter wave radar are arranged at the same position of the roadside upright rod (the distance between longitude and latitude coordinates of the two sensors is ensured to be less than 0.5 m), and the world geographic coordinates of the two sensors are kept consistent. The detection range of the sensor can be adjusted according to the installation height and the angle, and when the installation height of the sensor is 6 meters, the downward inclination and depression angle is 10 degrees, and no other shelters exist in a scene, the visual field range can reach 100-150m. The two sensors acquire data at the frequency of 25Hz, and access the central server to store and process the data. For video image data: and performing target recognition on multiple target vehicles in the visual field range of the camera by using a pre-trained deep learning target detection algorithm, and extracting image characteristics (including vehicle color, vehicle type, size and license plate information (unnecessary)) and geospatial characteristics (world geographic coordinates) of each target vehicle. For millimeter wave radar data: and extracting the geospatial characteristics (including the world geographic coordinates, the speed, the heading angle and the like of the vehicle) of the target vehicle.
Variant scheme A: the millimeter wave radar and the video camera are arranged on different road side vertical rods, and the coincidence rate of the fields of vision of the two sensors is more than 90%. The sensing range of the sensing area of the road section is the intersection of the visual field range of the video camera and the millimeter wave radar.
Variant scheme B: the millimeter wave radar and the video camera are arranged on the same portal frame above the road, and the coincidence rate of the visual fields of the two sensors is more than 90 percent. The sensing range of the sensing area of the road section is the intersection of the visual field range of the video camera and the millimeter wave radar.
Variant protocol C: the millimeter wave radar and the video camera are arranged at different portal frames above the road, so that the coincidence rate of the visual fields of the two sensors is more than 90 percent. The sensing range of the sensing area of the road section is the intersection of the visual field range of the video camera and the millimeter wave radar.
The second step is that: calibration
(1) Video camera calibration
The calibration of the video camera in the invention refers to establishing a mapping relation between a pixel coordinate system in an image picture of the video camera and a world geographic coordinate system, namely a homography transformation matrix H, as shown in a formula (1). The world geographic coordinate system also selects a reference coordinate system, referred to as the world geographic coordinate system, to describe the position of the camera and the object in the environment. The world geographic coordinate system is defined in the invention as the WGS-84 coordinate system in the geographic coordinate system. In the invention, the world geographic coordinates are acquired by an RTK device. The high-precision GPS measurement must adopt a carrier phase observation value, and the RTK equipment adopts a differential positioning technology, namely a real-time dynamic positioning technology based on the carrier phase observation value, and can provide a three-dimensional positioning result of a measuring station in a specified coordinate system in real time and achieve centimeter-level precision. Because of its high positioning accuracy, the world geographic coordinates acquired by RTK are considered as a relative true value in the present invention. The pixel coordinate system represents the position of an image pixel in an image, and usually, the upper left pixel of the image is taken as an origin, the right direction is defined as a positive x-axis direction, the lower direction is defined as a positive y-axis direction, and the horizontal and vertical pixel coordinates represent the number of pixels from the y-axis and the x-axis, respectively. Equation (2) describes that the world geographic coordinate system and the pixel coordinate system are obtained by homography matrix transformation, wherein longitude represents longitude, latitude represents latitude, x represents pixel coordinate abscissa, and y represents pixel coordinate abscissa. The equations (3) and (4) are conversion calculation equations.
As shown in fig. 4 and 5, the handheld RTK rover station is placed within the field of view of the video camera and at least 4 non-collinear key points are respectively selected for acquiring the world geographic coordinates of the handheld RTK rover station. And selecting the pixel coordinate of the bottom of the handheld rod of the handheld RTK mobile station from the camera image as the pixel coordinate corresponding to the positioning point. And finally, calculating a homography transformation matrix between the two coordinate systems, establishing a mapping relation between the pixel coordinates of the video image and the world geographic coordinates, and completing parameter calibration of the video camera.
(3) Target detection frame coordinate calibration
And calibrating the coordinates of the target detection frame to establish a mapping relation between the pixel coordinates of the vehicle target detection frame in the camera image and the geographic coordinates of the world of the vehicle. The pixel coordinates of the vehicle target detection frame refer to the pixel coordinates of the midpoint pixel at the lower bottom edge of the detection frame. (Camera located right above the road)
Variant protocol A: the camera is positioned on the left upper side of the road, and the pixel coordinates of the vehicle target detection frame refer to the pixel coordinates of the vertex pixels at the right lower side of the detection frame.
Variant scheme B: the camera is positioned on the road to the upper right, and the pixel coordinates of the vehicle target detection frame refer to the pixel coordinates of the vertex pixel at the lower left of the detection frame.
The target detection frame is obtained by a deep learning target detection algorithm, the data frequency is 25Hz, the vehicle world geographic coordinates are obtained by the vehicle-mounted RTK, and the data frequency is 5Hz. The method comprises the steps of additionally installing a vehicle-mounted RTK for an experimental test vehicle to obtain real-time world geographic coordinate data of the vehicle, driving the vehicle into a visual field range of a camera, and using a deep learning target detection algorithm to label and detect the experimental vehicle in a video image to obtain pixel coordinates of the experimental vehicle. And calculating the world geographic coordinates of the detection target vehicle according to the homography transformation matrix corresponding to the calculated pixel coordinates of the detection target of the camera and the world geographic coordinates. Clock asynchrony between different sensors is a ubiquitous phenomenon, and in addition, a world geographic coordinate converted by a pixel coordinate of a target detection frame is different from a real world geographic coordinate of a vehicle, so that a space-time matching optimization model of two types of world geographic coordinates needs to be established, wherein the space-time matching optimization model comprises a time synchronization optimization model and a spatial error calibration part.
As shown in fig. 7, in the time synchronization optimization model, two types of data acquisition frequencies are required to be unified first. And (3) the world geographic coordinates acquired by the vehicle-mounted RTK are up-sampled by adopting a linear interpolation method, so that the sampling frequency is consistent with the data frequency of the target detection frame. The method comprises the steps of establishing an objective function of a time synchronization optimization model as shown in a formula (5), and searching a group of appropriate parameters delta t by using an optimization algorithm to enable the Euclidean distance of two world geographic coordinates on a time sequence to be minimum.
Wherein, longituude bb_t And latitude bb_t Representing world geographic coordinates corresponding to pixel coordinates of target detection frame at time t rtk_t+Δt And latitude rtk_t+Δt Representing the world geographic coordinates of the on-board RTK measurement at time t + Δ t.
As shown in fig. 8, in the spatial error calibration, spatial error distributions of two types of world geographic coordinates on a pixel coordinate system are established, an error calculation formula is shown as formula (6), an error spatial distribution curved surface is calculated by a curved surface fitting method, and the world geographic coordinates calculated by the target detection frame are corrected according to formula (7).
Error=(longitude bb_t ,latitude bb_t )-(longitude rtk_t+Δt ,latitude rtk_t+Δt ) (6)
coordinate correction =(longitude bb_t ,latitude bb_t )-Error Curced Surface (7)
(3) Millimeter wave radar calibration
And calibrating the millimeter wave radar by using an experimental vehicle provided with a vehicle-mounted RTK according to the characteristic that the millimeter wave radar is suitable for detecting the moving target. The radar return data are target-level data, including the distance, azimuth, speed, etc. of the detected target. Fig. 9 shows a schematic diagram of target detection of the millimeter wave radar, and the relative coordinates of the experimental vehicle relative to the radar are calculated according to the original data of the millimeter wave radar, and the calculation formulas are shown in formulas (8) to (10).
x=dis·cosθ 1 ·cosθ 2 (8)
y=dis·cosθ 1 ·cosθ 2 (9)
z=-dis·cosθ 1 (10)
Wherein distance represents a distance between the target vehicle and the radar, θ 1 And theta 2 Respectively, a pitch angle and a horizontal angle between the target vehicle and the radar, and x, y and z represent coordinates of the radar detection target in a radar coordinate system.
As shown in fig. 10, there are a pose angular deviation and a spatial position deviation between the radar coordinate system and the world geographic coordinate system, where the pose angular deviation includes deviation angles of 3 directions, which are denoted as α, β, and γ, respectively. The spatial position deviation is the deviation between the origin of the millimeter wave radar coordinate system and the origin of the world geographic coordinate system, namely the coordinates of the millimeter wave radar in the world geographic coordinate system, and are respectively marked as longitude radar ,latitude radar And height radar 。
The pose deviation angle correction matrices in 3 directions are respectively shown as formula (11), formula (12) and formula (13).
The conversion formula of the radar coordinate system to the world geographic coordinate system is shown in formula (14).
Wherein x, y and z represent the coordinates of the radar detection target in the radar coordinate system, and longitude, latitude and height represent the conversion of the coordinates of the radar detection target in the radar coordinate system into world geographic coordinates in the world geographic coordinate system.
Recording the world geographic coordinates of the detection target obtained by radar coordinate calculation, and simultaneously obtaining the vehicle world geographic coordinates obtained by vehicle-mounted RTK (real-time kinematic) and recording the world geographic coordinates as longitudinal rtk ,latitude rtk And height rtk . And establishing two types of space-time matching optimization models of world geographic coordinates, including a time synchronization optimization model and a spatial error calibration.
As shown in fig. 11, in the time synchronization optimization model, two types of data acquisition frequencies are required to be unified first. And (3) up-sampling the world geographic coordinates acquired by the vehicle-mounted RTK by adopting a linear interpolation method, so that the sampling frequency is consistent with the data frequency of the target detection frame. An objective function is established as shown in formula (5), and an optimization algorithm is adopted to search a group of suitable parameters alpha, beta, gamma and delta t, so that the Euclidean distance of two world geographic coordinates on a time sequence is minimized.
Wherein, longitude t And latitude t The coordinate of the radar detection target in the radar coordinate system at the moment t is converted into a world geographic coordinate in the world geographic coordinate system, longitude rtk_t+Δt And lattitude rtk_t+Δt Vehicle-mounted RTK (real time kinematic) for representing t + delta t momentThe world geographic coordinates of the vehicle are obtained.
As shown in fig. 12, in the spatial error calibration, spatial error distribution of two types of world geographic coordinates on a pixel coordinate system is established, an error calculation formula is shown as formula (14), an error spatial distribution curved surface is calculated by a curved surface fitting method, and the world geographic coordinates calculated by the target detection frame are corrected according to formula (15).
Error=(longitude t ,latitude t )-(longitude rtk_t+Δt ,latitude rtk_t+Δt ) (16)
coordinate correction =(longitude t ,latitude t )-Error Curved Surface (17)
The third step: video image target identification and feature extraction
And (3) carrying out target recognition on the multiple target vehicles in the visual field range of the camera by using a pre-trained deep learning target detection algorithm, extracting image characteristics (including vehicle color, vehicle type, size, license plate information (unnecessary)) and geographic space characteristics (world geographic coordinates) of each target vehicle, and filling a column with a null value when the license plate information cannot be acquired. For millimeter wave radar data: and extracting the geospatial characteristics (including the world geographic coordinates, the speed, the heading angle and the like of the vehicle) of the target vehicle.
The fourth step: feature matching and data fusion
And matching the video with target vehicle data acquired by the millimeter wave radar according to the geospatial characteristics, and fusing the video with vehicle speed and geographic position data acquired by the millimeter wave radar by adopting a multi-sensor data fusion method, so that the data precision is improved. In the fusion algorithm, high-precision positioning coordinates acquired by vehicle-mounted RTK of an experimental vehicle are used as a reference true value, and fusion calculation is performed on geographic position data of the multi-target vehicle by adopting a method of sensor data fusion such as Kalman filtering, multi-Bayesian estimation, fuzzy logic inference and a deep neural network.
The Kalman filtering algorithm is one of algorithms widely applied to multi-source data fusion. The Kalman filtering is a recursive filtering algorithm, and is characterized in that past historical information does not need to be stored, and a new estimation value is obtained by combining new data with an estimation value obtained in a previous frame (or previous moment) and a state equation of a system per se according to a certain mode. The kalman filtering principle can be expressed by the following 5 formulas:
and (3) prediction:
updating:
wherein, the meaning of each parameter is as follows:
f: a state transition matrix;
b: a control matrix;
p: a state covariance matrix;
q: a state transition covariance matrix;
h: observing the matrix;
r: observing a noise variance;
presume the state variable of this moment from t-1 moment, have not already made corrections with the observation value of this moment;
presume the state variable of this moment from the last moment, have already made corrections with the observation value of this moment;
z: actual observed values;
k: a Kalman coefficient;
t: at time t.
The fifth step: vehicle weight tracking
1) Vehicle re-tracking within a single video camera field of view: and recognizing the vehicles in the visual field of the single video camera and marking a target frame by using a pre-trained deep learning target detection algorithm, giving IDs to the multiple target vehicles by using a Deepsort multi-target tracking algorithm, and re-recognizing and matching the target frames among different frames to realize re-tracking of the vehicles in the visual field of the single video camera. The schematic diagram of the re-recognition result is shown in fig. 13.
The Deepsort algorithm is proposed by Nicolai Wojke et al, which is an improvement over the Sort algorithm. The Sort and Deepsort algorithms are commonly used algorithms in multi-Object Tracking (Multi Object Tracking MOT), wherein the Sort algorithm uses a simple Kalman filtering process to process the relevance of frame-by-frame data and uses the Hungarian algorithm to perform relevance measurement, and the simple algorithm obtains good performance at a high frame rate. However, since the Sort algorithm ignores the surface features of the detected object, the method is accurate only when the uncertainty of the object state estimation is low, in the Deepsort algorithm, a more reliable metric is used for replacing a correlation metric, a CNN neural network is used for training in a large-scale data set, the features are extracted, and the robustness of the network to loss and obstacles is improved.
The Deepsort algorithm uses the construction detector and tracker for target re-identification. The detector is input by the target detection box, and the tracker is used for tracking and re-identifying the target. The Deepsort algorithm inputs include: target detection frame, target detection confidence, target features (image features and motion features). The target detection confidence coefficient is mainly used for screening a part of detection frames, and the target detection frames and the target features are used for tracker construction and subsequent tracking calculation. In the algorithm prediction module, a Kalman filter is used for predicting the tracker, wherein a model constructed by the Kalman filter is a uniform motion and linear observation model. The algorithm updating module comprises matching, tracker updating and target characteristic set updating.
A cascade matching algorithm: one tracker is assigned for each detector, and each tracker sets a timer parameter. If the tracker is done matching and updating is done, the timer parameter is reset to 0, otherwise it is incremented by 1 unit value. In cascade matching, the trackers are sequenced according to the parameters of the timer, the tracker with smaller parameters is matched first, and the tracker with larger parameters is matched later. I.e. giving the tracker that matched the first frame with high priority, and giving the tracker that did not match for several frames with low priority.
And (4) feature comparison, namely introducing the Mahalanobis distance and the cosine distance for comparison of the motion information and the image information. The Mahalanobis distance avoids the risk of different variances of data characteristics in the Euclidean distance, and a covariance matrix is added in calculation, so that the variance is normalized, and the distance value is more consistent with the data characteristics and the actual significance. Mahalanobis distance is a distance measure in the measure of dissimilarity, and unlike mahalanobis distance, cosine distance is a similarity measure. The former is for location discrimination and the latter is for direction. When cosine distances are used, it can be used to measure the difference between dimensions of different individuals. By combining the two types of characteristic distances, the difference of different target characteristics can be comprehensively measured on the whole by the calculation result.
And (3) feature matching: and matching the target characteristics between different image frames based on the Mahalanobis distance and the cosine distance characteristics to complete the process of target re-identification (ID transmission).
2) Vehicle re-tracking across video cameras: the overall logic flow diagram for vehicle cross-video camera re-tracking is shown in fig. 14. When the target vehicle is lost in the visual field of the camera 1, the system records the image characteristics and the geographic space characteristics of the vehicle of n frames (n is more than or equal to 10) before the target vehicle is lost, and transmits the image characteristics and the geographic space characteristics into a lost vehicle database, wherein the geographic space characteristics comprise the vehicle speed and the driving direction characteristics acquired by the millimeter wave radar besides the world geographic coordinates. After the vehicle is lost, the predicted vehicle speed of the vehicle is calculated by adopting methods such as linear fitting, kalman filtering and the like, the predicted position of the vehicle is calculated according to the driving direction before the vehicle is lost, and the lost database is dynamically updated. When a target vehicle enters the visual field of the camera 2, the system captures the characteristics of the previous n frames (n is more than or equal to 10) when the target vehicle appears, performs characteristic matching in the vehicle loss database, performs ID heavy endowment on the vehicle with high similarity, and realizes heavy tracking of the vehicle. If there is no matching vehicle in the vehicle loss database, a new ID is assigned to the target vehicle. In the invention, the license plate information is not necessary, and if the environmental factors are good (illumination, shooting angle and the like), the camera can identify partial license plate information (such as key characters, license plate colors and the like), and the dynamic weight is given to the camera according to the confidence of the identification result of continuous frames in the visual field range of the single camera, so that the camera can be used as a check basis for multi-target vehicle retracing. Finally, cross-video camera re-tracking of the multi-target vehicle is achieved, and the schematic result is shown in fig. 15.