CN112750150B

CN112750150B - Vehicle flow statistical method based on vehicle detection and multi-target tracking

Info

Publication number: CN112750150B
Application number: CN202110061609.XA
Authority: CN
Inventors: 杜建超; 沙洁韵; 谢倩楠; 韩硕; 曹博豪; 肖嵩
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-01-18
Filing date: 2021-01-18
Publication date: 2023-04-07
Anticipated expiration: 2041-01-18
Also published as: CN112750150A

Abstract

The invention discloses a traffic flow statistical method based on vehicle detection and multi-target tracking, and mainly solves the technical problems of poor real-time performance and low reliability of the conventional statistical method. The scheme comprises the following steps: firstly, positioning the position of a vehicle in a video frame by adopting an improved lightweight yolov5 detection model, and predicting the position of the vehicle in the next frame by using a Kalman filtering tracker; then, matching the prediction result with the detection result through a Hungarian matching algorithm, and determining the successfully matched vehicle number and the motion trail of the front frame and the rear frame; and finally, judging whether the vehicles pass through the counting lines or not according to the intersection condition of the motion track and the preset virtual counting lines, and further counting the number of the vehicles. The invention reduces the parameter quantity and the calculation times of the network model, and improves the real-time performance of traffic flow statistics on the premise of ensuring the vehicle detection accuracy; the reliable statistical result of the road traffic flow can be efficiently obtained.

Description

Vehicle flow statistical method based on vehicle detection and multi-target tracking

Technical Field

The invention belongs to the technical field of image processing, relates to computer vision and moving target tracking technologies, and particularly relates to a traffic flow statistical method based on vehicle detection and multi-target tracking, which can be used for an intelligent traffic system.

Technical Field

With the development of social economy, the popularization rate of vehicles in individual families is gradually increased, the road congestion situation is increasingly serious, and the following frequent traffic safety accidents become one of the urban construction problems to be solved urgently. An intelligent traffic system is established by means of real-time and accurate traffic flow statistical results, the utilization rate of the current road can be accurately evaluated, traffic scheduling is effectively guided, and the occurrence of congestion conditions and traffic accidents is reduced.

Compared with the traditional equipment using an induction coil, an infrared detector and the like, the traffic flow statistical method based on the video and adopting the image processing technology has the advantages of lower cost and more traffic resource saving. The method is mainly characterized in that a detection model is used for identifying and positioning vehicles in a video frame, and the position of the vehicle in the next frame is determined by combining a tracking algorithm, so that the aim of accurately counting the number of the vehicles appearing in the video is fulfilled.

In a paper published by Wang Shaya, "a new algorithm for detecting and tracking a moving vehicle based on the combination of interframe difference and optical flow technology" (computer application and software, 2012 (5): 117-120.), it is proposed that the moving region of a moving object is detected by using an interframe difference method, then the optical flow at a position which is not zero in a difference image is calculated, and then the tracking of the moving object is realized by using the optical flow field of the optical flow. However, the calculation amount of the frame-to-frame difference method is large, the requirement of real-time detection cannot be met, and the error between the traffic flow statistical result obtained by the method and the actual quantity is large, so that the engineering application is influenced.

In a patent application document with the name of 'a vehicle detection and tracking-based traffic flow statistical method', and the publication number of CN 108615365A, a traffic flow statistical method combining vehicle detection and tracking is provided, firstly, a coordinate system conversion formula from an image to the world is determined through camera calibration, then a cascade classifier based on Haar-like characteristics and an Adaboost algorithm is adopted to segment a vehicle and a background, a compressed tracking algorithm and a naive Bayes classifier are adopted to determine a tracked vehicle, and then the distance from the tracked vehicle to a set reference line is judged for counting. The detection module of the method adopts a traditional Haar-like feature extraction algorithm, a sliding window is needed to traverse the whole image and continuously adjust the size of the window to detect the vehicle, the algorithm complexity is too high, the real-time performance of the vehicle flow statistics is poor, the traditional Haar-like feature is a shallow feature, compared with the deep feature of an automatic learning target of a current popular deep learning algorithm, the accuracy of detecting the vehicle is low, and the vehicle flow result counted by the method is unreliable.

Disclosure of Invention

The invention aims to provide a vehicle flow counting method based on vehicle detection and multi-target tracking, which aims to overcome the defects of the prior art, adopts an improved lightweight yolov5 detection model to position the vehicle in a video frame, determines the motion track of each vehicle in the video frame by combining a tracking algorithm, judges whether the vehicle passes through a virtual counting line according to the track, further realizes accurate, real-time and rapid vehicle flow counting, and improves the real-time performance of the vehicle flow counting on the premise of ensuring the vehicle detection accuracy. In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

(1) Inputting a road video recorded by a camera, wherein each frame of the video is numbered as i, and i =1, 2.. N, n represents the last frame of the road video;

(2) Making the vehicle tracking number list empty and the traffic flow statistic value zero, and initializing a tracker list TL; meanwhile, a virtual counting line is defined at one third of the road section to be counted; taking i =1;

(3) Acquiring an ith frame of a road video;

(4) Detecting the vehicle in the ith frame to obtain a detection list DL;

(4a) The existing one-stage detection model yolov5 is improved to obtain a lightweight detection model:

firstly, replacing a cross-stage local network module of a yolov5 backbone part by using a ghost layer with the step length of 1, and respectively replacing a convolutional layer with the step length of 1 and a convolutional layer with the step length of 2 in the yolov5 backbone part by using the ghost layer with the step length of 1 and the ghost layer with the step length of 2; finally, reducing the number of the cross-stage local network modules in the yolov5 detection layer part to 1, and reducing the number of convolution kernels by half; obtaining a final lightweight detection model;

(4b) Training a lightweight detection model:

(4b1) Collecting road images shot by a camera, and marking the types and positions of vehicles in the images to be used as a training data set;

(4b2) Sending the training data set into a lightweight detection model for feature extraction training to obtain a trained detection model;

(4c) Acquiring a detection list DL:

extracting the characteristics of the current frame by using the trained detection model to obtain a current frame vehicle detection result, and storing the current frame vehicle detection result as a detection list DL;

(5) Predicting the vehicle position in the ith frame to obtain a prediction list PL;

(5a) Calculating the number of trackers in the tracker list, initializing the Kalman filtering trackers with the same number, and distributing corresponding tracking number indexes;

(5b) The Kalman filtering tracker predicts the vehicle position in the current frame and stores all prediction results as a prediction list PL;

(6) Copying a tracker list TL, and recording as ML;

(7) Updating the tracker list TL according to the detection list DL and the prediction list PL;

(7a) Calculating the intersection ratio of each detection vector in the detection list DL and the prediction list PL and a rectangular frame represented by the prediction vector to obtain an intersection ratio matrix IOU;

(7b) And taking the IOU as a weight parameter, and matching each detection vector with each prediction vector by using a Hungarian algorithm to obtain the following three matching results:

the first method comprises the following steps: matching the detection vector and the prediction vector successfully;

and the second method comprises the following steps: detecting that the vector does not match the predicted vector;

and the third is that: the prediction vector does not match the detection vector;

(7c) And selecting different modes to update the tracker list TL according to different matching results:

for the first matching result: updating the optimal estimation value of the Kalman filter by using the detection vector as the observed value of a new Kalman filter to obtain a final tracking vector, and updating trackers with the same tracking serial numbers as the predicted vectors in the tracker list by using the tracking vector to obtain an updated tracker list TL';

for the second kind of matching result: establishing a Kalman filtering tracker by taking the detection vector as an initial tracking state of the vehicle motion, adding the Kalman filtering tracker into a tracker list TL, and distributing a tracking number index to the tracker list TL to obtain an updated tracker list TL';

for the third match result: deleting trackers with the same number as the predictive vector from the tracker list TL to obtain an updated tracker list TL';

(8) Judging whether the vehicle passes through the virtual counting line or not, and updating traffic flow statistical information:

(8a) Respectively obtaining each tracking vector in the copied tracker list ML and the updated tracker list TL', and calculating to obtain a central point coordinate corresponding to each tracking vector;

(8b) Connecting the coordinates of the central points obtained by calculating the tracking vectors with the same tracking numbers in the ML and the TL' to obtain the motion tracks of all vehicles in the current frame;

(8c) Selecting a vehicle with a motion track intersected with the virtual counting line, judging whether a tracking number of the vehicle exists in a tracking number list, if so, considering that the vehicle is counted, and not counting repeatedly; if the traffic flow statistic value does not exist, adding one to the traffic flow statistic value to obtain updated traffic flow statistic information;

(9) Judging whether i is equal to n, if so, executing the step (10); otherwise, adding 1 to i and returning to the step (2);

(10) And finishing detection, and outputting the updated traffic flow statistical information as a final statistical result.

Compared with the prior art, the invention has the following advantages:

firstly, because the yolov5 detection model is adopted in the vehicle detection part, the model has higher accuracy compared with other target detection algorithms, so that the vehicle detection accuracy is further improved compared with the traditional detection method, and the effect of accurately counting the traffic flow is achieved;

secondly, the yolov5 detection model is improved by adopting a lighter network structure, and on the basis of keeping higher detection accuracy, the model parameters and the calculation complexity are greatly reduced, so that the vehicle detection time is shortened, and the real-time performance of traffic flow statistics is improved;

and thirdly, the vehicle detection and the multi-target tracking are fused, and the detection result is matched with the tracking prediction result, so that the motion trail of each vehicle in the video is confirmed, thereby avoiding the phenomena of omission and repeated statistics which possibly occur in the traffic flow statistics process and ensuring the reliability of the flow statistics result.

Drawings

FIG. 1 is a flow chart of an implementation of the method of the present invention;

figure 2 is a diagram of the lightweight yolov5 network architecture improved in the process of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to the attached figure 1, the invention provides a traffic flow statistical method based on vehicle detection and multi-target tracking, which comprises the following steps:

step 1, inputting a road video recorded by a camera, wherein each frame of the video is numbered as i, and i =1, 2.. N, n represents the last frame of the road video;

step 2, initialization:

(2a) Initializing a tracker list TL using a Kalman filter tracker; initializing a vehicle tracking number list to be null; initializing a traffic flow statistic value to be zero;

(2b) Selecting one third of the road to demarcate a virtual counting line;

(2c) Taking i =1, and entering step 3;

step 3, acquiring the ith frame of the road video;

step 4, detecting the vehicle in the ith frame to obtain a detection list DL;

firstly, replacing a cross-stage local network module of a yolov5 backbone part by using a ghost layer with the step length of 1, and respectively replacing a convolutional layer with the step length of 1 and a convolutional layer with the step length of 2 in the yolov5 backbone part by using the ghost layer with the step length of 1 and the ghost layer with the step length of 2; finally, reducing the number of cross-stage local network modules in the yolov5 detection layer part to 1, and reducing the number of convolution kernels by half; obtaining a final lightweight detection model; on the basis of keeping the high detection accuracy of yolov5, the improved method greatly reduces the parameter number and the calculation complexity of the model, improves the vehicle detection speed and improves the real-time performance of traffic flow statistics. The ghost layer with the step length of 1 obtains an original characteristic diagram by carrying out nonlinear operation on an input original image, and adopts channel separable convolution to carry out linear characteristic extraction again based on the original characteristic diagram to obtain a new characteristic diagram with the same dimension as the original characteristic diagram; adding the original characteristic diagram and the new characteristic diagram to obtain a complete output characteristic diagram; and the ghost layer with the step size of 2 is used for down-sampling the output characteristic diagram.

The overall framework of the lightweight detection model obtained through improvement is divided into two parts: a backbone portion and a detection layer portion.

And the zeroth layer of the backbone part is a pixel separation layer, pixels of the input image are extracted according to the horizontal and vertical coordinate intervals, and the image is subjected to down sampling once. The first layer is a convolution layer with the step length of 1, and normalization and activation operations are performed after the common convolution is performed on the output characteristic diagram of the previous layer. The second layer is a ghost layer with the step length of 1 and comprises two ghost modules, wherein the first ghost module firstly carries out a group of common convolution, normalization and activation operations on the output feature graph of the previous layer to obtain a group of original feature graphs. Based on the original feature maps, a simple channel separable convolution is adopted to perform linear feature extraction again, and a new feature map with the same dimension is obtained. And adding the original characteristic diagram and the new characteristic diagram to obtain a complete characteristic diagram serving as the input of a second ghost module. And the second ghost module performs the same operation, and performs residual error connection on the output characteristic diagram and the input characteristic diagram of the second layer to obtain a final output characteristic diagram. The third layer is a ghost layer with the step size of 2, and the structure of the third layer is the same as that of the ghost layer with the step size of 1, but the difference is that a depth separable convolution with the step size of 2 is added after the first ghost module, and the feature map is subjected to downsampling once. And the fourth layer to the seventh layer, wherein the ghost layer with the step size of 1 and the ghost layer with the step size of 2 are alternately constructed. And the eighth layer is a spatial pyramid pooling structure, the input feature map is subjected to maximal pooling in three scales, the output feature map and the eighth layer input feature map are subjected to feature map splicing, and then the convolution layer with the step length of 1 is used for feature extraction normalization and activation. The ninth layer is the last layer of the backbone part and is a ghost layer with the step length of 1 for feature extraction.

The detection layer part is a tenth to a twenty-fourth layer. And the tenth layer is a convolution layer with the step length of 1, and after convolution normalization activation operation, the output characteristic diagram is subjected to primary up-sampling, and the obtained output characteristic diagram is spliced with the output characteristic diagram of the sixth layer. And inputting the splicing result into a thirteenth layer cross-stage local network module for feature extraction. The fourteenth to seventeenth layers repeat the operations of the tenth to thirteen layers. And the eighteenth layer is a convolutional layer with the step length of 2, performs one-time downsampling operation on the input feature map, and splices the sampling result with the output of the fourteenth layer. Feature extraction is then performed by the twentieth layer cross-phase local network. The twenty-first to twenty-third layers repeat the operations of the eighteenth to twenty-third layers. And finally, fusing output feature maps of a seventeenth layer, a twentieth layer and a twentieth layer in the detection layer part to perform regression of the target category and the target position.

(4b) Training a lightweight detection model:

(4b1) Collecting road images shot by a camera, specifically urban and suburban road images shot by a camera erected at a high position, and marking the types and positions of vehicles in the images to be used as a training data set; the vehicles in the images here include common vehicles such as trucks, cars, buses, and the like.

(4c) Acquiring a detection list DL:

extracting the characteristics of the current frame by using the trained detection model to obtain a current frame vehicle detection result, and storing the current frame vehicle detection result as a detection list DL; specifically, the feature extraction is performed on the current frame, and whether the obtained feature vector is a vehicle or a background is judged, so that a vector [ Dx1, dy1, dx2, dy2] describing vehicle position information is finally obtained, wherein the vector represents coordinates of the upper left corner and the lower right corner of a rectangular frame surrounding a vehicle, namely a detection result.

Step 5, predicting the vehicle position in the ith frame to obtain a prediction list PL;

(5a) Calculating the number of trackers in the tracker list, initializing Kalman filtering trackers with the same number, and distributing corresponding tracking numbers index;

(5b) The Kalman filtering tracker predicts the position of a vehicle in a current frame, and represents the coordinates of the upper left corner and the lower right corner of a rectangular frame surrounding the predicted vehicle in a vector form [ Px1, py1, px2 and Py2] to serve as a prediction result; storing all prediction results as a prediction list PL;

the kalman filter tracker predicts the position of the vehicle in the current video frame in the next frame by using a state prediction equation. The working principle of the kalman filter is that the current vehicle state, i.e. position and speed, satisfies the gaussian distribution, which can be expressed as:

the current position is denoted as p _k Velocity is denoted by v _k The position and velocity of the previous frame are respectively denoted as p _k-1 And v _k-1 The conversion relationship between the two is as follows:

i.e. is>

Here, F _k Is the state transition equation.

The uncertainty and the correlation of the motion system can be described by a covariance matrix, and the covariance matrix method for predicting the current time according to the covariance matrix at the previous time comprises the following steps:

on the basis of which external controls are added to the state of motion of the vehicle, such as acceleration and deceleration, B _k And u _k Referred to as a state control matrix, which indicates how acceleration and deceleration change the state of the vehicle, and a state control vector, which indicates the magnitude and direction of the control. Assuming system state error w caused by external uncertain factors to vehicle _k Obeying a Gaussian distribution w _k :N(0,Q _k ) So far, the following prediction equation of the kalman filter for the vehicle state can be obtained:

step 6, copying a tracker list TL, and recording the tracker list TL as ML;

step 7, updating the tracker list TL according to the detection list DL and the prediction list PL;

and the second method comprises the following steps: the detection vector is not matched with the prediction vector and represents a new vehicle in the detection picture;

and the third is that: the prediction vector is not matched with the detection vector, and the vehicle appearing in the monitoring picture is separated from the current frame;

for the first matching result: updating the optimal estimation value of the Kalman filter by using the detection vector as an observation value of a new Kalman filter to obtain a final tracking vector, and updating the tracker with the same tracking number as the predicted vector in the tracker list by using the tracking vector to obtain an updated tracker list TL';

for the third match result: deleting the tracker with the same number as the prediction vector from the tracker list TL to obtain an updated tracker list TL';

step 8, judging whether the vehicle passes through the virtual counting line, and updating traffic flow statistical information:

(8a) Respectively obtaining each tracking vector in the copied tracker list ML and the updated tracker list TL', and calculating to obtain a central point coordinate corresponding to each tracking vector; the central point coordinate is the coordinate of the central point of the rectangular frame obtained by calculation according to the coordinates of the upper left corner and the lower right corner of the rectangular frame represented by the tracking vector.

in this step, a vehicle with a motion track intersecting with the virtual counter line is selected, and whether the motion track intersects with the virtual counter line needs to be judged, specifically, the following judgment is performed: the motion track of the vehicle is a line segment AB, the virtual counting line is CD, the end point C of the virtual counting line is taken as a vector starting point, and the relationship among the vector CA, the vector CB and the vector CD meets the following formula (CA multiplied by CB) · (CB multiplied by CD) ≥ 0; meanwhile, by taking the end point D of the virtual counting line as a vector starting point, and if the relationship among the vector DA, the vector DC and the vector DB meets the condition that (DA multiplied by DC) · (DC multiplied by DB) ≥ 0, the line segment AB and the line segment CD are judged to be intersected, namely the motion track of the vehicle is intersected with the virtual counting line.

Step 9, judging whether i is equal to n, if so, executing step 10; otherwise, adding 1 to i and returning to the step (2);

and step 10, finishing detection, and outputting the updated traffic flow statistical information as a final statistical result.

The effect of the present invention will be further described with reference to simulation experiments.

1. Simulation conditions are as follows:

the simulation experiment of the invention is carried out in the hardware environment of a display card model number of Nvidia RTX 2080Ti and a video memory 11G and the software environment of Pyorch 1.7, python3.8 and CUDA 11.0.

2. Simulation content:

firstly, collecting road image data sets including cars, trucks and buses for labeling and sorting, and dividing the data sets into a training set and a testing set. An original yolov5 detection model and an improved lightweight detection model are respectively used for training and testing a data set, and the detection effects of the two models are compared as shown in table 1.

Two sections of road videos are collected, the traffic flow in the videos is respectively counted by using the method provided by the invention, and the accuracy and the real-time performance of the counting result are compared. The real-time performance is represented by a frame rate FPS, and a higher frame rate indicates higher real-time performance of traffic flow statistics. The statistical results are shown in table 2.

3. And (3) simulation results:

TABLE 1 comparison of original yolov5 and lightweight test models

Model (model)	Accuracy of	Reasoning speed	Quantity of model parameters	Floating point operands
					Original detection model	0.930	4.4ms	47375432	116
Lightweight detection model	0.920	2.6ms	4543896	10.5

Simulation results show that the lightweight detection model greatly reduces the parameter quantity and the calculation complexity of the model on the basis of high detection accuracy. The model parameter quantity is reduced by 90%, the floating point operand GFLOPS of 10 hundred million times per second is 1/10 of the original one, the reasoning speed is increased by 64%, the accuracy is reduced by one point, and the high detection precision of more than 90% is still maintained.

TABLE 2 statistical comparison result of traffic flow between original detection model and lightweight detection model

As can be seen from the table 2, the lightweight detection model in the invention improves the frame rate of traffic flow statistics, achieves the approximate real-time statistical effect, and has accurate statistical result, which is in line with the reality; the frame rate of the original detection model is 26FPS and is lower than that of the lightweight detection model, and detection omission exists in the detection of the second road video. Therefore, the improved vehicle detection model can improve the accuracy and the real-time performance of vehicle flow statistics, greatly reduces the parameter quantity and the calculation complexity of the model, and has certain engineering value for the transplantation and landing use of subsequent algorithms.

The simulation analysis proves the correctness and the effectiveness of the method provided by the invention.

The invention has not been described in detail in part of the common general knowledge of those skilled in the art.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A traffic flow statistical method based on vehicle detection and multi-target tracking is characterized by comprising the following steps:

(2) Making the vehicle tracking number list be empty and the traffic flow statistic value be zero, and initializing a tracker list TL; meanwhile, a virtual counting line is defined at one third of the road section to be counted; taking i =1;

(3) Acquiring an ith frame of a road video;

(4) Detecting the vehicle in the ith frame to obtain a detection list DL;

(4a) The existing first-stage detection model yolov5 is improved to obtain a lightweight detection model:

(4b) Training a lightweight detection model:

(4c) Acquiring a detection list DL:

(5b) The Kalman filtering tracker predicts the position of a vehicle in the current frame and stores all prediction results as a prediction list PL;

(6) Copying a tracker list TL, and recording as ML;

the detection vector is not matched with the prediction vector and represents a new vehicle in the detection picture; the prediction vector is not matched with the detection vector, and the vehicle appearing in the monitoring picture is separated from the current frame;

for the third match result: deleting the tracker with the same number as the prediction vector in the tracker list TL to obtain an updated tracker list TL';

(8b) Connecting the central point coordinates obtained by calculating the tracking vectors with the same tracking numbers in ML and TL' to obtain the motion tracks of all vehicles in the current frame;

2. The method of claim 1, wherein: in the step (4 a), the ghost layer with the step length of 1 obtains an original characteristic diagram by carrying out nonlinear operation on an input original image, and based on the original characteristic diagram, channel separable convolution is adopted to carry out linear characteristic extraction again to obtain a new characteristic diagram with the same dimension as the original characteristic diagram; adding the original characteristic diagram and the new characteristic diagram to obtain a complete output characteristic diagram; and the ghost layer with the step size of 2 is used for down-sampling the output characteristic diagram.

3. The method of claim 1, wherein: the road image in the step (4 b 1) includes images of urban roads and suburban roads; the vehicles in the image include at least trucks, cars and buses.

4. The method of claim 1, wherein: and (5) the Kalman filtering tracker predicts the position of the vehicle in the next frame in the current video frame through a state prediction equation.

5. The method of claim 1, wherein: the central point coordinate in the step (8 a) is a coordinate of the central point of the rectangular frame obtained by calculation according to the coordinates of the upper left corner and the lower right corner of the rectangular frame represented by the tracking vector.

6. The method of claim 1, wherein: in the step (8 c), the motion trail is crossed with the virtual counting line and is realized according to the following mode: the motion track of the vehicle is a line segment AB, the virtual counting line is CD, the end point C of the virtual counting line is used as a vector starting point, and the relationship among the vector CA, the vector CB and the vector CD meets the following formula (CA multiplied by CB) · (CB multiplied by CD) which is more than or equal to 0; meanwhile, by taking the endpoint D of the virtual counting line as a vector starting point, and if the relationship among the vector DA, the vector DC and the vector DB meets (DA multiplied by DC) · (DC multiplied by DB) ≥ 0, it is determined that the line segment AB and the line segment CD intersect, that is, the motion track of the vehicle intersects with the virtual counting line.