CN113392817A

CN113392817A - Vehicle density estimation method and device based on multi-row convolutional neural network

Info

Publication number: CN113392817A
Application number: CN202110935837.5A
Authority: CN
Inventors: 曾琼; 耿微; 唐聃; 王亚强; 咬登国
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2021-08-16
Filing date: 2021-08-16
Publication date: 2021-09-14

Abstract

The invention discloses a vehicle density estimation method and device based on a multi-column convolutional neural network, wherein the method comprises the following steps: extracting vehicle features in the vehicle picture through a plurality of rows of convolutional neural networks with different sizes to obtain a plurality of different vehicle feature matrixes; generating a multi-column convolution neural network by utilizing the various different vehicle characteristic matrixes; generating a real density map by using a label file in a prestored training vehicle picture, and training the multi-column convolutional neural network by using the real density map to obtain a trained multi-column convolutional neural network; estimating and predicting a vehicle picture by using the trained multi-column convolutional neural network to generate an estimated density map; and carrying out vehicle density estimation on the predicted vehicle picture according to the estimated density map.

Description

Vehicle density estimation method and device based on multi-row convolutional neural network

Technical Field

The invention relates to the technical field of automatic driving automobiles, in particular to a vehicle density estimation method and device based on a multi-column convolutional neural network.

Background

With the development of society and the improvement of economic level, people increasingly pursue good life, the automobile becomes a main vehicle for people to go out at ordinary times, and the automobile can be taken to bring convenience to people to go out and represent the identity and status of the people. However, as the number of automobiles increases rapidly, the problems caused by the large number of automobiles are more and more, such as: traffic jam, traffic hub paralysis, pollution caused by automobile exhaust and traffic accidents. These incidents caused by the increasing number of cars and unreasonable urban transportation cause irreparable damage to people's lives and property. In order to solve these problems, it is impossible to solve the fundamental problem only from the large-scale expansion of urban road construction or from the number of vehicles on one side, and it is necessary to improve the traffic capacity of urban roads on the existing infrastructure. Therefore, more and more cameras are installed on urban roads to ensure that the events occur with low frequency as far as possible, although the monitoring cameras are widely popularized, the time and labor consumption of manual monitoring cannot ensure that the events are monitored in real time 24 hours a day, and therefore abnormal events caused by vehicle aggregation cannot be effectively warned. Therefore, the main effective method for preventing and reducing the occurrence of such events is to develop an intelligent monitoring software and hardware platform to perform intelligent monitoring on urban roads so as to realize intelligent detection and abnormal early warning. Real-time variation of traffic conditions and vehicle density estimation are a fundamental task of intelligent traffic systems. With the development of deep learning, the most accurate results can be predicted by automatic learning in a large amount of monitoring video data, and on this basis, many researchers have begun to apply a deep learning-based method to the vehicle counting problem. However, the existing methods are difficult to meet the requirements when facing the problems of complex scenes, perspective distortion of cameras, diversity of vehicle distribution, limited computing power of edge computing devices and the like. Therefore, solving the problems faced by the vehicle counting method and improving the counting accuracy and the application thereof in practical public scenes is a subject worthy of intensive research.

Traffic state detection and vehicle density estimation are critical to intelligent transportation systems because it plays a key role in dynamic route guidance, event detection, short-term travel time prediction, and various validity measurement parameters. Currently, vehicle density estimation has received a lot of attention. The following technologies are mainly used to solve the problem:

first, the fixed observer method, a fixed observer placed at the roadside (e.g., pneumatic pipes, inductive loop detectors, cameras, microwave sensors, radar), calculates the number of vehicles passing through the observer in a given time interval and, after estimating their average speed, estimates the traffic density.

Second, the moving observer method. Essentially, the moving observer method is implemented by detecting the multiple travel of a vehicle over a road segment, both in the direction of traffic of interest and in the opposite direction.

Although the fixed observer method and countless variant methods thereof produce relatively accurate results in the aspects of vehicle density estimation and traffic state detection, the fixed observer method and countless variant methods thereof are expensive to install and maintain under the conditions of constant vehicle speed and constant vehicle flow, cannot produce real-time results anytime and anywhere, and even cannot be popularized and applied in a large quantity.

To obtain results comparable to the fixed observer approach, the motion detector must make a large amount of motion in both directions. Unfortunately, all of these operations require time during which the density and flow parameters of the vehicle may change. However, this approach ignores that today's vehicles are equipped with a complete set of on-board computing, sensing, and communication devices.

Disclosure of Invention

The technical problems solved by the scheme provided by the embodiment of the invention are the defects of large environment dependence, poor universality, dependence on constant traffic flow and constant vehicle running speed in the existing vehicle density estimation method.

The vehicle density estimation method based on the multi-column convolutional neural network comprises the following steps:

extracting vehicle features in the vehicle picture through a plurality of rows of convolutional neural networks with different sizes to obtain a plurality of different vehicle feature matrixes;

generating a multi-column convolution neural network by utilizing the various different vehicle characteristic matrixes;

generating a real density map by using a label file in a prestored training vehicle picture, and training the multi-column convolutional neural network by using the real density map to obtain a trained multi-column convolutional neural network;

estimating and predicting a vehicle picture by using the trained multi-column convolutional neural network to generate an estimated density map;

and carrying out vehicle density estimation on the predicted vehicle picture according to the estimated density map.

The device for estimating the vehicle density based on the multi-column convolutional neural network comprises the following components:

the characteristic extraction module is used for respectively extracting the vehicle characteristics in the vehicle picture through a plurality of rows of convolutional neural networks with different sizes to obtain a plurality of different vehicle characteristic matrixes;

the generating module is used for generating a multi-column convolutional neural network by utilizing the various different vehicle characteristic matrixes and generating a real density map by utilizing a label file in a prestored training vehicle picture;

the training module is used for training the multi-column convolutional neural network by using the real density map to obtain a trained multi-column convolutional neural network;

and the estimation module is used for estimating a predicted vehicle picture by using the trained multi-column convolutional neural network, generating an estimated density map, and estimating the vehicle density of the predicted vehicle picture according to the estimated density map.

According to the scheme provided by the embodiment of the invention, the vehicle density map of the monitoring video is estimated by utilizing the convolutional neural network, so that not only can the total number of vehicles in the image be obtained, but also the vehicle density map can visually reflect the vehicle distribution in the image, and the vehicle number, the vehicle distribution and the vehicle density can be counted in real time only by deploying the trained network model on the edge equipment.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flowchart of a method for estimating vehicle density based on a multi-column convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an apparatus for estimating vehicle density based on a multi-column convolutional neural network according to an embodiment of the present invention;

FIG. 3 is a technical roadmap for vehicle density estimation based on multi-column convolutional neural networks provided by an embodiment of the present invention;

fig. 4 is a schematic diagram of an original picture, a real density map and an estimated density map according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, and it should be understood that the preferred embodiments described below are only for the purpose of illustrating and explaining the present invention, and are not to be construed as limiting the present invention.

Fig. 1 is a flowchart of a method for estimating vehicle density based on a multi-column convolutional neural network according to an embodiment of the present invention, as shown in fig. 1, including:

step S101: extracting vehicle features in the vehicle picture through a plurality of rows of convolutional neural networks with different sizes to obtain a plurality of different vehicle feature matrixes;

step S102: generating a multi-column convolution neural network by utilizing the various different vehicle characteristic matrixes; it includes: fusing the plurality of different vehicle characteristic matrixes to obtain a fused vehicle characteristic matrix; and generating a multi-column convolution neural network by using the fused vehicle characteristic matrix.

Step S103: generating a real density map by using a label file in a prestored training vehicle picture, and training the multi-column convolutional neural network by using the real density map to obtain a trained multi-column convolutional neural network;

wherein, the label file in the pre-stored training vehicle picture comprises: the method comprises the steps that a plurality of training vehicle pictures are obtained by dividing collected traffic videos according to frames; performing annotation processing on the position of each vehicle in each training vehicle picture to obtain a vehicle group image containing a plurality of vehicle labels, and taking the vehicle group image containing the plurality of vehicle labels as a label file of the training vehicle picture; and saving the label file of the training vehicle picture. Specifically, the vehicle group image containing the plurality of vehicle labels includes:

wherein the content of the first and second substances,w _irepresenting the pixel position of the vehicle in the picture of the training vehicle,

an impact function representing the position of the vehicle in the picture of the training vehicle,Mrepresenting the total number of vehicles in each picture of training vehicles,D(w)indicates one is provided withMAnd (5) vehicle group images of vehicle labels.

Wherein, the generating of the real density map by using the pre-stored label file in the training vehicle picture comprises: and generating a real density map according to the label file of the training vehicle picture and the Gaussian function. Specifically, the true density map includes:

wherein, theD(w)Indicates one is provided withMA group image of the vehicle's vehicle label,

the expression of the function of gaussian is given,

representing a true density map.

Step S104: estimating and predicting a vehicle picture by using the trained multi-column convolutional neural network to generate an estimated density map;

step S105: and carrying out vehicle density estimation on the predicted vehicle picture according to the estimated density map.

The invention also includes: and estimating the number of vehicles in the predicted vehicle picture according to the estimated density map.

Fig. 2 is a schematic diagram of an apparatus for estimating vehicle density based on a multi-column convolutional neural network according to an embodiment of the present invention, as shown in fig. 2, including: the feature extraction module 201 is configured to extract vehicle features in the vehicle picture through multiple rows of convolutional neural networks with different sizes, so as to obtain multiple different vehicle feature matrices; the generating module 202 is configured to generate a multi-column convolutional neural network by using the multiple different vehicle feature matrices, and generate a real density map by using a pre-stored label file in a training vehicle picture; the training module 203 is configured to train the multiple-column convolutional neural network by using the real density map to obtain a trained multiple-column convolutional neural network; the estimation module 204 is configured to estimate a predicted vehicle picture by using the trained multi-column convolutional neural network, generate an estimated density map, and perform vehicle density estimation on the predicted vehicle picture according to the estimated density map.

Wherein the estimation module 204 is further configured to estimate the number of vehicles in the predicted vehicle picture according to the estimated density map.

The generating module 202 is specifically configured to perform fusion processing on the multiple different vehicle feature matrices to obtain a fused vehicle feature matrix, and generate a multi-column convolutional neural network by using the fused vehicle feature matrix.

Fig. 3 is a technical route diagram of vehicle density estimation based on a multi-column convolutional neural network according to an embodiment of the present invention, as shown in fig. 3: the method mainly comprises the following steps:

(1) collecting a traffic video;

(2) preprocessing a video, and making a data set comprising a training set and a prediction set;

(3) generating a real density map according to the label file;

(4) training a parameter-free network model by using a real density map;

(5) estimating a prediction picture by using the trained network model with parameters to generate an estimated density map;

(6) and calculating the total number of vehicles according to the estimated density map.

The vehicle counting network mainly comprises three rows of convolution neural networks, and convolution kernels of the convolution neural networks are different in size. The input of the network is a video frame, the output is a vehicle density map, and the total number of vehicles in the video frame is calculated through integration.

Compared with the method for detecting the number of vehicles by sensors originally and detecting the number of vehicles by using the current popular detection network, the method has the advantages that the training difficulty of the convolutional neural network is reduced due to the correspondence between the road image and the vehicle density map, and the error of vehicle counting is further reduced.

The method for generating the real density graph of the training set picture by the label file comprises the following steps: firstly, generating a matrix with the same size as the original picture, setting all the matrixes as 0, traversing each position coordinate in the label file, setting the corresponding point in the matrix as 1, thus obtaining a matrix with only 0 and 1, wherein 1 represents that the position has a vehicle, and finally performing convolution through a Gaussian kernel function to generate a real density map.

Because the convolutional neural network CNN estimates the vehicle density map from the input image by learning the training data samples, the quality of the density map given in the training data samples largely determines the error of the vehicle counting problem, most solutions to the vehicle counting problem in the image are based on the density map generation method to obtain the counting result, because the generated vehicle density map retains spatial information, the vehicle distribution can be clearly seen through the density map, for example: the vehicles in a certain area are more, and abnormal conditions such as traffic jam and the like in the area can be prevented in advance. The generation of the real vehicle density map is crucial to the counting result, and the current method for generating the real vehicle density map by training data samples mainly comprises the following steps:

if a vehicle exists in a certain training set picture, the position and the number of the vehicles are represented by marking the positions of the vehicles. If the vehicle is marked with a positionw _iThen the point can be represented as

Thus, one includesMThe image of the vehicle may be represented as:

w _ithe table indicates the pixel position of the vehicle in the image,

an impact function representing the position of the vehicle in the image,Mis the total number of vehicles in the image,D(w)indicates one is provided withMAnd (5) vehicle group images of vehicle labels. In the real picture, each car has a certain size, corresponding to a small area in the picture, and a pixel point in the tag file represents the car, which is obviously unreasonable. The function characteristic of the Gaussian kernel function is that the weight is larger when the independent variable is closer to the central point, and the weight is smaller when the independent variable is farther from the central point. Therefore, a Gaussian kernel function is used to replace the pixel value of the central point by the weighted average of the pixel values of the surrounding points, and the weights of the surrounding pixels are added up to be equal to 1; therefore, the total number of vehicles in the generated real vehicle density map is not influenced, and the position characteristics of each vehicle in the space can be reflected in a real mode.

To convert it intoAnd obtaining a continuous vehicle density map through a continuous density function. Will be provided withD(w)And a Gaussian kernel

Convolution, the vehicle density thus obtained is:

D(w)indicates one is provided withMA group image of the vehicle's vehicle label,

representing a gaussian function. However, the vehicle density function is at the vehiclesw _iThis is true on the premise that the images are independent of each other. In fact, eachw _iThe corresponding is the area of the ground occupied by the vehicle in the actual scene, and the vehicles in the picture are not consistent in size due to perspective distortion of the camera. The cars near the lens are larger, and occupy more space pixels, while the cars far from the lens are smaller, and occupy less space pixels. And the pixel points occupied by vehicles in different areas in the image correspond to areas with different sizes in the actual scene.

Assuming that the vehicles are evenly distributed around each vehicle, then the vehicle and its nearest neighborskThe average distance between the vehicles (in the image) is a reasonable estimate of the presence of perspective distortion effects on the image. Thus, different blur radii are set according to each different vehicle size in the image, i.e., parameters are determined according to each vehicle size in the image

. However, it is impossible to accurately obtain the size of each vehicle in an actual scene. In a crowded scene, the size of a vehicle is typically related to the distance between the centers of two adjacent vehicles.

Therefore, to estimate the vehiclew _iThe density of surrounding vehicles is required to be

And

and

the proportional gaussian kernel is convolved to obtain a most accurate vehicle density map:

d _iindicating each vehiclew _iTo itkThe average distance of the nearest neighboring cars,

what is shown is a gaussian kernel,

is a standard deviation of a gaussian kernel,

is a constant value. Thus, generating a true density mapw _iThe occupied pixel points correspond to the areas of the picture occupied by the vehicles in the actual picture.

Therefore, the total number of vehicles in the image can be obtained by summing all the pixel values in the vehicle density map, and the vehicle density map can visually reflect the vehicle distribution in the image. The vehicle density map here includes a true density map generated from the label files in the training set and an estimated density map of the network predictions. Due to perspective distortion of the road monitoring camera, the vehicles in the pictures have the problems of inconsistent scale diversity and proportion and the like. Therefore, the invention solves the problem of large vehicle counting error caused by perspective distortion.

Due to perspective distortion, surveillance videos typically contain vehicles of different sizes, vehicles of different vehicle types, and therefore convolution kernels having the same size are unlikely to capture vehicle features at different scales. Thus, the present invention uses convolution kernels of different sizes to learn the density map mapping from the original video image. In the multi-column convolutional neural network of the present invention, for each column, different sizes of convolution kernels are used to extract different scales of vehicle features. For example: larger size convolution kernels are more useful for extracting features of larger vehicles.

And training the multi-column convolutional neural network provided by the invention by using the preprocessed data samples.

Watch (A)

Multi-column convolution neural network structure for vehicle density map estimation

The structure of the multi-column convolutional neural network for vehicle density map estimation proposed by the present invention is shown in table 1. It contains three single-column convolutional neural networks in which the convolutional kernels are of different sizes. For simplicity, the same net structure is used for all columns, except for the size and number of convolution kernels. The size of an input picture is 224 multiplied by 224, features with different scales in the picture are extracted through three-column networks respectively, then the three features are fused with each other finally, and then a vehicle density map is generated. The execution flow is as follows:

1) a column network:

with 224 × 224 pictures as input, first, a convolution operation with a convolution kernel size of 9 × 9, stride of 1, and padding of 1 is performed on pictures with an input size of 224 × 224 × 3, to obtain a feature matrix of 218 × 218 × 16. This operation may achieve channel expansion, increasing the number of channels of the feature matrix. Next, the feature matrix of 218 × 218 × 16 is input to the maximum pooling layer down-sampling process, and a feature matrix of 109 × 109 × 16 is obtained. Then, the feature matrix is input to a convolution layer having a convolution kernel size of 7 × 7 to obtain a 105 × 105 × 32 feature matrix, and the matrix is subjected to down-sampling to obtain a 53 × 53 × 32 feature matrix. Next, the feature matrix is input to a convolution layer having a convolution kernel size of 7 × 7 to obtain a 49 × 49 × 16 feature matrix, and then the downsampling process is performed on the feature matrix to obtain a 25 × 25 × 16 feature matrix. Finally, the feature matrix is input to a convolution layer having a convolution kernel size of 7 × 7, and a 21 × 21 × 8 feature matrix is obtained.

2) b list network:

with 224 × 224 pictures as input, a convolution operation with a convolution kernel size of 7 × 7 is first performed on pictures with an input size of 224 × 224 × 3, stride is 1, and padding is 1, resulting in a feature matrix of 220 × 220 × 20. This operation may achieve channel expansion, increasing the number of channels of the feature matrix. Next, the feature matrix of 220 × 220 × 20 is input to maximum pooling layer down-sampling processing, and a feature matrix of 110 × 110 × 20 is obtained. Then, the feature matrix is input to a convolution layer having a convolution kernel size of 5 × 5 to obtain a 108 × 108 × 40 feature matrix, and the matrix is down-sampled to obtain a 54 × 54 × 40 feature matrix. Next, the feature matrix is input to a convolution layer having a convolution kernel size of 5 × 5 to obtain a 52 × 52 × 20 feature matrix, and then the downsampling process is performed on the feature matrix to obtain a 26 × 26 × 20 feature matrix. Finally, the feature matrix is input to a convolution layer having a convolution kernel size of 5 × 5, and a 24 × 24 × 10 feature matrix is obtained.

3) c, network:

with 224 × 224 pictures as input, first, a convolution operation with a convolution kernel size of 5 × 5 is performed on the pictures with an input size of 224 × 224 × 3, stride is 1, and padding is 1, so as to obtain a feature matrix of 222 × 222 × 24. This operation may achieve channel expansion, increasing the number of channels of the feature matrix. Next, the 222 × 222 × 24 feature matrix is input to the maximum pooling layer down-sampling process, and a 111 × 111 × 24 feature matrix is obtained. Then, the feature matrix is input to a convolution layer having a convolution kernel size of 3 × 3 to obtain a 111 × 111 × 48 feature matrix, and the matrix is subjected to down-sampling to obtain a 55 × 55 × 48 feature matrix. Next, the feature matrix is input to a convolution layer having a convolution kernel size of 3 × 3 to obtain a 55 × 55 × 24 feature matrix, and then the matrix is down-sampled to obtain a 27 × 27 × 24 feature matrix. Finally, the feature matrix is input to a convolution layer having a convolution kernel size of 3 × 3, and a feature matrix of 27 × 27 × 12 is obtained.

4) The feature matrix of 21 × 21 × 8 obtained by the a-column network, the feature matrix of 24 × 24 × 10 obtained by the b-column network, and the feature matrix of 27 × 27 × 12 obtained by the c-column network are merged to obtain a feature matrix of 27 × 27 × 30. Then, dimension reduction processing is performed on the feature matrix through a convolution layer with a convolution kernel size of 1 × 1, and a feature matrix of 27 × 27 × 1 is obtained. And finally, visually displaying the characteristic matrix to obtain an estimated vehicle density map.

In order to improve the training effect and generalization capability of the model, the method performs data enhancement on the picture sample, and divides the original picture into 9 pictures with equal size. 90% of the training set pictures were used as training samples and the remaining 10% were used as validation. Learning rate was set to 0.00001 and momentum to 0.9, using Adam optimization. The model weight initialization is Kaiming weight initialization, the mean is 0 and the standard deviation is 0.01.

1. Data preprocessing and training

Collecting road traffic monitoring videos;

dividing the collected monitoring video into a series of pictures according to frames to obtain a data set;

then, marking vehicles in each picture in the MATLAB to obtain a mat file containing vehicle coordinates;

each label file comprises the coordinates of each vehicle, so that the vehicles in the labeled data set generate label files;

and displaying the marked real value of the training sample in the data set through code visualization. The method for generating the real density map according to the marked real value adopts the currently common normalized geometric self-adaptive Gaussian kernel method to carry out Gaussian blur on the vehicle position information in each label file, and finally generates the real vehicle density map. The non-parametric vehicle counting network is then trained using the real vehicle density map.

2. Vehicle quantity prediction

And inputting the pictures of the prediction set into the weighted vehicle counting network to obtain a predicted vehicle density map, and further calculating the total number of the vehicles. As shown in fig. 4. The first column a in the figure is an original picture of the surveillance video; the middle column b is a real density map, and "GT" represents the number of real vehicles in the map; the third column c is the estimated density map generated by the network, and "Count" represents the number of vehicles estimated in the map.

Mean Absolute Error (MAE), which is the sum of the Absolute values of the differences between the target and predicted values, is used. Therefore, the smaller the difference between the target value and the predicted value, the smaller the error, and the higher the accuracy. Mean Squared Error (MSE), which is the sum of the squares of the distances between the target and predicted values and represents the degree of difference between the predicted and true values, is used to evaluate the effectiveness of the present invention. A larger MSE indicates a greater difference between the predicted value and the target value, and therefore a smaller MSE indicates a more accurate prediction.

In the formula, the first and second sets of data are represented,Nis the total number of pictures in the test set,g _iis the ground real value of the ith picture,p _iis the predicted value of the ith picture.

Although the present invention has been described in detail hereinabove, the present invention is not limited thereto, and various modifications can be made by those skilled in the art in light of the principle of the present invention. Thus, modifications made in accordance with the principles of the present invention should be understood to fall within the scope of the present invention.

Claims

1. A method for vehicle density estimation based on a multi-column convolutional neural network, comprising:

2. The method of claim 1, further comprising:

and estimating the number of vehicles in the predicted vehicle picture according to the estimated density map.

3. The method of claim 1, wherein the generating a multi-column convolutional neural network using the plurality of different vehicle feature matrices comprises:

fusing the plurality of different vehicle characteristic matrixes to obtain a fused vehicle characteristic matrix;

and generating a multi-column convolution neural network by using the fused vehicle characteristic matrix.

4. The method of claim 1, wherein the pre-stored label files in the training vehicle picture comprise:

the method comprises the steps that a plurality of training vehicle pictures are obtained by dividing collected traffic videos according to frames;

performing annotation processing on the position of each vehicle in each training vehicle picture to obtain a vehicle group image containing a plurality of vehicle labels, and taking the vehicle group image containing the plurality of vehicle labels as a label file of the training vehicle picture;

and saving the label file of the training vehicle picture.

5. The method of claim 4, wherein the vehicle group image containing the plurality of vehicle labels comprises:

6. The method of claim 5, wherein the generating a true density map using the pre-stored label files in the training vehicle picture comprises:

and generating a real density map according to the label file of the training vehicle picture and the Gaussian function.

7. The method of claim 5, wherein the true density map comprises:

the expression of the function of gaussian is given,

representing a true density map.

8. An apparatus for vehicle density estimation based on a multi-column convolutional neural network, comprising:

9. The apparatus of claim 8, wherein the estimation module is further configured to estimate the number of vehicles in the predicted vehicle picture according to the estimated density map.

10. The apparatus according to claim 8, wherein the generating module is specifically configured to perform fusion processing on the plurality of different vehicle feature matrices to obtain a fused vehicle feature matrix, and generate a multi-column convolutional neural network by using the fused vehicle feature matrix.