CN113076898A

CN113076898A - Traffic vehicle target detection method, device, equipment and readable storage medium

Info

Publication number: CN113076898A
Application number: CN202110387355.0A
Authority: CN
Inventors: 陈婷; 姚大春; 高涛; 王松涛; 刘占文; 李永会; 陈友静
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-07-06
Anticipated expiration: 2041-04-09
Also published as: CN113076898B

Abstract

The invention discloses a method, a device, equipment and a readable storage medium for detecting a traffic vehicle target, wherein vehicle labeling is carried out on a preprocessed traffic image; the method has the advantages that a network layer is added to a shallow layer part of a main network based on feature extraction to extract shallow layer features, the processing capacity of detail features is improved, on the premise of keeping high detection precision, CSPDarknet26 is adopted to extract features of multiple layers of residual error networks, the redundancy of a convolutional layer can be avoided, the utilization rate of a memory is improved, the training speed is improved, multi-scale detection is carried out after the features are extracted through the multiple layers of residual error networks, rich detail information is extracted, the problem of degradation of the detection effect of small targets can be avoided by CRB, and the applicability and the accuracy of the detection of single targets of vehicles under the strong rain density traffic environment are improved.

Description

Traffic vehicle target detection method, device, equipment and readable storage medium

Technical Field

The invention belongs to the technical field of traffic vehicle detection, and particularly relates to a traffic vehicle target detection method, a traffic vehicle target detection device, traffic vehicle target detection equipment and a readable storage medium.

Background

In recent years, the automobile industry has been developed rapidly, and automobiles have become an indispensable tool for transportation. With the increasing number of global motor vehicles, traffic jam and traffic accidents become more serious, which not only causes environmental pollution to the country, but also causes loss to families. The development of vehicle detection technology based on computer vision can enable related traffic departments to master traffic flow in real time, so that related traffic guidance policies are formulated to alleviate traffic problems.

Vehicle detection algorithms are mainly divided into traditional detection algorithms and detection algorithms based on deep learning. The traditional vehicle detection algorithm firstly selects an interested region and extracts a candidate region through a sliding window, secondly manually extracts features of the candidate region, and finally uses a classifier for classification and identification. For example, some scholars adopt the classical haar feature and carry out detection through a sliding window searching strategy, so that the false alarm rate is effectively reduced; to obtain fine-grained features, some scholars extract features by Histogram of Oriented Gradients (HOG). In addition, some researchers have combined the HOG with a Support Vector Machine (SVM) to propose a Deformable Part Model (DPM). However, the extracted image features are essentially manually extracted features, so that the target detection has the defects of low efficiency, low precision, poor generalization, complex flow and the like.

With the rapid development of the deep learning technology in the field of target detection, researchers begin to discuss the target detection problem by replacing the traditional technology which is tedious and low in precision with the deep learning technology. According to different importance degrees of real-time performance and accuracy, deep learning target detection algorithms are mainly divided into two types, one type is a two-stage target detector represented by R-CNN (Region-conditional Neural Networks), namely a two-stage (two-stage) detection model, which is also called a Region-based detection method and mainly comprises R-CNN, Fast R-CNN, R-FCN (full conditional network) and the like. Although the detection precision of the two-stage method is high, the time consumption is too long, and the effect of real-time detection is difficult to achieve. In order to balance the detection speed and the precision, a single-stage (one-stage) detection model is proposed. The single-stage detection method is also called as a regression-based method, a prediction result is directly obtained from an image without a region detection process, and end-to-end target detection is realized. Mainly comprises a series of methods such as YOLO (you Only Look one), SSD (Single Shot detector), YOLOv2, YOLOv3, YOLOv4 and the like.

The YOLOv4 target detection algorithm maximally achieves the balance between precision and speed. However, for the problem of single-class target detection, the original network appears to be redundant of convolution layers, so that the memory utilization rate is low, and when a small target with a color similar to that of the background is detected, the detection effect is poor.

Disclosure of Invention

The invention aims to provide a method, a device and equipment for detecting a traffic vehicle target and a readable storage medium, so as to overcome the defects of the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for detecting a traffic vehicle target comprises the following steps:

s1, preprocessing the traffic image to obtain traffic images of different scenes;

s2, carrying out vehicle labeling on the preprocessed traffic images, carrying out dimension clustering on the labeled traffic images, setting a threshold value for clustering Anchor Box, and selecting a relatively far point as a next initial clustering center;

s3, adding a network layer to a shallow part of the main network for shallow feature extraction based on CSPDarknet26 feature extraction, and extracting features of a deep part of the main network by adopting a multilayer residual error network;

and S4, performing multi-scale detection after performing feature extraction through a multi-layer residual error network, and realizing detection of different traffic vehicle targets in the traffic image.

Further, the traffic data set picture is from web crawlers and field shooting, and the traffic vehicle data set is divided into a training set, a testing set and a verification set; and sequentially carrying out image inversion and symmetrical processing on the traffic vehicle data set to realize the expansion of the data set.

Further, an IOU is used as a target clustering analysis, the IOU is used as a spatial distance calculation, and a clustering formula is as follows:

D_i(x_j)＝1-IOU(x_j,c_i) (1)

x_j∈X＝{x₁,x₂,...,c_ndenotes a ground truth sample; c. C_i∈{c₁,c₂,...,c_nRepresents the cluster center; k represents the number of anchor boxes, the clustering objective function represents the minimum value of the sum of the distances from each sample to the clustering center, and the calculation formula is as follows:

further, the clustering target is analyzed by the contour coefficient method to select the optimal clustering number K, K is 9, and the height and width are (13, 11), (23, 17), (31, 28), (41, 20), (51, 33), (63, 51), (102, 61), (166, 116), (388, 244), respectively.

Furthermore, the first convolutional layer filters an input image with 416 × 416 resolution by using 32 convolutional kernels with the size of 3 × 3, then takes the output of the previous convolutional layer as the input of the next convolutional layer, and performs convolution operation by using 64 convolutional kernels with the size of 3 × 3 pixels and the step size of 2 pixels, so as to realize 2 times of downsampling and obtain a feature map with 208 × 208 resolution; then, 5 sets of 2 × Resblock _ bodies are added and executed in the network, and after 4 times of downsampling, feature maps with the sizes of 104 × 104, 52 × 52, 26 × 26 and 13 × 13 are obtained respectively.

Furthermore, the size of the input image is adjusted to 416 × 416, and the input image is subjected to 5 times of downsampling to obtain feature maps of 52 × 52, 26 × 26 and 13 × 13 in three different scales for multi-scale detection.

Further, all cars were labeled Car using the laboratory software.

A traffic vehicle object detecting device comprising:

the preprocessing module is used for preprocessing the traffic images to obtain traffic images of different scenes and marking vehicles on the preprocessed traffic images;

the clustering module is used for carrying out dimension clustering on the traffic images marked by the vehicles, setting a threshold value for clustering Anchor Box, and selecting a relatively far point as a next initial clustering center;

the characteristic extraction module detection module is used for adding a network layer to a shallow layer part to extract shallow layer characteristics, extracting the characteristics by adopting a multilayer residual error network based on a deep layer network, extracting the characteristics by the multilayer residual error network and then carrying out multi-scale detection, thereby realizing the detection of different traffic vehicle targets in the traffic image.

A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of a method of detecting a traffic vehicle object as described above when executing the computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method of detecting an object in a vehicle as set out above.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention relates to a traffic vehicle target detection method, which comprises the steps of preprocessing traffic vehicle images to obtain vehicle images of different scenes; then, carrying out vehicle labeling on the preprocessed traffic vehicle image, carrying out dimension clustering on the traffic vehicle image subjected to vehicle labeling, setting a threshold value to carry out clustering Anchor Box, and selecting a relatively far point as a next initial clustering center to improve the clustering precision; based on CSPDarknet26 feature extraction, a trunk network is added with a network layer at a shallow part for shallow feature extraction, so that the processing capacity of detail features is improved, on the premise of keeping higher detection precision, CSPDarknet26 is adopted for carrying out multi-layer residual error network feature extraction, the redundancy of a convolutional layer can be avoided, the utilization rate of a memory is improved, the training speed is improved, then multi-scale detection is carried out after the multi-layer residual error network is used for carrying out feature extraction, rich detail information is extracted, the problem of small target detection effect degradation is avoided, and the applicability and the accuracy of vehicle single-class target detection of an algorithm in a strong rain density traffic environment are improved.

Furthermore, the IOU is used as a space distance, and the target detection requirement under the condition that the vehicles are distributed loosely can be met.

Further, after the composite residual block processing, the composite residual block is sent to a large-scale detection convolutional layer to realize the detection of a large target, and the detection conditions of vehicles with different scales are well balanced through multi-scale detection.

The traffic vehicle target detection device can quickly realize vehicle detection and provide better service for traffic industry.

Drawings

Fig. 1 is a schematic diagram of a network detection process in an embodiment of the present invention.

Fig. 2 is a graph showing a variation in loss value in the embodiment of the present invention.

FIG. 3 is a comparison graph of predictions using a prior art method and a method of the present invention in an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

a method for detecting a traffic vehicle target comprises the following steps:

s1, preprocessing the traffic image to obtain traffic images of different scenes, and expanding a traffic image data set;

the traffic vehicle data set picture is from web crawlers and field shooting, and the traffic vehicle data set is divided into a training set, a testing set and a verification set;

specifically, the traffic vehicle data set is subjected to image inversion, symmetry, data enhancement, cutting and angle transformation in sequence to obtain traffic vehicle images in different scenes, so that the data set is expanded, and the robustness of subsequent network training is improved.

S2, performing vehicle labeling on the preprocessed traffic image, performing dimension clustering on the traffic image subjected to the vehicle labeling, setting a threshold value to perform clustering Anchor Box, and selecting a relatively far point as a next initial clustering center, namely a point which is far away from the previous clustering center and is not in the previous dimension clustering;

the invention adopts IOU as target clustering analysis, and uses IOU as space Distance (Spatial Distance Calculation) Calculation, thereby reducing the error generated by initial anchor frames with different sizes, and the clustering formula is as follows:

D_i(x_j)＝1-IOU(x_j,c_i) (1)

x_j∈X＝{x₁,x₂,...,c_ndenotes the group Truth sample; c. C_i∈{c₁,c₂,...,c_nRepresents the cluster center; k represents the number of anchor point frames, the clustering objective function represents the minimum value of the sum of the distances from each sample to the clustering center, and the calculation formula is as follows:

the Method is adopted to carry out dimension clustering on the Anchor Box, a Contour Coefficient Method (Contour Coefficient Method) is used for analyzing a clustering target to select the optimal clustering number K, and when the K is less than a true value of 3, J (K) is greatly reduced; when K reaches a true value of 3, J (K) is stopped quickly, and the clustering effect is reduced; as K increases, J (K) becomes more and more stable. Thus, the initial optimal anchor box cluster number for the vehicle data set is 9, and the height and width are (13, 11), (23, 17), (31, 28), (41, 20), (51, 33), (63, 51), (102, 61), (166, 116), (388, 244).

Specifically, all cars were labeled Car using the laboratory software.

S3, adding a network layer to the shallow part based on a feature extraction backbone network (CSPDarknet26) to perform shallow feature extraction on the traffic images after dimension clustering, and performing feature extraction on the deep part by adopting a multilayer residual error network;

specifically, as shown in table 1; compressing the feature extraction depth, and adding a network layer in the shallow layer part, wherein the shallow layer convolution feature has small receptive field and small background noise and is suitable for extracting features with small resolution; secondly, on the premise of keeping higher detection precision, the deep network is properly cut down; the idea of a multi-layer residual network; specifically, the first convolutional layer filters an input image with 416 × 416 resolution by using 32 convolutional kernels with the size of 3 × 3, then performs convolution operation by using 64 convolutional kernels with the size of 3 × 3 pixels and the step size of 2 pixels by using the output of the previous convolutional layer as input, so as to realize down-sampling by 2 times and obtain a feature map with the resolution of 208 × 208; then, 5 sets of 2 × Resblock _ bodies are added and executed in the network, and after 4 times of downsampling, feature maps with the sizes of 104 × 104, 52 × 52, 26 × 26 and 13 × 13 are obtained respectively.

TABLE 1 CSPDarknet26 network architecture

And S4, performing multi-scale detection after performing feature extraction through a multi-layer residual error network, and realizing detection of the vehicle target in different traffic scenes in the traffic image.

Specifically, the size of the input image is adjusted to 416 × 416, and the input image is subjected to 5 times of downsampling to obtain feature maps of 52 × 52, 26 × 26 and 13 × 13 in three different scales for multi-scale detection. As shown in fig. 1, after a 52 × 52 feature map of a 92-layer network is processed, a smaller prior frame is allocated to achieve detection of a small target; meanwhile, downsampling 97 layers of 52 × 52 feature maps, performing feature fusion on the layer and the feature maps at the previous 87 layers, processing the layer by a Composite Residues Block (CRB), sending the layer to a detection medium-scale convolutional layer, and distributing a medium-size Anchor Box to realize detection of a medium target; finally, the 26 × 26 feature maps of 108 layers are subjected to downsampling operation through convolution of 3 × 3, then subjected to Concate feature fusion operation with the 13 × 13 feature maps of 77 layers, subjected to CRB processing, and sent to a convolution layer with large detection scale to realize large-target detection. The multi-scale detection well balances the accuracy and speed of vehicle detection of different scales.

The method is carried out by using Pythroch in a Linux environment, an operating system is Ubuntu16.04, a CPU is configured to be Intel Xeon E3-1225 v6, a GPU is Nvidia Quadro p4000, a video memory is 8GB, the batch size is set to be 64, the resolution condition of a computer is considered, the transmission frequency is set to be 32, the maximum iteration frequency is 50200, a momentum parameter is set to be 0.949, the initialized learning rate is 0.001, the attenuation coefficient is 0.0005, a step mode is selected to update the learning rate, and when the iteration frequency reaches 40160 and 45180, the learning rate is respectively reduced to 10% and 10% of the initial learning rate.

In one embodiment of the present invention, a terminal device is provided that includes a processor and a memory, the memory storing a computer program comprising program instructions, the processor executing the program instructions stored by the computer storage medium. The processor is a Central Processing Unit (CPU), or other general purpose processor, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), ready-made programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and in particular, to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the traffic vehicle target detection method.

Example (b): a traffic vehicle target detection device can be used for realizing the traffic vehicle target detection method in the embodiment, and specifically comprises a preprocessing module, a clustering module and a feature extraction module detection module;

the preprocessing module is used for preprocessing the traffic image to obtain a clear traffic image and marking the preprocessed traffic image with vehicles;

the clustering module is used for carrying out dimension clustering on the traffic images marked by the vehicles, setting a threshold value for carrying out clustering, and selecting a relatively far point as a next initial clustering center;

In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in the terminal device and is used for storing programs and data. The computer-readable storage medium includes a built-in storage medium in the terminal device, provides a storage space, stores an operating system of the terminal, and may also include an extended storage medium supported by the terminal device. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a Non-volatile memory (Non-volatile memory), such as at least one disk memory. One or more instructions stored in the computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the method for detecting a target of a transportation vehicle in the above-described embodiments.

Since the research content belongs to multi-target detection, while the integrated accuracy (Precision), Recall (Recall), F1 value (F1-measure, F1) and detection speed are used as evaluation criteria, the Average Precision (AP) is also used for integrated comparison:

wherein: t is_PIndicating that the Positive class is correctly predicted as the Positive class (True Positive), F_PIndicating that the negative class error is predicted as a Positive example (False Positive), F_NIndicating that the positive class was mispredicted as a Negative class (False Negative). AP is the average accuracy, using the standards in VOC2007, setting a set of thresholds, [0, 0.1, 0.2]Then, for each threshold value for which Recall is greater than, the corresponding maximum Precision is obtained, and the AP is the average value of these maximum precisions.

The invention is trained by adopting the optimal parameters set in the above and compares the optimal parameters with the processing effect of the prior YOLOv4 algorithm. From the training loss dynamics diagram of fig. 2, it can be seen that the method of the present invention significantly improves the training speed. Before the training is carried out for 500 steps, the training loss is almost linearly reduced to about 10 from 1900, and after the training is carried out for 50200 steps, the loss value is obviously reduced to 1.69, so that the method can avoid the redundancy of the convolution layer and the waste of the memory space on the premise of achieving good detection effect, and obviously improve the training speed. In addition, the method of the present invention was compared with available YOLOv3, YOLOv4, YOLOv4-C (representing CSPDarknet26) on the test set, respectively. As is apparent from fig. 3, YOLOv3 has serious missing detection and false detection, YOLOv4 algorithm has low detection precision and also has false detection and missing detection, YOLOv4 has increased number of YOLOv3 detections, but has missing detection and low accuracy and precision; compared with other algorithms, the method provided by the invention has the advantages that the accuracy rate and the accuracy rate are obviously improved, the vehicle target can be detected with high precision, and the problem of degradation of the detection effect of the small target is avoided.

TABLE 2 comparison of test results for different algorithms

Table 2 qualitatively compares the method of the invention with YOLOv3, YOLOv 4-C. The images were scaled to 416 x 416 before inspection. Firstly, the method of the invention is verified on a vehicle data set with other algorithms, and four inspected quantities are selected as comparison, namely, an accuracy P value, a recall ratio R value, a F1 value, an AP value and a speed. Compared with the YOLOv3 and YOLOv4 algorithm speeds are respectively increased to 13.46f/s and 15.63f/s, so that the improved algorithm obviously improves the training speed and the memory utilization rate; compared with the YOLOv4-C, the precision of the method is improved by about 10 percent, and the problem of small target detection effect degradation is solved. Meanwhile, as can be seen from table 2, the AP value of the method of the present invention reaches 93.61%, the accuracy reaches 0.96, the recall rate is 0.91, and the average accuracy is 0.857. Compared with other algorithms, all indexes of the method are superior to those of other algorithms, and the method is fully proved to be high in detection precision and high in detection speed.

Claims

1. A method for detecting a traffic vehicle target is characterized by comprising the following steps:

s2, marking vehicles on the preprocessed traffic images in different scenes, performing dimension clustering on the marked traffic images, setting a threshold value to perform clustering Anchor Box, and selecting a relatively far point as a next initial clustering center;

s3, based on the feature extraction backbone network, adding a network layer to the shallow part to perform shallow feature extraction on the traffic images after dimension clustering, and performing feature extraction on the deep part by adopting a multilayer residual error network;

and S4, performing multi-scale detection after performing feature extraction through a multi-layer residual error network, and realizing detection of vehicle targets of different traffic scenes in the traffic image.

2. The method of claim 1, wherein the traffic images are obtained by web crawlers and live filming, and the traffic image set is divided into a training set, a testing set and a verification set; and sequentially carrying out image inversion and symmetrical processing on the traffic vehicle data set to realize the expansion of the data set.

3. The method of claim 1, wherein the IOU is used as a target cluster analysis, the IOU is used as a spatial distance calculation, and the cluster formula is as follows:

D_i(x_j)＝1-IOU(x_j,c_i) (1)

x_j∈X＝{x₁,x₂,...,c_ndenotes the group Truth sample; c. C_i∈{c₁,c₂,...,c_nRepresents the cluster center; k represents the number of anchor boxes, the clustering objective function represents the minimum value of the sum of the distances from each sample to the clustering center, and the calculation formula is as follows:

4. a method as claimed in claim 3, wherein the clustering objects are analyzed by contour coefficient method to select the optimal number of clusters K, K being 9, and the height and width are (13, 11), (23, 17), (31, 28), (41, 20), (51, 33), (63, 51), (102, 61), (166, 116), (388, 244), respectively.

5. The method of claim 1, wherein the first convolutional layer filters an input image with 416 x 416 resolution using 32 convolutional kernels with size of 3 x 3, and then takes the output of the previous convolutional layer as the input of the next convolutional layer, and performs convolution operation using 64 convolutional kernels with size of 3 x 3 pixels and with step size of 2 pixels, so as to realize down-sampling by 2 times, and obtain a feature map with 208 x 208 resolution; then, 5 sets of 2 × Resblock _ bodies are added and executed in the network, and after 4 times of downsampling, feature maps with the sizes of 104 × 104, 52 × 52, 26 × 26 and 13 × 13 are obtained respectively.

6. The method as claimed in claim 1, wherein the input image is adjusted to 416 x 416, and 5 times of downsampling are performed to obtain 52 x 52, 26 x 26 and 13 x 13 feature maps with different scales for multi-scale detection.

7. The method of claim 1, wherein all cars are labeled Car using Labeling software.

8. A traffic vehicle object detecting device, comprising:

and the characteristic extraction module detection module is used for adding a network layer to the shallow layer part to perform shallow layer characteristic extraction, the deep layer network part adopts a multilayer residual error network to perform characteristic extraction, and multi-scale detection is performed after the characteristic extraction is performed through the multilayer residual error network, so that the detection of different traffic vehicle targets in the traffic image is realized.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.