CN113762144A

CN113762144A - Deep learning-based black smoke vehicle detection method

Info

Publication number: CN113762144A
Application number: CN202111035079.8A
Authority: CN
Inventors: 路小波; 袁立
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-09-05
Filing date: 2021-09-05
Publication date: 2021-12-07
Anticipated expiration: 2041-09-05
Also published as: CN113762144B

Abstract

The invention discloses a black smoke vehicle detection method based on deep learning, and the model construction comprises the following steps: marking a vehicle target in an image acquired by an actual road camera to construct a vehicle target detection data set; training the labeled vehicle target data set by adopting a YOLOv3 model, detecting the vehicle target in the image by the model, and obtaining an image of the tail area of the vehicle target to construct a black smoke classification data set; carrying out black smoke classification model training by adopting an improved Vision Transformer model combining distillation and training optimization to judge whether the vehicle discharges black smoke or not; the method adopts a computer vision mode, and effectively and automatically detects and warns the black smoke vehicle from the video data under the condition of no monitoring, so that the detection precision and speed of the black smoke vehicle can be effectively improved, and the interference of factors such as shadow in an image and the like can be avoided.

Description

Deep learning-based black smoke vehicle detection method

Technical Field

The invention belongs to the field of computer vision and traffic video detection, and particularly relates to a black smoke vehicle detection method based on deep learning.

Background

The continuous and rapid increase of the total quantity of world fuel oil vehicles brings huge traffic and environmental problems to major cities of various countries, and the emission of tail gas of the fuel oil vehicles is one of the main sources of air pollution gas. The fuel vehicle tail gas which takes solid suspended particles and poisonous and harmful gas as main components can cause irreversible influence on human health due to contact or inhalation, and can also generate phenomena such as haze, acid rain and the like to pollute the atmospheric environment and the soil environment. Therefore, the detection and the alarm of the black smoke vehicle in the road have important significance to the health and the ecological environment of the human body. The mode of carrying out black smoke car detection from the car outer adoption traditional approach mainly includes that the vehicle is examined the year, traffic police actually observes in the road and adopts the manual work mode to carry out the analysis to road monitoring video, adopts this kind of mode not only the timeliness is poor, still can consume a large amount of manpower resources, and black smoke car leaks to examine the condition and takes place occasionally. However, as the number of road cameras increases, low-cost, fast, and efficient black smoke vehicle recognition can be achieved with the aid of rapidly developing image processing techniques.

The core of black smoke vehicle detection based on computer vision lies in classification and identification of black smoke in an image, and the current black smoke classification model mainly comprises a traditional machine learning and deep learning method. The method based on the traditional machine learning is mainly divided into two methods of a discrimination model and a generation model. The traditional method is usually combined with only a few characteristics for identifying the smoke, and if more characteristics are combined, the calculation is complex, so the identification effect is usually poor, the algorithm adaptability is weak, and the application scene is very limited.

The black smoke classification method based on deep learning can well and automatically extract features, and further solves the problems existing in the traditional black smoke classification method. Recently, the Transformer model has attracted attention in the field of computer Vision and has unusual behavior under each specific task, while the Vision Transformer, which is a model based entirely on the attention mechanism apart from the convolutional neural network, is representative of the image classification task.

Disclosure of Invention

In order to solve the problems, the invention discloses a black smoke vehicle detection method based on deep learning, which adopts a computer vision mode, can realize a method for effectively and automatically detecting and warning a black smoke vehicle from video data acquired by a camera in an actual road environment under an unmanned monitoring condition, can effectively improve the detection precision and speed of the black smoke vehicle, and avoids the interference of factors such as shadow in an image to a great extent.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a black smoke vehicle detection method based on deep learning comprises the following steps:

marking a vehicle target in an image acquired by an actual road camera to construct a vehicle target detection data set;

training the labeled vehicle target data set by adopting a YOLOv3 model, detecting the vehicle target in the image by the model, and obtaining an image of the tail area of the vehicle target to construct a black smoke classification data set;

step three, adopting an improved Vision Transformer model combining distillation and training optimization to train a black smoke classification model so as to judge whether the vehicle discharges black smoke or not;

and step four, carrying out real-time black smoke vehicle detection on video data transmitted by the actual road video, and warning if the black smoke vehicle is found.

Further, in the step one, the vehicle target detection data set is formed by labeling the captured images of frames such as videos shot by the actual road cameras, and the vehicle targets include 5 types in total, such as cars, hatchbacks, vans, trucks and buses.

Further, in the second step, a YOLOv3 model is adopted for training and then the vehicle target in the video can be identified, an algorithm is adopted to take a square with the side length equal to the width of the vehicle target frame at 70% of the height of the vehicle target frame, the size of the square is subjected to scale transformation on the basis of keeping the central point unchanged, so that a black smoke classification image is obtained, and the vehicle tail image is divided into two types of smoke and smokeless by adopting a manual mode to construct a black smoke classification data set.

Further, in the third step, distillation is realized by adding a distillation mark at the input end in a manner similar to adding a classification mark before an encoder. And introducing a Resnet-50 network which is trained as a teacher model, so that the output of the distillation mark is as close as possible to the output of the teacher model on the input picture. The original classification marks and the distillation marks are subjected to back propagation learning simultaneously in the training process, so that the student model needing to be trained can better optimize and complement the training process in the interaction with the instructor model. In the training process, the loss function is replaced by a binary cross entropy loss function as shown in the following:

where N is the batch size, σ is the sigmoid function, x_nAs output of the model, y_nIs the tag value.

Further, in the fourth step, the actual testing process mainly includes a two-stage process, namely, performing target detection on the vehicle in the video frame-by-frame image, then expanding the lower part area of the detected vehicle target frame to serve as the core area of the black smoke classifier, and performing black smoke classification on the core area by adopting the black smoke classifier according to the vehicle tail gas characteristics to judge whether the smoke discharge condition exists. And finally, if the smoke condition is judged in a plurality of continuous frames, warning is carried out.

The invention has the beneficial effects that:

1) the invention provides a black smoke vehicle detection method based on deep learning, wherein an algorithm firstly carries out target detection on vehicles in video frame-by-frame images and then selects a tail core area of a vehicle target for classification detection, the accuracy and the speed of black smoke vehicle detection can be effectively improved by adopting the method, and the algorithm can be greatly prevented from being interfered by factors such as shadow in the images.

2) According to the method, an additional distillation mark is introduced into the classification network model, a Resnet-50 model which is trained can be additionally added outside the data label to additionally guide the training process, the supervision effect of model training can be enhanced, and the capability of the model is improved while the training cost is reduced.

3) The method specifically optimizes the training process of the model, and can realize that the model only independently concerns the value of the smoke category in the output of the classification mark and calculates the loss function by introducing the binary cross entropy loss function, thereby effectively reducing the operation amount in the training process and reducing the convergence difficulty of the model.

4) The invention solves the problems that the black smoke vehicle in the road monitoring video can not be detected and the evidence can not be stored accurately in real time in the prior art; can detect the black cigarette car and preserve the evidence in the road monitoring video accurately in real time, can carry out better adaptation to multiple period of time, road environment, motorcycle type simultaneously.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a flow chart of the detection process.

Fig. 3 is an image obtained by capturing frames such as a video captured by an actual road camera.

Fig. 4 is a vehicle target detection result.

FIG. 5 is a calculation process for an algorithm to obtain an image of a target tail region of a vehicle.

FIG. 6 is a framework diagram of a classification model.

Fig. 7 is a real map of a black smoke car detection on a real road.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention.

Examples

The overall logical framework of the embodiment is shown in fig. 1, and the logical process of black smoke detection on an actual road is shown in fig. 2.

The image used in this embodiment is an image obtained by capturing frames such as a video captured by an actual road camera, as shown in fig. 3.

In this embodiment, the actual road environment images shown in fig. 3 are taken as an example, the images include vehicles of different types, and in the actual road environment, the proportion of black smoke vehicles in the traffic flow is often very small. After the road video image frame is obtained, different types of vehicle targets are marked in the image, wherein the vehicle targets comprise 5 types including cars, hatchbacks, vans, trucks and buses, and therefore construction of a vehicle target detection data set is completed.

In the embodiment, a YOLOv3 model is adopted to train the labeled vehicle target data set according to the division of the training set and the test set, and the vehicle target detection result is shown in fig. 4.

After the vehicle target in the image is detected by the implementation model, an algorithm is adopted to obtain a tail region image of the vehicle target so as to construct a black smoke classification data set. The specific idea of the algorithm is that as shown in fig. 5, a square with the side length equal to the width of the vehicle target frame is taken at 70% of the height of the vehicle target frame, and the size of the square is subjected to scale transformation on the basis of keeping the central point unchanged, so that the black smoke classification core area image is obtained. After that, the vehicle tail images are manually classified into smoke and smoke, and a black smoke classification data set is constructed.

The embodiment classifies black smoke based on a Vision Transformer model of a Transformer architecture, and on the basis, a better model training effect is achieved by adopting a mode of combining distillation and optimizing a loss function. The frame diagram of the model is shown in fig. 6, and the specific steps are as follows:

1) image input processing: to satisfy the sequence input form applied to the NLP model, the image of (H, W, C) can be sliced into m pieces with (P × P) size, and then expanded into m pieces with length P by m (P, C) pieces²The sequence of XC is sufficient. Because the directly obtained sequence has a large length, the full-connection layer is adopted in the algorithm to carry out dimension reduction compression on the sequence.

2) Adding a classification mark: as the model abandons the structure of a decoder in a Transformer, the model adds a trainable classification mark x at the forefront of picture information before being sent to an encoder₀The flag is responsible for representing the classification prediction result.

3) Adding position information: the position information is added to avoid the problem of information confusion caused by the fact that the attention mechanism of the model cannot acquire the position information. And carrying out position coding on the information added with the classification mark, and adding a position coding vector and an embedded vector obtained by the image small block to form the input of a subsequent encoder.

4) An encoder: the original encoder structure is adopted, and the method mainly comprises two processes of a multi-head self-attention mechanism and a feedforward network. In this context, the multi-head self-attention mechanism is composed of 8 self-attention modules. Each self-attention module proceeds Z_i＝softmax(QK^TD) operation of V, where Q, K, V are Query vector, Key vector, Value vector, respectively, which are input X from the encoder and the trainable matrix W^Q,W^K,W^VAnd d is a normalization factor. Therefore, the output Z of the multi-head self-attention module is composed of 8Z_iAnd obtaining the data through a full connection layer after weighted splicing. The feedforward network transforms the dimension by two full connection layers, and compared with a Transformer, the model replaces a ReLU activation function by a GELU which is more suitable for the Transformer model so as to achieve better effect. In the encoder, a residual channel and LN layer normalization operation are added, so that the model optimization speed is improved, and the degradation problem is avoided.

5) And after the output of the last feedforward network passes through an output head comprising a normalization layer and a full connection layer, a classification mark positioned at the forefront can be obtained, and the predicted values of the models contained in the classification mark to different classes are the output of the whole model.

6) Distillation is achieved by adding a distillation flag at the input end in a similar way as adding a class flag before the encoder. The invention introduces the trained Resnet-50 network as a teacher model, so that the output of the distillation mark is as close as possible to the output of the input picture by the teacher model. The original classification marks and the distillation marks are subjected to back propagation learning simultaneously in the training process, so that the student model needing to be trained can better optimize and complement the training process in the interaction with the instructor model.

7) The model training process is optimized, and by introducing binary cross entry with locations loss functions, the model can only independently care about the value of the smoke category in the output of the classification mark and calculate the loss functions, so that the calculation amount in the training process can be effectively reduced, and the model convergence difficulty is reduced.

After the construction and training of the target detection model and the black smoke classification model are completed, the real-time black smoke vehicle detection process of video data transmitted by an actual road video can be realized, firstly, the vehicle in a video frame-by-frame image is subjected to target detection, then, the lower part area of a detected vehicle target frame is expanded to be used as the core area of a black smoke classifier, secondly, the black smoke classifier is adopted to classify the core area according to the tail gas characteristics of the vehicle, if the smoke condition is judged in continuous frames, warning is carried out, and the actual measurement result is shown in fig. 7. Wherein, the rectangular frame is used for detecting the vehicle target, and the square frame is used for detecting the black smoke discharged by the vehicle.

The invention provides a black smoke vehicle detection method based on deep learning, which can be used for realizing real-time and accurate detection of black smoke vehicles in actual road environments by performing segmented training and using a YOLOv3 target detection model and an improved Vision Transformer black smoke classification model combining distillation and training optimization. The invention has important effect on the intellectualization and automation of the black smoke vehicle detection, and has wider application prospect.

The present invention is not limited to the specific technical solutions described in the above embodiments, and other embodiments of the present invention are possible in addition to the above embodiments. It will be understood by those skilled in the art that various changes, substitutions of equivalents, and alterations can be made without departing from the spirit and scope of the invention.

Claims

1. A black smoke vehicle detection method based on deep learning is characterized by comprising the following steps:

2. The method of claim 1, wherein: in the first step, the vehicle target detection data set is formed by labeling the images captured by the frames of videos and the like shot by the actual road camera, and the vehicle targets comprise 5 types of cars, hatchbacks, vans, trucks and buses.

3. The black smoke vehicle detection method based on deep learning of claim 1, wherein: in the second step, a YOLOv3 model is adopted for training and then a vehicle target in a video can be recognized, an algorithm is adopted for taking a square with the side length equal to the width of the vehicle target frame at 70% of the height of the vehicle target frame, the size of the square is subjected to scale transformation on the basis of keeping a central point unchanged, a black smoke classification image is obtained, an image at the tail of the vehicle is divided into two types of smoke and smokeless by adopting a manual mode, and a black smoke classification data set is constructed.

4. The black smoke vehicle detection method based on deep learning of claim 1, wherein: in the third step, distillation is realized by additionally adding a distillation mark at the input end in a manner similar to adding a classification mark before an encoder; introducing a Resnet-50 network which is trained as a teacher model, and enabling the output of the distillation mark to be as close as possible to the output of the teacher model to the input picture; the original classification marks and the distillation marks are subjected to back propagation learning simultaneously in the training process, so that the student model needing to be trained can better optimize and complement the training process in the interaction with the instructor model; in the training process, the loss function is replaced by a binary cross entropy loss function as shown in the following:

5. The black smoke vehicle detection method based on deep learning of claim 1, wherein: in the fourth step, the actual test process mainly comprises two steps, namely, firstly, performing target detection on the vehicle in the video frame-by-frame image, then expanding the lower part area of the detected vehicle target frame to be used as the core area of the black smoke classifier, and secondly, performing black smoke classification on the core area by adopting the black smoke classifier according to the tail gas characteristics of the vehicle to judge whether the smoke discharge condition exists or not; and finally, if the smoke condition is judged in a plurality of continuous frames, warning is carried out.