CN117037085A

CN117037085A - Vehicle identification and quantity statistics monitoring method based on improved YOLOv5

Info

Publication number: CN117037085A
Application number: CN202311022687.4A
Authority: CN
Inventors: 陈大龙; 魏东迎; 刘振洋
Original assignee: Nanjing Howso Technology Co ltd
Current assignee: Nanjing Howso Technology Co ltd
Priority date: 2023-08-15
Filing date: 2023-08-15
Publication date: 2023-11-10

Abstract

The invention discloses a monitoring method for vehicle identification and quantity statistics based on improved YOLOv5, which comprises the following steps: s1: generating a motor vehicle data set, dividing and preprocessing; s2: training by adopting the preprocessed data set to obtain a YOLOv5 motor vehicle identification statistical model; s3: identifying a YOLOv5 motor vehicle identification statistical model and carrying out INT8 quantization and calibration on the YOLOv5 motor vehicle identification statistical model to obtain a quantized engine model; s4: reading data of image acquisition equipment in a monitoring area to acquire image data of each frame; s5: inputting each frame of image data into an engine model for detection and result analysis; s6: tracking and updating the track state of the motor vehicle ID; s7: it is determined whether the motor vehicle ID passes through the target area and statistics are made. The method reduces the loss caused by quantization of the model INT8 and improves the reasoning speed of the model.

Description

Vehicle identification and quantity statistics monitoring method based on improved YOLOv5

Technical Field

The invention relates to the technical field of visual positioning, in particular to a monitoring method for vehicle identification and quantity statistics based on improved YOLOv 5.

Background

With the rapid development of economy and society, the demands of people for transportation travel are increasing, especially for long-distance travel. The number of domestic automobiles kept by 26150 ten thousand at 2019 is counted, and 2122 ten thousand is increased compared with the last year. The conflict between the growing traffic demand and existing road conditions is increasingly prominent. Reasonable traffic management can effectively reduce the occurrence of traffic jam. In order to improve the traffic management level, accurate analysis of driving behaviors of vehicles on a road is required.

Deep learning object detection algorithms are currently commonly employed to identify motor vehicles and count numbers. This method relies on the accuracy of the target detection algorithm, and is not solved in some scenes where motor vehicles are identified and counted, and the algorithm solves the problem that counting is performed when motor vehicles appear in an area, which is obviously unreasonable, so that repeated counting occurs in motor vehicle identification counting in various scenes.

Conventional deep learning object detection algorithms may have some limitations in the motor vehicle identification and quantity statistics tasks. These algorithms typically rely on detecting and locating the bounding boxes of the motor vehicle in the image and using these bounding boxes for quantity statistics. However, in some scenarios, this approach may lead to problems with repeated counting or missing counts, especially in cases where the vehicles are dense, overlapping or moving rapidly.

Meanwhile, the Monx model is directly converted into an INT8 precision model by adopting TensorRT, the loss of the model by the method is relatively large, the accuracy of the model is reduced relatively much, the problems of false detection and omission detection easily occur when the identification and the quantity statistics of motor vehicles are realized, and the method has great defects when the method is used in a production scene. Converting a model from floating point precision to INT8 precision is a common optimization method that can improve reasoning performance and reduce storage requirements in some cases. However, converting a model to INT8 accuracy using TensorRT or other tools may introduce some information loss, resulting in reduced model accuracy. This penalty is typically introduced by quantization techniques, mapping the floating point parameters to integer representations of lower bit precision. The reduced accuracy may result in the model not accurately representing certain features and details, resulting in false and missed detection problems. Particularly for tasks requiring high precision identification and quantity statistics, INT8 precision may not be satisfactory.

The Chinese patent literature discloses a method for detecting and counting vehicles in expressway monitoring video based on YOLOv3, and the method for detecting and counting the vehicles also uses a target detection algorithm in the field of deep learning computer vision, and can be seen from the flow of the method, the method uses YOLOv3 as a target detection method, and performs target tracking according to the result obtained by target detection and judges whether to enter a target area so as to count. The method mainly comprises the steps of carrying out target tracking by using a Kalman filtering algorithm, carrying out data fusion on a Kalman filtering by utilizing a value predicted by a mathematical model and an observed value obtained by measurement, finding an optimal estimated value, and finding the most accurate target in the next frame, thereby reducing the error of statistics of the number of motor vehicles.

For the above reasons, this method depends on the accuracy of the target detection algorithm, and there are always some scenes in the judgment of the motor vehicle identification and statistics, and the algorithm solves the problems that the motor vehicle is classified into motor vehicles and counted when the motor vehicle appears in the area, which is obviously unreasonable, and the motor vehicle detection and the counting are repeated in a short time under various scenes.

For the above reasons, this method may reduce the accuracy of the model, and thus, the motor vehicle cannot be accurately identified, so it is necessary to propose a monitoring method based on improved YOLOv5 vehicle identification and quantity statistics, to reduce the loss caused by quantization of the model INT8, and to increase the reasoning speed of the model.

Disclosure of Invention

The invention aims to solve the technical problem of providing a monitoring method for vehicle identification and quantity statistics based on improved YOLOv5, which reduces loss caused by quantification of a model INT8 and improves the reasoning speed of the model.

In order to solve the technical problems, the invention adopts the following technical scheme: the monitoring method based on the improved YOLOv5 vehicle identification and quantity statistics specifically comprises the following steps:

s1: generating and dividing a motor vehicle data set, and preprocessing to obtain a preprocessed data set;

s2: training by adopting the preprocessed data set to obtain a YOLOv5 motor vehicle identification statistical model;

s3: identifying a YOLOv5 motor vehicle identification statistical model and carrying out INT8 quantization and calibration on the YOLOv5 motor vehicle identification statistical model to obtain a quantized engine model;

s4: reading data of image acquisition equipment in a monitoring area, and acquiring image data of each frame;

s5: inputting each frame of image data into an engine model for detection and result analysis;

s6: tracking the target of the motor vehicle ID and updating the track state;

s7: it is determined whether the motor vehicle ID passes through the target area and statistics are made.

By adopting the technical scheme, the motor vehicle is identified and counted by adding the crisscross attention mechanism and the target tracking algorithm in the YOLOv5, the improved YOLOv5 network reduces the memory occupation of the model, improves the calculation performance, further improves the detection speed and the identification accuracy of the model, and uses the object tracking algorithm of deep learning and the target detection algorithm to detect whether the motor vehicle passes through the target area before counting the motor vehicle, so that the motor vehicle is identified and counted more accurately and in real time by combining the three, and can meet richer scenes. The INT8 quantization is carried out on the model, so that the loss caused by the conversion of the model is reduced, meanwhile, the second mode based on the deep learning target detection and the target tracking algorithm is used for identifying and counting the motor vehicles, and the advantages of the deep learning target detection and the target tracking algorithm are used, so that the problem that repeated counting or missing counting occurs when the motor vehicles are identified and counted by the traditional deep learning target detection algorithm is avoided; that is, the problems of difficult motor vehicle identification and quantity statistics in complex scenes are solved by combining a plurality of technologies.

Preferably, the specific steps of the step S1 are as follows:

s11: collecting videos of motor vehicles and videos of different types of motor vehicles under a simulated shooting development scene, and performing frame extraction processing on the shot videos to generate a motor vehicle data set;

s12: dividing the motor vehicle dataset into a training set and a validation set;

s13: and respectively carrying out data enhancement on the training set and the verification set by adopting a data enhancement mode to obtain an enhanced training set and an enhanced verification set.

Preferably, the specific steps of the step S2 are as follows:

s21: constructing a YOLOv5 algorithm model, and adding a crisscross (Criss-Cross) attention mechanism on a YOLOv5 network structure, wherein the formula of the Criss-Cross attention mechanism is as follows:

wherein H is ^′ _u An output vector representing a u-th position; t represents the length of the sequence, H represents the height, W represents the width, A _i,u An attention weight for weighting information considering different positions, the importance of the ith position to the ith position being represented; phi _i,u Representing the feature vector obtained through affine transformation; h _u An input vector representing a u-th position;

s22: inputting data into the improved YOLOv5 algorithm model and training to obtain algorithm model weight, thereby obtaining the YOLOv5 motor vehicle identification statistical model.

By adopting the technical scheme, the motor vehicle is counted by adding a Criss-Cross attention mechanism and using a target tracking algorithm to track the motor vehicle in real time, so that the accuracy of recognition is improved, and the memory occupation of a model is reduced.

Preferably, the specific steps of the step S3 are as follows:

s31: taking out a plurality of samples from the verification set generated in the step S1 to manufacture a calibration data set and generating the calibration data set;

s32: writing a calibration data unit to generate an IInt8 relative entropy calibrator;

s33: configuring parameters required for constructing an INT8 quantization model, carrying out INT8 quantization, continuously adjusting a threshold value, and calculating relative entropy to obtain an optimal solution;

s34: and carrying out INT8 quantization on the YOLOv5 motor vehicle identification statistical model according to the calculated relative entropy, simultaneously reading the calibration data set in the step S31, collecting the histogram of each layer of activation value, calculating the minimum threshold value of KL divergence by adopting a KL divergence calibration method, and carrying out model calibration to obtain an engine model after INT8 quantization. The problem of large model loss when the Monx model is converted into an INT8 precision model by using TensorRT currently, so that the INT8 quantization of the model by using a KL divergence calibration method in the technical scheme perfects the problem of large loss when the model is converted; the problem that the motor vehicle cannot be accurately identified due to information loss caused by using TensorRT to conduct INT8 precision model conversion is avoided. In order to improve the recognition speed of the motor vehicle and ensure that the recognition accuracy has the lowest loss, a KL divergence calibration method is used for carrying out INT8 quantization and calibration on a model trained by YOLOv 5; the identification speed of the model after INT8 quantization can be improved to be three times of that of the original YOLOv5 model, and the real-time performance of detection is greatly improved.

Preferably, the formula of the KL divergence calibration method in step S34 is:

KL(P||Q)＝ΣP(x)*log(P(x)/Q(x))；

wherein P represents the actual probability distribution, Q represents the probability distribution output by the model, KL (P||Q) represents the KL divergence, which is used to measure the difference between the two probability distributions P and Q, P (x) represents the probability of the probability distribution P over event x, Q (x) represents the probability of the probability distribution Q over event x, and Σ represents the summation. The model loss is reduced through a calibration algorithm, the real-time performance of detection is improved, a complex scene result is obtained based on the combination of multiple models and common detection, the detection of each step is the embodiment of the detection accuracy of the algorithm, the problem of the reduction of the model accuracy caused by the traditional model conversion is solved, and the accuracy and the speed of motor vehicle identification are improved.

Preferably, the specific steps of the step S5 are: inputting each frame of image data acquired in the step S4 into the step S3 to acquire an engine model, and detecting each frame of image data; if the existence of the motor vehicle is detected in the current frame of image data, storing a detection result; the coordinates of four points of the rectangular frame of the detected motor vehicle are put into a group to be stored, and the coordinates of the central point of the rectangular frame are calculated, wherein the calculation formula is as follows: (c_x= ((x_left+x_right))/2), (c_y= ((y_left+y_right))/2), wherein (x_left, y_left), (x_right, y_right) represent coordinates of upper left and lower right corners of the rectangular frame, and c_x, c_y represent coordinates of a center point of the rectangular frame.

Preferably, the specific steps of the step S6 are:

s61: detecting motor vehicles appearing in each frame of image data by adopting a pre-trained multi-target tracking model (deep model), extracting the characteristics of each motor vehicle, and endowing the motor vehicles with ID;

s62: using a detection result of the engine model as a target frame input of a multi-target tracking model deep model, and taking the obtained track section as a current frame track;

s63: and (3) matching the target frame of the current frame image data with the track by means of cross-over ratio (IOU), predicting the target frame state of the next frame image data according to the track state by means of Kalman filtering, and updating all track states by means of Kalman filtering observation values and estimation values, so that the motor vehicle ID tracking is completed.

The Deep SORT multi-target tracking algorithm used in the multi-target tracking model Deep SORT model is an improvement on the SORT algorithm, and cascade matching, judgment on track states and other improvements are added on the basis of the Deep SORT algorithm. In the matching process, three situations are considered in which the prediction frame, the track and the state thereof can respectively represent the target: targets that continue to appear in the video, new targets that appear, and old targets that disappear. And for the continuously-appearing target, carrying out Kalman filtering prediction according to the result of the current frame and continuously carrying out matching in the next frame according to the detection result and the prediction result. For the newly emerging target, it is similar to the processing of the first frame and is directly converted into track information, which is temporarily retained and matched in the subsequent frames. For the old target that has disappeared, track information is still kept for it temporarily until after disappearing a certain number of times, the track is deleted. The SORT is characterized in that the target detection method based on the fast R-CNN utilizes a Kalman filtering algorithm and a Hungary algorithm, so that the speed of multi-target tracking is greatly improved, and meanwhile, the accuracy of SOTA is achieved. The algorithm is really one widely used in practical application, and the core is two algorithms: kalman filtering and hungarian algorithms. The Kalman filtering algorithm is divided into two processes, namely prediction and updating. The algorithm defines the motion state of the object as 8 normally distributed vectors. And (3) predicting: when the target moves, the parameters such as the position and the speed of the target frame of the current frame are predicted through the parameters such as the target frame and the speed of the previous frame. Updating: and carrying out linear weighting on the predicted value and the observed value and the two normally distributed states to obtain the state predicted by the current system. Hungarian algorithm: the problem of bipartite graph distribution is solved, and the IOU cost matrix for calculating the similarity in the main MOT step is obtained, so that the similarity matrix of the front frame and the rear frame is obtained. The hungarian algorithm solves the real matching goal of the two frames before and after by solving the similarity matrix.

Deep sort is mainly characterized in that: appearance information is added on the basis of the SORT algorithm, appearance features (namely Deep Association Metric in the title) are extracted by means of the ReID domain model, and the number of times of ID switches is reduced. The matching mechanism changes the original matching based on the IOU cost matrix into cascade matching and IOU matching.

Preferably, the specific steps of the step S7 are as follows:

s71: judging whether the motor vehicle enters a specified area according to the central point coordinates of the motor vehicle detected in the step S5, drawing the motor vehicle in the specified area by using opencv, and obtaining a target area by taking the coordinate points of the polygon as parameters;

s72: judging whether the motor vehicle passes through the target area for the first time within a certain time period, if so, marking the motor vehicle as a counted state, and adding one to the counted number of the motor vehicles; then adopting a multi-target tracking model (deep model) to continuously track the targets of the motor vehicle so as to prevent the vehicles from repeatedly counting when passing through the target area again; if not, not counting the vehicles;

s73: and saving the number information of the motor vehicles passing through the target area to a database and ending. Wherein opencv is generally known as Open Source Computer Vision Library, and is a cross-platform computer vision library.

Preferably, in step S4, the video of the image capturing device is read in the form of video or RTSP pull stream. After the model INT8 is quantized, the video of the camera is read in an RTSP (real time streaming) mode, the motor vehicles in the video are continuously detected, the ID of each vehicle is determined through a target tracking algorithm, whether the motor vehicles pass through a target area or not is judged in the process of continuously detecting the motor vehicles, statistics of the number of the motor vehicles is achieved, the motor vehicles passing through the target area and the number information are stored in a database, and the algorithm is finished.

Preferably, in the step S1, the Mosaic mosaics mode is used for data enhancement. Enhancing the data set improves the generalization ability of the model.

Compared with the prior art, the invention has the following beneficial effects:

(1) The KL divergence calibration method is used for reducing a great deal of loss of accuracy brought by the INT8 quantization of the model; the problem of accuracy reduction after the INT8 quantization is carried out on the model is solved, so that the model can improve the recognition speed by three times than the original model and the accuracy of the model is not lost;

(2) Training is performed by utilizing the improved network, a Crisscross (Crisscross) attention mechanism is added on the basis of the original YOLOv5s network structure, so that the GPU memory is saved, and higher computing performance is achieved;

(3) The method for solving the complex problem by reasonably using the multi-model hybrid application according to the requirements is provided, and meanwhile, the problem that the motor vehicle is repeatedly counted in a time period is effectively avoided; namely, the method for improving the yolov5+deep tracking algorithm is used for completing the algorithm development of the complex scene, and the method for solving the complex problem by using the multi-model mixture is used for completing the algorithm.

Drawings

FIG. 1 is a flow chart of a method for monitoring vehicle identification and quantity statistics based on improved YOLOv5 of the present invention;

FIG. 2 is a flow chart of INT8 quantization and calibration in step S3 of the improved YOLOv5 based vehicle identification and quantity statistics monitoring method of the present invention;

FIG. 3 is a flowchart of steps S3-S7 in the improved Yolov5 based vehicle identification and quantity statistics monitoring method of the present invention;

FIG. 4 is a block diagram of the Criss-Cross attention mechanism added in step S2 in the improved Yolov5 based vehicle identification and quantity statistics monitoring method of the present invention;

fig. 5 is a diagram showing the structure of the improved YOLOv5 network in step S2 in the method for monitoring vehicle identification and quantity statistics based on improved YOLOv5 according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the drawings of the embodiments of the present invention.

Examples: as shown in fig. 1, the method for monitoring vehicle identification and quantity statistics based on improved YOLOv5 specifically includes the following steps:

the specific steps of the step S1 are as follows:

s11: collecting a large number of videos related to the motor vehicle and videos of different types of motor vehicles in a simulated shooting development scene, and performing frame extraction processing on the shot videos to generate a motor vehicle data set; the data set comprises the types of motor vehicles to be detected, such as cars, muck trucks, vans and the like;

s13: respectively carrying out data enhancement on the training set and the verification set by adopting a data enhancement mode to obtain an enhanced training set and an enhanced verification set; in the step S1, a Mosaic mode is used for data enhancement; such as rotation, cropping, and increasing or decreasing image brightness;

the specific steps of the step S2 are as follows:

s21: constructing a YOLOv5 algorithm model, and adding a crisscross Criss-Cross attention mechanism on a YOLOv5 network structure, wherein the formula of the crisscross Criss-Cross attention mechanism is as follows:

the specific steps of the step S3 are as follows:

s31: taking out a plurality of samples from the verification set generated in the step S1 to manufacture a calibration data set and generating the calibration data set; in the embodiment, about 500 calibration sets are taken out from the verification set in the data set used for training, and the calibration set has better sample representativeness;

s32: writing a calibration data unit to generate an IInt8 relative entropy calibrator; constructing an IInt8 relative entropy calibrator during model quantization;

s34: and carrying out INT8 quantization on the YOLOv5 motor vehicle identification statistical model according to the calculated relative entropy, simultaneously reading the calibration data set in the step S31, reasoning and collecting the histogram of each layer of activation value under the FP32 precision network, calculating the minimum threshold value of KL divergence by adopting the KL divergence calibration method, and carrying out model calibration to obtain an engine model after INT8 quantization. The problem of large model loss when the Monx model is converted into an INT8 precision model by using TensorRT currently, so that the INT8 quantization of the model by using a KL divergence calibration method in the technical scheme perfects the problem of large loss when the model is converted; the problem that the motor vehicle cannot be accurately identified due to information loss caused by using TensorRT to conduct INT8 precision model conversion is avoided;

the formula of the KL divergence calibration method in step S34 is as follows:

KL(P||Q)＝ΣP(x)*log(P(x)/Q(x))；

wherein P represents the actual probability distribution, Q represents the probability distribution output by the model, KL (P||Q) represents the KL divergence, which is used to measure the difference between the two probability distributions P and Q, P (x) represents the probability of the probability distribution P over event x, Q (x) represents the probability of the probability distribution Q over event x, and Σ represents the summation.

in the step S4, the video of the image acquisition equipment is read in a video or RTSP (real time streaming protocol) pull stream mode; s5:

inputting each frame of image data into an engine model for detection and result analysis;

the specific steps of the step S5 are as follows: inputting each frame of image data acquired in the step S4 into the step S3 to acquire an engine model, and detecting each frame of image data; if the existence of the motor vehicle is detected in the current frame of image data, storing a detection result; the coordinates of four points of the rectangular frame of the detected motor vehicle are put into a group to be stored, and the coordinates of the central point of the rectangular frame are calculated, wherein the calculation formula is as follows: (c_x= ((x_left+x_right))/2), (c_y= ((y_left+y_right))/2), wherein (x_left, y_left), (x_right, y_right) represent coordinates of upper left and lower right corners of the rectangular frame, and c_x, c_y represent coordinates of a center point of the rectangular frame;

s6: tracking the target of the motor vehicle ID and updating the track state;

the specific steps of the step S6 are as follows:

s62: using the detection result of the engine model as a target frame input of a multi-target tracking model (deep model), and taking the obtained track as a current frame track;

The Deep SORT multi-target tracking algorithm used in the multi-target tracking model (Deep SORT model) adopted by the invention is an improvement on the SORT algorithm, and the improvement such as cascade matching and judgment on the track state is added on the basis of the Deep SORT algorithm. In the matching process, three situations are considered in which the prediction frame, the track and the state thereof can respectively represent the target: targets that continue to appear in the video, new targets that appear, and old targets that disappear. And for the continuously-appearing target, carrying out Kalman filtering prediction according to the result of the current frame and continuously carrying out matching in the next frame according to the detection result and the prediction result. For the newly emerging target, it is similar to the processing of the first frame and is directly converted into track information, which is temporarily retained and matched in the subsequent frames. For the old target that has disappeared, track information is still kept for it temporarily until after disappearing a certain number of times, the track is deleted. The SORT is characterized in that the target detection method based on the fast R-CNN utilizes a Kalman filtering algorithm and a Hungary algorithm, so that the speed of multi-target tracking is greatly improved, and meanwhile, the accuracy of SOTA is achieved. The algorithm is really one widely used in practical application, and the core is two algorithms: kalman filtering and hungarian algorithms. The Kalman filtering algorithm is divided into two processes, namely prediction and updating. The algorithm defines the motion state of the object as 8 normally distributed vectors. And (3) predicting: when the target moves, the parameters such as the position and the speed of the target frame of the current frame are predicted through the parameters such as the target frame and the speed of the previous frame. Updating: and carrying out linear weighting on the predicted value and the observed value and the two normally distributed states to obtain the state predicted by the current system. Hungarian algorithm: the problem of bipartite graph distribution is solved, and the IOU cost matrix for calculating the similarity in the main MOT step is obtained, so that the similarity matrix of the front frame and the rear frame is obtained. The hungarian algorithm solves the real matching goal of the two frames before and after by solving the similarity matrix.

S7: judging whether the motor vehicle ID passes through the target area and counting;

the specific steps of the step S7 are as follows:

s71: judging whether the motor vehicle enters a specified area according to the central point coordinates of the motor vehicle detected in the step S5, drawing the motor vehicle in the specified area by using opencv, and obtaining a target area by taking the coordinate points of the polygon as parameters; opencv is known as Open Source Computer Vision Library, which is a cross-platform computer vision library;

s73: and saving the number information of the motor vehicles passing through the target area to a database and ending.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. The monitoring method for vehicle identification and quantity statistics based on improved YOLOv5 is characterized by comprising the following steps of:

s6: tracking the target of the motor vehicle ID and updating the track state;

2. The method for monitoring the vehicle identification and quantity statistics based on the improved YOLOv5 according to claim 1, wherein the specific steps of the step S1 are as follows:

3. The method for monitoring the vehicle identification and quantity statistics based on the improved YOLOv5 according to claim 2, wherein the specific steps of the step S2 are as follows:

s21: building a YOLOv5 algorithm model, and adding a crisscross attention mechanism on a YOLOv5 network structure, wherein the formula of the crisscross attention mechanism is as follows:

wherein H' _u An output vector representing a u-th position; t represents the length of the sequence, H represents the height, W represents the width, A _i,u An attention weight for weighting information considering different positions, the importance of the ith position to the ith position being represented; phi _i,u Representing the feature vector obtained through affine transformation; h _u An input vector representing a u-th position;

4. The method for monitoring the identification and quantity statistics of vehicles based on improved YOLOv5 according to claim 3, wherein the specific steps of step S3 are as follows:

s34: and carrying out INT8 quantization on the YOLOv5 motor vehicle identification statistical model according to the calculated relative entropy, simultaneously reading the calibration data set in the step S31, collecting the histogram of each layer of activation value, calculating the minimum threshold value of KL divergence by adopting a KL divergence calibration method, and carrying out model calibration to obtain an engine model after INT8 quantization.

5. The method for monitoring the identification and quantity statistics of vehicles based on the improved YOLOv5 according to claim 4, wherein the formula of the KL divergence calibration method in step S34 is:

KL(P||Q)＝ΣP(x)*log(P(x)/Q(x))；

6. The method for monitoring the vehicle identification and quantity statistics based on the improved YOLOv5 according to claim 4, wherein the specific steps of the step S5 are as follows: inputting each frame of image data acquired in the step S4 into the step S3 to acquire an engine model, and detecting each frame of image data; if the existence of the motor vehicle is detected in the current frame of image data, storing a detection result; the coordinates of four points of the rectangular frame of the detected motor vehicle are put into a group to be stored, and the coordinates of the central point of the rectangular frame are calculated, wherein the calculation formula is as follows: (c_x= ((x_left+x_right))/2), (c_y= ((y_left+y_right))/2), wherein (x_left, y_left), (x_right, y_right) represent coordinates of upper left and lower right corners of the rectangular frame, and c_x, c_y represent coordinates of a center point of the rectangular frame.

7. The method for monitoring the vehicle identification and quantity statistics based on the improved YOLOv5 of claim 6, wherein the specific steps of step S6 are as follows:

s61: detecting motor vehicles appearing in each frame of image data by adopting a pre-trained multi-target tracking model, extracting the characteristics of each motor vehicle, and endowing the motor vehicles with ID;

s62: using the detection result of the engine model as the target frame input of the multi-target tracking model, and taking the obtained track as the current frame track;

8. The method for monitoring the vehicle identification and quantity statistics based on the improved YOLOv5 of claim 7, wherein the specific steps of step S7 are as follows:

s72: judging whether the motor vehicle passes through the target area for the first time within a certain time period, if so, marking the motor vehicle as a counted state, and adding one to the counted number of the motor vehicles; then adopting a multi-target tracking model to continuously track the targets of the motor vehicle so as to prevent the vehicles from repeatedly counting when passing through the target area again; if not, not counting the vehicles;

9. The method for monitoring the vehicle identification and quantity statistics based on the improved YOLOv5 according to claim 8, wherein the video of the image capturing device is read in step S4 in the form of video or RTSP pull stream.

10. The method for monitoring the identification and quantity statistics of vehicles based on improved YOLOv5 according to claim 5, wherein the step S1 uses Mosaic mobile mode for data enhancement.