CN113052011A

CN113052011A - Road target flow monitoring system based on computer vision

Info

Publication number: CN113052011A
Application number: CN202110246689.6A
Authority: CN
Inventors: 白冰; 张静; 李潇峥
Original assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Current assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-06-29

Abstract

The invention discloses a road target flow monitoring system based on computer vision, wherein a research target is image target monitoring, a yolov4 convolutional neural network model based on machine vision library opencv is used for carrying out batch preprocessing to draw a detection target, a loss function of original yolov4 is optimized, each drawn recognition object on a picture is selected by utilizing a modular matrix detection system frame, objects are classified and counted, and the coordinate of each monitored object is output; setting and comparing self-defined critical values for the deviation of the acquired data; the obtained video, data and identification results are stored and are imported into a database in real time, and big data are collected and analyzed, so that the classification of the targets is more refined, a foundation is provided for imaging management, the operation efficiency is improved while the identification precision is ensured, the detection speed is increased, and a realistic and reliable data source is provided for subsequent planning.

Description

Road target flow monitoring system based on computer vision

Technical Field

The invention relates to the technical field of image target monitoring, in particular to a road target flow monitoring system based on computer vision.

Background

At present, domestic common traffic flow detection methods comprise ultrasonic detection, infrared detection, annular induction coil detection and computer vision detection, the ultrasonic detection precision is not high, the method is easily influenced by vehicle shielding and pedestrians, and the detection distance is short (generally not more than 12 m); the infrared detection is influenced by the heat source of the vehicle, the noise resistance is not strong, and the detection precision is not high; the annular induction coil is high in detection precision, but is required to be arranged in a pavement civil structure, damage is caused to a pavement, construction and installation are inconvenient, and the number of installation is large.

Disclosure of Invention

In order to overcome the problems, the invention provides a road target flow monitoring system based on computer vision, which comprises the following steps:

step one, data acquisition, wherein the data acquisition comprises static picture acquisition and dynamic video acquisition, the state and the position of a field detection target are acquired by a camera, and a sensor device is used for detecting the environment; the cloud end is connected with the intelligent terminal, researchers realize remote visual monitoring and control management on the traffic control system, the image acquisition and intelligent recognition system is presented in a C/S framework, and interaction between intelligent equipment and a background control program is realized;

step two, analyzing data, wherein in the data analysis process, a filter is in a self-defined mode, and background removal, critical value, corrosion, bilateral filtration, median blurring or form expansion processing is carried out according to the collection of a sample to obtain a more accurate result;

and step three, storing data, namely storing the acquired video, data and identification results, importing the stored data into a database in real time to acquire and analyze big data, so that the classification of the target is more detailed, connecting the database with an upper computer, and realizing data information transmission between the database and the upper computer.

Preferably, in the second step, the etching process is performed to remove boundary points of the image in the canvas, and the boundary is shrunk inwards according to the formulated parameters to remove objects which are meaningless than the detection target.

Preferably, in step two, for the median blurring processing of the image, a medianburr function is adopted as a nonlinear filter, and the median of the current field is taken as the gray value of the current point.

Preferably, the object recognition is performed through a YOLO network, and specifically includes the following steps:

reading information in a camera in real time, sending the information into a YOLO network, and equally dividing a picture into a plurality of lattices;

step two, predicting whether a target central point exists in each grid, if so, realizing target detection, wherein the working content of the target needs to determine whether the target belongs to a detection object and the position of the detection object, and the following parameters need to be obtained when predicting the target position: the position of the central point of the target frame, the length and the width of the target frame and the confidence coefficient are used for judging whether the content in the selected frame belongs to the detection object or not;

and step three, calculating confidence, and adopting a probability statement method when sampling to estimate the overall parameters, namely that the estimated values and the overall parameters are within a certain allowable error range, and the corresponding probabilities are calculation targets.

Preferably, during the capturing, a single object with a given label in the detection image is located, after which all objects with a given label in the detection image are detected.

Preferably, setting a shooting area of a camera to be white within 60 seconds, wherein the traffic flow is less than 10; the traffic flow is more than or equal to 10 and less than 20, and the area is green; the traffic flow is more than or equal to 20 and less than 30, and the area is blue; the traffic flow is more than or equal to 30 and less than 50, and the area is yellow; the traffic flow is more than or equal to 50, and the area is red.

For the normalized index function softmax, the result of the multi-classification is expressed in the form of numerical probability, and to ensure the nonnegativity of the probability, softmax translates the prediction result of the model onto the index function.

The larger the loss function is, the smaller the occupation probability of the class on the label is, and the worse the performance is, so that the softmax function is easy to cause a problem of numerical stability in the aspect of processing details. YOLOv4 replaces the softmax function with multiple independent logic operators, greatly reducing the probability of this problem occurring.

To calculate the likelihood that the input belongs to a particular tag. When calculating the classification loss, binary cross entropy loss is used for each label, so that the use of a softmax function is avoided, and the complexity of calculation is reduced.

Recall increases with increasing predictive label, but Precision fluctuates up and down.

The network structure of YOLO has 5 layers: input layer-read in data from the detected image/video; convolution layer-a complete image obtained by reassembling the local characteristics of the object through a weight matrix; pooling layers, namely selecting the maximum value of the corresponding pixel points in the convolved feature image as the pooled feature, and using the pooled feature to reduce the dimension of the feature, compress the number of data and parameters, reduce overfitting and improve the fault tolerance of the model; a fully-connected layer; and (5) outputting the layer.

Compared with the prior art, the invention has the beneficial effects that:

the method is based on a yolov4 convolutional neural network model of a machine vision library opencv, performs batch preprocessing to draw detection targets, optimizes a loss function of an original yolov4, selects each drawn recognition object on a picture by using a modular matrix detection system frame, classifies and counts the objects, and outputs the coordinates of each monitored object. Setting and comparing self-defined critical values for the deviation of the acquired data; the obtained video, data and identification results are stored and are imported into a database in real time, and big data are collected and analyzed, so that the classification of the targets is more refined, a foundation is provided for imaging management, the operation efficiency is improved while the identification precision is ensured, the detection speed is increased, and a realistic and reliable data source is provided for subsequent planning.

The neural network can obtain the control of a proposed driving program through the analysis and statistics of big data, self-defines real-time optimized parameters, and can uniformly finish the extraction of characteristics and the classification of monitoring targets in a network environment; setting and comparing self-defined critical values for the deviation of the acquired data; the obtained video, data and identification results are stored and are imported into a database in real time, and big data are collected and analyzed, so that the classification of the targets is more refined, a foundation is provided for imaging management, the operation efficiency is improved while the identification precision is ensured, the detection speed is increased, and a realistic and reliable data source is provided for subsequent planning.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a data processing flow diagram of the present invention;

FIG. 2 is a confusion matrix diagram;

fig. 3 is a diagram of a YOLO network structure.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to fig. 1 to 3, the present invention provides the following technical solutions:

example one

A road target flow monitoring system based on computer vision comprises the following steps:

In the second step, the purpose of the erosion processing is to eliminate the boundary points of the image in the canvas, and to shrink the boundary inward according to the formulated parameters, so as to eliminate objects that are meaningless than the detection target.

In the second step, for the median fuzzy processing of the image, a medianburr function is adopted as a nonlinear filter, and the median of the current field is taken as the gray value of the current point.

The process of identifying the object by the YOLO network is as follows:

1. reading information in a camera in real time, sending the information into a YOLO network, and equally dividing a picture into 7 × 7 grids;

2. predicting whether a target central point exists in each grid, if so, realizing target detection, wherein the working content of the target needs to determine whether the target belongs to a detection object and the position of the detection object, and the following parameters need to be obtained when predicting the position of the target: the position of the central point of the target frame, the length and the width of the target frame and the confidence coefficient are used for judging whether the content in the selected frame belongs to the detection object or not;

3. confidence is calculated, and when sampling estimates the overall parameters, the conclusion is always uncertain due to the randomness of the samples. Therefore, a statement method of probability is adopted, namely, the estimated value and the overall parameter are within a certain allowable error range, and the corresponding probability is the calculation target.

Confidence of jth bounding box of ith grid cell.

During shooting, a single object with a given label in the detection image is located, and then all objects with the given label in the detection image are detected.

Setting a shooting area of a camera for 60 seconds, wherein the traffic flow is less than 10, and the area is white; the traffic flow is more than or equal to 10 and less than 20, and the area is green; the traffic flow is more than or equal to 20 and less than 30, and the area is blue; the traffic flow is more than or equal to 30 and less than 50, and the area is yellow; the traffic flow is more than or equal to 50, and the area is red.

The network structure of the YOLO comprises an input layer, a convolution layer, a pooling layer, a full-link layer and an output layer; the input layer reads data from the detected image/video; the convolution layer reassembles the local characteristics of the object through a weight matrix to obtain a complete image; and the pooling layer selects the maximum value of the corresponding pixel points in the convolved feature image as the pooled feature, and is used for feature dimension reduction, compressing the number of data and parameters, reducing overfitting and improving the fault tolerance of the model.

It will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer readable storage medium.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A road target flow monitoring system based on computer vision is characterized by comprising the following steps:

2. The system of claim 1, wherein in step two, the erosion process is aimed at eliminating boundary points of the image in the canvas, and the boundary is shrunk inwards according to the parameters set forth to eliminate meaningless objects compared with the detected object.

3. The system for monitoring the target flow rate of the road based on the computer vision as claimed in claim 1, wherein in the second step, the median fuzzy processing of the image adopts medianburr function as a nonlinear filter, and the median of the current domain is taken as the gray value of the current point.

4. The system for monitoring the road target flow based on the computer vision as claimed in claim 1, wherein the object recognition is performed through a YOLO network, which includes the following steps:

5. The system of claim 4, wherein during the capturing process, a single object with a given tag in the detected image is located, and then all objects with a given tag in the detected image are detected.

6. The system for monitoring the road target flow based on the computer vision as claimed in claim 4, wherein a camera is planned to shoot an area within 60 seconds, the traffic flow is less than 10, and the area is white; the traffic flow is more than or equal to 10 and less than 20, and the area is green; the traffic flow is more than or equal to 20 and less than 30, and the area is blue; the traffic flow is more than or equal to 30 and less than 50, and the area is yellow; the traffic flow is more than or equal to 50, and the area is red.

7. The computer vision based road target flow monitoring system of claim 4, wherein the network structure of YOLO comprises an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer.

8. The system of claim 7, wherein the input layer reads data from the detected image/video.

9. The system of claim 7, wherein the convolutional layer is a complete image obtained by reassembling local features of the object through a weight matrix.

10. The system of claim 7, wherein the pooling layer selects the maximum value of the corresponding pixel points in the convolved feature image as the pooled feature for feature dimension reduction, compresses the number of data and parameters, reduces overfitting, and improves the fault tolerance of the model.