CN111144273A

CN111144273A - Non-motor vehicle detection method

Info

Publication number: CN111144273A
Application number: CN201911346083.9A
Authority: CN
Inventors: 王凤石
Original assignee: Suzhou Aecs Automotive Electronics Co ltd
Current assignee: Beijing Aoyikesi Technology Co ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2020-05-12

Abstract

The invention discloses a non-motor vehicle detection method, which comprises the following steps: s1, preprocessing the image, acquiring a video image shot by a traffic video camera, and standardizing the specification of the video image; and S2, detecting the non-motor vehicles, namely, using a YOLOv3 network on an OpenCV platform and combining the video images subjected to the standardized processing to detect the non-motor vehicles to obtain a detection result. The method has short running time and higher recognition rate and accuracy, and can effectively realize the detection of the non-motor vehicles entering the motor vehicle lane.

Description

Non-motor vehicle detection method

Technical Field

The invention relates to a detection method, in particular to a non-motor vehicle detection method based on deep learning, and belongs to the technical field of artificial intelligence.

Background

With the rapid development of computer technology in recent years, various related technologies based on computer technology are rapidly developed, and among them, the target detection technology is receiving more and more attention. The target detection technology is an important branch in the field of image processing and computer vision, and the research method is mainly a background modeling-based method and a characteristic information-based detection method proposed by Taigman Y et al, namely, the set non-motor vehicle target is detected and positioned in the actual traffic image. However, until now, few researchers have developed target detection studies for non-motorized vehicles such as bicycles; in addition, since the non-motor vehicle target in the road traffic image may change due to changes in illumination, viewing angle, and driver occlusion, it is also difficult to directly apply the prior art to non-motor vehicle detection.

In the conventional target detection method, such as the hog (histogramf orintedredgradition) proposed by Taigman Y et al, sift (scale acquired Feature transform) proposed by Ma Xiaoyu et al, and other machine methods, the classification and identification are mainly performed by extracting target features and inputting the extracted features into classifiers such as an svm (support vector machine), an iterator (Adaboost), and the like. When the detection method is applied, the whole feature extraction process is very complex and is very essential, the features are manually designed, proper features need to be selected for different images, the large-scale application is not facilitated, and the generalization capability is poor.

With the rise of deep learning, some methods of deep learning are rapidly applied to the field of target detection, with deep Convolutional Neural Network (CNN) being the most prominent. Different from the traditional feature extraction algorithm relying on prior knowledge, the CNN has certain invariance to several transformations, deformation, illumination and the like, effectively overcomes the difficulty brought by the changeability of the appearance of the non-motor vehicle to the detection, can construct feature description in a self-adaptive manner under the drive of training data, and has the characteristics of high flexibility and strong generalization capability. In 2013, R-CNN is taken as a precursor of the application field of deep learning target detection, the traditional machine learning and the deep learning are innovatively combined, then the optimization of target monitoring networks such as SPP-Net appears, and the subsequent FastR-CNN integrates the advantages of R-CNN and SPP-Net, so that the rapid and accurate detection of non-motor vehicle targets under variable road traffic images becomes possible. However, the detection effect of Fast R-CNN depends on the number of extracted Object Suggestions (OPs) of the sample image, the extraction of a large number of OPs is time-consuming and labor-consuming, the burden of model training is increased, and the overfitting phenomenon is easy to occur in the process of model training, which has high requirements on the network and hardware.

In general, the above methods have various disadvantages such as low accuracy, poor robustness, and long recognition time. Therefore, how to provide a brand-new non-motor vehicle detection method based on the prior art to overcome various deficiencies in the prior art becomes a problem to be solved by the technical staff in the field.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a method for detecting a non-motor vehicle based on deep learning, which is as follows.

A non-motor vehicle detection method comprising the steps of:

s1, preprocessing the image, acquiring a video image shot by a traffic video camera, and standardizing the specification of the video image;

and S2, detecting the non-motor vehicle by using a YOLOv3 network on the OpenCV platform and combining the preprocessed video image to obtain a detection result.

Preferably, the image preprocessing of S1 includes the following steps: the method comprises the steps of obtaining video images shot by a traffic video camera, uniformly adjusting the video images into 416 x 416 pixel video images, and collecting the adjusted video images.

Preferably, the non-motor vehicle detection of S2 includes the following steps:

s21, generating a boundary box by using a YOLOv3 network on an OpenCV platform and outputting the boundary box as the detection of prediction;

s22, inputting the preprocessed video image into a YOLOv3 network, decomposing the video image into pictures containing a plurality of grid units by the YOLOv3 network, and then processing the pictures by taking the grid units as units to predict a plurality of bounding boxes in the units;

s23, after reading a frame from the input video image by a YOLOv3 network, converting the frame into an input format blob required by a neural network through a blob FromImage function;

s24, the output blob format is used as input to be transmitted to a neural network and transmitted in the forward direction, and the obtained prediction bounding box list is operated to be used as network output;

and S25, drawing a bounding box subjected to non-maximum suppression filtering on the input frame, and assigning a class label and a confidence score to the bounding box to finally obtain a detection result.

Preferably, S21 includes the steps of: using bounding boxes generated by a YOLOv3 network as detection outputs of the predictions, each prediction box being associated with a confidence score; in the first stage, all bounding boxes with confidence levels below the threshold are ignored, and the rest bounding boxes are subjected to non-maximum suppression.

Preferably, the step of averagely decomposing the video image into a picture containing a plurality of grid cells in S22 includes the steps of: the video image is averagely decomposed into a picture containing 13 × 13 grid cells of 169 cells, and the size of the 169 grid cells is changed according to the size of the input video image.

Preferably, each grid cell is 32 x 32 pixels in size.

Preferably, between S22 and S23, the method further comprises the following steps:

and S220, starting the video writer to store the detected frame with the output bounding box.

Preferably, the converting the blob into the input format blob required by the neural network through the blob frommage function in S23 includes the following steps: the blob fromimage function scales the pixel values of the image between the 0 and 1 ranges using a scaling factor of 1/255.

Preferably, the method further includes the step of S24: and carrying out subsequent processing on the obtained prediction boundary box, and filtering out a box with low confidence coefficient.

Preferably, in S25: and marking the detection result by a boundary frame, and marking that the non-motor vehicle breaks into the motor lane when the motor vehicle and the non-motor vehicle appear in the same lane to finish the detection of the non-motor vehicle.

The advantages of the invention are mainly embodied in the following aspects:

according to the non-motor vehicle detection method, the processed image is input into a target detection framework based on OpenCV and Yolov3 deep learning, and motor vehicles and non-motor vehicles in the image are marked through a boundary frame, so that the non-motor vehicle detection can be realized when the non-motor vehicles appear in a motor lane. The method has short running time and higher recognition rate and accuracy, and can effectively realize the detection of the non-motor vehicles entering the motor vehicle lane.

In addition, the method has very wide application prospect. Researchers can expand and extend on the basis of the technical scheme of the invention, and apply similar technologies to other technical schemes such as a detection recognition alarm system of non-motor vehicles, an intelligent non-motor vehicle recognition tracking system and the like in the same field to realize the popularization and application of the technology.

The following detailed description of the embodiments of the present invention is provided in connection with the accompanying drawings for the purpose of facilitating understanding and understanding of the technical solutions of the present invention.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of the pretreatment process of the present invention;

FIG. 3 is a diagram showing the detection results of the present invention.

Detailed Description

The invention provides a non-motor vehicle detection technology based on deep learning, which can quickly and accurately detect a non-motor vehicle entering a motor vehicle lane, and belongs to the field of computer vision.

As shown in fig. 1, a non-motor vehicle detection method includes the following steps.

And S1, preprocessing the image. Because the images shot by the traffic video camera are different in size, the video images shot by the traffic video camera are obtained here, the specification of the video images is standardized, that is, the video images are uniformly adjusted to 416 × 416 pixels, and the adjusted video images are collected, and the result is shown in fig. 2.

S2, non-motor vehicle detection, wherein a YOLOv3 network framework is supported from OpenCV 3.4.2, a YOLOv3 network on an OpenCV platform is used, and a non-motor vehicle detection result is obtained by combining a preprocessed video image. The method specifically comprises the following steps:

s21, generating a boundary box by using a YOLOv3 network on an OpenCV platform and outputting the boundary box as the detection of prediction; wherein each prediction box is associated with a confidence score; for the convenience of subsequent processing, in the first stage, all the bounding boxes with the confidence degrees lower than the threshold value are ignored, and the rest bounding boxes are subjected to non-maximum suppression, so that redundant overlapped bounding boxes are eliminated through the operation.

S22, inputting the preprocessed video image into a YOLOv3 network, the YOLOv3 network averagely decomposes the video image into a picture containing 13 × 13 and 169 grid cells, and the size of the 169 grid cells changes according to the size of the input video image; in the present embodiment, for a 416 × 416 pixel video image, the size of each grid cell is 32 × 32 pixels; then, picture processing is performed in units of grid cells, and a plurality of bounding boxes in a unit are predicted.

At this point, the video composer needs to be started to save the detected frame with the output bounding box.

S23, after reading a frame from the input video image by a YOLOv3 network, converting the frame into an input format blob required by a neural network through a blob FromImage function; in this process, the blob fromimage function scales the image pixel values between the 0 and 1 ranges using a scaling factor of 1/255.

S24, the output blob format is used as input to be transmitted to a neural network and transmitted in the forward direction, and the obtained prediction bounding box list is operated to be used as network output; these boxes are subjected to certain subsequent processing to filter out the boxes with lower confidence.

S25, drawing a bounding box subjected to non-maximum suppression filtering on the input frame, and assigning its category label and confidence score, and finally obtaining the detection result, as shown in fig. 3.

And marking the detection result by a boundary frame, and marking that the non-motor vehicle breaks into the motor lane when the motor vehicle and the non-motor vehicle appear in the same lane to finish the detection of the non-motor vehicle.

To verify the effectiveness of the above method, 1200 bicycle and electric vehicle image data sets were selected and identified as shown in table 1.

TABLE 1 results of the experiment

Name (R)	Total data set	Identification number	Recognition rate	Run time
					Bicycle with a wheel	1500	1421	94.73%	159s
Electric vehicle	1500	1397	93.13%	164s

As can be seen from Table 1, the method of the present invention has a high recognition rate and a fast running time.

Subsequently, a comparison experiment is carried out on the method of the invention and the existing deep learning algorithm, 1800 images of the non-motor vehicles entering the motor vehicle lane are selected for recognition, and the recognition result is shown in table 2.

Table 2 comparative experimental results

Name (R)	Identifying accurate quantities	Rate of identification accuracy	Run time
				Method of the invention	1697	94.28%	193s
Fast R-CNN	1525	84.72%	392s
				SPP-Net	1393	77.39%	469s
R-CNN	1424	79.11%	1065s

As shown in Table 2, the method of the present invention has high accuracy and fast running time.

In summary, according to the non-motor vehicle detection method provided by the invention, the processed image is input into the target detection framework based on OpenCV and YOLOv3 deep learning, and then the motor vehicle and the non-motor vehicle in the image are marked by the boundary frame, so that the non-motor vehicle detection can be realized when the non-motor vehicle appears in the motor vehicle lane. The method has short running time and higher recognition rate and accuracy, and can effectively realize the detection of the non-motor vehicles entering the motor vehicle lane.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A method of detecting a non-motor vehicle, comprising the steps of:

2. The method of claim 1, wherein the image preprocessing of S1 comprises the steps of: the method comprises the steps of obtaining video images shot by a traffic video camera, uniformly adjusting the video images into 416 x 416 pixel video images, and collecting the adjusted video images.

3. The method of claim 2, wherein the step of detecting the non-motor vehicle at S2 comprises the steps of:

4. A method as claimed in claim 3, wherein S21 includes the steps of: using bounding boxes generated by a YOLOv3 network as detection outputs of the predictions, each prediction box being associated with a confidence score; in the first stage, all bounding boxes with confidence levels below the threshold are ignored, and the rest bounding boxes are subjected to non-maximum suppression.

5. The method as claimed in claim 3, wherein the step of decomposing the video image into a plurality of pictures with grid cells by averaging in S22 comprises the steps of: the video image is averagely decomposed into a picture containing 13 × 13 grid cells of 169 cells, and the size of the 169 grid cells is changed according to the size of the input video image.

6. A method of detecting a non-motor vehicle according to claim 5, wherein: each grid cell is 32 x 32 pixels in size.

7. The method as claimed in claim 3, further comprising the steps of, between S22 and S23:

8. A method as claimed in claim 3, wherein the step of converting the blob frommage function into the input format blob required by the neural network in S23 comprises the steps of: the blob fromimage function scales the pixel values of the image between the 0 and 1 ranges using a scaling factor of 1/255.

9. A method as claimed in claim 3, further comprising the step of, at S24: and carrying out subsequent processing on the obtained prediction boundary box, and filtering out a box with low confidence coefficient.

10. A non-motor vehicle detecting method according to claim 3, wherein in S25: and marking the detection result by a boundary frame, and marking that the non-motor vehicle breaks into the motor lane when the motor vehicle and the non-motor vehicle appear in the same lane to finish the detection of the non-motor vehicle.