CN111062384B

CN111062384B - Vehicle window accurate positioning method based on deep learning

Info

Publication number: CN111062384B
Application number: CN201911089593.2A
Authority: CN
Inventors: 韩梦江; 楼燚航; 白燕; 张永祥; 陈杰
Original assignee: Boyun Vision Beijing Technology Co ltd
Current assignee: Boyun Vision Beijing Technology Co ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2023-09-08
Anticipated expiration: 2039-11-08
Also published as: CN111062384A

Abstract

The invention discloses a vehicle window accurate positioning method based on deep learning, which comprises the following steps: s1, acquiring a rough positioning frame of a vehicle window in a first stage; s11, selecting a sample group, and calibrating corner coordinates of a vehicle window in a picture; s12, saving the picture and the corner coordinates as a data set; s13, inputting the data set into a deep convolution network in the first stage to extract a feature map; s14, inputting the feature map into a BOX regression layer to obtain a rough positioning frame of the vehicle window; s2, acquiring four accurate angular point coordinates of a vehicle window in a second stage; s21, enlarging a general positioning frame of the vehicle window; s22, capturing pictures in the enlarged candidate frames; s23, converting the corner coordinates into relative coordinates relative to the enlarged candidate frame; s24, inputting the intercepted picture into a depth convolution network of a second stage to extract a feature image, and converting the feature image into a feature vector; s25, inputting the feature vector into a linear regression layer to obtain the accurate angular point coordinates of the vehicle window.

Description

Vehicle window accurate positioning method based on deep learning

Technical Field

The invention relates to the field of image processing, in particular to a vehicle window accurate positioning method based on deep learning.

Background

In recent years, intelligent traffic systems and intelligent monitoring have been rapidly developed, and window recognition has a significant role in the fields of intelligent traffic and intelligent monitoring. The electronic police and the vehicle bayonet can acquire a large number of high-definition vehicle pictures in real time, and the effective application of the pictures and the acquisition of more information as much as possible to help relieve traffic management pressure are important points focused in the fields of intelligent traffic and intelligent monitoring at present. The vehicle window recognition provides possibility for further analysis of driver information, safety belt positioning and improvement of accuracy of vehicle type recognition in the fields of intelligent transportation and intelligent monitoring. In addition, if the vehicle window can be accurately positioned, more interference can be eliminated, and the vehicle internal information can be further and accurately acquired.

The goal of window positioning is to automatically identify the vehicle window for a given series of vehicle pictures taken from different cameras, with different colors, directions, types and sizes.

At present, aiming at the problem of vehicle window positioning, the vehicle window is generally detected by utilizing effective information of the characteristics of vehicle color, texture, space relation and the like, and the traditional method comprises the following steps: the method is characterized in that the vehicle window is roughly positioned by utilizing texture information of the vehicle in the picture, and the vehicle is subjected to color space transformation and then texture detection, so that the robustness of the algorithm is deteriorated due to the fact that the vehicle is excessively depended on the color texture information of the vehicle, so that the performance of the detection algorithm is greatly reduced under different illumination and vehicle color information, and the method is characterized in that the vehicle window is positioned by utilizing a sliding window by taking the positioned vehicle window position as a reference, and the accuracy and the precision of the positioning of the vehicle window position are difficult to meet the use requirements of people.

Disclosure of Invention

The invention aims to solve the problems, and provides a vehicle window accurate positioning method based on deep learning, which can accurately position a vehicle window and output four corner coordinates of the vehicle window.

In order to achieve the above object, the technical scheme of the present invention is as follows:

a vehicle window accurate positioning method based on deep learning; the method comprises the following steps:

s1, acquiring a rough positioning frame of a front window of a vehicle in a first stage;

s11, selecting a vehicle picture as a sample group, and manually calibrating coordinates of four corner points of the upper left, the upper right, the lower left and the lower right of a front window in the vehicle picture;

s12, storing each vehicle picture and a front window corner coordinate picture in the vehicle picture correspondingly to form a data set;

s13, inputting a data set into a first-stage deep convolutional network, wherein the first-stage deep convolutional network is a 23-layer neural network, performing five-time convolution operation on pictures in the data set, regularizing an output characteristic graph in batches after each convolution operation, inputting the regularized output characteristic graph into an activation function, performing maximum value pooling operation after the former four convolution operations, and adding one branch to the deep convolutional network after the five convolution operations; in the two branches, one branch continues to perform five times of convolution operation and one time of full convolution operation, the other branch fuses the feature image before the branch and the feature image obtained after the former branch performs five times of convolution operation in the channel direction, and finally the two branches perform one time of convolution operation and one time of full convolution operation respectively to obtain a fused vehicle picture feature image;

s14, inputting the vehicle picture feature map and corresponding front window corner coordinates into a BOX regression layer, and returning to a rough positioning frame of the front window after optimizing a loss function;

s2, four accurate angular point coordinates of a front window of the vehicle are obtained in a second stage;

s21, expanding the front window rough positioning frame obtained in the first stage by 1.3 times in the width and height directions to obtain an expansion candidate frame;

s22, capturing and expanding the pictures in the candidate frames in the vehicle pictures to form new pictures;

s23, converting coordinates of four corner points of the front window calibrated manually into relative coordinates relative to the expansion candidate frame;

s24, inputting the new picture obtained after interception into a second-stage deep convolution network, extracting a feature map of the new picture through the second-stage deep convolution network, and converting the feature map into a feature vector through a full connection layer of the second-stage deep convolution network;

s25, inputting the feature vector and the transformed relative coordinates into a linear regression layer, and obtaining four accurate corner coordinates of the front window through regression after optimizing a loss function.

Further, in the step S14, a loss function of the BOX regression layer is smoothl 1 loss; the calculation formula is as follows:

；

wherein ,to indicate a parameter, when the value is 1, it represents that the ith default box is matched with the jth group trunk; n is the number of candidate frames; m is a position parameter of the boundary frame, cx represents an x coordinate of a center point of the boundary frame, cy represents a y coordinate of the center point of the boundary frame, w represents a width of the boundary frame, and h represents a height of the boundary frame; l is the predicted value of the boundary box position corresponding to the default box, < >>Is the corresponding group trunk position parameter value.

Further, in the step S24, five convolution operations are performed on the new image obtained after the interception, the activation function after the convolution uses the linear rectification function with the parameters as the activation function, a pooling layer is connected after the previous four convolution operations to perform the maximum pooling operation, a full connection layer is connected after the five convolution operations, and the extracted feature image is integrated into a feature vector.

Further, the loss function of the linear regression layer in step S25 adopts an L2 norm loss function, and the calculation formula is as follows:

；

wherein θ is the weight of the deep convolutional network of the second stage, i is the samples of each batch, j is the index of 4 corner points of the front window in each vehicle picture,for the front window corner x coordinate to be regressed, < >>The x coordinate of the front window corner point of the group trunk is that w is the width of a new picture obtained after interception, and the left/right is that of the front window corner point of the group trunk>For the y coordinate of the front window corner point to be regressed, < >>And h is the y coordinate of the front window corner point of the ground trunk, and h is the height of the new picture obtained after interception.

Compared with the prior art, the invention has the advantages and positive effects that:

the invention provides a detection method for regression of a rough positioning frame of a vehicle window by using a deep convolution network and further for accurate angular point coordinate regression of the vehicle window.

According to the invention, the accurate coordinates of the angular points of the vehicle window are obtained by utilizing two neural networks in stages, so that the positioning precision and accuracy of the vehicle window are greatly improved; the improved design of combining the small-sized deep convolutional network and the regression algorithm is utilized, so that the calculation speed of the neural network is effectively improved; on the other hand, the vehicle window is positioned in a mode of solving the coordinates of the four corner points of the vehicle window, so that the position information of the trapezoid vehicle window can be accurately obtained, a large amount of edge interference generated by the rectangular positioning frame is eliminated, and the method is also beneficial to more accurately obtaining the information in the vehicle.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a diagram of a training model framework in a first stage;

FIG. 2 is a block diagram of a first stage deep convolutional network;

FIG. 3 is a feature diagram of a BOX regression layer;

FIG. 4 is a model framework diagram of a BOX regression layer;

fig. 5 is a diagram of a training model framework for the second stage.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, modifications, equivalents, improvements, etc., which are apparent to those skilled in the art without the benefit of this disclosure, are intended to be included within the scope of this invention.

As shown in fig. 1 to 5, the invention provides a method for detecting a rough positioning frame of a vehicle window by using a depth convolution network and further performing accurate angular point coordinate regression of the vehicle window.

The training model framework of the invention in the stage of acquiring the approximate positioning frame of the car window is shown in fig. 1.

In the first stage, two data sets are input in batches to train the network model in a time division mode, one is a picture set which contains vehicle pictures with different colors, directions, types and sizes in a real monitoring shooting scene, and the other is coordinates of a window group trunk marking frame corresponding to each vehicle picture. And extracting multi-level and multi-scale features of the picture from the input picture data set through a deep convolution network, inputting the features into a BOX regression layer after extracting features of different scales and layers of the vehicle picture, matching a default BOX with a group trunk BOX of a labeling frame, and regressing approximate coordinates of a vehicle window positioning frame by using the matched default BOX.

The deep convolutional network in this stage is different from the common classified neural network, and is a network structure modified based on YOLOV3-TINY, and the network structure is shown in fig. 2.

As shown in fig. 2, five convolution operations are performed on an input picture, the output feature map is regularized in batches after each convolution, and then the regularized feature map is input into an activation function, and after the former four convolution operations, a maximum value pooling operation is connected at the back, so that the height and the width of the feature map output after each convolution are halved; after five convolution operations, the network adds one branch in the middle, and makes different times of convolution operations on each branch. The reason why the network structure is split into two branches here is to enable the network to extract feature information of different scales. And (3) continuing to perform five convolution operations and one full convolution operation on one branch in the two branch networks, fusing the characteristic diagram before the branch and the characteristic diagram obtained after the previous branch performs five convolutions on the other branch in the channel direction, and performing up-sampling on the previous branch before the fusion to ensure that the characteristic diagrams of the two branches are consistent in height and width, performing one convolution operation and one full convolution operation on the two branches, and inputting the characteristic diagrams into a BOX regression layer.

As shown in fig. 3, in the BOX regression layer, default boxes (shown by blue and red dotted lines in the figure) with different aspect ratios are preset on feature diagrams with different dimensions, then the default boxes and the ground trunk boxes are matched according to the IOU, then the matched default boxes are subjected to position regression, and the approximate positioning frame of the optimal vehicle window is obtained through selection in a non-maximum suppression mode. In practice, our network generates two feature maps of 10×10 and 20×20, and sliding windows are made on the two feature maps to match the default box and the ground trunk.

As shown in fig. 4, for default boxes with different preset aspect ratios, we match these default boxes with the input group trunk by comparing the best results Best Jaccard Overlap of the default boxes and the group trunk calculated with respect to the jaccard coefficients, if the calculation result between them is greater than the preset threshold, we consider the matching to be successful, and then add this default box to the list to be regressed. The Jaccard coefficients are calculated as follows:

；

wherein A represents the area covered by the default box and B represents the area covered by the group trunk box.

The matched candidate boxes are subjected to position regression so as to be closer to the group trunk, and a loss function for regression is selected from the soil L1 loss. The specific formula of the loss function is as follows:

；

wherein ,is an indication parameter, and when the value is 1, the i default box is matched with the j group trunk; n represents the number of candidate boxes; m represents the position parameter of the bounding box, wherein cx represents the x coordinate of the center of the bounding box, cy represents the y coordinate of the center point of the bounding box, w represents the width of the bounding box, and h represents the height of the bounding box; l represents the parameter value of the predicted value of the position of the boundary box corresponding to the default box; />Is the corresponding group trunk position parameter value.

At this stage we calculate the loss function by propagating the network forward, updating the network weights according to the derivative of the samples during the back propagation, and continuously optimizing the loss function so that the network can return the input picture to the approximate positioning box of the window.

The invention trains the model frame in four angular point coordinates accurate positioning stage of the car window as shown in figure 5.

In the second stage, two data sets are input in batches to train the network model when the model is trained, one is a window picture produced in the previous stage, the positioning frame is enlarged by 1.3 times, the window picture is taken from an original image as input of the stage, and the other is a group trunk coordinate of 4 window corner points corresponding to each picture. Similar we extract multi-level features of the picture from the input picture dataset through a depth convolution network, after extracting the features of the window picture layer, input the features into a corner coordinate linear regression layer, and train model parameters by continuously reducing Euclidean distance between the corner coordinates to be regressed and the real corner coordinates.

The deep convolution network in this stage adopts a network structure modified on the oNet basis, the network firstly carries out five convolution operations on an input original picture, the activation function after each convolution adopts a linear rectification function (Prelu) with parameters, a pooling layer is connected after each convolution operation of the first four times, and the maximum pooling operation is adopted, so that the height and width of an output characteristic picture are halved each time. And finally, a full connection layer is connected to the convolution layer, the feature images are integrated into a vector, the integrated vector is input into a corner coordinate linear regression layer, and four accurate corner coordinates of the vehicle window are regressed in the layer.

In the linear regression layer, an L2 norm loss function is adopted, and in particular, in order to obtain more accurate coordinates, the x coordinate and the y coordinate of the corner coordinates to be regressed and the real corner coordinates are divided by the width and the height of an input picture respectively, so that integer pixel coordinates are converted into floating point numbers, and the effect of obtaining more accurate corner coordinates in the iteration process of reducing the loss function is achieved. The specific formula of the loss function is as follows:

；

where θ is the weight of the deep convolutional network, i is the samples of each batch, j is the 4 window corner labels of each picture,for the x-coordinate of the window corner to be regressed, +.>The x coordinate of the window corner point of the group trunk, w is the width of the input picture,/-the window corner point is the x coordinate of the window corner point of the group trunk>For the y coordinate of the window corner point to be regressed, +.>Window angle for group trunkThe point y coordinate, h is the height of the input picture.

By estimating the coordinates of the window angular points when the network propagates forwards and calculating the loss function, calculating the gradient of the loss function when the network propagates backwards and continuously updating the network weight, the loss function is continuously reduced, and the estimated coordinates of the window angular points are continuously close to the real angular point coordinates, so that the accurate coordinates of the window angular points are obtained.

In a first stage, a method for obtaining a rough regression frame of a vehicle window is mainly provided, and specifically includes the following steps:

(1) Selecting a sample group, and manually calibrating a front window;

(2) Correspondingly storing the data picture and the window corner coordinate picture to form a data set;

(3) Dividing the data set into a training set and a testing set;

(4) Extracting multi-level and multi-scale characteristics of the vehicle picture by using the depth convolution network designed in the stage;

(5) Inputting the feature map obtained by the depth convolution network into a BOX regression layer to regress to obtain a rough positioning frame of the vehicle window;

in the second stage, the invention provides a method for acquiring the accurate coordinates of four corner points of a vehicle window, which specifically comprises the following steps:

(6) Expanding the rough positioning frame of the vehicle window obtained in the first stage by 1.3 times in the width and height directions to obtain a candidate frame;

(7) Intercepting candidate frames from the original pictures to form new pictures and taking the new pictures as input of the stage;

(8) The coordinates of four corner points of the front window calibrated manually are transformed into relative coordinates relative to the candidate frame;

(9) Extracting the characteristics of an input picture by using the depth convolution network designed in the stage, and converting the characteristic picture into a 256 characteristic vector through a full connection layer Fc;

(10) Inputting the feature vector obtained by the depth convolution network into a linear regression layer of window corner coordinates, and accessing a designed L2 normal form loss function as described in the summary of the invention as an optimization target; in the actual test stage, only the feature vectors of the input pictures are extracted according to the steps, and eight values of 4 corner coordinates of the vehicle window can be output through linear regression, so that the accurate positioning of the vehicle window is obtained.

Claims

1. A vehicle window accurate positioning method based on deep learning; the method is characterized in that: the method comprises the following steps:

s13, inputting a data set into a first-stage deep convolutional network, wherein the first-stage deep convolutional network is a 23-layer neural network, firstly, performing five-time convolution Conv operation on pictures in an image data set, regularizing an output characteristic graph in batches after each convolution operation, then inputting the regularized output characteristic graph into an activation function Relu, performing maximum value pooling operation after the former four-time convolution Conv1 operation, and adding one branch to the deep convolutional network after the five-time convolution Conv5 operation; in the two branches, one branch continues to perform five times of convolution Conv6 operation and one time of full convolution Conv12 operation, the other branch fuses the feature image before the branch and the feature image obtained by up-sampling after the previous branch performs Conv12 operation in the channel direction and performs convolution Conv13 operation, and finally the two branches respectively perform one time of convolution Conv11 operation and one time of full convolution Conv15 operation to obtain a fused vehicle picture feature image;

s14, inputting the vehicle picture feature map and corresponding front window corner coordinates into a frame regression layer, and returning to a rough positioning frame of the front window after optimizing a loss function;

s23, converting four corner coordinates of a front window of the manual calibration data set into relative coordinates relative to the expansion candidate frame;

2. The vehicle window accurate positioning method based on deep learning as claimed in claim 1, wherein: in the step S14, a loss function of a frame regression layer adopts a smooth L1 loss; the calculation formula is as follows:

；

wherein ,for indicating parameters, when the value is 1, the i expansion candidate frame is matched with the j label frame; n is the number of candidate frames; m is a position parameter of the boundary frame, cx represents an x coordinate of a center point of the boundary frame, cy represents a y coordinate of the center point of the boundary frame, w represents a width of the boundary frame, and h represents a height of the boundary frame; l is the predicted value of the position of the boundary frame corresponding to the expansion candidate frame, < >>And marking frame position parameter values for the corresponding j.

3. The vehicle window accurate positioning method based on deep learning as claimed in claim 2, wherein: in the step S24, five convolution operations are performed on the new image obtained after the interception, the activation function after the convolution uses the linear rectification function with parameters as the activation function, a pooling layer is connected after the previous four convolution operations to perform the maximum pooling operation, a full connection layer is connected after the five convolution operations, and the extracted feature image is integrated into a feature vector.

4. A deep learning-based vehicle window accurate positioning method as claimed in claim 3, wherein: the loss function of the linear regression layer in step S25 adopts an L2 norm loss function, and the calculation formula is as follows:

；

wherein θ is the weight of the deep convolutional network of the second stage, i is the samples of each batch, j is the index of 4 corner points of the front window in each vehicle picture,for the front window corner x coordinate to be regressed, < >>For the j-th mark frameThe x coordinate of the front window corner point of the car, w is the width of a new picture obtained after interception, and ++>For the y coordinate of the front window corner point to be regressed, < >>And the y coordinate of the front window corner point of the jth marked frame is h, and the height of the new picture obtained after interception is h.