CN110852157A

CN110852157A - Deep learning track line detection method based on binarization network

Info

Publication number: CN110852157A
Application number: CN201910940999.0A
Authority: CN
Inventors: 段章领; 洪予晨
Original assignee: Hefei Ho Chi Chi Intelligent Technology Co Ltd
Current assignee: Hefei Ho Chi Chi Intelligent Technology Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-02-28

Abstract

The invention discloses a deep learning track line detection method based on a binarization network, which comprises the following steps: in the sample preparation stage, firstly, a plurality of road scene videos are collected through a camera, after the videos are processed into a single frame, a picture is calibrated, trained, corrected, rotationally transformed and post-processed to obtain a picture data set P, and the data set A is divided into a training set X and a testing set Y according to the ratio of 4: 1; a network establishing stage, namely establishing BiSeNet; and a network creating stage, wherein the BiSeNet is created into a binary network. The method for detecting the track line through deep learning based on the binarization network can realize the detection of the track line in the picture, has smaller requirements on time and power consumption in the detection process, namely, the memory occupies small space and the training speed is high, and can greatly compress and accelerate the neural network after the parameters of the convolution layer of the BiSeNet are binarized, thereby being beneficial to promoting the integration of the deep learning and the embedding and bringing better use prospect.

Description

Deep learning track line detection method based on binarization network

Technical Field

The invention relates to the field of track line detection based on images/videos, in particular to a depth learning track line detection method based on a binarization network.

Background

With the development of deep learning, artificial intelligence has been widely applied in various fields, and in recent years, more and more research teams have developed work on the aspects of transportation problems of underground locomotives, such as underground obstacle detection, underground track detection, underground pedestrian obstacle avoidance and the like. In order to ensure the safe transportation of underground rail locomotives, thereby reducing the casualties of miners and further achieving the ultimate goal of building intelligent and digital mines, the automatically-driven locomotives are gradually being transported to underground rails. On the one hand, the visibility is low for the condition of underground light irradiation is weaker, the roadway is narrow, the track line laying is complex, and the operating environment is severe. On the other hand, miners who do not operate according to safe production specifications exist, and the running locomotive is very easy to collide with mineral resource mining personnel in operation, so that a series of locomotive transportation accidents are caused, and irreparable loss is caused to the country and people.

The track line detection means that a lane area in a video or an image is recognized by using an image processing technology, and a specific position of a track line is displayed. The track line detection is to accurately and quickly find out the position of the track line in the picture through a proper algorithm. In practice, however, underground track detection is easily affected by complex environmental factors such as illumination shadow change, water accumulation coverage and vehicle shielding, so that the analysis speed of the track in various track scenes by the conventional track line identification is low, the accuracy is low, and the result cannot be obtained in real time.

Disclosure of Invention

The invention mainly aims to provide a depth learning track line detection method based on a binarization network, which can effectively solve the problems in the background technology and has the advantages of track line detection in pictures, smaller requirements on time and power consumption in the detection process, namely small memory occupation and high training speed.

In order to achieve the purpose, the invention adopts the technical scheme that:

a deep learning track line detection method based on a binarization network comprises the following steps:

a. in the sample preparation stage, firstly, a plurality of road scene videos are collected through a camera, after the videos are processed into a single frame, a picture is calibrated, trained, corrected, rotationally transformed and post-processed to obtain a picture data set P, and the data set A is divided into a training set X and a testing set Y according to a ratio of 4: 1;

b. a network establishing stage, namely establishing BiSeNet;

c. a network establishing stage, namely establishing the BiSeNet as a binary network;

d. and in the network operation stage, generating a countermeasure network through the binarization condition to perform track line detection.

Preferably, in the step a, the video cropping in the sample preparation stage finally obtains 2500 pictures with a resolution band size of 1280 × 720, the pictures of a part of the highway lanes are selected as a training set, the number of the training set is 2000, and the pictures of another part of different highway lanes are selected as a test set, and the number of the test set is 500.

Preferably, all pictures in the data set need to be marked, the data set is marked uniformly by adopting AutoCAD, the marking colors of people, vehicles and the surrounding environment are different during marking, the environment can be distinguished conveniently, and model parameters can be reduced as soon as possible during training.

Preferably, in the step b, the network creation stage includes the following steps;

b1, a space path module is arranged on the left branch of the BiSeNet network model structure and used for solving the problem of space information loss, the space path module is composed of three convolution layers, each layer comprises a convolution layer with the step length of 2, then batch standardization processing and ReLU nonlinear activation are carried out, the size of an image output through the path is 1/8 of an original image, so that a feature map with a larger size can be utilized, rich space information is reserved, a context path module on the right branch uses a pre-trained lightweight model Xception module as a backbone network, rapid down-sampling operation is carried out to obtain a receptive field, then a global average pooling layer is added at the tail of the Xception module to provide the receptive field, finally the characteristics of the last two stages are fused by means of a partial U-shaped structure, the space path module is used for coding rich space information, the context path module is used for providing sufficient receptive field, the two modules complete different requirements and assist in completing detection tasks together;

b2, designing an attention thinning module by the BiSeNet network, wherein the attention thinning module is used for optimizing the characteristics of each stage, the characteristics of the two paths are different at the level of characteristic representation, so that the output of the two paths cannot be directly combined, and the network consists of a Global pool (Global pool), a convolution and polynomial multiplication layer (conv (1 × 1)), an accelerated neural network training layer (Batch norm) and an activation function layer (sigmoid);

b3, the BiSeNet network simultaneously uses an attention thinning module and a feature fusion module; the feature fusion module fuses the output features of the two paths to carry out final prediction; the network consists of a concatenate layer, a convolution and polynomial multiplication layer (Conv), a bn layer, an activation function layer (relu), a Global pool (Global pool), a convolution and polynomial multiplication layer (Conv (1 x 1)), an accelerated neural network training layer (Batch norm) and an activation function layer (sigmoid);

b4, cross entropy loss function as shown in equation (1):

CE- [ ylog p + (1-y) log (1-p) ] formula (1)

Wherein p represents the probability that the sample label is 1, p belongs to [0,1], and y represents the value of the label;

then equation (1) can be written as:

if p is_tComprises the following steps:

then:

loss＝-log(p_t) Formula (4)

Balancing the imbalance between the orbit and the background, introducing an improved cross-entropy loss as shown in equation (5):

L_loss＝-α(1-p_t)²log(p_t) Formula (5)

Experiments show that when the value of α is 0.35, the improved loss function can enable the optimizer to be more suitable for a network model, and the model performance is better.

Preferably, in the step c, the network creation stage includes the following steps;

c1, creating the BiSeNet as binary BiSeNet, wherein the BiSeNet constraint weight and the activation value are binarized into +1 or-1 as shown in formula (6):

wherein a is_bIs a binary activation value, a_rIs the real value activation amount, w_bIs a binary weight, w_rRepresenting the actual weight;

c2, the constraint BiSeNet has a binarization weight, and the convolution operation can be approximated by equation (7):

the symbol represents the conventional convolution operation, and the weight in the convolution process is binary, so that the symbol represents the conventional convolution operation

The convolution operation involving only addition and subtraction, not multiplication, yields the optimal estimates of E and β by solving the following optimization problem:

solving the deformation of the formula (8) to obtain:

training a binary weight network: training a BiSeNet with a binarization weight, and binarizing the weight during forward propagation and backward propagation; calculating a binary weight according to the formula (9), and then calculating activated forward propagation and gradient backward propagation according to the scaled binary weight; wherein the gradient formula is

Preferably, in the step d, the network operation stage: the method comprises the steps of inputting a picture to be detected into a trained binary Bisenet network, obtaining a corresponding classification result, and realizing detection of the track line, wherein the first layer and the last layer of the binary network keep full-precision weight, and the result analysis shows that the picture generated by the method has very fine and smooth results and very real effect, can generate better results for all scenes, and can still better detect track information which is not marked in the training picture, so that the better robustness of the text algorithm is embodied.

Compared with the prior art, the deep learning track line detection method based on the binarization network has the following beneficial effects:

1. the method is based on the bilateral segmentation network, the track detection problem is regarded as an example segmentation problem by adjusting the structure and parameters of the network, and the track end-to-end real-time detection is carried out by establishing the track detection network based on deep learning by improving the context path module, the attention refining module and the feature fusion module in the BiSeNet network, so that the accuracy of the model can be improved and the detection rate of the model can be accelerated;

2. the end-to-end deep learning algorithm provided by the invention has the advantages that the obtained result is finer and more exquisite, the effect is very real, and the method does not depend on more post-processing technologies;

3. the invention compresses the BiSeNet, so that the use of the deep learning algorithm in the embedded terminal becomes possible, and the development of the application of the deep learning algorithm to equipment such as a mobile terminal is promoted.

Drawings

FIG. 1 is a flowchart of an overall method of a deep learning track line detection method based on a binarization network according to the present invention;

fig. 2 is a structural diagram of a BiSeNet network in the deep learning orbit line detection method based on the binarization network of the present invention;

FIG. 3 is a structural diagram of an attention thinning module in the deep learning track line detection method based on the binarization network according to the present invention;

fig. 4 is a structural diagram of a feature fusion module in the deep learning track line detection method based on the binarization network of the present invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

in the sample preparation stage, video cutting is carried out to finally obtain 2500 pictures with the resolution band size of 1280 x 720, the pictures of a part of road lanes are selected as a training set, the number of the training sets is 2000, the pictures of the other part of different road lanes are taken as a test set, and the number of the test sets is 500;

all pictures in the data set need to be marked, the data set is marked in a unified mode through AutoCAD, the marking colors of figures, vehicles and the surrounding environment are different during marking, the environment can be distinguished conveniently, and model parameters can be reduced as soon as possible during training.

b. A network establishing stage, namely establishing BiSeNet;

the network creation stage comprises the following steps;

b4, cross entropy loss function as shown in equation (1):

CE- [ ylog p + (1-y) log (1-p) ] formula (1)

then equation (1) can be written as:

if p is_tComprises the following steps:

then:

loss＝-log(p_t) Formula (4)

L_loss＝-α(1-p_t)²log(p_t) Formula (5)

the network creation stage comprises the following steps;

solving the deformation of the formula (8) to obtain:

training a binary weight network: training a BiSeNet with a binarization weight, and binarizing the weight during forward propagation and backward propagation; calculating a binary weight according to the formula (9), and then calculating activated forward propagation and gradient backward propagation according to the scaled binary weight; wherein the gradient formula is:

d. in the network operation stage, a countermeasure network is generated through binarization conditions to carry out track line detection;

and (3) network operation stage: the method comprises the steps of inputting a picture to be detected into a trained binary Bisenet network, obtaining a corresponding classification result, and realizing detection of the track line, wherein the first layer and the last layer of the binary network keep full-precision weight, and the result analysis shows that the picture generated by the method has very fine and smooth results and very real effect, can generate better results for all scenes, and can still better detect track information which is not marked in the training picture, so that the better robustness of the text algorithm is embodied.

The invention relates to a depth learning track line detection method based on a binarization network, which is used for detecting lanes in traffic pictures;

firstly, training and testing conditions are respectively established to generate a picture set of the countermeasure network, the proportion of the training set to the testing set is 4:1, in the aspect of data set marking, AutoCAD is adopted for unified marking, the data set is distinguished from the surrounding environment, the marking colors of people, vehicles and the surrounding environment are different during marking, the environment can be conveniently distinguished, model parameters can be reduced as soon as possible during training, and 8000 pictures with the resolution of 1280 x 720 are finally obtained by video cutting;

the BiSeNet network mainly contributes to providing a new method, the acquisition of image space information and the increase of the receptive field of an image are decomposed into two paths, and finally, the image characteristics extracted by the two paths are fused by a characteristic fusion module;

the left branch of the BiSeNet network model structure is a space path module used for solving the problem of space information loss, and the BiSeNet network model structure is composed of three convolutional layers, each layer comprises a convolutional layer with the step length of 2, then batch standardization processing and ReLU nonlinear activation are carried out, the size of an image output through the path is 1/8 of an original image, so that a feature map with a larger size can be utilized, and rich space information is reserved; for a context path module of a right branch, a pre-trained lightweight model Xscene module is used as a backbone network, rapid down-sampling operation is carried out to obtain a receptive field, then a global average pooling layer is added at the tail of the Xscene module to provide the receptive field, and finally the characteristics of the last two stages are fused by means of a partial U-shaped structure; the spatial path module is used for coding abundant spatial information, the context path module is used for providing enough receptive fields, and the two modules complete different requirements and jointly assist in completing detection tasks;

an Attention Refining Module (ARM) is designed in the BiSeNet network; the attention thinning module is used for optimizing the characteristics of each stage; the characteristics of the two paths are different at the level of the representation of the characteristics, so that the outputs of the two paths cannot be directly combined; the network consists of a Global pool (Global pool), a convolution and polynomial multiplication layer (conv (1 × 1)), an accelerated neural network training layer (Batch norm) and an activation function layer (sigmoid);

the BiSeNet network uses an Attention Refining Module (ARM) and a Feature Fusion Module (FFM) at the same time; the feature fusion module fuses the output features of the two paths to carry out final prediction; the network consists of a concatenate layer, a convolution and polynomial multiplication layer (Conv), a bn layer, an activation function layer (relu), a Global pool (Global pool), a convolution and polynomial multiplication layer (Conv (1 x 1)), an accelerated neural network training layer (Batch norm) and an activation function layer (sigmoid);

because the lane lines and the tracks have the same characteristics, a large lane line data set CULane pre-training network model proposed by Pan et al, Chinese university of hong Kong, is used, and a self-built track data set is used for retraining the model so as to learn and adjust parameters;

the cross entropy loss function is shown in equation (1):

CE- [ ylog p + (1-y) log (1-p) ] formula (1)

then equation (1) can be written as:

if p is_tComprises the following steps:

then:

loss＝-log(p_t) Formula (4)

To balance the imbalance between the track and the background, an improved cross-entropy loss is introduced as shown in equation (5):

L_loss＝-α(1-p_t)²log(p_t) Formula (5)

Experiments show that when the value of α is 0.35, the improved loss function can enable the optimizer to be more suitable for a network model, and the model performance is better;

creating the BiSeNet as a binarized BiSeNet, wherein the BiSeNet constraint weights and activation values are binarized to +1 or-1 as shown in equation (6):

c2, to constrain BiSeNet to have binarization weights, the convolution operation can be approximated by equation (7):

solving the deformation of the formula (8) to obtain:

Inputting the picture to be detected into a trained binary Bisenet network, so that a corresponding classification result can be obtained, and the detection of the track line is realized, wherein the first layer and the last layer of the binary network keep full-precision weight; the result analysis shows that the picture generated by the method has very fine and smooth results and very real effect, can generate better results for all the scenes, can still better detect the track information which is not marked in the training picture, and embodies better robustness of the algorithm;

for the binary convolution neural network, the binaryzation of the first layers can cause precision loss, and the binaryzation effects of the second layers are very slight, so that the weight precision is reserved for the first layer and the last layer in the binaryzation process;

in addition, because the vehicle body occupies a certain space when the vehicle passes through the outer side of the track, if the pedestrian only avoids the track and the space between the two tracks, the safety of the pedestrian can not be ensured, and a certain space for the vehicle body to pass through needs to be reserved; therefore, the monocular distance measurement method is used for measuring the safe distance after the track is detected, and the actual coordinates, known quantities, of the corresponding pixel points are calculated according to the similar triangular proportion: the height H of the camera, the distance between a world coordinate point corresponding to the center of an image coordinate and a camera on a y axis, the image coordinate of the center point of a lens, the image coordinate of a measuring pixel point, the length xpix of an actual pixel, the width ypix of the actual pixel, the focal length f of the camera, the calculation of the y axis direction is the same as that of a previous model, and the calculation of the x axis is obtained by calculating the y axis coordinate through proportion, so that the coordinate in the vertical direction can be obtained; the safe distance between two sides of the known track is reversely deduced, the safe distance from the track in the picture is drawn, and if a pedestrian is detected outside the safe distance, a prompt is required to be set to enable the pedestrian to avoid in time so as to avoid the occurrence of underground track traffic accidents;

finally, inputting pictures for testing by using a trained lane line detection technology; the experimental results show that the lane detection algorithm of the invention has better effect, can quickly identify lane lines, is obviously superior to the traditional lane identification method of highways, and contributes to the development of embedded deep learning.

When the method is used, the track detection problem is regarded as an example segmentation problem by adjusting the structure and parameters of the network based on the bilateral segmentation network, and the track detection network based on deep learning is established by improving the context path module, the attention refining module and the feature fusion module in the BiSeNet network to carry out end-to-end real-time detection on the track, so that the accuracy of the model can be improved and the detection rate of the model can be accelerated.

The end-to-end deep learning algorithm provided by the invention has the advantages that the obtained result is finer and more exquisite, the effect is very real, and the method does not depend on more post-processing technologies;

the invention compresses the BiSeNet, so that the use of the deep learning algorithm in the embedded terminal becomes possible, and the development of the application of the deep learning algorithm to equipment such as a mobile terminal is promoted.

Firstly, obtaining a picture data set, then creating a BiSeNet, constructing a Spatial Path with 3 layers of binarization convolution layers, 3 layers of BatchNormal layers and 3 layers of signalling layers, an Attention Reformat Module with a global pole layer, binarization convolution layers, BatchNormal layers and signalling, and a Feature Fusion Module with 2 layers of binarization convolution layers, 2 layers of BatchNormal layers, signalling layers, global pole layers and signalling; and establishing the BiSeNet network as a binarization network, binarizing the convolution layer weight parameters and the activation value, and accelerating the neural network.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A deep learning track line detection method based on a binarization network is characterized in that: the method comprises the following steps:

b. a network establishing stage, namely establishing BiSeNet;

2. The method for detecting the deep learning orbit line based on the binarization network as claimed in claim 1, wherein: in the step a, 2500 pictures with the resolution band size of 1280 × 720 are finally obtained through video cutting in the sample preparation stage, the pictures of a part of the highway lanes are selected as a training set, the number of the training sets is 2000, the pictures of the other part of different highway lanes are selected as a test set, and the number of the test sets is 500.

3. The method for detecting the deep learning orbit line based on the binarization network as claimed in claim 2, wherein: all pictures in the data set need to be marked, the data set is marked uniformly by adopting AutoCAD, and the marking colors of people, vehicles and the surrounding environment are different during marking.

4. The method for detecting the deep learning orbit line based on the binarization network as claimed in claim 1, wherein: in the step b, the network creation stage comprises the following steps;

b1, a space path module is arranged on the left branch of a BiSeNet network model structure, the space path module is composed of three convolution layers, each layer comprises a convolution layer with the step length of 2, then batch standardization processing and ReLU nonlinear activation are carried out, the size of an image output through the path is 1/8 of an original image, the context path module of the right branch uses a pre-trained lightweight model Xconcentration module as a backbone network to carry out rapid down-sampling operation to obtain a receptive field, then a global average pooling layer is added at the tail of the Xconcentration module to provide the receptive field, finally the characteristics of the last two stages are fused by means of a partial U-shaped structure, the space path module is used for coding rich space information, the context path module is used for providing enough receptive fields, and the two modules jointly assist in the completion of detection tasks;

b2, designing an attention refining module by the BiSeNet network, wherein the attention refining module is used for optimizing the characteristics of each stage, and the network consists of a global pool (Globalpool), a convolution and polynomial multiplication layer (conv (1 multiplied by 1)), an accelerated neural network training layer (Batch norm) and an activation function layer (sigmoid);

b4, cross entropy loss function as shown in equation (1):

CE ═ ylogp + (1-y) log (1-p) ] formula (1)

then equation (1) can be written as:

if p is_tComprises the following steps:

then:

loss＝-log(p_t) Formula (4)

L_loss＝-α(1-p_t)²log(p_t) Formula (5)

α is 0.35.

5. The method for detecting the deep learning orbit line based on the binarization network as claimed in claim 1, wherein: in the step c, the network creation stage comprises the following steps;

the symbol denotes a conventional convolution operation,

solving the deformation of the formula (8) to obtain:

6. The method for detecting the deep learning orbit line based on the binarization network as claimed in claim 1, wherein: in step d, the network operation stage: and inputting the picture to be detected into the trained binary Bisenet network to obtain a corresponding classification result, and completing the detection of the track line, wherein the first layer and the last layer of the binary network keep full-precision weight.