CN113591756A

CN113591756A - Lane line detection method based on heterogeneous information interaction convolutional network

Info

Publication number: CN113591756A
Application number: CN202110904312.5A
Authority: CN
Inventors: 周庆; 周晶
Original assignee: Nanjing Aerospace Technology Co ltd
Current assignee: Nanjing Aerospace Technology Co ltd
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-11-02
Anticipated expiration: 2041-08-06
Also published as: CN113591756B

Abstract

The invention discloses a lane line detection method based on heterogeneous information interaction convolutional network; the constructed network branch includes two predicted branches: the method comprises the steps of lane line segmentation (pixel level) and lane line block classification, wherein a reverse characteristic representation space is constructed by utilizing a lane line segmentation prediction result and is used as a complementary space to be cascaded with a picture characteristic level space, so that the extraction capability of network characteristics is improved, and particularly, the lane line is lost due to factors such as shielding; secondly, global and local parts are designed to further improve the feature extraction capability of the network, so that the network pays attention to global context information and local detail information; finally, the invention avoids redundant calculation during network design, improves the reasoning efficiency of the network, and has important application value in the fields of automatic driving and auxiliary driving.

Description

Lane line detection method based on heterogeneous information interaction convolutional network

Technical Field

The invention relates to the technical field of image processing and pattern recognition, in particular to a lane line detection method based on a heterogeneous information interaction convolutional network.

Background

The automatic driving technology realizes the unmanned driving through a computer system, improves the driving experience of users, and is a hot problem in current research of colleges and universities and enterprises; in the technology, the lane line detection is used as one of the basic modules to guide the vehicle to autonomously judge the driving direction, standardize the driving behavior and avoid vehicle collision, so that better driving experience is realized.

Lane line detection faces a significant challenge in practical applications: one is the problem of image quality, such as distortion and blur caused by vehicle shake during driving; illumination changes and shadow problems caused by buildings and trees; visibility reduction caused by fog days, rain days and the like; secondly, the quality problem of the lane line is solved, such as the definition and the different abrasion degrees of the lane line on different road sections; the lane lines are different in width; thirdly, the problems of visual angle change, shielding and the like; the single source characteristics cannot provide rich representation for the network, and further limit the generalization capability of the network; therefore, the invention realizes the robust detection of the lane line by means of the interaction of heterogeneous information.

Disclosure of Invention

The invention realizes the construction of a feature space except attention by utilizing the prediction result of lane line pixel segmentation, and the feature space is used as a complementary space of picture features to be cascaded with the complementary space to form complete feature representation; secondly, by using the learning of global and local characteristics, the network not only focuses on global context information, but also focuses on local detail information; finally, the complexity of network calculation is reduced by utilizing a lightweight design;

in order to achieve the purpose, the invention adopts the following technical scheme:

the lane line detection method based on the heterogeneous information interactive convolutional network comprises the following steps:

step 1: making training data; preprocessing each training picture and the corresponding label thereof, specifically as follows:

let N pictures of the training data and their corresponding labels (wherein the height and width of each picture are 288 and 800 respectively), let the pictures in the training set be { I₁,I₂,...,I_NThe label corresponding to each picture is { l }₁,l₂,...,l_N}；

Step 101: reading a coordinate point of a lane line in a label to obtain a designated color p, drawing a polygon by using an ImageDraw.Draw.polygon () function in a PIL package, and filling the drawn polygon with the designated color by using a filling function in the function;

step 102: assigning a color p to each lane line in sequence according to yellow (p ═ 1), green (p ═ 2), blue (p ═ 3), red (p ═ 4) and purple (p ═ 5), reading each lane line in a picture label, recording the total number of lane lines as C (C is less than or equal to 5), the current lane number as k (k is less than or equal to C), giving the lane line color as p ═ k, sequentially executing the operation of step 101, further obtaining a picture with colors of lane lines, then converting the picture into a gray picture, sequentially converting the gray values of the corresponding colors into 1, 2, 3, 4 and 5, and recording the pixel points without colors as 0, and finally converting the labels of all pictures into { mask } pictures₁,mask₂,...,mask_N}；

Step 103: generating a multi-target segmentation label by using an One-hot coding mode, wherein the dimensionality of the multi-target segmentation label is 288 multiplied by 800 multiplied by (C +1), and finally converting the label into { seg × (C +1)₁,seg₂,...,seg_N}；

Step 104: predefined row anchors _ anchor, such as [121,131,141,150,160,170,180,189,199,209,219,228,238,248,258,267,277,287 ]; dividing the height of the label obtained in the step 102 into 18 parts according to the predefined row anchor, then equally dividing the width direction of the label into 200 parts, and simultaneously generating an 18 multiplied by 201 multiplied by C zero matrix U;

step 105: selecting a block with the largest k-th lane line proportion of the gray label generated in the step 102 in the row where the predefined row anchor is located as a label of the lane line at the position (i, j) of the row by using the predefined row anchor in the step 104, wherein U (i, j, k) is 1; if the mth line anchor position in the gray label generated in step 102 has no lane line, U (m,201, k) is 1;

step 106: label { mask in step 102₁,mask₂,...,mask_NThe operations of step 105 are performed in sequence to generate block labels { block }₁,block₂,...,block_N}；

Step 2: establishing a lane line detection network for heterogeneous information interaction; the concrete model of the network is as follows:

the convolutional layer 1: deconvoluting an image with input of 288 × 800 × 3 using 64 7 × 7 convolution kernels with step size of 2, and obtaining features of 144 × 400 × 64 after normalization (BN) layer and ReLU activation function;

a pooling layer 1: the output features of the convolutional layer 1 are subjected to a maximum pooling layer of 3 × 3 with the step length of 2 to obtain features of 72 × 200 × 64;

and (3) convolutional layer 2: deconvolving the output of the pooling layer 1 with 64 3 × 3 convolution kernels with step size 1, and obtaining a 72 × 200 × 64 feature after normalization (BN) layer and ReLU activation function;

and (3) convolutional layer: deconvoluting the output of the convolutional layer 2 by using 64 convolutional kernels with the step length of 3 × 3, adding the characteristics obtained after normalization (BN) layer to the characteristics output by the pooling layer 1, and obtaining the characteristics of 72 × 200 × 64 after a ReLU activation function;

and (4) convolutional layer: deconvoluting the output of the convolutional layer 3 by using 64 convolutional kernels with the step size of 3 × 3, and obtaining the characteristics of 72 × 200 × 64 after a normalization (BN) layer and a ReLU activation function;

and (5) convolutional layer: deconvolving the output of the convolutional layer 4 by using 64 convolutional kernels with the step length of 3 × 3, adding the characteristics of the output of the convolutional layer 3 to the characteristics obtained after normalization (BN) layer, and obtaining the characteristics of 72 × 200 × 64 after a ReLU activation function;

and (6) a convolutional layer: deconvolving the output of the convolutional layer 5 with 128 convolutional kernels of 3 × 3 with the step size of 2, and obtaining the characteristics of 36 × 100 × 128 after a normalization (BN) layer and a ReLU activation function;

and (3) a convolutional layer 7: deconvoluting the output of the convolutional layer 6 by using 128 convolution kernels of 3 × 3 with the step size of 1, and obtaining a characteristic of 36 × 100 × 128 through the characteristic obtained after normalization (BN) layer;

convolutional layer 7_ 1: deconvolving the output of the convolutional layer 5 by using 128 convolutional kernels with the step length of 1 × 1 of 2, adding the output characteristics of the convolutional layer 7 after passing through a normalization (BN) layer, and obtaining the characteristics of 36 × 100 × 128 after passing through a ReLU activation function;

and (3) convolutional layer 8: deconvolving the output of the convolution layer 7_1 with 128 convolution kernels of 3 × 3 with step size 1, and obtaining the characteristics of 36 × 100 × 128 after normalization (BN) layer and ReLU activation function;

a convolutional layer 9: deconvoluting the output of the convolutional layer 8 by using 128 convolutional kernels with the step length of 3 × 3, performing normalization (BN) layer and adding the characteristics of the convolutional layer 7_1 output, and performing a ReLU activation function to obtain the characteristics of 36 × 100 × 128;

the convolutional layer 10: deconvoluting the output of the convolutional layer 9 by using 256 convolutional kernels with the step size of 3 × 3, and obtaining the characteristics of 18 × 50 × 256 after normalization (BN) layer and ReLU activation function;

the convolutional layer 11: deconvolving the output of the convolution layer 10 with 256 convolution kernels of 3 × 3 with a step size of 1, and obtaining a feature of 18 × 50 × 256 through the feature obtained after normalization (BN) layer;

convolutional layer 11_ 1: deconvolving the output of the convolutional layer 9 by using 256 convolutional kernels with the step length of 1 × 1 of 2, adding the characteristics output by the convolutional layer 11 after passing through a normalization (BN) layer, and obtaining the characteristics of 18 × 50 × 256 after passing through a ReLU activation function;

the convolutional layer 12: deconvoluting the output of the convolutional layer 11_1 by using 256 convolutional kernels with the step size of 3 × 3, and obtaining the characteristics of 18 × 50 × 256 after a normalization (BN) layer and a ReLU activation function;

a convolutional layer 13: deconvoluting the output of the convolutional layer 12 by using 256 convolutional kernels with the step length of 3 × 3, performing normalization (BN) layer and adding the characteristics of the convolutional layer 11_1 output, and performing a ReLU activation function to obtain the characteristics of 18 × 50 × 256;

the convolutional layer 14: deconvoluting the output of the convolutional layer 13 by using 512 convolution kernels with the step length of 3 × 3 and 2, and obtaining the characteristics of 9 × 25 × 512 after a normalization (BN) layer and a ReLU activation function;

a convolution layer 15: deconvolving the output of the convolutional layer 14 with 512 convolutional kernels of 3 × 3 with the step size of 1, and obtaining the characteristics of 9 × 25 × 512 through the characteristics obtained after normalization (BN) layer;

convolutional layer 15_ 1: deconvolving the output of the convolutional layer 13 by using 512 convolutional kernels with the step length of 1 × 1 of 2, adding the characteristics output by the convolutional layer 15 after passing through a normalization (BN) layer, and obtaining the characteristics of 9 × 25 × 512 after passing through a ReLU activation function;

a convolutional layer 16: the output of the convolution layer 15_1 is deconvoluted by using 512 convolution kernels with the step size of 1 and the convolution kernel is deconvoluted by 3 multiplied by 3, and the characteristics of 9 multiplied by 25 multiplied by 512 are obtained after a normalization (BN) layer and a ReLU activation function;

a convolutional layer 17: deconvoluting the output of the convolutional layer 16 by using 512 convolutional kernels with the step length of 3 × 3, performing normalization (BN) layer and adding the characteristics of the output of the convolutional layer 15_1, and performing a ReLU activation function to obtain the characteristics of 9 × 25 × 512;

the convolutional layer 18: deconvolving the output of the convolutional layer 9 with 128 convolutional kernels of 1 × 1 with the step size of 1, and obtaining the characteristics of 36 × 100 × 128 after a normalization (BN) layer and a ReLU activation function;

a convolutional layer 19: deconvolving the output of the convolutional layer 18 with 128 convolutional kernels of 3 × 3 with a step size of 1, and obtaining the characteristics of 36 × 100 × 128 after a normalization (BN) layer and a ReLU activation function;

a convolutional layer 20: deconvolving the output of the convolutional layer 19 with 128 convolutional kernels of 1 × 1 with step size 1, and obtaining the characteristics of 36 × 100 × 128 after normalization (BN) layer and ReLU activation function;

a convolutional layer 21: deconvolving the output of the convolutional layer 20 with 128 convolutional kernels of 3 × 3 with step size 1, and obtaining the characteristics of 36 × 100 × 128 after normalization (BN) layer and ReLU activation function;

convolutional layer 22: deconvolving the output of the convolutional layer 13 with 128 convolutional kernels of 3 × 3 with a step size of 1, and obtaining the characteristics of 18 × 50 × 128 after normalization (BN) layer and ReLU activation function;

a convolutional layer 23: deconvolving the output of the convolutional layer 22 with 128 1 × 1 convolution kernels with step size 1, and obtaining 18 × 50 × 128 features after normalization (BN) layer and ReLU activation function;

convolutional layer 24: deconvolving the output of the convolutional layer 23 with 128 convolutional kernels of 3 × 3 with a step size of 1, and obtaining the characteristics of 18 × 50 × 128 after a normalization (BN) layer and a ReLU activation function;

upper sampling layer 1: the output of the convolutional layer 24 is up-sampled by 2 times by using bilinear interpolation to obtain the characteristics of 36 multiplied by 100 multiplied by 128;

a convolutional layer 25: deconvolving the output of the convolutional layer 17 with 128 convolutional kernels of 3 × 3 with a step size of 1, and obtaining the characteristics of 9 × 25 × 128 after normalization (BN) layer and ReLU activation function;

the convolutional layer 26: deconvolving the output of the convolutional layer 25 with 128 convolutional kernels of 1 × 1 with step size 1, and obtaining the characteristics of 9 × 25 × 128 after normalization (BN) layer and ReLU activation function;

upper sampling layer 2: the output of the convolutional layer 26 is up-sampled by 4 times by using bilinear interpolation to obtain the characteristics of 36 multiplied by 100 multiplied by 128;

cascade layer 1: the output of the upsampling 1, the upsampling 2 and the convolutional layer 9 are cascaded along the channel dimension to obtain the characteristics of 36 multiplied by 100 multiplied by 384;

the convolutional layer 27: deconvolving the output of the cascade layer 1 by using 256 convolution kernels with the step size of 3 × 3 and obtaining the characteristics of 36 × 100 × 256 after a normalization (BN) layer and a ReLU activation function;

convolutional layer 28: deconvolving the output of the convolutional layer 27 with 128 convolutional kernels of 3 × 3 with a step size of 1, and obtaining the characteristics of 36 × 100 × 128 after normalization (BN) layer and ReLU activation function;

convolutional layer 29: deconvoluting the output of the convolutional layer 28 using 128 convolutional kernels of 3 × 3 with step size 1, and obtaining features of 36 × 100 × 128 after normalization (BN) layer and ReLU activation function;

the convolutional layer 30: deconvoluting the output of the convolutional layer 29 by using a 1 × 1 convolution kernel with C +1 step size of 1, and obtaining the characteristics of 36 × 100 × (C +1) after passing through a Sigmoid activation function;

convolutional layer 30_ 1: up-sampling the output of convolutional layer 30 by 8 times and waiting for 288 × 800 × (C +1) feature;

convolutional layer 30_ 2: adding the opposite number of the output of the convolutional layer 30 to the tensor whose dimension is the same as that of the convolutional layer and whose element is 1 to obtain the characteristic of 36 × 100 × (C + 1);

the convolutional layer 31: deconvoluting the output of the convolutional layer 30_2 by using 256 convolutional kernels with the step size of 3 × 3, and obtaining the characteristics of 18 × 50 × 256 after a normalization (BN) layer and a ReLU activation function;

the convolutional layer 32: deconvoluting the output of the convolutional layer 31 with 256 convolutional kernels of 3 × 3 with the step size of 1, and obtaining the characteristics of 18 × 50 × 256 after normalization (BN) layer and ReLU activation function;

the convolutional layer 33: deconvolving the output of the convolutional layer 32 with 256 1 × 1 convolutional kernels with step size 1, and obtaining 18 × 50 × 256 characteristics through a normalization (BN) layer and a ReLU activation function;

upper sampling layer 3: the output of the convolutional layer 17 is up-sampled by 2 times by using bilinear interpolation to obtain the characteristics of 18 multiplied by 50 multiplied by 512;

cascade layer 2: the convolution layer 33 and the output of the up-sampling layer 3 are cascaded along the channel dimension to obtain the characteristics of 18 multiplied by 50 multiplied by 768;

the convolutional layer 34: deconvolving the output of the cascade layer 2 by using 256 convolution kernels with the step size of 3 × 3 and 1, and obtaining the characteristics of 18 × 50 × 256 after a normalization (BN) layer and a ReLU activation function;

the convolutional layer 35: deconvolving the output of the convolutional layer 34 with 64 3 × 3 convolutional kernels with step length of 1, performing normalization (BN) layer and ReLU activation function, and then performing bilinear interpolation up-sampling to [18,201] to obtain 18 × 201 × 64 characteristics;

the convolutional layer 36: deconvolving the output of the convolution layer 35 with 16 convolution kernels of 3 × 3 with step size 1, and obtaining the features of 18 × 201 × 16 after normalization (BN) layer and ReLU activation function;

convolutional layer 37: deconvolving the output of the cascade layer 2 by using 128 3 × 3 convolution kernels with step size 1, and obtaining the characteristics of 18 × 50 × 128 after a normalization (BN) layer and a ReLU activation function;

convolutional layer 38: the output of the convolution level convolution layer 37 is deconvolved by using 2 convolution kernels with the step length of 1 and the dimension of the output characteristic is reconstructed to obtain the characteristic of 1 multiplied by 1800 after the ReLU activation function;

the convolutional layer 39: deconvolving the output of the level convolution layer 38 by using 57888 convolution kernels with 1 × 1 step length and reconstructing the dimensionality of the output characteristics after a ReLU activation function to obtain the characteristics of 18 × 201 × 16;

cascade layer 3: the outputs of convolutional layer 36 and convolutional layer 39 are cascaded along the channel dimension to obtain 18 × 201 × 32 features;

the convolutional layer 39: deconvolving the output of the cascade layer 3 by using C1 × 1 convolution kernels with the step length of 1, and obtaining the characteristic of 18 × 201 × C after a Softmax activation function;

and step 3: training the network model established in the step 2 by using the training data in the step 1, performing parameter learning on the model through an SGD (generalized minimum deviation) optimization strategy, and storing a final training model, wherein the method specifically comprises the following steps:

step 301: the network designed by the invention trains and learns the network parameters in a multitask mode, and the initial learning rate of the network is set as gamma;

step 302: the output of the convolutional layer 30_1 in the step 2 is recorded as Pre _ seg, the output of the convolutional layer 39 is recorded as Pre _ block, parameters in the network are learned based on the label given in the step 1, and the loss function is

Wherein λ₁Is radix Ginseng;

step 303: after the network is trained in step 301 and step 302, the network parameters are saved

And 4, step 4: testing a deep network model; testing the input picture based on the parameters saved in the step 303, which specifically comprises the following steps:

step 401: initializing parameters: noting the size of the input picture as

Wherein

And

respectively the height and width of the picture; a wide variation of

Generating a matrix with the step length of 1 and the range of 1 to 200, and reconstructing the output dimension into a matrix of 200 multiplied by 1, and recording the matrix as Idx;

step 402: reconstructing the output dimension of the convolution layer 39 in the step 2 into 201 multiplied by 18 multiplied by C, then taking the first 200 lines of the reconstructed layer as Cut _ block, multiplying the Cut _ block by Idx, summing the sum by lines and recording the sum as Loc;

step 403: according to the row, the index of the maximum output number of the convolution layer 39 in the step 2 is recorded as Maxind, and the number of positions where Maxind is equal to 200 in Loc is assigned as 0;

step 404: sequentially traversing the elements in the Loc matrix, and if the sum of non-zero elements in a certain column of the Loc is greater than 2, calculating the position of the k-th lane line and the ith preset row anchor in the original image according to the following formula:

where int (·) denotes rounding.

Compared with the prior art, the invention has the beneficial effects that:

1. the method utilizes multi-scale features to realize pixel-level lane line segmentation, learns a group of prior features by means of reverse output of segmentation results, enables a network to discover heterogeneous complementary features, combines the heterogeneous complementary features with the features of an image, and constructs complete feature representation;

2. global and local feature learning modules are constructed, so that the network can sense global context information and can not ignore local detail information

3. When the network is designed, the 3 multiplied by 3 convolution and the 1 multiplied by 1 convolution are stacked, so that the complexity of network calculation is reduced, and the expression capability of the network is enhanced; meanwhile, the network models all use convolution operation, and the complex calculation consumption of a linear layer is avoided.

Drawings

FIG. 1 is a framework diagram of a deep network model according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments;

example 1: referring to fig. 1, the lane line detection method based on the heterogeneous information interaction convolutional network includes the following steps:

step 106: label { mask in step 102₁,mask₂,...,mask_NExecuting the step 1 in sequence05 of generating Block tag { Block }₁,block₂,...,block_N}；

Wherein λ₁Is radix Ginseng;

step 401: initializing parameters: noting the size of the input picture as

Wherein

And

respectively the height and width of the picture; a wide variation of

where int (·) denotes rounding.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. The lane line detection method based on the heterogeneous information interactive convolutional network is characterized by comprising the following steps of:

step 1: making training data;

step 2: constructing a lane line detection network based on heterogeneous information interaction;

and step 3: training the network model established in the step 2 by using the training data in the step 1, performing parameter learning on the model by using an SGD (generalized serving gateway) optimization strategy, and storing a final training model;

and 4, step 4: and (4) testing the final network model in the step (3).

2. The method of claim 1, wherein the method comprises the steps of,

step 105: selecting a block with the largest k-th lane line in the row where the predefined row anchor is located in the gray label generated in step 102 as a label of the lane line at the position (i, j), that is, U (i, j, k) is 1, by using the predefined row anchor in step 104, and if there is no lane line at the position of the mth row anchor in the gray label generated in step 102, U (m,201, k) is 1;

step 106: label { mask in step 102₁,mask₂,...,mask_NThe operations of step 105 are performed in sequence to generate block labels { block }₁,block₂,...,block_N}。

3. The method of claim 1, wherein the method comprises the steps of,

the convolutional layer 39: the output of the cascade layer 3 is deconvolved using C1 × 1 convolution kernels with step size 1, and after the Softmax activation function, the 18 × 201 × C characteristic is obtained.

4. The method of claim 1, wherein the method comprises the steps of,

Wherein λ₁Is radix Ginseng;

step 303: after the network is trained in step 301 and step 302, the network parameters are saved.

5. The method of claim 1, wherein the method comprises the steps of,

step 401: initializing parameters: noting the size of the input picture as

Wherein

And

respectively the height and width of the picture; a wide variation of

where int (·) denotes rounding.