CN113158790A

CN113158790A - Processing edge lane line detection system based on geometric context coding network model

Info

Publication number: CN113158790A
Application number: CN202110275555.7A
Authority: CN
Inventors: 李跃; 于丽娜; 魏云素; 李招康; 刘行言; 赵振东
Original assignee: Hebei College of Industry and Technology
Current assignee: Hebei College of Industry and Technology
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2021-07-23

Abstract

The invention relates to a processing edge lane line detection system based on a geometric context coding network model, which comprises a backbone network, a frame regression network and a target detection network, wherein the backbone network is in signal communication with the frame regression network and the target detection network. The backbone network extracts the shared features through 6 convolutional layers, each of which consists of two parts, convolution operation and ReLU. The frame regression network is used for processing lane line positioning, and the target detection network is used for judging lane lines. The detection system utilizes the geometric context coding network model to carry out geometric context coding on the frame regression network and the highest hidden layer of the target detection network. The invention longitudinally divides the highest hidden layer, connects each column obtained after division according to a specific direction, and updates by using convolution operation and ReLU, so that the lane line characteristics in the same column can transmit information between the columns according to the specific direction, thereby extracting the geometric context characteristics of the lane line pixels.

Description

Processing edge lane line detection system based on geometric context coding network model

Technical Field

The invention belongs to the technical field of computer application, relates to computer vision of an automatic driving intelligent automobile, and particularly relates to a system for detecting a processing edge lane line based on a geometric context coding network model.

Background

The lane line detection is one of important components of a vision system of the unmanned vehicle, and mainly functions to provide a vision premise for an unmanned control system by detecting the position of a lane line and combining sensor equipment such as radar and the like so as to ensure that the unmanned vehicle can safely run in a current lane.

The lane line detection algorithm based on the traditional method focuses on the design of a lane line feature extraction method. Manually designed feature extraction methods are difficult to meet various practical situations, so most algorithms only consider limited application environments. The lane line detection based on the deep learning also comprises two parts of lane line detection and lane line fitting, and compared with the traditional method, the deep learning method has better robustness and generalization capability by designing a specific network model and training on a labeled data set. In recent years, deep learning has been rapidly developed in the field of computer vision, and particularly, significant achievements have been achieved in computer vision tasks such as target detection, target recognition, target segmentation and the like. Lane line detection can then be viewed as a combination of object detection and object recognition. The feature extraction capability of the convolutional neural network is superior to that of a traditional method, and the classification of the targets can be obtained, which is also a place difficult to achieve by the traditional method.

The current lane line detection method based on deep learning still faces some problems. The existing method usually adopts a series of convolution operations to extract the geometric features of the lane lines, and although the convolutional neural network has a strong capability of automatically learning features, the existing detection model architecture based on the CNN does not fully explore the geometric context correlation information on the image. Such context information is important for learning objects with strong a-priori shapes, such as thin, long and continuous objects like lane lines. It should be noted that the lane lines at the two sides of the road scene are more susceptible to illumination changes, image distortion, and the like, resulting in blurred lane lines. Meanwhile, the distance between the lane lines is closer and difficult to distinguish. Due to the factors, the existing network model is difficult to accurately detect the automobile, even the conditions of detection leakage and detection error occur, and then great hidden danger is caused to safe driving of the automatic driving automobile.

Disclosure of Invention

The invention aims to provide a system for detecting a processing edge lane line based on a geometric context coding network model, which introduces geometric priori knowledge of a lane line into the system for detecting the lane line through the geometric context coding network model, applies the principle of the traditional layer-by-layer convolution operation to the line-by-line convolution by using convolution operation and nonlinear activation, realizes that the geometric context information of the lane line in the same line can be transmitted and coded in a specific direction between lines, and improves the detection precision of the lane line positioned at the edge of a road scene.

The technical scheme of the invention is as follows: the system for detecting the processing edge lane line based on the geometric context coding network model comprises a backbone network, a frame regression network and a target detection network, wherein the backbone network is in signal communication with the frame regression network and the target detection network. The backbone network extracts the shared features through 6 convolutional layers, each of which consists of two parts, convolution operation and ReLU. The frame regression network is used for processing lane line positioning, and the target detection network is used for judging lane lines. The detection system utilizes a geometric context coding network model to carry out geometric context coding on the highest hidden layer of the frame regression network and the target detection network, each column of the feature graph is regarded as a new convolution layer, and convolution operation and ReLU are used, so that the lane line features in the same column can carry out information transmission between the columns according to a specific direction, the geometric context features are extracted, and the network can better capture the features of geometric continuity of the lane lines.

The trunk network comprises a number 1 convolution layer, a number 2 convolution layer, a number 3 convolution layer, a number 4 convolution layer, a number 5 convolution layer and a number 6 convolution layer, wherein the number 1 convolution layer, the number 2 convolution layer, the number 3 convolution layer, the number 4 convolution layer, the number 5 convolution layer and the number 6 convolution layer are sequentially in signal communication, and the number 6 convolution layer is in signal communication with the frame regression network and the target detection network. The frame regression network and the target detection network are respectively provided with a number 7 convolution layer, a number 8 convolution layer, a geometric context coding network model and a Tiling layer. In the frame regression network, the No. 7 convolution layer, the No. 8 convolution layer, the geometric context coding network model and the Tiling layer are sequentially in signal communication with the frame regression network for output. In the target detection network, a No. 7 convolution layer, a No. 8 convolution layer, a geometric context coding network model and a Tiling layer are sequentially in signal communication with a target detection output.

The geometric context coding network model comprises a three-dimensional feature tensor module before updating, a longitudinal segmentation module, a first-stage updating module, a second-stage updating module and a three-dimensional feature tensor module after obtaining updating. The working process of the geometric context coding network model is as follows:

(1) the size of the three-dimensional feature tensor module is C × H × W, where C, H and W represent the channel, row number, and column number, respectively.

(2) And the longitudinal segmentation module C longitudinally segments the three-dimensional feature tensor module according to the column number W of the three-dimensional feature tensor module to obtain W column sheets, and each column sheet after segmentation is regarded as a new convolution layer.

(3) The first stage updating module, which performs the geometric context transfer from right to left, sends the first column slice to the convolution kernel with size of C × kw × 1 for convolution operation and ReLU (where C is the number of channels and kw (kernel width) is the width of the convolution kernel). And performing summation operation on the result obtained after the convolution operation and the ReLU and the corresponding element in the second column slice to update the second column slice, wherein the second column slice obtains the characteristic information from the first column slice. And then sending the updated second column slice to the next convolution kernel, and repeating the process until the W column slice is updated and the transfer of the geometric context information in the right-left direction is finished. The W-th row slice comprises the information of the previous W-1 row slices which is transmitted layer by layer from the right to the left.

(4) And the second-stage updating module is used for taking the W-th column slice of the first-stage updating module as a starting position, performing convolution operation and ReLU from left to right, sending the W-th column slice to a convolution kernel with the size of C multiplied by kw multiplied by 1 for convolution operation and ReLU, performing summation operation on the result obtained after the convolution operation and the ReLU and the corresponding element in the W-1-th column slice to update the W-1-th slice, and repeating the process until the updating of the first column slice is terminated.

(5) And (3) obtaining an updated three-dimensional feature tensor module, inputting all the W slices updated in the steps (3) and (4) into a concat layer (the concat layer is the updated three-dimensional feature tensor), splicing in the direction with the dimension as the width to obtain an updated three-dimensional feature tensor, and finishing all the operations of the geometric context coding network model. After two-stage updating operation from right to left and from left to right, any Nth column slice contains information transmitted from left to right by the first N-1 column slices and also contains information transmitted from right to left by the W-N slices.

In the process of spatial geometric context information transfer, a specified three-dimensional tensor K is assumed, and an information transfer formula is as follows:

wherein: k (i, j, K) is the weight between the elements of channel i in the last slice and the elements of channel j in the current slice, with an offset of K columns between these two elements. X (i, j, k) is similarly recorded as an element of tensor X, where i, j, k refer to channel, row, and column, respectively. f is a non-linear activation function ReLU. X' represents the updated value and all slices share a set of convolution kernels.

The convolution operation means that a convolution kernel slides on the feature diagram according to the step length of 1, meanwhile, the convolution kernel array and the array of the current covered area are multiplied according to elements and summed, and the elements at the corresponding positions in the output array are calculated to obtain the values of the elements on the new feature diagram. The formula of ReLU (x) max (0, x), i.e. the y value of 0 is mapped to x value smaller than zero, and everything of x value greater than zero is mapped to itself. ReLU has lower time and space complexity, and can effectively avoid the problem of gradient disappearance.

The Tiling layer (i.e., the flat layer) gives more weight to "salient" regions and less weight to non-salient regions by copying the contents of the array along a specified dimension. Specifically, when a three-dimensional feature tensor with a size of C × H × W is input and tile _ dim (Tiling dimension) is i, a size of (C/i) will be obtained after passing through a Tiling layer²) X (H × i) × (W × i) three-dimensional feature tensor.

The invention relates to a processing edge lane line detection system based on a geometric context coding network model, which utilizes a processing edge lane line detection system consisting of a main network, a frame regression network and a target detection network to detect a lane line at the edge position of a road, adding a geometric context coding network model at the position of the highest hidden layer of two branch networks of target detection and frame regression, the priori knowledge is introduced into the lane line detection model through the geometric context coding network model, the convolution operation and the nonlinear activation are used, the principle of the traditional layer-by-layer convolution operation is applied to the line-by-line convolution, the lane line geometric information in the same line can be transmitted and coded in a specific direction between lines, the detection system can better extract and utilize the geometric priori knowledge of the lane lines, and the detection precision of the lane lines at the edge of a road scene is improved.

Drawings

FIG. 1 is a schematic diagram of a system for processing edge lane line detection based on a geometric context coding network model according to the present invention;

FIG. 2 is a schematic diagram of a geometry context encoding process;

wherein: the method comprises the following steps of 1-convolution operation, 2-RELU, 3-backbone network, 4-7 convolutional layers, 5-target detection network, 6-8 convolutional layers, 7-geometric context coding network model, 8-Tiling layers, 9-1 convolutional layers, 10-2 convolutional layers, 11-3 convolutional layers, 12-4 convolutional layers, 13-5 convolutional layers, 14-6 convolutional layers, 15-frame regression network, 16-three-dimensional feature tensor module before updating, 17-longitudinal segmentation module, 18-first-stage updating module, 19-second-stage updating module, 20-updated three-dimensional feature tensor module, 21-convolution kernel, 22-summation operation, 23-column slice, 24-target detection output and 25-frame regression output.

Detailed Description

The present invention will be described in detail with reference to the following examples and drawings. The scope of protection of the invention is not limited to the embodiments, and any modification made by those skilled in the art within the scope defined by the claims also falls within the scope of protection of the invention.

The invention relates to a processing edge lane line detection system based on a geometric context coding network model, which comprises a backbone network 3, a frame regression network 15 and a target detection network 5, as shown in figure 1. The backbone network 3 includes a number 1 convolutional layer 9, a number 2 convolutional layer 10, a number 3 convolutional layer 11, a number 4 convolutional layer 12, a number 5 convolutional layer 13, and a number 6 convolutional layer 14. The frame regression network and the target detection network are respectively provided with a number 7 convolution layer 4, a number 8 convolution layer 6, a geometric context coding network model 7 and a Tiling layer 8. No. 1 convolutional layer, No. 2 convolutional layer, No. 3 convolutional layer, No. 4 convolutional layer, No. 5 convolutional layer and No. 6 convolutional layer are sequentially in signal communication, and No. 6 convolutional layer is in signal communication with No. 7 convolutional layers of the frame regression network and the target detection network. In the frame regression network, the No. 7 convolution layer, the No. 8 convolution layer, the geometric context coding network model and the Tiling layer 8 are sequentially in signal communication with the frame regression output 25. In the target detection network, the number 7 convolutional layer, the number 8 convolutional layer, the geometric context coding network model and the Tiling layer are sequentially in signal communication to a target detection output 24.

The backbone network extracts the shared features through 6 convolutional layers, each of which is composed of two parts, convolution operation 1 and ReLU 2. The frame regression network is used for processing lane line positioning, points on the grid are regressed to the nearest grid unit by using a substitution regression method of grid level masks, and adjacent grids are regressed to form a target. The target detection network is used to determine lane lines, and uses a 4 × 4 window (a window having a width of 4 pixels × a height of 4 pixels, with pixels as a basic unit) as a central region of a target, slides the entire picture using a "sliding window", and regresses a grid having a size of 4 × 4, and the result of the target detection output 24 indicates a probability that a certain 4 × 4 region in an input image includes a lane line. The detection system utilizes a geometric context coding network model 7 to carry out geometric context coding on the highest hidden layer of the frame regression network and the target detection network, each row of the feature graph is regarded as a new convolution layer, the feature of the lane line in the same row can carry out information transfer between the rows according to a specific direction by using convolution operation 1 and ReLU2, the geometric context feature of the lane line is extracted, and the network can better capture the geometric prior feature of the lane line.

As shown in fig. 2, the geometric context coding network model includes a pre-update three-dimensional feature tensor module 16, a vertical segmentation module 17, a first-stage update module 18, a second-stage update module 19, and a post-update three-dimensional feature tensor module 20. The working process of the geometric context coding network model is as follows:

(1) the three-dimensional feature tensor module 16 has dimensions C H W, where C, H and W represent the channel, row and column numbers, respectively.

(2) And the longitudinal segmentation module 17 is used for longitudinally segmenting the three-dimensional feature tensor module according to the row number W of the three-dimensional feature tensor module to obtain W row sheets 23, and each row sheet after segmentation is regarded as a new convolution layer.

(3) The first stage update module 18, performing the geometric context transfer from right to left, sends the first column slice 23 to the convolution kernel 21 with size C × kw × 1 for convolution operation and ReLU, where C is the number of channels and kw (kernel width) is the width of the convolution kernel. The result of the convolution operation and the ReLU is summed with the corresponding element in the second column slice 22 to update the second column slice and the second column slice with the characteristic information from the first column slice. And then sending the updated second column slice to the next convolution kernel, and repeating the process until the W column slice is updated and the spatial geometric context information transfer from the right to the left direction is finished. At this time, the W-th column slice contains the information of the previous W-1 column slices, which is transmitted layer by layer from right to left.

(4) The second stage update module 19, taking the W-th column slice of the first stage update module as the starting position, performs convolution operation and ReLU from left to right, similar to the connection and update process of the first stage, sends the W-th column slice to a convolution kernel 21 with the size of C × kw × 1 for convolution operation and ReLU, performs a summation operation 22 on the result obtained after the convolution operation and the ReLU and the corresponding element in the W-1 th column slice, thereby updating the W-1 th slice, and repeats the above process until the update of the first column slice 23 is terminated.

(5) And (3) obtaining an updated three-dimensional feature tensor module 20, inputting all the W slices updated in the steps (3) and (4) into a concat layer (the concat layer is the updated three-dimensional feature tensor), and splicing in the direction with the dimension as the width to obtain an updated three-dimensional tensor, thereby completing all the operations of the geometric context coding network model. After two-stage updating operation from right to left and from left to right, any Nth column slice contains information transmitted from left to right by the first N-1 column slices and also contains information transmitted from right to left by the W-N slices.

Convolution operation 1 means that a convolution kernel slides on the feature map according to the step length of 1, and meanwhile, the convolution kernel array and the array of the currently covered area are multiplied according to elements and summed, so that the elements at the corresponding positions in the output array are calculated, and the values of the elements on the new feature map are obtained. The formula of ReLU (x) max (0, x), i.e. the y value of 0 is mapped to x value smaller than zero, and everything of x value greater than zero is mapped to itself. ReLU has lower time and space complexity, and can effectively avoid the problem of gradient disappearance.

The detection system for processing the edge lane line adopts an OverFeat single-step target detection framework, so that the network model has economic effectiveness in processing joint training learning of positioning and detection tasks. In order to better deal with the problem of poor detection accuracy caused by the fuzzy lane lines on two sides, the three-dimensional feature tensor is longitudinally segmented by using a geometric context coding network model, the connection form from the convolutional layer to the convolutional layer in the traditional convolutional network is used for reference, each segmented column of sheets is regarded as a new convolutional layer and is transversely connected, so that abundant geometric context features are extracted, the network is endowed with the feature of capturing the geometric continuity of the lane lines better, the utilization of the lane line pixel context information by the network is improved, and the detection accuracy of the lane lines at the edges of the road scene is improved.

The bounding box regression task refers to matching a single box to a particular object, but a slim object such as a lane line cannot be represented by a single box. The invention uses the alternative regression method of the grid level mask, the points on the grid regress to the nearest grid unit, and the adjacent grid regresses to a target. The lane lines have the prior knowledge of thin and long shapes, and a single lane line has certain continuity in a geometric space. Considering that the difference of road scenes in different areas is large, when a plurality of (greater than or equal to 4) lane lines exist in the scene, the lane lines at the edge position are more easily blurred, so that the existing network model is difficult to accurately detect the lane lines at the edge of the road. The lane line detection task of the invention is cooperatively completed by two branch tasks of frame regression and target detection, simultaneously, because the highest hidden layer of the network model contains rich semantic feature information, geometric context coding is carried out on the highest hidden layer of the frame regression and the target detection, each row of the feature graph is regarded as a new convolution layer, the convolution operation and the ReLU are used, the principle of the layer-by-layer convolution operation is applied to the row-by-row convolution, and the lane line feature in the same row can carry out information transmission between the rows according to a specific direction.

Claims

1. A processing edge lane line detection system based on a geometric context coding network model is characterized in that: the detection system comprises a backbone network (3), a frame regression network (15) and a target detection network (5), wherein the backbone network is in signal communication with the frame regression network and the target detection network; the backbone network extracts shared features through 6 convolutional layers, each convolutional layer is composed of two parts of convolution operation (1) and ReLU (2); the frame regression network is used for processing lane line positioning, and the target detection network is used for judging lane lines; the detection system utilizes a geometric context coding network model (7) to carry out geometric context coding on the highest hidden layer of a frame regression network (15) and a target detection network (5), each column piece of a three-dimensional feature map is regarded as a new convolution layer, and convolution operation (1) and ReLU (2) are used, so that lane line features in the same column can carry out information transmission between the columns according to a specific direction, the geometric context features of lane line pixels are extracted, and the network can better capture the priori knowledge of the geometric continuity of the lane lines.

2. The system for processing edge lane line based on geometric context coding network model according to claim 1, wherein: the main network (3) comprises a number 1 convolution layer (9), a number 2 convolution layer (10), a number 3 convolution layer (11), a number 4 convolution layer (12), a number 5 convolution layer (13) and a number 6 convolution layer (14), wherein the number 1 convolution layer, the number 2 convolution layer, the number 3 convolution layer, the number 4 convolution layer, the number 5 convolution layer and the number 6 convolution layer are sequentially in signal communication, and the number 6 convolution layer is in signal communication with the frame regression network (15) and the target detection network (5).

3. The system for processing edge lane line based on geometric context coding network model according to claim 1, wherein: the frame regression network (15) and the target detection network (5) are respectively provided with a number 7 convolution layer (4), a number 8 convolution layer (6), a geometric context coding network model (7) and a Tiling layer (8); in the frame regression network, a No. 7 convolution layer, a No. 8 convolution layer, a geometric context coding network model and a Tiling layer 8 are sequentially in signal communication with a frame regression output (25); in the target detection network, a No. 7 convolution layer, a No. 8 convolution layer, a geometric context coding network model and a Tiling layer are sequentially in signal communication with a target detection output (24).

4. The system for processing edge lane line based on geometric context coding network model according to claim 1, wherein: the geometric context coding network model comprises a three-dimensional feature tensor module before updating (16), a longitudinal segmentation module (17), a first-stage updating module (18), a second-stage updating module (19) and an updated three-dimensional feature tensor obtaining module (20); the working process of the geometric context coding network model is as follows:

(1) the three-dimensional feature tensor module (16) has a size of C x H x W, wherein C, H and W represent the channel, row number and column number, respectively;

(2) a longitudinal segmentation module (17) C longitudinally segments the three-dimensional feature tensor module according to the row number W of the three-dimensional feature tensor module to obtain W row sheets (23), and each segmented row sheet is regarded as a new convolution layer;

(3) a first stage update module (18) for performing a right-to-left geometric context transfer, sending a first column slice (23) to a convolution kernel (21) of size C × kw × 1 for convolution operations (1) and ReLU (2), where C is the number of channels and kw is the width of the convolution kernel; performing a summation operation (22) of the result of the convolution operation and the ReLU and the corresponding element in the second column slice to update the second column slice, wherein the second column slice obtains the characteristic information from the first column slice; then sending the updated second column slice to the next convolution kernel, repeating the process until the W column slice is updated and the transmission of the space geometric context information from the right to the left direction is finished;

(4) a second stage updating module (19) which takes the W-th column slice of the first stage updating module as a starting position, performs convolution operation and ReLU from left to right, sends the W-th column slice to a convolution kernel (21) with the size of C multiplied by kw multiplied by 1 for convolution operation and ReLU, performs summation operation (22) on the result obtained after the convolution operation and the ReLU and the corresponding element in the W-1-th column slice so as to update the W-1-th slice, and repeats the process until the updating of the first column slice (23) is terminated;

(5) and (3) obtaining an updated three-dimensional feature tensor module (20), inputting all W column slices updated in the steps (3) and (4) to a concat layer, and splicing in the direction with the dimension as the width to obtain an updated three-dimensional feature tensor so as to finish all operations of the geometric context coding network model.

5. The system for processing edge lane line based on geometric context coding network model of claim 4, wherein: in the geometric context information transfer process, a specified three-dimensional tensor K is assumed, and an information transfer formula is as follows:

wherein: k (i, j, K) is the weight between the element of channel i in the last slice and the element of channel j in the current slice, with an offset of K columns between these two elements; x (i, j, k) is similarly recorded as an element of tensor X, where i, j, k refer to channel, row, and column, respectively; f is a non-linear activation function ReLU; x' represents the updated value and all slices share a set of convolution kernels.

6. The system for processing edge lane line based on geometric context coding network model of claim 4, wherein: the convolution operation (1) means that a convolution kernel slides on the feature diagram according to the step length of 1, meanwhile, the convolution kernel array and the array of the current covered area are multiplied according to elements and summed, and the elements of the corresponding positions in the output array are calculated to obtain the values of the elements on the new feature diagram.

7. The system for processing edge lane line based on geometric context coding network model of claim 4, wherein: the ReLU (2) is an activation function in deep learning and is used for carrying out nonlinear transformation on a feature map, and the formula is ReLU (x) max (0, x), namely y values of which x values are smaller than zero are mapped into 0, and everything of which x values are larger than zero is mapped into the ReLU; ReLU has lower time and space complexity, and can effectively avoid the problem of gradient disappearance.