CN116580369B

CN116580369B - Lane end-to-end real-time detection method for automatic driving

Info

Publication number: CN116580369B
Application number: CN202310397021.0A
Authority: CN
Inventors: 李振峰; 陈志远; 魏哲; 徐宁仪
Original assignee: Beijing Huixi Intelligent Technology Co ltd
Current assignee: Beijing Huixi Intelligent Technology Co ltd
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-12-26
Anticipated expiration: 2043-04-14
Also published as: CN116580369A

Abstract

The invention discloses a lane line end-to-end real-time detection method for automatic driving, which is characterized in that 3d point cloud data of all lane line scenes in a lane line database are subjected to data processing to obtain position truth values, classification truth values and attribute truth values of lane lines; extracting features of the current frame image acquired by the multi-path cameras and 3d point cloud data corresponding to the current frame moment by utilizing the detection network model, acquiring overall aerial view feature images under different camera angles, carrying out weighted fusion on the overall aerial view feature images and the aerial view feature images of the previous frame to acquire the aerial view feature images of the current frame, further acquiring a position predicted value, a classification predicted value and an attribute predicted value of a lane line, carrying out matching calculation on the position predicted value and the position true value by utilizing a Hungary matching algorithm, carrying out loss calculation according to the acquired lane line matching pair, and adjusting parameters of the detection network model to enter the next iteration until the detection network model training is completed; and (3) utilizing the trained detection network model to finish the real-time detection of the lane line in the running process of the vehicle.

Description

Lane end-to-end real-time detection method for automatic driving

Technical Field

The invention relates to the technical field of automatic driving, in particular to a lane end-to-end real-time detection method for automatic driving.

Background

The lane lines are important references for vehicle running, and for automatic driving or advanced auxiliary driving vehicles, the lane lines are timely acquired and identified, so that safety guidance can be provided for vehicle path planning and driving. During running of the vehicle, road information is acquired through the vehicle-mounted camera, and the lane marking can be acquired through image processing by the computer.

The complex flow design of the algorithm architecture in the prior art is lengthy, for example, fan Yan et al published ONCE-3DLanes:Building Monocular 3D Lane Detection algorithm, tu Zheng et al published CLRNet: cross Layer Refinement Network for Lane Detection algorithm and the like, which result in low operation efficiency, difficulty in processing intelligent driving scenes with high real-time requirements, and meanwhile, because the algorithm framework has strong dependence on assumptions, such as CondLananenet: a Top-to-down LaneDetection Framework Based on Conditional Convolution algorithm and the like, generally only can process specific line types of specific scenes, such as simple straight line or fewer lane scenes, difficult driving scenes, such as special line types, illumination and the like, complex curves, branch roads and intersection turning scenes, and cannot be used in a landing high-level auxiliary driving system.

In addition, the generalization performance of the algorithm is not robust enough, the post-processing is usually performed by using a hard-coding-based scenery rule, and the rule scene beyond the rule specified by the post-processing cannot be effectively processed, and even a large number of wrong lane lines are output, so that the driving safety is seriously influenced.

Disclosure of Invention

The invention provides a lane line end-to-end real-time detection method for automatic driving, which solves the technical problems that the existing algorithm strongly depends on a fixed single modeling assumption and lane lines cannot be modeled and logged in complex curves, branches and intersection turning scenes.

The invention can be realized by the following technical scheme:

a lane end-to-end real-time detection method for automatic driving comprises the following steps of

Construction of detection network model and lane line database

The detection network model adopts a network structure which takes a resnet18 as a backbone network and is combined with deformable convolution based on multi-head spatial attention and self-attention;

arranging a laser radar and a plurality of cameras around a vehicle, and collecting 3d data and a plurality of visual images of a plurality of lane line scenes to establish a lane line database;

training a test network model

3d data of all lane line scenes in the lane line database are respectively subjected to manual labeling and interpolation processing to obtain position truth values, classification truth values and attribute truth values of all lane lines in each lane line scene;

extracting features of the 3d data corresponding to the current frame visual image and the current frame moment acquired by the multipath cameras by using a detection network model, obtaining an overall aerial view feature image under different camera angles, carrying out weighted fusion on the overall aerial view feature image and the aerial view feature image of the last frame output by the detection network model, obtaining the aerial view feature image of the current frame, further obtaining a position predicted value, a classification predicted value and an attribute predicted value of each lane line in each lane line scene, carrying out matching calculation on the position predicted value and the position true value by using a Hungary matching algorithm, and carrying out network loss calculation on the classification predicted value and the classification true value, the attribute predicted value and the attribute true value and the position true value by using a cross entropy classification loss calculation method respectively according to the obtained lane line matching pairs so as to adjust the parameters of the detection network model to enter the next iteration until the detection network model training is completed;

lane line detection

And (3) performing data processing by using the trained detection network model and using the current frame visual image acquired by the multipath cameras, 3d data corresponding to the current frame time acquired by the laser radar and the last frame aerial view feature image output by the detection network model, so as to finish the real-time detection of the lane line in the vehicle running process.

Further, during training, firstly, feature extraction is carried out on the current frame visual image acquired by the multi-path cameras by utilizing a backbone network, multi-path visual image features are obtained, and 3d data corresponding to the current frame moment are projected through vehicle calibration parameters to obtain image coordinates;

carrying out feature sampling on multi-channel visual image features and image coordinates obtained by projection by utilizing deformable convolution based on multi-head spatial attention and self-attention, obtaining integral aerial view feature images under different camera view angles, carrying out weighted fusion on the integral aerial view feature images and the aerial view feature images of the last frame output by a detection network model to obtain a current frame aerial view feature image, and carrying out lane line prediction on the current frame aerial view feature images by using a regression module and a classification module based on multiple full connection to obtain position predicted values, classification predicted values and attribute predicted values of each lane line in each lane line scene;

and finally, carrying out matching calculation on the position predicted value and the position true value by using a Hungary matching algorithm, obtaining a lane line matching pair, and carrying out network loss calculation by adopting a cross entropy classification loss calculation method according to the lane line matching pair so as to adjust the parameters of the detection network model to enter the next iteration until the detection network model training is completed.

Further, manually marking the lane lines on the 3d data of each lane line scene to obtain attribute parameters of each lane line and three-dimensional coordinates of the point-linked lane lines, and obtaining image coordinates (u, v, depth) of a corresponding multi-path camera through vehicle calibration parameter projection, wherein u and v represent transverse coordinates and longitudinal coordinates of pixel points in a two-dimensional image, depth represents depth values of the pixel points in real world 3d data, and then obtaining curve parameters of each lane line through a data fitting method;

according to the image coordinates obtained by projection, the image longitudinal axis is adopted at equal proportion intervals to obtain sampling coordinates of lane lines at different heights, and normalization processing is carried out to obtain sampling coordinates (u ', v ', depth ') after normalization processing as position coordinates of the lane lines;

and obtaining a position truth value, a classification truth value and an attribute truth value corresponding to each lane line scene, wherein the position truth value comprises the number of lane lines and position coordinates contained in the position truth value, the classification truth value comprises whether the lane lines exist in the lane line scene, and the attribute truth value comprises attribute parameters of the lane lines.

Further, the camera is provided with seven paths which respectively cover the front 120 degrees, the front 30 degrees, the left side, the right side, the left rear side, the right rear side and the rear side of the vehicle;

the laser radar is arranged on the top of the vehicle and covers 120 degrees of the front direction of the vehicle.

A computer readable storage medium storing a computer program for execution by a processor to implement a method as described above.

The beneficial technical effects of the invention are as follows:

1. by utilizing a front fusion mode of the multi-view multi-type sensor data of the visual camera on the model side, 360-degree prediction capability of the detection network model on the lane line is established, and the prediction accuracy of 3D position coordinates under the scenes of shielding, long distance, complex line type and the like is greatly improved.

2. By means of fusion of the detection network model to the characteristics of the historical aerial view, the historical observation data information is effectively utilized, so that model prediction accuracy under the current frame is improved, and model time sequence output stability and model prediction robustness are improved.

3. The 3D lane line real-time detection algorithm from end to end is designed, the complex post-processing logic is removed, the neural network is used for directly outputting the 3D coordinates and the attributes of the lane lines, the complex and redundant post-processing logic for detecting the traditional lane lines is greatly solved, and the real-time processing performance of the algorithm detection is improved.

Drawings

FIG. 1 is a schematic general flow chart of the present invention.

Detailed Description

The following detailed description of the invention refers to the accompanying drawings and preferred embodiments.

As shown in FIG. 1, the invention provides a lane line end-to-end real-time detection method for automatic driving, wherein other parts except a main network adopt light multilayer convolution or multilayer full-connection design, no complex and low-efficiency model architecture design is adopted, meanwhile, the problems and hidden dangers caused by combining a segmentation network with complex post-processing in the traditional lane line detection task, such as lane line disconnection, line leakage, false detection and the like, are solved by directly predicting the spatial position coordinates of a 3D lane line point chain, and in addition, the historical time sequence characteristics are added, so that the time sequence stability and robustness of a module output lane line are improved, and stable time sequence prediction can be realized.

The method comprises the following steps:

1. construction of detection network model and lane line database

the camera is provided with seven paths which respectively cover the front 120 degrees, the front 30 degrees, the left side, the right side, the left rear side, the right rear side and the rear side of the vehicle; the laser radar is arranged at the top of the vehicle and covers 120 degrees of the front direction of the vehicle.

2. Training a test network model

1. Prediction truth calculation

And 3d data of all lane line scenes in the lane line database are respectively subjected to manual labeling and interpolation processing to obtain position truth values, classification truth values and attribute truth values of all lane lines in each lane line scene.

Carrying out lane line manual marking on the 3d data of each lane line scene to obtain attribute parameters of each lane line and three-dimensional coordinates of the lane line in a point chain mode, wherein the attribute parameters comprise colors, lines and the like of the lane line, such as white, yellow, red, solid lines and broken lines;

obtaining image coordinates (u, v, depth) of a corresponding multi-path camera through vehicle calibration parameter projection, wherein u and v represent transverse coordinates and longitudinal coordinates of pixel points in a two-dimensional image, depth represents depth values of the pixel points in real world 3d data, and curve parameters of each lane line are obtained through a data fitting method, so that whether the lane lines exist in a lane line scene or not and whether the lane lines exist in a plurality of lane lines can be judged;

according to the image coordinates obtained by projection, sampling the longitudinal axis of the image at equal proportion intervals, and assuming that the interval of 50 pixels is equal to the height of the image under 4k pixels, namely, a set V '= (0, 50, 100.) formed by different heights V, 3750,3800, so as to obtain the sampling coordinates of the lane lines under different heights, and then normalizing the sampling coordinates to a (0, 1) interval to obtain the sampling coordinates (u', V ', depth') after normalization processing as the position coordinates of the points on the lane lines;

obtaining a position true value, a classification true value and an attribute true value corresponding to each lane line scene, wherein the position true value comprises the number of lane lines and contained position coordinates, the number is defined as (max_line, max_pts, 3), the classification true value comprises whether the lane lines exist in the lane line scene or not, the number is defined as (max_line, max_pts), 3 represents (u ', v ', depth '), and max_line is the maximum predictable number of different lane lines of the network and is preset to be 32; max_pts is the number of different points of the same lane line which can be predicted most by the network, and is preset to be 36;

the attribute truth value comprises attribute parameters of the lane lines, and is defined as tensors of (max_line, num_color) and (max_line, num_type), wherein num_color is a type of color and is preset to be 3 (yellow, white and other); num_type is a type of line type, preset to 3 (solid line, dotted line, others).

2. Detecting data processing during network model training

And carrying out feature extraction on the 3d data corresponding to the current frame visual image and the current frame moment acquired by the multipath cameras by using the detection network model, obtaining an overall aerial view feature image under different camera angles, carrying out weighted fusion on the overall aerial view feature image and the last frame aerial view feature image output by the detection network model, obtaining the current frame aerial view feature image, further obtaining a position predicted value, a classification predicted value and an attribute predicted value of each lane line in each lane line scene, carrying out matching calculation on the position predicted value and the position true value by using a Hungary matching algorithm, and carrying out network loss calculation on the classification predicted value and the classification true value, the attribute predicted value and the attribute true value and the position true value by using a cross entropy classification loss calculation method according to the obtained lane line matching pair so as to adjust the parameters of the detection network model to enter the next iteration until the detection network model training is completed.

In training, the following is specific:

a. Picture preprocessing and image enhancement to improve generalization ability and robustness of network learning, including but not limited to the following ways:

i. in the color space, a certain random disturbance is made to brightness, contrast and saturation

in image space, making random offset and rotation of picture and coordinate true value to the same extent

b. Firstly, feature extraction is carried out on a current frame visual image acquired by a plurality of cameras by utilizing a backbone network, multi-path visual image features are acquired, and 3d data corresponding to the current frame moment are projected through vehicle calibration parameters to obtain image coordinates:

i. and obtaining 7 paths of visual image features (7, h and w) by adopting a lightweight backbone network as a resnet18, wherein h and w are the sizes of feature images.

Projecting 3d data corresponding to the current frame time through vehicle calibration parameters to obtain image coordinates, and obtaining a corresponding laser radar-to-image conversion matrix

c. Feature Learning is performed using a multi-head spatial attention and self-attention based deformable convolution, which can be referred to as BEVFormer: learning Bird's-Eye-View Representation fromMulti-Camera Images via Spatiotemporal Transformers, to obtain features in a Bird's Eye view:

i. carrying out feature sampling on the multipath visual image features and the image coordinates obtained by projection by utilizing deformable convolution based on multi-head spatial attention and self-attention, and learning to obtain integral aerial view features (7,h ', w') under different camera view angles, wherein h ', w' are the aerial view feature sizes;

and ii, carrying out multi-sensor feature fusion on the whole aerial view feature and the aerial view feature of the last frame output by the detection network model in a weighted fusion mode to obtain a current aerial view feature map (h ', w').

d. And (3) predicting the coordinates of the lane lines in the image and classifying the confidence coefficient by using a regression module and a classification module based on multiple full connection to the characteristics under the whole aerial view, so as to obtain the position predicted value, the classification predicted value and the attribute predicted value of each lane line in each lane line scene, namely obtaining the position predicted tensor with the size of (max_line, max_pts, 3) and the classification predicted tensor with the size of (max_line, max_pts). The prediction result of the attribute part is output based on a mode of multiple full connection: lane line type prediction tensor of (max_line, 3) size.

e. And calculating the matching pair of the lane lines by using a Hungary matching algorithm from the three-dimensional coordinates (u, v, depth) of the position of the lane line and the three-dimensional coordinates (u ', v ', depth ') of the position true value, namely, the lane line in the true value matched with each lane line in the predicted value.

f. Calculating regression and classification losses for each matched pair

According to the matching result, based on the true value and the predicted value, the network loss calculation of the position, classification and attribute is completed by using a cross entropy classification loss calculation method, namely, the L2 loss, the cross entropy classification loss and the classification loss of the lane line attribute of the predicted position are calculated respectively

And performing gradient back propagation and gradient descent on the network according to the calculated network loss, updating and detecting the network weight and entering the next iteration until the network training is finished.

3. Lane line detection

According to one embodiment of the present invention, there is also provided a computer-readable storage medium.

The computer readable storage medium has stored thereon a time synchronization program which, when executed by a processor, implements the steps of the time synchronization method according to any one of the embodiments.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While particular embodiments of the present invention have been described above, it will be appreciated by those skilled in the art that these are merely illustrative, and that many changes and modifications may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims.

Claims

1. A lane end-to-end real-time detection method for automatic driving is characterized in that: comprises constructing a detection network model and a lane line database

training a test network model

lane line detection

2. The lane end-to-end real-time detection method for automatic driving according to claim 1, wherein: when training is carried out, firstly, a trunk network is utilized to carry out feature extraction on the current frame visual image acquired by the multi-path cameras, multi-path visual image features are acquired, and 3d data corresponding to the current frame moment are projected through vehicle calibration parameters to obtain image coordinates;

for extracting fusion features, carrying out feature sampling on multi-channel visual image features and image coordinates obtained by projection by utilizing deformable convolution based on multi-head spatial attention and self-attention, obtaining integral aerial view feature images under different camera view angles, carrying out weighted fusion on the integral aerial view feature images and the aerial view feature images of the last frame output by a detection network model to obtain a current frame aerial view feature image, and carrying out lane line prediction on the current frame aerial view feature image by using a regression module and a classification module based on multiple full connection to obtain a position predicted value, a classification predicted value and an attribute predicted value of each lane line in each lane line scene;

3. The lane end-to-end real-time detection method for automatic driving according to claim 2, wherein: manually marking the lane lines on the 3d data of each lane line scene to obtain attribute parameters of each lane line and three-dimensional coordinates of the point-linked lane lines, and obtaining image coordinates (u, v, depth) of a corresponding multi-path camera through vehicle calibration parameter projection, wherein u and v represent transverse coordinates and longitudinal coordinates of pixel points in a two-dimensional image, depth represents depth values of the pixel points in the 3d data of the real world, and then obtaining curve parameters of each lane line through a data fitting method;

4. The lane end-to-end real-time detection method for automatic driving according to claim 1, wherein: seven paths of cameras are arranged and respectively cover the front 120 degrees, the front 30 degrees, the left side, the right side, the left rear side, the right rear side and the rear side of the vehicle;

5. A computer-readable storage medium storing a computer program, characterized in that: the computer program being executable by a processor to implement the method of any one of claims 1-4.