CN117011819A - Lane line detection method, device and equipment based on feature guidance attention - Google Patents

Lane line detection method, device and equipment based on feature guidance attention Download PDF

Info

Publication number
CN117011819A
CN117011819A CN202310992383.4A CN202310992383A CN117011819A CN 117011819 A CN117011819 A CN 117011819A CN 202310992383 A CN202310992383 A CN 202310992383A CN 117011819 A CN117011819 A CN 117011819A
Authority
CN
China
Prior art keywords
feature
lane line
attention
map
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310992383.4A
Other languages
Chinese (zh)
Inventor
刘登峰
郭文静
陈世海
郭虓赫
朱烁
王然
柴志雷
吴秦
陈璟
周浩杰
王宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202310992383.4A priority Critical patent/CN117011819A/en
Publication of CN117011819A publication Critical patent/CN117011819A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of lane line detection, in particular to a lane line detection method, equipment and storage medium based on feature guidance attention. The lane line detection method comprises the following steps: using a characteristic residual neural network as a backbone network of a lane line detection model, carrying out lane line characteristic extraction on a picture containing lane lines, and outputting a multi-size characteristic map; using an equalization feature pyramid network as a trans-scale feature fusion module of a lane line detection model, generating a plurality of sizes corresponding to a pyramid for each size feature generated by the feature residual neutral network through up-down sampling operation, and fusing feature graphs of the same size; and using the ROI Gather as a detection module of the lane line detection model, iteratively updating a preset lane line, and outputting a finally detected lane line. The method improves the detection speed while ensuring the precision by intensively analyzing the key areas.

Description

Lane line detection method, device and equipment based on feature guidance attention
Technical Field
The application relates to the technical field of lane line detection, in particular to a lane line detection method, device and equipment based on feature guidance attention and a readable storage medium.
Background
Lane lines serve as important clues for constraining vehicles to travel on roads, which play a vital role in modern automotive advanced driving assistance systems and automatic driving systems. Helping intelligent vehicles to better locate and drive more safely is the goal of lane line detection systems.
In actual lane line detection, since there are many bad scenes such as bad weather, dim or dazzle light, lane lines being blocked by other vehicles, etc., although a human being can easily infer the lane line position and fill the blocked portion according to the context, it is difficult for the lane line detection task to distinguish the lane line from the surrounding environment without advanced semantics and global context information. The prior art proposes a message passing mechanism to collect global context information, but this method performs pixel-by-pixel prediction, which is difficult to meet the real-time requirement.
Moreover, the existing lane line detection task based on the line anchor does not reuse local features and global features, and is easy to cause missed detection and false detection. Modeling and integrating the local geometry of the lane lines into the global features easily misdeems landmarks as lane lines causing false detection. Constructing a fully connected layer with global features to detect lanes can easily lead to inaccurate positioning of the predicted lanes.
Disclosure of Invention
Therefore, the technical problem to be solved by the application is to solve the problems that the lane line detection method in the prior art is inaccurate in detection in a severe scene and is difficult to meet the real-time performance.
In order to solve the technical problems, the application provides a lane line detection method based on feature guidance attention, which comprises the following specific steps:
using a characteristic residual neural network as a backbone network of a lane line detection model, carrying out lane line characteristic extraction on a picture containing lane lines, and outputting a multi-size characteristic map;
the characteristic residual neural network comprises a basic residual block and a basic block formed by characteristic guidance attention; the basic blocks formed by the characteristic directing attention adopt jump connection, and the basic blocks comprise a plurality of convolution blocks, a batch normalization layer, a rectification linear unit activation function layer and the characteristic directing attention; the feature guidance attention is constructed based on a convolution attention module, an input feature map is processed by the convolution attention module, a space refinement map is output, and the input feature map and the space refinement map are subjected to channel shuffling operation to obtain a final output refinement feature map;
using an equalization feature pyramid network as a trans-scale feature fusion module of a lane line detection model, generating a plurality of sizes corresponding to a pyramid for each size feature generated by the feature residual neutral network through up-down sampling operation, and fusing feature graphs of the same size;
and using the ROIGatre as a detection module of a lane line detection model, inputting the multi-size feature map fused by the balanced feature pyramid network into the ROIGatre detection module, iteratively updating a preset lane line, and outputting a finally detected lane line.
In one embodiment of the present application, the extracting lane line features of the image including the lane lines by using the feature residual neural network as the backbone network of the lane line detection model, and outputting the multi-size feature map includes:
sequentially inputting the lane line images to be detected into a convolution layer, a batch normalization layer, a rectification linear unit activation function, the convolution layer and the batch normalization layer, and outputting the images as input characteristic images with characteristic guiding attention;
the feature guidance attention processes the input feature map through a convolution attention module, a space refinement map is output, and the input feature map and the space refinement map are subjected to channel shuffling operation to obtain a final output refinement feature map;
and shuffling the lane line image to be detected and the refined feature image of the feature guidance attention output by adopting jump connection, and outputting a multi-size feature image through a rectification linear unit activation function.
In one embodiment of the present application, the spatial refinement graph is used as a guide, and the spatial refinement graph and the input feature graph are rearranged in an alternating manner through a channel shuffling operation, so as to obtain a final output refinement feature graph, wherein the formula is:
F out =σ(GC 7×7 (CS([F in ,F s ])))
wherein F is out To the endOutput refinement feature graphs, sigma represents sigmoid operation, CS (·) represents channel shuffling operation, GC 7×7 Representing a group convolution layer with a kernel size of 7 x 7, F in To input the feature map F s The graph is spatially refined.
In one embodiment of the application, the spatially refined graph F s Refinement of graph F by channels c The specific formula of the application of the 2D space attention is as follows:
wherein F is SRM Mapping for spatial attention;
the channel refinement graph is derived from the input feature graph F by applying 1D channel attention in The specific formula is as follows:
wherein F is CRM Mapping for channel attention;
the channel attention map F CRM The specific formula is as follows:
the spatial attention map F SRM The specific formula is as follows:
wherein MLP is a multi-layer sensor, and the number of hidden layers is R C/r×1×1 C is the number of channels in the feature map, r is the reduction ratio, W 1 And W is 0 Is the weight of the multi-layer perceptron,is a cross-channelFeature of global average pooling operation of dimensions, +.>Features of the global max pooling operation across channel dimensions, +.>For the feature of global averaging pooling operations across spatial dimensions,features that operate to process for global maximization across spatial dimensions.
In one embodiment of the present application, the balanced feature pyramid network is used as a trans-scale feature fusion module, and for each size feature generated by the feature residual neural network, a plurality of sizes corresponding to the pyramid are generated through up-down sampling operation, up-sampling operation is performed by using hole convolution, and down-sampling is performed by using convolution kernel and stride.
In one embodiment of the present application, the specific method for inputting the multi-size feature map obtained by fusing the balanced feature pyramid network to the ROIGather detection module, iteratively updating the preset lane line, and outputting the finally detected lane line includes:
after a preset lane line is distributed to the minimum size feature map output by the balanced feature pyramid network, each preset lane line has N points; sampling N uniformly from the preset lane line p Calculating the exact value of the input characteristic by bilinear interpolation, and carrying out zero padding on the characteristic exceeding the picture range to obtain the ROI characteristic of each preset lane lineC is the number of channels in the feature map;
performing 9×9 one-dimensional convolution on the ROI feature of each preset lane line, and collecting the nearby features of each channel pixel;
further extracting by using full-connectivity algorithm to obtain preset valueCharacteristic X 'of lane line' p ∈R C×1
Adjusting global feature map X f ∈R C×H×W Is the same as the minimum dimension feature map output by the equilibrium feature pyramid network in size and flattened to X' f ∈R C×HW
Establishing the feature X 'of the preset lane line obtained by further extraction' p And global feature map X' f Obtaining an attention matrix W between the characteristics of the preset lane lines and the global characteristic diagram;
calculating an aggregation characteristic G through the attention matrix W, and combining the aggregation characteristic G with a characteristic X 'of a preset lane line' p And adding, namely distributing the new preset lane lines to the next-layer feature graphs output by the balanced feature pyramid network, and sequentially cycling until all the feature graphs output by the balanced feature pyramid network are utilized.
In one embodiment of the application, the characteristic X 'of the preset lane line' p And global feature map X' f The attention matrix W between is:
wherein f is a normalization function softmax, and C is the number of channels in the feature map;
the polymeric features
The application also provides a lane line detection device based on the feature guiding attention, which comprises:
the feature extraction module is used for extracting the lane line features of the picture containing the lane lines by using the feature residual neural network as a backbone network of the lane line detection model and outputting a multi-size feature map; the characteristic residual neural network comprises a basic residual block and a basic block formed by characteristic guidance attention; the basic blocks formed by the characteristic directing attention adopt jump connection, and the basic blocks comprise a plurality of convolution blocks, a batch normalization layer, a rectification linear unit activation function layer and the characteristic directing attention; the feature guidance attention is constructed based on a convolution attention module, an input feature map is processed by the convolution attention module, a space refinement map is output, and the input feature map and the space refinement map are subjected to channel shuffling operation to obtain a final output refinement feature map;
the feature fusion module uses an equilibrium feature pyramid network as a trans-scale feature fusion module of the lane line detection model, generates a plurality of sizes corresponding to pyramids for features of each size generated by the feature residual neural network through up-down sampling operation, and fuses feature graphs of the same size;
the lane line detection module uses the ROIGATher as a detection module of a lane line detection model, inputs the multi-size feature map fused by the balanced feature pyramid network to the ROIGATher detection module, iteratively updates a preset lane line, and outputs the finally detected lane line.
The application also provides a lane line detection device based on the feature guiding attention, comprising:
the camera acquisition unit is used for acquiring lane line images;
a memory for storing a computer program;
and the processor is used for processing the lane line image acquired by the camera acquisition unit, realizing the step of the lane line detection method based on the characteristic guiding attention when executing the computer program, and outputting the finally detected lane line.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described feature-based attention-directing lane line detection method.
Compared with the prior art, the technical scheme of the application has the following advantages:
according to the feature-based attention-guiding lane line detection method, the lane is roughly positioned by utilizing the global features, the global features are refined, so that fine local features are obtained, more accurate position and high-precision detection results are obtained, and the detection speed is improved while the precision is ensured by intensively analyzing key areas. The method has higher accuracy in the task of detecting the lane lines, particularly in a severe environment, has higher detection speed compared with a pixel-by-pixel prediction method, and can meet the requirement of instantaneity.
The features described in the present application direct the basic blocks of attention to focus more on lane line pixels and more important channel information. The features guide attention to generate a refined feature map in a coarse-to-fine mode, and the input feature map and the spatial refined map are subjected to channel shuffling operation to obtain a final output refined feature map, so that information between the spatial attention features and the channel attention features is fully exchanged, and important information in the features is emphasized.
The balanced feature pyramid network fully utilizes global features and local features, solves the problem of detail information loss caused by gradual desalination of high-level features along with the fusion process, realizes balanced feature fusion among different levels, and reduces false detection rate and omission rate of lane line detection.
Drawings
In order that the application may be more readily understood, a more particular description of the application will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings, in which
FIG. 1 is an overall architecture diagram of a feature-based attention-directing lane line detection method provided by the present application;
FIG. 2 is a block diagram of the basic blocks of the feature directing attention component provided by the present application;
FIG. 3 is a block diagram of the feature directing attention provided by the present application;
FIG. 4 is a diagram of a network architecture of an equalizing feature pyramid provided by the present application;
fig. 5 is a visual result of CULane and TuSimple datasets provided in this embodiment.
Detailed Description
The present application will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the application and practice it.
Referring to FIG. 1, the present application provides a lane line detection method based on feature guidance attention, which is input as a picture I ε R containing lane lines 3×H×W The method specifically comprises the following steps of:
and the characteristic residual neural network is used as a backbone network of a lane line detection model (FGANet), and is used for carrying out lane line characteristic extraction on pictures containing lane lines and outputting a multi-size characteristic diagram. Wherein the characteristic residual neural network comprises a basic residual block and a basic block (FGAB) of characteristic directing attention composition.
Referring to fig. 2, the present application uses a basic block composed of feature guidance attentions as a building block of a feature residual neural network, sequentially inputs lane line images to be detected into a convolution layer, a batch normalization layer, a rectification linear unit activation function, the convolution layer and the batch normalization layer, and outputs as input feature images of Feature Guidance Attentions (FGAs).
And shuffling the lane line image to be detected and the refined feature image of the feature guidance attention output by adopting jump connection, and outputting a multi-size feature image through a rectification linear unit activation function. The jump connection is adopted to enable the lane line detection model to be simple enough, fewer parameter amounts are contained, the processing speed is higher, the method can be deployed into real-time equipment, the deep learning model is helped to solve the problem of vanishing gradient, and the phenomenon of overfitting caused by over-deep network depth is prevented.
The feature-guided attention is optimized based on a convolution attention module consisting of a channel attention and a spatial attention, which are placed in sequence to calculate the attention weights in the channel and spatial dimensionsHeavy. Channel attention computation channel-by-channel vectors using both average pooling and maximum pooling operations to generate a channel attention map, i.e., F CRM ∈R C×1×1 The channel attention map is obtained by multiplying the recalibration feature and the input feature map element by element, so that the representation capability of the network is greatly improved. Spatial attention applies average pooling and maximum pooling operations along the channel axis, applying convolution layers to generate a spatial attention map, i.e., F SRM ∈R 1×H×W The importance of different regions is represented adaptively, and the spatial attention map is obtained by multiplying the channel attention map element by element.
The main purpose of the attention mechanism is to focus more on the road portion of the image, while ignoring other objects in the image, such as sky, trees, pedestrians, utility poles, and others. However, the spatial attention inside the convolution attention module can only solve the problem of focusing on the road part at the image level, whereas focusing on the road at the feature level is ignored. Channel attention within the convolution attention module models channel differences without consideration of context information, other object information is encoded into the feature map as the feature channels expand, meaning that for each feature channel the other object information is unevenly distributed in the spatial dimension. Furthermore, another problem with the convolution attention module is that the spatial attention and channel attention features are computed sequentially, and that there is insufficient information exchange between them.
In order to solve the above problems, the present application proposes a feature guidance attention, which inputs a feature map F in ∈R C×H×W And (3) outputting a space refinement graph through convolution attention module processing, and obtaining a final output refinement feature graph from the input feature graph and the space refinement graph through channel shuffling operation, wherein the final output refinement feature graph has the same dimension as the input feature graph.
The features direct attention from the input feature map F by applying 1D channel attention in ∈R C×H×W Find important channels in the image and generate channel refinement diagram F c
Wherein channel attention map F CRM The specific formula is as follows:
refining the graph F in the channel c Applying 2D space attention to generate a space refinement F s
Wherein the spatial attention map F SRM The specific formula is as follows:
wherein MLP is a multi-layer sensor, and the number of hidden layers is R in order to reduce the number of parameters and limit the complexity of the lane line detection model C/r×1×1 C is the number of channels in the feature map, and t is the reduction ratio. W (W) 1 And W is 0 Weight of multi-layer sensor, W 0 ∈R C/r×C ,W 1 ∈R C×C/rFeatures of the global averaging pooling operation across channel dimensions +.>Features of the global max pooling operation across channel dimensions, +.>Features of the global averaging pooling operation across spatial dimensions +.>Features that operate to process for global maximization across spatial dimensions.
Finally, the space refinement diagram F is utilized s As a guide to generate the final feature map, spatially refining map F s And input a feature map F in Is rearranged in an alternating manner by a channel shuffling operation to obtain a refined feature map of the final output, guaranteeing information interaction, the specific formula is as follows:
F out =σ(GC 7×7 (CS([F in ,F s ])))
wherein F is out For the final output refinement feature map, σ represents sigmoid operation, CS (·) represents channel shuffling operation, GC 7×7 Representing a group convolution layer with a kernel size of 7 x 7, F in To input the feature map F s The graph is spatially refined.
Referring to fig. 4, the balanced feature pyramid network is used as a cross-scale feature fusion module of the lane line detection model, and features of each size generated by the feature residual neural network are generated into a plurality of sizes corresponding to a pyramid through up-down sampling operation, and feature graphs of the same size are fused.
One of the difficulties in lane line detection is how to efficiently represent and process multi-scale features. The current lane line detection model based on deep learning already uses a characteristic pyramid network as a neck module, detects large objects in a low-resolution pyramid characteristic diagram and detects small objects in a high-resolution pyramid characteristic diagram so as to effectively process multi-scale objects in an image. The profound insight that higher-level neurons respond strongly to the whole object, while other neurons are more likely to be activated by local textures and patterns, suggests the importance of higher-level features and the necessity to propagate semantically strong features. However, conventional top-down feature pyramid networks are limited by unidirectional information flow. To effectively solve this problem, the existing lane line detection model adds an additional bottom-up path aggregation network over the feature pyramid network.
In order to solve the problem that the conventional characteristic pyramid network gradually fades useful information in high-level characteristics along with the fusion process when acquiring multi-scale characteristics, and detail information is lost, the application provides a balanced characteristic pyramid network based on the multi-scale characteristic network as a trans-scale characteristic fusion module of a lane line detection model, and the useful information and detail information in the high-level characteristics are reserved so as to reduce the influence of scale difference on the performance of the model.
And the balanced feature pyramid network is used as a trans-scale feature fusion module, features of each size generated by the feature residual neural network are generated into a plurality of sizes corresponding to the pyramid through up-down sampling operation, the up-sampling operation is carried out by using cavity convolution, and the down-sampling operation is carried out by using convolution kernels and steps. Compared with the common convolution operation, the cavity convolution can increase the receptive field on the premise of not increasing the network parameter quantity.
For some extreme cases where there is no local visual evidence of the presence of a lane, it is necessary to look at features in the vicinity of the pixel, i.e. contextual features, in order to determine whether the current pixel belongs to the lane. Therefore, the application adopts the ROIGATher as a detection module of the lane line detection model, inputs the multi-size feature map fused by the balanced feature pyramid network to the ROIGATher detection module, iteratively updates the preset lane line, and outputs the finally detected lane line, and the specific steps are as follows:
after a preset lane line is distributed to the minimum size feature map output by the balanced feature pyramid network, each preset lane line has N points; sampling N uniformly from the preset lane line p Calculating the exact value of the input characteristic by bilinear interpolation, and carrying out zero padding on the characteristic exceeding the picture range to obtain the ROI characteristic of each preset lane lineC is the number of channels in the feature map.
And performing 9×9 one-dimensional convolution on the ROI feature of each preset lane line, and collecting the nearby features of each channel pixel.
Further extracting by using a full-connectivity algorithm to obtain a pre-preparationSetting the characteristic X 'of the lane line' p ∈R C×1
Adjusting global feature map X f ∈R C×H×W Is the same as the minimum dimension feature map output by the equilibrium feature pyramid network in size and flattened to X' f ∈R C×HW
Establishing the feature X 'of the preset lane line obtained by further extraction' p And global feature map X' f Obtaining an attention matrix W between the characteristics of the preset lane lines and the global characteristic diagram:
where f is the normalization function softmax and C is the number of channels in the feature map.
Calculating by the attention matrix W to obtain the aggregate characteristicsCombining the aggregate characteristic G with the characteristic X 'of the preset lane line' p And adding, namely distributing the new preset lane lines to the next-layer feature graphs output by the balanced feature pyramid network, and sequentially cycling until all the feature graphs output by the balanced feature pyramid network are utilized.
The loss function of the lane line detection model comprises classification loss and regression loss, and the total loss function is defined as:
L total =w cls L cls +w xytl L xytl +w xytl L LIoU
wherein L is cls Is the focus loss between the prediction and the label, L xytl Is a smooth L1 loss of regression of the starting point coordinates, angle theta and lane line length, L LIoU Is the lineinformation loss between the predicted lane and the baseline truth. Super parameter w cls 、w xytl 、w xytl Set to 6.0, 0.5 and 2.0, respectively.
In order to broadly evaluate the proposed method of the present application, the present embodiment performs experiments on two widely used lane detection reference data sets CULane and tune. CULane is one of the widely used large-scale lane detection datasets, and one of the most complex datasets, which contains nine challenging scenarios such as congestion, night, no lane lines, etc. TuSimple lane detection benchmark is also one of the most widely used data sets in lane detection, which is collected under steady illumination conditions on highways. Details of the CULane and Tusimple datasets are shown in Table 1.
Dataset Train Val. Test Road type Resolution Scenarios
CULane 88.9K 9.7K 34.7K Urban&Highway 1640×590 9
Tusimple 3.3K 0.4K 2.8K Highway 1280×720 1
Table 1, CULane and Tusimple dataset details
In this embodiment, the characteristic residual neural network is used as a backbone network of the lane line detection model, and the sizes of all the input images are adjusted to 320×800. In the optimization process, the embodiment uses an AdamW optimizer, and the initial learning rate is 10 -3 Cosine decay learning rate strategy. This example trained 70 epochs and 300 epochs for CULane and Tusimple, respectively, where the larger difference was due to the larger difference between the data set sizes. For data enhancement, the present embodiment uses a random affine transformation including translation, rotation, and scaling, and random horizontal flipping. The network of this embodiment is implemented based on Pytorch, using 2 GPUs to run all experiments. All experimental results were calculated on the machines of Intel (R) Xeon (R) Silver4110 and RTX2080 Ti.
For the CULane dataset, mF1 is used as a metric, the intersection ratio between the real lane line and the predicted lane line is calculated (IoU), and then the detection result is classified into True Positive (TP), false Positive (FP) and False Negative (FN) according to a preset threshold value.
For the Tusimple dataset, three official indices are employed: accuracy (Acc), false Positive (FP) and False Negative (FN), the evaluation formula is:
wherein C is clip Is the number of correctly predicted lane points, S slip Is the number of reference truth points of the image. If more than 85% of the predicted lane points are at the benchmarkWithin 20 pixels of the truth point, the predicted lane is considered correct.
Wherein F is pred Is the number of lane lines predicted incorrectly, N pred Is to predict the correct number of lane lines, M pred Is not to correctly predict the number of real lane lines, N gt Is the actual lane line number.
Referring to fig. 5, the first behavior is a visualization result of the CULane dataset, the second behavior is a visualization result of the TuSimple dataset, and different lane instances are represented by different colors.
The results of the lane line detection method based on the feature guidance attention on the CULane data set are shown in Table 2.
TABLE 2 results of the application on CULane dataset
Referring to table 2, the proposed method achieves the most advanced results of the 52.88mf1 score. In addition, the method of the application realizes the best performance in eight scenes in nine scenes, and has robustness to different scenes. For difficult situations, such as curves and nights, the method of the present application has significant advantages. In addition, the method of the present application achieves an mf1 fraction improvement of 2.18% with similar efficiency compared to the line anchor based method laneat-S. In most cases in the CULane dataset, the fresenet 101 version of the method of the present application exceeded all previous methods.
The results of the lane line detection method based on the feature-guided attention on the Tusimple data set are shown in Table 3.
TABLE 3 results of the application on Tusimple dataset
Referring to table 3, the difference between the different methods on the dataset is relatively small due to the small amount of data and the large number of single scenes. The method of the application achieves a best F1 score of 97.45 and a most advanced Acc score of 96.67.
To verify the effect of the different components of the proposed method, the present embodiment performs qualitative and quantitative experiments on the tuning dataset. Experiments were performed with the same training setup and different combinations of modules, and table 4 shows the quantitative results of the method according to the application under this experiment. The gradual addition of feature guidance attention and equalization feature pyramid networks in the baseline, feature guidance attention increased F1 from 95.04 to 96.13, which verifies that there was a significant increase in positioning accuracy. Furthermore, the balanced feature pyramid network increases F1 to 96.72. The results in F1, acc, FP, FN were consistently improved, which verifies the effectiveness of the algorithm.
TABLE 4 results of the ablation experiments of the application on a Tusimple dataset
In summary, based on the evaluation results performed on the two lane detection reference data sets Culane and Tusimple, the method of the present application is superior to the lane line detection method in the prior art.
The lane line detection method based on the feature guidance attention has higher accuracy in the task of detecting the lane line. The feature guidance attention provided by the application can emphasize more useful information coded in the features, and the balanced feature pyramid network solves the problem of detail information loss caused by gradual desalination of high-level features along with the fusion process, so that balanced feature fusion among different levels is realized.
The application also provides a lane line detection device based on the feature guiding attention, which comprises:
the feature extraction module is used for extracting the lane line features of the picture containing the lane lines by using the feature residual neural network as a backbone network of the lane line detection model and outputting a multi-size feature map; the characteristic residual neural network comprises a basic residual block and a basic block formed by characteristic guidance attention; the basic blocks formed by the characteristic directing attention adopt jump connection, and the basic blocks comprise a plurality of convolution blocks, a batch normalization layer, a rectification linear unit activation function layer and the characteristic directing attention; the feature guidance attention is constructed based on a convolution attention module, an input feature map is processed by the convolution attention module, a space refinement map is output, and the input feature map and the space refinement map are subjected to channel shuffling operation to obtain a final output refinement feature map;
the feature fusion module uses an equilibrium feature pyramid network as a trans-scale feature fusion module of the lane line detection model, generates a plurality of sizes corresponding to pyramids for features of each size generated by the feature residual neural network through up-down sampling operation, and fuses feature graphs of the same size;
the lane line detection module uses the ROIGATher as a detection module of a lane line detection model, inputs the multi-size feature map fused by the balanced feature pyramid network to the ROIGATher detection module, iteratively updates a preset lane line, and outputs the finally detected lane line.
The application also provides a lane line detection device based on the feature guiding attention, comprising:
the camera acquisition unit is used for acquiring lane line images;
a memory for storing a computer program;
and the processor is used for processing the lane line image acquired by the camera acquisition unit, realizing the step of the lane line detection method based on the characteristic guiding attention when executing the computer program, and outputting the finally detected lane line.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described feature-based attention-directing lane line detection method.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present application will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present application.

Claims (10)

1. The lane line detection method based on the feature guiding attention is characterized by comprising the following specific steps:
using a characteristic residual neural network as a backbone network of a lane line detection model, carrying out lane line characteristic extraction on a picture containing lane lines, and outputting a multi-size characteristic map;
the characteristic residual neural network comprises a basic residual block and a basic block formed by characteristic guidance attention; the basic blocks formed by the characteristic directing attention adopt jump connection, and the basic blocks comprise a plurality of convolution blocks, a batch normalization layer, a rectification linear unit activation function layer and the characteristic directing attention; the feature guidance attention is constructed based on a convolution attention module, an input feature map is processed by the convolution attention module, a space refinement map is output, and the input feature map and the space refinement map are subjected to channel shuffling operation to obtain a final output refinement feature map;
using the balanced feature pyramid network as a trans-scale feature fusion module of the lane line detection model,
generating a plurality of sizes corresponding to pyramids for the features of each size generated by the feature residual neural network through up-down sampling operation, and fusing feature graphs of the same size;
and using the ROIGatre as a detection module of a lane line detection model, inputting the multi-size feature map fused by the balanced feature pyramid network into the ROIGatre detection module, iteratively updating a preset lane line, and outputting a finally detected lane line.
2. The method for detecting the lane line based on the feature guidance attention according to claim 1, wherein the backbone network using the feature residual neural network as the lane line detection model performs the lane line feature extraction on the picture containing the lane line, and outputting the multi-size feature map comprises:
sequentially inputting the lane line images to be detected into a convolution layer, a batch normalization layer, a rectification linear unit activation function, the convolution layer and the batch normalization layer, and outputting the images as input characteristic images with characteristic guiding attention;
the feature guidance attention processes the input feature map through a convolution attention module, a space refinement map is output, and the input feature map and the space refinement map are subjected to channel shuffling operation to obtain a final output refinement feature map;
and shuffling the lane line image to be detected and the refined feature image of the feature guidance attention output by adopting jump connection, and outputting a multi-size feature image through a rectification linear unit activation function.
3. The lane line detection method according to claim 2, wherein the spatial refinement map is used as a guide, and the spatial refinement map and the input feature map are rearranged in an alternating manner through a channel shuffling operation, so as to obtain a final output refinement feature map, and the formula is:
F out =σ(GC 7×7 (CS([F in ,F s ])))
wherein F is out For the final output refinement feature map, σ represents sigmoid operation, CS (·) represents channel shuffling operation, GC 7×7 (. Cndot.) represents a group convolution layer with a kernel size of 7 x 7, F in To input the feature map F s The graph is spatially refined.
4. A method of lane line detection based on feature-guided attention as claimed in claim 3 wherein said spatially refined map F s Refinement of graph F by channels c The specific formula of the application of the 2D space attention is as follows:
wherein F is SRM Mapping for spatial attention;
the channel refinement graph is derived from the input feature graph F by applying 1D channel attention in The specific formula is as follows:
wherein F is CRM Mapping for channel attention;
the channel attention map F CRM The specific formula is as follows:
the spatial attention map F SRM The specific formula is as follows:
wherein MLP is a multi-layer sensor, and the number of hidden layers is R C/r×1×1 C is the number of channels in the feature map, r is the reduction ratio, W 1 And W is 0 Is the weight of the multi-layer perceptron,features of the global averaging pooling operation across channel dimensions +.>Features of the global max pooling operation across channel dimensions, +.>Features of the global averaging pooling operation across spatial dimensions +.>Features that operate to process for global maximization across spatial dimensions.
5. The lane line detection method based on feature guidance attention according to claim 1, wherein the balanced feature pyramid network is used as a trans-scale feature fusion module, and features of each size generated by the feature residual neural network are generated into a plurality of sizes corresponding to pyramids through up-down sampling operation, up-sampling operation is performed by using hole convolution, and down-sampling is performed by using convolution kernel and stride.
6. The method for detecting the lane line based on the feature guidance attention according to claim 1, wherein the specific method for inputting the multi-size feature map obtained by fusing the balanced feature pyramid network to the roiglaher detection module, iteratively updating the preset lane line and outputting the finally detected lane line comprises the following steps:
will be presetAfter the fixed lane lines are distributed to the minimum-size feature graphs output by the balanced feature pyramid network, each preset lane line has N points; sampling N uniformly from the preset lane line p Calculating the exact value of the input characteristic by bilinear interpolation, and carrying out zero padding on the characteristic exceeding the picture range to obtain the ROI characteristic of each preset lane lineC is the number of channels in the feature map;
performing 9×9 one-dimensional convolution on the ROI feature of each preset lane line, and collecting the nearby features of each channel pixel;
further extracting and obtaining the characteristic X 'of the preset lane line by using a full-communication algorithm' p ∈R C×1
Adjusting global feature map X f ∈R C×H×W Is the same as the minimum dimension feature map output by the equilibrium feature pyramid network in size and flattened to X' f ∈R C×HW
Establishing the feature X 'of the preset lane line obtained by further extraction' p And global feature map X' f Obtaining an attention matrix W between the characteristics of the preset lane lines and the global characteristic diagram;
calculating an aggregation characteristic G through the attention matrix W, and combining the aggregation characteristic G with a characteristic X 'of a preset lane line' p And adding, namely distributing the new preset lane lines to the next-layer feature graphs output by the balanced feature pyramid network, and sequentially cycling until all the feature graphs output by the balanced feature pyramid network are utilized.
7. The method for detecting a lane line based on feature guidance concentration according to claim 6, wherein the feature X 'of the lane line is preset' p And global feature map X' f The attention matrix W between is:
wherein f is a normalization function softmax, and C is the number of channels in the feature map;
the polymeric features
8. A lane line detection apparatus for directing attention based on features, comprising:
the feature extraction module is used for extracting the lane line features of the picture containing the lane lines by using the feature residual neural network as a backbone network of the lane line detection model and outputting a multi-size feature map; the characteristic residual neural network comprises a basic residual block and a basic block formed by characteristic guidance attention; the basic blocks formed by the characteristic directing attention adopt jump connection, and the basic blocks comprise a plurality of convolution blocks, a batch normalization layer, a rectification linear unit activation function layer and the characteristic directing attention; the feature guidance attention is constructed based on a convolution attention module, an input feature map is processed by the convolution attention module, a space refinement map is output, and the input feature map and the space refinement map are subjected to channel shuffling operation to obtain a final output refinement feature map;
the feature fusion module uses an equilibrium feature pyramid network as a trans-scale feature fusion module of the lane line detection model, generates a plurality of sizes corresponding to pyramids for features of each size generated by the feature residual neural network through up-down sampling operation, and fuses feature graphs of the same size;
the lane line detection module uses the ROIGATher as a detection module of a lane line detection model, inputs the multi-size feature map fused by the balanced feature pyramid network to the ROIGATher detection module, iteratively updates a preset lane line, and outputs the finally detected lane line.
9. A lane line detection apparatus that guides attention based on features, characterized by comprising:
the camera acquisition unit is used for acquiring lane line images;
a memory for storing a computer program;
a processor for processing the lane line image acquired by the camera acquisition unit, and when executing the computer program, implementing the lane line detection method based on the feature-based attention guidance according to any one of claims 1 to 7, and outputting the finally detected lane line.
10. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the steps of the feature-based attention-directing lane line detection method of any one of claims 1 to 7.
CN202310992383.4A 2023-08-08 2023-08-08 Lane line detection method, device and equipment based on feature guidance attention Pending CN117011819A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310992383.4A CN117011819A (en) 2023-08-08 2023-08-08 Lane line detection method, device and equipment based on feature guidance attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310992383.4A CN117011819A (en) 2023-08-08 2023-08-08 Lane line detection method, device and equipment based on feature guidance attention

Publications (1)

Publication Number Publication Date
CN117011819A true CN117011819A (en) 2023-11-07

Family

ID=88575814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310992383.4A Pending CN117011819A (en) 2023-08-08 2023-08-08 Lane line detection method, device and equipment based on feature guidance attention

Country Status (1)

Country Link
CN (1) CN117011819A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117789144A (en) * 2023-12-11 2024-03-29 深圳职业技术大学 Cross network lane line detection method and device based on weight fusion

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117789144A (en) * 2023-12-11 2024-03-29 深圳职业技术大学 Cross network lane line detection method and device based on weight fusion

Similar Documents

Publication Publication Date Title
CN107274445B (en) Image depth estimation method and system
CN113468967B (en) Attention mechanism-based lane line detection method, attention mechanism-based lane line detection device, attention mechanism-based lane line detection equipment and attention mechanism-based lane line detection medium
CN109753913B (en) Multi-mode video semantic segmentation method with high calculation efficiency
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN111696110B (en) Scene segmentation method and system
CN108805016B (en) Head and shoulder area detection method and device
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
Sun et al. Adaptive multi-lane detection based on robust instance segmentation for intelligent vehicles
CN116188999B (en) Small target detection method based on visible light and infrared image data fusion
CN117011819A (en) Lane line detection method, device and equipment based on feature guidance attention
CN112613434A (en) Road target detection method, device and storage medium
US11367206B2 (en) Edge-guided ranking loss for monocular depth prediction
CN115272437A (en) Image depth estimation method and device based on global and local features
CN111597913A (en) Lane line picture detection and segmentation method based on semantic segmentation model
Tang et al. HIC-YOLOv5: Improved YOLOv5 for small object detection
Ni et al. Scene-adaptive 3D semantic segmentation based on multi-level boundary-semantic-enhancement for intelligent vehicles
CN114495060A (en) Road traffic marking identification method and device
WO2022120996A1 (en) Visual position recognition method and apparatus, and computer device and readable storage medium
CN116229406B (en) Lane line detection method, system, electronic equipment and storage medium
Wang et al. Cbwloss: constrained bidirectional weighted loss for self-supervised learning of depth and pose
CN110610184B (en) Method, device and equipment for detecting salient targets of images
EP4235492A1 (en) A computer-implemented method, data processing apparatus and computer program for object detection
Yang et al. A novel vision-based framework for real-time lane detection and tracking
CN116091784A (en) Target tracking method, device and storage medium
CN112446292B (en) 2D image salient object detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination