Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a lane line identification method and device based on lightweight edge calculation, a computer device and a storage medium, which can solve the problems of low calculation speed and low efficiency in the prior art.
In a first aspect, the present invention provides a lane line identification method based on lightweight edge calculation, including:
acquiring a last key frame image based on a last acquired global feature image, performing down-sampling processing according to the last key frame image to obtain a plurality of first feature maps with different sampling multiples, and forming a plurality of first feature map layers according to the first feature maps;
acquiring a current non-key frame image based on a currently acquired global feature image, and performing down-sampling processing according to the current non-key frame image to obtain a plurality of second feature maps and second feature map layers with different sampling multiples, wherein the second feature map layers comprise a feature map layer and a first feature map layer formed according to the second feature maps;
and calculating and outputting a grid classification map corresponding to the current global feature image according to the second feature image layer, and predicting a lane line based on the grid classification map.
Preferably, the down-sampling processing is performed according to the last key frame image to obtain a plurality of first feature maps with different sampling multiples, including:
and performing down-sampling processing according to the last key frame image to obtain first characteristic graphs with sampling multiples of 1 time, 2 times, 4 times, 8 times and 16 times.
Preferably, forming a plurality of first feature layers according to the first feature map includes:
performing down-sampling processing on the first feature maps, outputting the last layer of each first feature map as a first target feature map layer for convolution calculation, and obtaining a first reference feature map layer after convolution calculation;
and after the first feature map with the lowest sampling multiple is eliminated, the first reference feature map layers corresponding to other first feature maps are used as first feature map layers.
Preferably, the sampling multiples of the first feature map layer include 4 times, 8 times, 16 times and 32 times.
Preferably, forming a plurality of first feature layers according to the first feature map includes:
sequencing the first feature maps from high to low according to sampling multiples, and performing convolution calculation on the first feature map with the highest sampling multiple to obtain a first feature map layer on the top layer;
and the first feature layer of the top layer is used as a first upper-stage first feature layer, the upper-stage first feature layer is subjected to up-sampling and then added with a lower-stage first feature map of a corresponding level of the upper-stage first feature layer to obtain a first target feature layer, and the first target feature layer is subjected to convolution calculation to obtain the first feature layer of the lower-stage first feature map.
Preferably, the upsampling method is nearest neighbor interpolation.
Preferably, the down-sampling processing is performed according to the current non-key frame image to obtain a plurality of second feature maps with different sampling multiples, including: and performing down-sampling processing according to the current non-key frame image to obtain second feature maps with sampling multiples of 1 time, 2 times, 4 times, 8 times and 16 times.
Preferably, the down-sampling processing is performed according to the current non-key frame image to obtain a plurality of second feature layers with different sampling multiples, and the method includes:
performing down-sampling processing on the second feature maps, outputting the last layer of each second feature map as a second target feature map layer for convolution calculation, and obtaining a second reference feature map layer after the convolution calculation;
and after the second feature map with the lowest sampling multiple is eliminated, taking the second reference feature map layers corresponding to other second feature maps as second feature map layers.
Preferably, the down-sampling processing is performed according to the current non-key frame image to obtain a plurality of second feature layers with different sampling multiples, and the method includes:
sequencing the second feature maps from high to low according to the sampling multiples, and performing convolution calculation on the second feature map with the highest sampling multiple to obtain a top second feature map layer;
the second feature layer of the top layer is used as a first previous second feature layer, the previous second feature layer is subjected to up-sampling and then added with a next second feature map of a corresponding level of the previous second feature layer to obtain a second target feature layer, and the second target feature layer is subjected to convolution calculation to obtain a second feature layer of the next second feature map;
and replacing the first characteristic layer corresponding to the first characteristic graph with the sampling multiple of 8 times and 16 times with the second characteristic layer corresponding to the second characteristic graph with the sampling multiple of 8 times and 16 times to form a new second characteristic graph layer.
Preferably, the predicting the lane line based on the grid classification map specifically includes predicting the lane line based on the grid classification map and a formula one, where the formula one is:
wherein C is the set maximum number of lanes, and h is the set grid classification diagramThe number of line grids, w is the number of grid units of each line of the set grid classification graph, X is the global feature image, fijFor the classifier for selecting the lane position on the ith lane, the jth line mesh, P i,jThe probability of selecting (w +1) grid cell is the ith lane line, the jth row grid.
In a second aspect, the present application provides a lane line identification device based on lightweight edge calculation, including:
a first feature acquisition module: the system comprises a last key frame image acquisition unit, a first feature map generation unit, a second feature map generation unit and a second feature map generation unit, wherein the last key frame image acquisition unit is used for acquiring a last key frame image based on a last acquired global feature image, performing down-sampling processing according to the last key frame image to obtain a plurality of first feature maps with different sampling multiples, and forming a plurality of first feature maps according to the first feature maps;
the second characteristic acquisition module: the image processing device is used for acquiring a current non-key frame image based on a currently acquired global feature image, and performing down-sampling processing according to the current non-key frame image to obtain a plurality of second feature images and second feature layers with different sampling multiples, wherein the second feature layers comprise a feature layer and a first feature layer formed according to the second feature images;
lane line identification module: and the system is used for calculating and outputting a grid classification map corresponding to the current global feature image according to the second feature image layer, and predicting a lane line based on the grid classification map.
Preferably, the down-sampling processing is performed according to the last key frame image to obtain a plurality of first feature maps with different sampling multiples, including:
and performing down-sampling processing according to the last key frame image to obtain first characteristic graphs with sampling multiples of 1 time, 2 times, 4 times, 8 times and 16 times.
Preferably, forming a plurality of first feature layers according to the first feature map includes:
performing down-sampling processing on the first feature maps, outputting the last layer of each first feature map as a first target feature map layer for convolution calculation, and obtaining a first reference feature map layer after convolution calculation;
and after the first feature map with the lowest sampling multiple is eliminated, the first reference feature map layers corresponding to other first feature maps are used as first feature map layers.
Preferably, the sampling multiples of the first feature map layer include 4 times, 8 times, 16 times and 32 times.
Preferably, forming a plurality of first feature layers according to the first feature map includes:
sequencing the first feature maps from high to low according to sampling multiples, and performing convolution calculation on the first feature map with the highest sampling multiple to obtain a first feature map layer on the top layer;
and the first feature layer of the top layer is used as a first upper-stage first feature layer, the upper-stage first feature layer is subjected to up-sampling and then added with a lower-stage first feature map of a corresponding level of the upper-stage first feature layer to obtain a first target feature layer, and the first target feature layer is subjected to convolution calculation to obtain the first feature layer of the lower-stage first feature map.
Preferably, the upsampling method is nearest neighbor interpolation.
Preferably, the down-sampling processing is performed according to the current non-key frame image to obtain a plurality of second feature maps with different sampling multiples, including: and performing down-sampling processing according to the current non-key frame image to obtain second feature maps with sampling multiples of 1 time, 2 times, 4 times, 8 times and 16 times.
Preferably, the down-sampling processing is performed according to the current non-key frame image to obtain a plurality of second feature layers with different sampling multiples, and the method includes:
performing down-sampling processing on the second feature maps, outputting the last layer of each second feature map as a second target feature map layer for convolution calculation, and obtaining a second reference feature map layer after the convolution calculation;
and after the second feature map with the lowest sampling multiple is eliminated, taking the second reference feature map layers corresponding to other second feature maps as second feature map layers.
Preferably, the down-sampling processing is performed according to the current non-key frame image to obtain a plurality of second feature layers with different sampling multiples, and the method includes:
sequencing the second feature maps from high to low according to the sampling multiples, and performing convolution calculation on the second feature map with the highest sampling multiple to obtain a top second feature map layer;
the second feature layer of the top layer is used as a first previous second feature layer, the previous second feature layer is subjected to up-sampling and then added with a next second feature map of a corresponding level of the previous second feature layer to obtain a second target feature layer, and the second target feature layer is subjected to convolution calculation to obtain a second feature layer of the next second feature map;
and replacing the first characteristic layer corresponding to the first characteristic graph with the sampling multiple of 8 times and 16 times with the second characteristic layer corresponding to the second characteristic graph with the sampling multiple of 8 times and 16 times to form a new second characteristic graph layer.
Preferably, the predicting the lane line based on the grid classification map specifically includes predicting the lane line based on the grid classification map and a formula one, where the formula one is:
wherein C is the set maximum lane number, h is the row grid number of the set grid classification diagram, w is the unit number of each row of grid of the set grid classification diagram, X is the global feature image, fijFor the classifier for selecting the lane position on the ith lane, the jth line mesh, P i,jThe probability of selecting (w +1) grid cell is the ith lane line, the jth row grid.
In a third aspect, the present application provides a computer device comprising a memory and one or more processors, the memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a lane line identification method based on lightweight edge calculation as described in any one of the first aspects of the present application.
In a fourth aspect, the present application provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the lane line identification method based on lightweight edge calculation according to any one of the first aspect of the present application.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, a plurality of first feature maps with different sampling multiples are obtained through down-sampling according to a last key frame image, a plurality of first feature map layers are formed according to the first feature maps, a plurality of second feature maps with different sampling multiples and a plurality of second feature map layers are obtained through down-sampling according to a current non-key frame image, wherein the second feature map layers are formed according to the second feature maps or the first feature map layers, grid classification maps corresponding to a current global feature image are calculated and output according to the second feature map layers, and lane lines are predicted based on the grid classification maps; the feature images of different sizes are obtained by performing down-sampling processing on the feature images of the whole image, and the high-resolution first feature image layer obtained by processing in the last acquired key frame image is converted into the second feature image layer, so that the calculation time and cost are greatly saved, the operation efficiency is improved, the real-time performance is achieved, the high-resolution details are reserved, and the accuracy of lane line identification in motion is improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The embodiment of the invention provides a lane line identification method and device based on lightweight edge calculation, computer equipment and a storage medium. According to the embodiment of the application, a plurality of first feature maps with different sampling multiples are obtained by performing down-sampling processing according to a last key frame image, a plurality of first feature map layers are formed according to the first feature maps, a plurality of second feature maps with different sampling multiples and a plurality of second feature map layers are obtained by performing down-sampling processing according to a current non-key frame image, wherein the second feature map layers are formed according to the second feature maps or the first feature map layers, a grid classification map corresponding to a current global feature image is calculated and output according to the second feature map layers, and lane lines are predicted based on the grid classification maps; the feature images of different sizes are obtained by performing down-sampling processing on the feature images of the whole image, and the high-resolution first feature image layer obtained by processing in the last acquired key frame image is converted into the second feature image layer, so that the calculation time and cost are greatly saved, the operation efficiency is improved, the real-time performance is achieved, the high-resolution details are reserved, and the accuracy of lane line identification in motion is improved.
The embodiment of the application provides a lane line identification calculation method on an embedded chip, wherein NVIDIA Jetson AGX Xavier is adopted by equipment, the edge equipment is provided with an integrated Volta GPU with a tensor core, a double-depth learning accelerator and a 32GB memory, the highest Volta can reach 32TeraOPS, in addition, Xavier is a unique framework which simultaneously supports FP16 and INT8 tensor cores in NVIDIA Jetson series and is required by TensorRT optimization. TensorRT is a deep learning inference optimizer for NVIDIA that provides hybrid precision support, optimal tensor placement, network layer fusion, and kernel specialization. One major component of using the TensorRT acceleration model is to quantify the model weights to INT8 or FP16 precision (also known as: fix-point). Since FP16 has a wider precision range than INT8, it achieves better precision at the cost of consuming more computation time, considering that the weights of different depth network components (network backbone, segmentation module, classification module, etc.) have different ranges, this speed and precision trade-off needs to be specifically analyzed according to the properties of different modules. The method converts each model component module into TensorRT independently, and explores the optimal combination between INT8 and FP16 weights, and accelerates to the maximum extent while maintaining accuracy.
To quantify each module in the model to INT8 accuracy, a calibration step is required in which TensorRT collects the histograms of each layer activation, generates several quantified distributions with different thresholds, and compares each quantified distribution to a reference distribution using KL-divergence. This step ensures that the model loses as little performance as possible when converted to INT8 precision. The method can achieve the best effect by using 50 or 100 images for calibration through experiments and combining the balance of two levels of precision and speed.
In an embodiment of the present invention, the lane lines represent a series of horizontal "row grids" (also referred to as "row anchors") located at predefined rows. To represent the location, the first step is meshing, on each row the location is divided into a number of cells, and the detection of lane lines can be described as selecting a particular cell on a predefined row grid, as shown in fig. 1.
The following are detailed below.
Fig. 2 shows a lane line identification method based on light-weight edge calculation according to an embodiment of the present disclosure, which may be implemented by a lane line identification device based on light-weight edge calculation, which may be implemented in hardware and/or software and integrated in a computer device.
The following description will be given taking as an example a lane line identification method based on lightweight edge calculation performed by a lane line identification apparatus based on lightweight edge calculation, which includes, with reference to fig. 2, the steps of:
101: the method comprises the steps of obtaining a last key frame image based on a last collected global feature image, conducting down-sampling processing according to the last key frame image to obtain a plurality of first feature maps with different sampling multiples, and forming a plurality of first feature map layers according to the first feature maps.
In the embodiment of the present application, the identification of the lane line is based on the global feature image acquired last time and the global feature image acquired this time, and is recorded as the currently acquired global feature image in this embodiment.
Specifically, a feature layer is finally formed for calculating and identifying the lane lines based on the combination and conversion of the key frame image of the global feature image acquired last time and the non-key frame image of the currently acquired global feature image. Compared with the depth semantic segmentation method, the depth semantic segmentation method selects the lane position in the preset line of the image by using the global features of the image, and does not segment each pixel of the lane line based on a local receptive field, so that the complex calculated amount for segmenting each pixel is avoided, and the inference speed can be effectively improved; the method utilizes global characteristics to predict, has a larger receiving domain than a segmentation formula, and can solve the problem of no visual clue by the mode; the method uses a Structural Loss (structured Loss) to utilize the prior information of smoothness, continuity and the like of the lane line, so as to ensure the continuity and correctness of the lane line; besides, the method further improves the calculation speed of the model on the circular edge equipment by combining the key frame characteristic information of the video and the quantization acceleration of the tensorRT.
In this embodiment, the capturing of the front road video may be implemented by a high definition camera, so for the edge computing device of this embodiment, the input video sequence is { Ii }, and the present embodiment rapidly and accurately predicts the position of each lane line in { yi = n (Ii) } in each frame image based on the video introduction.
Firstly, the characteristics of each frame of image are lifted through a deep convolutional network, and the characteristics are combined with a residual module in a residual gateway ResNet to lift the multi-scale characteristic diagram of the input image. Specifically, the previous key frame image is obtained, and downsampling is performed to obtain first feature maps with different sampling multiples, where the first feature map includes feature maps with the original image resolution and the sampling size 2-16 times, such as C1-C5 in fig. 4, that is, the first feature maps with the sampling multiples of 1 time, 2 times, 4 times, 8 times, and 16 times are obtained by performing downsampling according to the previous key frame image.
And then processing is carried out based on the first feature map to obtain a first feature map layer. Specifically, the method includes two stages, one of which is an Up-bottom path, and the first feature maps are subjected to down-sampling processing, and the last layer of each first feature map is output as a first target feature map layer to be subjected to convolution calculation, so that a first reference feature map layer after convolution calculation is obtained; and after the first feature map with the lowest sampling multiple is eliminated, the first reference feature map layers corresponding to other first feature maps are used as first feature map layers.
As shown in C1-C5 in fig. 4, each stage Up uses down-sampling with step size 2(step =2), the output size of the same network part is called the same stage (stage), and the last layer of feature map of each stage is selected as the corresponding layer number of Up-bottom path, and the element is added after 1 × 1 convolution. With 2-5 levels participating in prediction (because the semantic features of the first level are lower), output layers (ResNet last residual block layer) of conv2, conv3, conv4 and conv5 represented by { C2, C3, C4 and C5} in fig. 4 are taken as feature layers, and the downsampling multiples of the input pictures are {4, 8, 16 and 32}, respectively.
The second stage is that the Top-down path is linked with the jump connections (Top-down path and lateral connections), the first feature maps are sorted according to the sequence of the sampling multiple from high to low, and the first feature map with the highest sampling multiple is subjected to convolution calculation to obtain a first feature map layer at the Top layer;
and the first feature layer of the top layer is used as a first upper-stage first feature layer, the upper-stage first feature layer is subjected to up-sampling and then added with a lower-stage first feature map of a corresponding level of the upper-stage first feature layer to obtain a first target feature layer, and the first target feature layer is subjected to convolution calculation to obtain the first feature layer of the lower-stage first feature map.
In the above, the upsampling method is a nearest neighbor interpolation method. The process enlarges the small feature map of the top layer to the same size as the feature map of the upper stage (stage) by means of up-sampling. The semantic information of the feature map can be reserved to the maximum extent in the up-sampling process, so that the feature map with rich spatial information (high resolution and favorable positioning) corresponding to the bottom-up process is fused, and the feature map with good spatial information and strong semantic information is obtained. As shown in the P3-P7 layers in FIG. 4, the process specifically comprises: the C5 layer is firstly convolved by 1x1, the number of channels of the feature map is changed (d =256 in the method) to obtain a P5 layer, and the P5 layer is upsampled and added (each element at the same position in the feature map is directly added) to the feature map obtained by 1x1 convolving the C4 layer to obtain P4. This process is again repeated to obtain P3, respectively. The final P3, P4, P5 layer characteristics were obtained as input to the following modules.
102: the method comprises the steps of obtaining a current non-key frame image based on a currently collected global feature image, and carrying out down-sampling processing according to the current non-key frame image to obtain a plurality of second feature images and second feature layers with different sampling multiples, wherein the second feature layers are formed according to the second feature images or the first feature layers.
In this step, a specific manner of forming the second feature map is the same as the principle of the step shown in step 101, and specifically, the down-sampling processing is performed according to the current non-key frame image to obtain a plurality of second feature maps with different sampling multiples, including: and performing down-sampling processing according to the current non-key frame image to obtain second feature maps with sampling multiples of 1 time, 2 times, 4 times, 8 times and 16 times.
According to the current non-key frame image, performing down-sampling processing to obtain a plurality of second feature layers with different sampling multiples, wherein the method comprises the following steps: performing down-sampling processing on the second feature maps, outputting the last layer of each second feature map as a second target feature map layer for convolution calculation, and obtaining a second reference feature map layer after the convolution calculation; after the second feature map with the lowest sampling multiple is eliminated, the second reference feature map layers corresponding to other second feature maps are used as second feature map layers; and replacing the first characteristic layer corresponding to the first characteristic graph with the sampling multiple of 8 times and 16 times with the second characteristic layer corresponding to the second characteristic graph with the sampling multiple of 8 times and 16 times to form a new second characteristic graph layer.
As another stage, sequencing the second feature maps from high to low according to the sampling multiples, and performing convolution calculation on the second feature map with the highest sampling multiple to obtain a top-layer second feature map layer; the first feature layer of the top layer is used as a first upper second feature layer, the upper second feature layer is subjected to up-sampling and then added with a lower second feature map of a corresponding level of the upper second feature layer to obtain a second target feature layer, and the second target feature layer is subjected to convolution calculation to obtain a second feature layer of the lower second feature map; and replacing the first characteristic layer corresponding to the first characteristic graph with the sampling multiple of 8 times and 16 times with the second characteristic layer corresponding to the second characteristic graph with the sampling multiple of 8 times and 16 times to form a new second characteristic graph layer.
103: and calculating and outputting a grid classification map corresponding to the current global feature image according to the second feature image layer, and predicting a lane line based on the grid classification map.
In this step, with reference to fig. 1, fig. 2, and fig. 3, predicting the lane line based on the grid classification map specifically includes predicting the lane line based on the grid classification map and a formula one, where the formula one is:
wherein C is the set maximum lane number, h is the row grid number of the set grid classification diagram, w is the unit number of each row of grid of the set grid classification diagram, X is the global feature image, fijFor the classifier for selecting the lane position on the ith lane, the jth line mesh, P i,jProbability of selecting (w +1) grid cell for ith lane line, jth row grid. P i,jIs a vector of (w +1) dimensions. The method predicts the probability distribution of all positions on each line grid based on the global characteristics, so that the correct position can be selected according to the probability distribution. In fig. 1, a is the selected network, and b is the row grid where no lane line is detected.
In this embodiment, the grid classification module is used to identify lane lines, the grid classification module mainly focuses on the internal relationship between lane lines, and the grid classification module is an auxiliary feature aggregation method that combines global context features and local features, which is an auxiliary segmentation task that models local features using multi-scale features. The module is only activated in the model training stage and is not activated in the inference prediction stage, so that even if an additional segmentation task is added, the task does not influence the inference prediction speed of the method.
In the application, a partial feature transfer method is adopted, so that the quality of the converted features is improved on the premise of ensuring that the speed is not influenced. For key frames, the application calculates the corresponding first feature map layers of all the first feature maps, and for non-key frames, only one feature is calculated, including calculating all the high-resolution C1, C2, C3 and P3, and skipping the calculation of the low-resolution P4 and P5, while the feature of the resolution is directly converted from the P4 and P5 of the last key frame I k to obtain the approximate feature of the non-key frame, namely W4 and W5 in fig. 4, wherein the P6 and P7 are obtained by down-sampling W5. The calculated C3 signature and the transformed W4 signature were synthesized into P3, i.e., P3 = C3 + up (W4), where up (-) represents upsampling. Finally, we use the P3 feature to make the following module calculations. In this way, we can retain the high resolution details of the generated features unlike other methods, since the high resolution C3 features are computed rather than translated from the last key frame, and therefore no errors in motion estimation occur. Although the C1-C3 backbone features are computed for each frame (i.e., both key and non-key frames are computed), the most computationally intensive part of the backbone features are avoided because the computational costs at different stages of the feature pyramid network are highly unbalanced and are experimentally found: more than 66% of the computation in the Renet-101 backbone network is at C4, while the computation of the backbone network takes more than half of the inference time of the entire Renet-101. By only calculating the lower level of the characteristic pyramid and converting the rest parts, the model of the method can be greatly accelerated, and the real-time performance is achieved.
The model of the present application first encodes two frames of motion information into a two-dimensional motion stream, which is recorded as
The motion stream is then used to characterize the previous key frame I k
Performing a translation to align the non-key frame with the key frame to obtain a new feature of the non-key frame:
effectively estimating the motion of an object is the basis for accurate and fast feature transformation. In the existing method, flow guide characteristic transformation is generally directly executed, and an existing pixel-level optical flow network is adopted for motion estimation. For example, FlowNetS performs flow estimation in three stages: firstly, taking an original RGB picture frame as input, and calculating characteristic graphs of the original RGB picture frame in batch by using a neural network; then refining the feature subset through recursive upsampling and jump-connected feature mapping to generate a global feature (large motion amplitude) containing a high receptive field and a fine local feature (small motion amplitude) containing a low receptive field; finally, it usesThese characteristics are used to predict the final light flow pattern. In the method, in order to improve the computational efficiency and increase the processing frame rate, the original input picture frame is not processed by using the existing optical flow method, but the features calculated by a feature backbone network of a model are reused, the backbone network generates a set of semantically rich features, the method is called as a motion feature flow network, the feature stack is not calculated from the original RGB image input in the first stage, but the features from ResNet backbone (C3 layer in figure 4) are reused, and fewer convolutional layers are used, so that the calculation network of the feature flow processed by the method can achieve higher speed in the calculation, and the same colleagues can achieve better effects. Computing motion streams for two frames using a motion feature stream network
![Figure 622702DEST_PATH_IMAGE005](https://patentimages.storage.***apis.com/43/31/9f/0bc0999cf89805/622702DEST_PATH_IMAGE005.png)
Then, features are translated from key frames to non-key frames by inverse rectification: mapping each pixel x of the non-key frame I n to a pixel point x + δ x of the key frame I k, wherein
Then, by bilinear difference calculation:
where θ represents the weight of the bilinear difference at different spatial locations.
In the specific calculation of the step, loss function calculation is also carried out, wherein the loss function calculation comprises classification loss, structure loss and segmentation loss, and the classification loss adopts a formula
Where Lcls is the classification loss, lsr is the structural loss, Lseg is the segmentation loss, and α and β are the weighting coefficients of the corresponding losses.
In the network classification module, assuming T i, j,: is a single label of the correct location, expressed in the form of a one-hot vector, the training optimization objective (i.e., the loss function) of the network classification module is as follows, using the standard cross entropy loss:
the difference between the computational complexity of the above formula and the image segmentation based method is shown in fig. 3. Assuming an image size of H W, the predefined number of line meshes and the number of mesh cells per mesh are typically much smaller than the size of the image, i.e., H is much smaller than H and W +1 is much smaller than W. The image segmentation-based method needs to perform H × W classification of C +1 dimension (C is the number of output class channels), and the function only needs to solve C × H classification of W +1 dimension. The computational complexity of the above-mentioned loss function is C × H × (W +1), while the computational complexity of the segmentation algorithm is H × W × (C +1), the former being much smaller than the latter. For example, with the common setting of the CULane data set, the calculation amount of the method is 1.7 × 104, and the calculation amount of the image segmentation is 1.15 × 106, which can be seen that the method is much simpler than the common segmentation method, and the calculation speed is obviously improved.
The structural loss consists of a similarity loss and a position loss, and the loss function of the similarity loss is derived from the property of continuity of lane lines, i.e. lane points in adjacent row meshes should be close to each other. In the method, the similarity loss function is called, and the formula is as follows:
wherein Pi, j is the predicted position of the jth line of the grid, and | | is a norm of L1. The loss function of position loss is emphasized on the shape of the lane lines. Most lane lines are straight in general, and even if the lane lines are curved lanes, the lane lines still show good continuity and smoothness. Since the second-order partial derivative value is 0 when the lane line approaches a straight line, the method uses the second-order partial derivative to constrain the shape of the lane. For any lane line index i and row grid index j, the location Loci, j may be represented as:
where Probi, j, k represents the probability of the ith lane line at the jth row grid, kth position. Pi, j,1: w is a w-dimensional vector, Probi j: represents the probability at each position, respectively. Because no background grid cells are included, the calculation range is only 1 to w. The benefits of this positioning method are two-fold: the first is that the expectation function is differentiable; the second is to recover the continuous position by discrete random variables. According to the above formula, the second order differential constraint loss can be expressed as:
where Loci, j is the ith lane line, the predicted position on the jth row grid.
Finally, the overall structural loss can be expressed as:
wherein the corresponding weight coefficients of two loss functions in the lambda structure loss function.
This embodiment uses cross entropy as an auxiliary segmentation penalty, Lseg is expressed as follows:
where yi represents label of pixel i, N ∈ [0, H × W],
Indicating the predicted value.
The present embodiment also provides a lane line recognition apparatus based on lightweight edge calculation, as shown in fig. 5, including:
the first feature acquisition module 501: the system comprises a last key frame image acquisition unit, a first feature map generation unit, a second feature map generation unit and a second feature map generation unit, wherein the last key frame image acquisition unit is used for acquiring a last key frame image based on a last acquired global feature image, performing down-sampling processing according to the last key frame image to obtain a plurality of first feature maps with different sampling multiples, and forming a plurality of first feature maps according to the first feature maps;
the second feature acquisition module 502: the image processing device is used for acquiring a current non-key frame image based on a currently acquired global feature image, and performing down-sampling processing according to the current non-key frame image to obtain a plurality of second feature images and second feature layers with different sampling multiples, wherein the second feature layers are formed according to the second feature images or formed according to the first feature layers;
lane line identification module 503: and the system is used for calculating and outputting a grid classification map corresponding to the current global feature image according to the second feature image layer, and predicting a lane line based on the grid classification map.
In another aspect, an embodiment of the present invention further provides a computer device, as shown in fig. 6, including a memory 601 and one or more processors 602, where the memory 601 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 602, the one or more processors implement the lane line identification method based on lightweight edge calculation according to any one of the present invention, the lane line identification method includes obtaining a last key frame image based on a last acquired global feature image, performing down-sampling processing according to the last key frame image to obtain first feature maps with several different sampling multiples, and forming multiple first feature map layers according to the first feature maps; acquiring a current non-key frame image based on a currently acquired global feature image, and performing down-sampling processing according to the current non-key frame image to obtain a plurality of second feature maps and second feature layers with different sampling multiples, wherein the second feature layers are formed according to the second feature maps or the first feature layers; and calculating and outputting a grid classification map corresponding to the current global feature image according to the second feature image layer, and predicting a lane line based on the grid classification map.
Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used to perform the lane line identification method based on lightweight edge calculation according to any one of the present invention.
Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.