CN116343157A

CN116343157A - Deep learning extraction method for road surface cracks

Info

Publication number: CN116343157A
Application number: CN202310411298.4A
Authority: CN
Inventors: 刘如飞; 张轶; 苏辕; 来瑞鑫; 赵帅; 苏占文; 许伟彬
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-06-27

Abstract

The invention relates to a deep learning extraction method for road surface cracks, which belongs to the field of road disease detection and comprises the following steps: continuously acquiring road surface images from a real high-speed scene by a vehicle-mounted high-resolution industrial camera; screening the pavement image to construct an HRRC data set; pre-training a fracture-segmented neural network; the precision and recall rate of each round in training are introduced into a PRF framework to carry out self-adaptive adjustment, so that the performance of the model is optimized; dividing the data set into two sub data sets according to the positive and negative sample proportion, and re-transmitting the sub data sets into a model for training; the invention solves the problem of unbalance of positive and negative samples by realizing the mutual balance between the precision and the recall rate in the crack segmentation model, improves the road crack segmentation performance of the convolutional neural network, and can realize the accurate segmentation and extraction of cracks in various expressway scenes. The invention can automatically identify road diseases, greatly improve detection efficiency, reduce artificial subjective influence and is suitable for large-scale long-distance road disease detection.

Description

Deep learning extraction method for road surface cracks

Technical Field

The invention relates to a deep learning extraction method for road surface cracks, and belongs to the technical field of road disease detection.

Background

With the continuous promotion of infrastructure construction in China, highway construction is continuously developed, and China already has the largest highway network in the world. The road network in China has the characteristics of long mileage, different road conditions, wide distribution and the like, is more important for road disease detection, can accurately obtain road diseases, can ensure life and property safety of people, and saves road maintenance cost. At present, pavement disease identification mainly depends on manual field investigation, has low identification efficiency and strong subjectivity, and breaks away from the development targets of the transportation department in the guidance opinion about the construction of novel infrastructure in the field of transportation.

Conventional Digital Image Processing (DIP) methods, such as Canny edge detector, wavelet transform and fracture index, have been developed for fracture segmentation. However, these algorithms focus on only a small fraction of pixels, lacking knowledge of the image content. In the past five years, the method of extracting road cracks through a convolutional neural network has gradually replaced the traditional method, and becomes a mainstream method of scientific research.

The deep learning method can better preserve the geometry of the detected crack pattern, and the prediction result thereof finally becomes more accurate than the threshold method. The existing semantic segmentation algorithm usually needs to manually adjust the proportion of positive and negative samples to optimize a model when solving the imbalance problem, and therefore, the invention provides a stable, universal and self-adaptive solution to solve the extreme imbalance problem.

Disclosure of Invention

Aiming at the defects in the prior art, the invention discloses a road pavement crack deep learning extraction method, and provides a PRF self-adaptive parameter, wherein a bridge is established between precision and recall rate, a sampling method and loss weight are continuously adjusted in the training process, and finally, the difference between the precision and the recall rate is reduced by spontaneously adjusting the balance between the precision and the recall rate, so that a better model is obtained, and the crack detection precision is improved.

The invention adopts the following technical scheme:

a deep learning extraction method for road surface cracks comprises the following steps:

s1: continuously acquiring road surface images from a real high-speed scene by a vehicle-mounted high-resolution industrial camera;

s2: screening the pavement image obtained in the step S1, and selecting pavement images containing cracks to construct a high-resolution pavement crack image dataset, namely an HRRC dataset;

s3: introducing the data set obtained in the step S2 into an HRNet model (namely an initial fracture segmentation model) for pre-training to obtain the accuracy rate and the recall rate of the evaluation index; the precision and recall rate of each round in training are introduced into a PRF framework to carry out self-adaptive adjustment, so that the performance of the model is optimized;

s4: introducing the accuracy and recall of the evaluation index into the evaluation adaptability Parameter (PRF), while improving BCELoss defines a new adaptive loss function DropLoss;

s5: dividing the HRRC data set into two sub data sets, and re-transmitting the sub data sets into a crack segmentation neural network for training according to PRF sampling to obtain a crack segmentation model;

s6: inputting the image to be detected into the crack segmentation model obtained in the step S5 to obtain a detection result.

Preferably, step S1 specifically includes:

the high-resolution industrial camera is fixed on a vehicle, the camera is always aligned with the ground, road surface images are continuously and uninterruptedly collected on a highway at the running speed of 80/km per hour and the frequency of 4 frames/s, and the collected road surface images comprise road shadows, road stains, road materials and the like so as to ensure data diversity.

Preferably, step S2 comprises the following sub-steps:

s21: manually screening the continuous pavement images acquired in the step S11, selecting pavement images containing cracks, and establishing an HRRC data set without labels;

s22: clipping the image of the HRRC dataset in step S21 to 1024 x 512, drawing the Crack outline along the Crack edge on the pixel level using the professional labeling software Labelme, and labeling as a Crack, creating the HRRC dataset comprising the image and the label;

s23: dividing the HRRC data set obtained in the step S22, wherein the ratio of the training set to the verification set to the test set is 8:1:1, wherein the training set and the verification set are used for model pre-training, and the test set is used for model verification.

Preferably, step S3 comprises the following sub-steps:

s31: introducing the HRRC data set obtained in the step S2 into a HRNet model for pre-training, inputting a minimum Batch-Size (the number of samples selected by one-time training) into the HRNet model for training to obtain model output, combining with a marked real label, and combining with a BCELoss Loss function to calculate a Loss value Loss;

BCELoss are defined as shown in formulas (1) and (2):

where i is the pixel index, pi is the probability that the predicted pixel i belongs to the front class, yi is the true probability that the pixel i belongs to the front class, and Li is the BCE loss value generated on the pixel i;

where N is the total number of pixels in the batch;

the HRNet model is composed of four layers, which can be divided into a Stem layer at the beginning of a network, a feature layer extracted in parallel, a stage layer responsible for semantic space information interaction and a final output layer. In the HRNet model, the stem layer acts against the base of one feature extraction, and initially the same residual structure as in resten is used to extract the features of the original feature map.

BCELoss loss function is a loss function for the two-classification problem, and is commonly used to calculate the difference between the neural network output value and the label. The physical meaning of the loss function is: minimizing cross entropy between the probability distribution of the labels and the predicted probability distribution can be understood as a negative log likelihood loss in a classification problem.

S32: during back propagation, corresponding gradient values can be obtained by deriving the BCE Loss, and then model parameters are updated by utilizing optimization algorithms such as gradient descent, so that the model is gradually fitted with training data, and the classification accuracy is improved;

continuously updating the gradient according to a back propagation algorithm of the network, and continuously and iteratively updating the weights of all parts by using a gradient descent method;

the principle of the back propagation algorithm is that the partial derivative of the loss function between the actual output result and the actual result on each weight parameter or bias term is calculated by utilizing a chained derivative rule, then the weight or bias term is reversely updated layer by layer according to an optimization algorithm, a training mode of forward-backward propagation is adopted, and the loss function is converged by continuously adjusting parameters in a model, so that an accurate model structure is constructed.

The back propagation algorithm can be divided into three steps:

(1) Forward propagation: inputting sample data into a network, and transmitting the data from an input layer to an output layer through layer-by-layer calculation to obtain a corresponding actual output result;

(2) Reversely calculating an error term of the layer L neuron i, wherein the error term represents the partial derivative of a loss function of the network on the output value of the neuron;

(3) Calculating gradients of each neuron parameter according to the optimization algorithm, and updating each parameter

The gradient descent method is an iterative algorithm for optimizing an objective function, wherein each step updates the weights according to the gradient of the objective function to the weights (or parameters). In machine learning, gradient descent methods are commonly used to train models such as neural networks.

In the gradient descent method, each iteration updates parameters in the model, including weights and bias terms in the model. Specifically, in the iterative process of the gradient descent method, the partial derivative of the objective function with respect to each parameter is calculated to obtain the gradient of each parameter, and then the value of the parameter is updated in the negative gradient direction so as to reduce the objective function as much as possible.

Thus, the partial weights of the gradient descent method iterations include all weights and bias terms in the model. In each iteration, the gradient descent method updates the values of all parameters according to the gradients of the objective function, so as to gradually optimize the model and improve the performance of the model;

the general formula for gradient descent method calculation is shown as follows:

where alpha is the learning rate, represents a number,

representing the theta of J _j In gradient descent, to achieve simultaneous updating of θ ₀ And theta ₁ ；

S33: repeating the steps S31 and S32 until the loss value is not reduced;

preferably, to objectively evaluate the performance of the network, the accuracy, recall, IOU index, and F1score are used to verify the performance of the semantic segmentation model, defined as follows:

wherein precision represents precision, recovery represents recall, IOU represents IOU index, F1score represents F1score, TP, FP and FN represent the number of true positive, false positive and false negative pixels, respectively;

the prediction probability map (probability map) is an image output by a model, in which the value of each pixel represents the probability that the pixel belongs to a certain class, a pixel having a prediction probability greater than 0.5 is determined as a positive example, and a pixel having a prediction probability equal to or less than 0.5 is determined as a negative example. In the image segmentation task, it is often necessary to convert the predictive probability map into a binary mask (binary mask) to distinguish whether each pixel in the image belongs to a positive example or a negative example.

Preferably, step S4 comprises the following sub-steps:

s41: the average value of the accuracy and recall obtained in the training process of the step S33 is stored in a list L [ R ] _i ,P _i ]Wherein R is _i Represents the ith recall, P _i Representing the ith precision, grouping the values in the list L according to each 3-group, calculating the average precision value and the average recall value of each group, and storing the average precision value and the average recall value into a new list Lg;

s42: the average precision and average recall rate obtained by grouping every 3-wheel groups in the step S41 are combined, the average recall rate is represented by an abscissa with the origin of coordinates as a (0, 0) point, the average precision is represented by an ordinate, a rectangular coordinate system is drawn, and the F point (M) is obtained by taking the average precision and the average recall rate as a coordinate axis _R ，M _P ) And calculating the azimuth angle OF the OF line through plane projection, wherein the azimuth angle calculation OF the OF line is shown as a formula (7):

wherein M is _P Is the average value of the accuracy in the interval, M _R Is the average value of recall rate in the interval;

s43: the OF line azimuth is subjected to cosine transformation to introduce an adaptive parameter RPF, which is defined as an evaluation adaptive parameter, and the specific definition is shown in a formula (8):

wherein cos represents a cosine function, alpha is calculated in S42 to obtain a cosine function interval which is positioned in [0, pi/2 ];

s44: an adaptive loss function is introduced to reduce loss, which is an improvement on BCELoss;

the loss weight and intensity coefficient β, which introduce a balance, are shown in equation (9):

β＝|PRF|,β∈[0,1] (9)

removing the gradient caused by the positive sample when the recall rate exceeds 0.7, otherwise removing the positive sample;

wherein Ci is the loss value of the pixel i after the feature is lost;

in order to improve diversity, a random tensor r is introduced, the size of which is consistent with that of an input picture, the shape of the random tensor r is the same as that of a tensor label of an input neural network (namely a true value), and random numbers uniformly distributed on intervals [0,1] are filled; the calculation method of each Di in drop loss is as shown in (11):

D _i ＝L _i +C _i ·r _i (11)

wherein D is _i Is the drop loss value, L, of the gradient after dropping over pixel i _i Is the BCE loss value generated at pixel i, and r _i A random tensor between 0 and 1;

finally, the expression of DropLoss for the adaptive loss function is shown in (12):

where i is the pixel index, pi is the probability that pixel i is predicted to belong to the front class, yi is the true probability that pixel i belongs to the front class, N is the total number of pixels in the batch, β is the intensity coefficient in equation (9), and ri is the random tensor defined in equation (11).

Preferably, step S5 comprises the following sub-steps:

s51: PRF sampling

Dividing the HRRC data set in the step S22 into two subsets according to the proportion of positive sample pixels of each image, namely the proportion of crack pixels of each image to the whole image, putting the image with the positive sample pixel proportion of more than 0.5% into a data set R, sampling the rest of the put data set P from the two subsets with different probabilities S, wherein the initial value of S is 0.1, and iterating the PRF to generate update, wherein the following formula (13) shows that:

where S is the probability of sampling from the dataset P;

the sampling strategy is as follows: before each sampling step, taking a random number R E [0,1], when R is less than S, the sample is from the data set P; otherwise, the sample is from dataset R;

s52: the data set P and the data set R are subjected to PRF sampling and then are transferred into the HRNet model again for training;

s53: inputting data of a minimum Batch-Size into the HRNet model each time to obtain model output, and calculating a Loss value by utilizing a self-adaptive Loss function Droploss of the step S4 in combination with a marked real label;

s54: continuously updating the gradient according to a back propagation algorithm of the network, and continuously and iteratively updating the weights of all parts by using a gradient descent method;

s55: the accuracy rate and recall rate of each round in the training process are transmitted back to the PRF to estimate the direction and the intensity, wherein the direction is the azimuth angle alpha, and the intensity is the self-adaptive parameter RPF, so that the PRF sampling and DropLoss are updated to enable the model to achieve the best performance;

s56: and stopping training when the loss value of the model is not reduced any more, and obtaining the final fracture segmentation model.

Preferably, step S6 comprises the following sub-steps:

s61: collecting images acquired by an industrial camera;

s62: and (3) carrying out prediction extraction on the image transmitted into the trained crack segmentation model, and outputting an image, namely classifying each pixel of the image to obtain a crack map.

The invention is not exhaustive and can be seen in the prior art.

The beneficial effects of the invention are as follows:

according to the road pavement crack deep learning extraction method, the PRF framework and the self-adaptive loss function are introduced, so that the adjustment precision and the recall rate can be effectively balanced. Spontaneous flow between precision and recall is enabled by the introduction of PRF, similar to temperature repeated conduction. During training, the degree of imbalance is dynamically evaluated (equation 8), the sampling rate is determined (equation 13), and the loss weights of the positive and negative features are adjusted (equation 12). Through the steps, a channel is established between precision and recall rate, balance is kept when the precision and the recall rate flow mutually, and high-precision low-recall rate phenomenon which is easy to occur in small sample identification is avoided. According to the invention, the PRF is used for spontaneously adjusting parameters in the training process, so that the training convergence speed is higher, and the model detection performance is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application.

FIG. 1 is a flow chart of the method for deep learning and extracting cracks of a road surface.

The specific embodiment is as follows:

in order to better understand the technical solutions in the present specification, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the implementation of the present specification, but not limited thereto, and the present invention is not fully described and is according to the conventional technology in the art.

Example 1

A method for deep learning and extracting cracks of a road surface, as shown in fig. 1, comprises the following steps:

Example 2

The method for deep learning and extracting the road surface cracks is as in embodiment 1, except that the step S1 specifically includes:

Example 3

The method for deep learning and extracting the road surface cracks is as described in embodiment 2, except that the step S2 includes the following sub-steps:

Example 4

A method for deep learning and extracting cracks of a road pavement as in embodiment 3, wherein the step S3 includes the following sub-steps:

BCELoss are defined as shown in formulas (1) and (2):

where N is the total number of pixels in the batch;

The back propagation algorithm can be divided into three steps:

where alpha is the learning rate, represents a number,

S33: repeating the steps S31 and S32 until the loss value is not reduced;

Example 5

A method for deep learning and extracting cracks of a road pavement as in embodiment 4, except that the step S4 includes the following sub-steps:

s42: the average precision and average recall rate obtained by grouping every 3-wheel groups in the step S41 are represented by the abscissa with the origin of coordinates as the (0, 0) pointThe ratio, the ordinate represents the average precision, a rectangular coordinate system is drawn, and the F point (M) is obtained by taking the average precision and the average recall ratio as coordinate axes _R ，M _P ) And calculating the azimuth angle OF the OF line through plane projection, wherein the azimuth angle calculation OF the OF line is shown as a formula (7):

β＝|PRF| ,β∈[0,1] (9)

wherein Ci is the loss value of the pixel i after the feature is lost;

D _i ＝L _i +C _i ·r _i (11)

Example 6

As described in example 5, the difference in the method for deep learning and extracting the road surface crack is that step S5 includes the following sub-steps:

s51: PRF sampling

where S is the probability of sampling from the dataset P;

Example 7

As described in example 6, the difference in the method for deep learning and extracting the road surface crack is that step S6 includes the following sub-steps:

s61: collecting images acquired by an industrial camera;

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. The deep learning extraction method for the road surface cracks is characterized by comprising the following steps of:

s3: introducing the data set obtained in the step S2 into an HRNet model for pre-training to obtain the accuracy rate and recall rate of the evaluation index;

s4: introducing the precision and recall of the evaluation index into the evaluation adaptability parameters, and simultaneously improving BCELoss to define a new self-adaptive loss function DropLoss;

s5: dividing the HRRC data set into two sub data sets, and transmitting the sub data sets into a fracture splitting neural network for training according to PRF sampling to obtain a fracture splitting model;

2. The method for deep learning and extracting a crack of a road pavement according to claim 1, wherein the step S1 is specifically:

the high-resolution industrial camera is fixed on a vehicle, the camera is always aligned to the ground, road images are continuously and uninterruptedly collected on a highway at the running speed of 80/km per hour and the frequency of 4 frames/s, and the collected road images comprise road shadows, road stains and road materials so as to ensure data diversity.

3. The method for deep learning extraction of a crack in a road pavement according to claim 1, wherein the step S2 comprises the following sub-steps:

s22: clipping the image of the HRRC dataset in step S21 to 1024 x 512 and drawing the Crack outline along the Crack edge on the pixel level using labeling software Labelme, and labeling as a mask, establishing the HRRC dataset comprising the image and the label;

4. The method for deep learning extraction of road surface cracks according to claim 3, wherein the step S3 comprises the following sub-steps:

s31: importing the HRRC data set obtained in the step S2 into a HRNet model for pre-training, inputting a minimum Batch-Size into the HRNet model for training to obtain model output, combining a marked real label and combining a BCELoss loss function to calculate a loss value BCELoss;

BCELoss are defined as shown in formulas (1) and (2):

L _i ＝[y _i ·log(P _i )+(1-y _i )·log(1-P _i )] (1)

where N is the total number of pixels in the batch;

s33: steps S31 and S32 are repeated until the loss value is no longer reduced.

5. The method according to claim 4, wherein for objectively evaluating the performance of a network, the accuracy, recall, IOU index, and F1score are used to verify the performance of a semantic segmentation model, defined as follows:

the predictive probability map is an image output by a model, in which the value of each pixel represents the probability that the pixel belongs to a certain class, a pixel with a predictive probability greater than 0.5 is determined as a positive example, and a pixel with a predictive probability less than or equal to 0.5 is determined as a negative example.

6. The method for deep learning extraction of road surface cracks according to claim 5, wherein step S4 comprises the sub-steps of:

s42: the average precision and average recall rate obtained by grouping every 3-wheel groups in the step S41 are represented by the abscissa, the average recall rate is represented by the ordinate, the average precision is represented by the ordinate, and the drawing is carried out by taking the origin of coordinates as the (0, 0) pointAnd (3) preparing a rectangular coordinate system, and obtaining the F point (M) by taking the average precision and the average recall rate as coordinate axes _R ，M _P ) And calculating the azimuth angle OF the OF line through plane projection, wherein the azimuth angle calculation OF the OF line is shown as a formula (7):

β＝|PRF|,β∈[0,1] (9)

wherein Ci is the loss value of the pixel i after the feature is lost;

in order to improve diversity, a random tensor r is introduced, the size of the tensor is consistent with that of an input picture, the shape of the random tensor r is the same as that of a tensor label of an input neural network, and random numbers uniformly distributed on intervals [0,1] are filled; the calculation method of each Di in drop loss is as shown in (11):

D _i ＝L _i +C _i ·r _i (11)

7. The method for deep learning extraction of road surface cracks according to claim 6, wherein step S5 comprises the sub-steps of:

s51: PRF sampling

where S is the probability of sampling from the dataset P;

8. The method for deep learning extraction of a pavement crack of a road according to claim 7, wherein the step S6 comprises the sub-steps of:

s61: collecting images acquired by an industrial camera;