CN116343157A - Deep learning extraction method for road surface cracks - Google Patents

Deep learning extraction method for road surface cracks Download PDF

Info

Publication number
CN116343157A
CN116343157A CN202310411298.4A CN202310411298A CN116343157A CN 116343157 A CN116343157 A CN 116343157A CN 202310411298 A CN202310411298 A CN 202310411298A CN 116343157 A CN116343157 A CN 116343157A
Authority
CN
China
Prior art keywords
model
pixel
training
road
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310411298.4A
Other languages
Chinese (zh)
Inventor
刘如飞
张轶
苏辕
来瑞鑫
赵帅
苏占文
许伟彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Science and Technology
Original Assignee
Shandong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Science and Technology filed Critical Shandong University of Science and Technology
Priority to CN202310411298.4A priority Critical patent/CN116343157A/en
Publication of CN116343157A publication Critical patent/CN116343157A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a deep learning extraction method for road surface cracks, which belongs to the field of road disease detection and comprises the following steps: continuously acquiring road surface images from a real high-speed scene by a vehicle-mounted high-resolution industrial camera; screening the pavement image to construct an HRRC data set; pre-training a fracture-segmented neural network; the precision and recall rate of each round in training are introduced into a PRF framework to carry out self-adaptive adjustment, so that the performance of the model is optimized; dividing the data set into two sub data sets according to the positive and negative sample proportion, and re-transmitting the sub data sets into a model for training; the invention solves the problem of unbalance of positive and negative samples by realizing the mutual balance between the precision and the recall rate in the crack segmentation model, improves the road crack segmentation performance of the convolutional neural network, and can realize the accurate segmentation and extraction of cracks in various expressway scenes. The invention can automatically identify road diseases, greatly improve detection efficiency, reduce artificial subjective influence and is suitable for large-scale long-distance road disease detection.

Description

Deep learning extraction method for road surface cracks
Technical Field
The invention relates to a deep learning extraction method for road surface cracks, and belongs to the technical field of road disease detection.
Background
With the continuous promotion of infrastructure construction in China, highway construction is continuously developed, and China already has the largest highway network in the world. The road network in China has the characteristics of long mileage, different road conditions, wide distribution and the like, is more important for road disease detection, can accurately obtain road diseases, can ensure life and property safety of people, and saves road maintenance cost. At present, pavement disease identification mainly depends on manual field investigation, has low identification efficiency and strong subjectivity, and breaks away from the development targets of the transportation department in the guidance opinion about the construction of novel infrastructure in the field of transportation.
Conventional Digital Image Processing (DIP) methods, such as Canny edge detector, wavelet transform and fracture index, have been developed for fracture segmentation. However, these algorithms focus on only a small fraction of pixels, lacking knowledge of the image content. In the past five years, the method of extracting road cracks through a convolutional neural network has gradually replaced the traditional method, and becomes a mainstream method of scientific research.
The deep learning method can better preserve the geometry of the detected crack pattern, and the prediction result thereof finally becomes more accurate than the threshold method. The existing semantic segmentation algorithm usually needs to manually adjust the proportion of positive and negative samples to optimize a model when solving the imbalance problem, and therefore, the invention provides a stable, universal and self-adaptive solution to solve the extreme imbalance problem.
Disclosure of Invention
Aiming at the defects in the prior art, the invention discloses a road pavement crack deep learning extraction method, and provides a PRF self-adaptive parameter, wherein a bridge is established between precision and recall rate, a sampling method and loss weight are continuously adjusted in the training process, and finally, the difference between the precision and the recall rate is reduced by spontaneously adjusting the balance between the precision and the recall rate, so that a better model is obtained, and the crack detection precision is improved.
The invention adopts the following technical scheme:
a deep learning extraction method for road surface cracks comprises the following steps:
s1: continuously acquiring road surface images from a real high-speed scene by a vehicle-mounted high-resolution industrial camera;
s2: screening the pavement image obtained in the step S1, and selecting pavement images containing cracks to construct a high-resolution pavement crack image dataset, namely an HRRC dataset;
s3: introducing the data set obtained in the step S2 into an HRNet model (namely an initial fracture segmentation model) for pre-training to obtain the accuracy rate and the recall rate of the evaluation index; the precision and recall rate of each round in training are introduced into a PRF framework to carry out self-adaptive adjustment, so that the performance of the model is optimized;
s4: introducing the accuracy and recall of the evaluation index into the evaluation adaptability Parameter (PRF), while improving BCELoss defines a new adaptive loss function DropLoss;
s5: dividing the HRRC data set into two sub data sets, and re-transmitting the sub data sets into a crack segmentation neural network for training according to PRF sampling to obtain a crack segmentation model;
s6: inputting the image to be detected into the crack segmentation model obtained in the step S5 to obtain a detection result.
Preferably, step S1 specifically includes:
the high-resolution industrial camera is fixed on a vehicle, the camera is always aligned with the ground, road surface images are continuously and uninterruptedly collected on a highway at the running speed of 80/km per hour and the frequency of 4 frames/s, and the collected road surface images comprise road shadows, road stains, road materials and the like so as to ensure data diversity.
Preferably, step S2 comprises the following sub-steps:
s21: manually screening the continuous pavement images acquired in the step S11, selecting pavement images containing cracks, and establishing an HRRC data set without labels;
s22: clipping the image of the HRRC dataset in step S21 to 1024 x 512, drawing the Crack outline along the Crack edge on the pixel level using the professional labeling software Labelme, and labeling as a Crack, creating the HRRC dataset comprising the image and the label;
s23: dividing the HRRC data set obtained in the step S22, wherein the ratio of the training set to the verification set to the test set is 8:1:1, wherein the training set and the verification set are used for model pre-training, and the test set is used for model verification.
Preferably, step S3 comprises the following sub-steps:
s31: introducing the HRRC data set obtained in the step S2 into a HRNet model for pre-training, inputting a minimum Batch-Size (the number of samples selected by one-time training) into the HRNet model for training to obtain model output, combining with a marked real label, and combining with a BCELoss Loss function to calculate a Loss value Loss;
BCELoss are defined as shown in formulas (1) and (2):
Figure BDA0004183293360000021
where i is the pixel index, pi is the probability that the predicted pixel i belongs to the front class, yi is the true probability that the pixel i belongs to the front class, and Li is the BCE loss value generated on the pixel i;
Figure BDA0004183293360000022
where N is the total number of pixels in the batch;
the HRNet model is composed of four layers, which can be divided into a Stem layer at the beginning of a network, a feature layer extracted in parallel, a stage layer responsible for semantic space information interaction and a final output layer. In the HRNet model, the stem layer acts against the base of one feature extraction, and initially the same residual structure as in resten is used to extract the features of the original feature map.
BCELoss loss function is a loss function for the two-classification problem, and is commonly used to calculate the difference between the neural network output value and the label. The physical meaning of the loss function is: minimizing cross entropy between the probability distribution of the labels and the predicted probability distribution can be understood as a negative log likelihood loss in a classification problem.
S32: during back propagation, corresponding gradient values can be obtained by deriving the BCE Loss, and then model parameters are updated by utilizing optimization algorithms such as gradient descent, so that the model is gradually fitted with training data, and the classification accuracy is improved;
continuously updating the gradient according to a back propagation algorithm of the network, and continuously and iteratively updating the weights of all parts by using a gradient descent method;
the principle of the back propagation algorithm is that the partial derivative of the loss function between the actual output result and the actual result on each weight parameter or bias term is calculated by utilizing a chained derivative rule, then the weight or bias term is reversely updated layer by layer according to an optimization algorithm, a training mode of forward-backward propagation is adopted, and the loss function is converged by continuously adjusting parameters in a model, so that an accurate model structure is constructed.
The back propagation algorithm can be divided into three steps:
(1) Forward propagation: inputting sample data into a network, and transmitting the data from an input layer to an output layer through layer-by-layer calculation to obtain a corresponding actual output result;
(2) Reversely calculating an error term of the layer L neuron i, wherein the error term represents the partial derivative of a loss function of the network on the output value of the neuron;
(3) Calculating gradients of each neuron parameter according to the optimization algorithm, and updating each parameter
The gradient descent method is an iterative algorithm for optimizing an objective function, wherein each step updates the weights according to the gradient of the objective function to the weights (or parameters). In machine learning, gradient descent methods are commonly used to train models such as neural networks.
In the gradient descent method, each iteration updates parameters in the model, including weights and bias terms in the model. Specifically, in the iterative process of the gradient descent method, the partial derivative of the objective function with respect to each parameter is calculated to obtain the gradient of each parameter, and then the value of the parameter is updated in the negative gradient direction so as to reduce the objective function as much as possible.
Thus, the partial weights of the gradient descent method iterations include all weights and bias terms in the model. In each iteration, the gradient descent method updates the values of all parameters according to the gradients of the objective function, so as to gradually optimize the model and improve the performance of the model;
the general formula for gradient descent method calculation is shown as follows:
Figure BDA0004183293360000041
where alpha is the learning rate, represents a number,
Figure BDA0004183293360000042
representing the theta of J j In gradient descent, to achieve simultaneous updating of θ 0 And theta 1
S33: repeating the steps S31 and S32 until the loss value is not reduced;
preferably, to objectively evaluate the performance of the network, the accuracy, recall, IOU index, and F1score are used to verify the performance of the semantic segmentation model, defined as follows:
Figure BDA0004183293360000043
Figure BDA0004183293360000044
Figure BDA0004183293360000045
Figure BDA0004183293360000046
wherein precision represents precision, recovery represents recall, IOU represents IOU index, F1score represents F1score, TP, FP and FN represent the number of true positive, false positive and false negative pixels, respectively;
the prediction probability map (probability map) is an image output by a model, in which the value of each pixel represents the probability that the pixel belongs to a certain class, a pixel having a prediction probability greater than 0.5 is determined as a positive example, and a pixel having a prediction probability equal to or less than 0.5 is determined as a negative example. In the image segmentation task, it is often necessary to convert the predictive probability map into a binary mask (binary mask) to distinguish whether each pixel in the image belongs to a positive example or a negative example.
Preferably, step S4 comprises the following sub-steps:
s41: the average value of the accuracy and recall obtained in the training process of the step S33 is stored in a list L [ R ] i ,P i ]Wherein R is i Represents the ith recall, P i Representing the ith precision, grouping the values in the list L according to each 3-group, calculating the average precision value and the average recall value of each group, and storing the average precision value and the average recall value into a new list Lg;
s42: the average precision and average recall rate obtained by grouping every 3-wheel groups in the step S41 are combined, the average recall rate is represented by an abscissa with the origin of coordinates as a (0, 0) point, the average precision is represented by an ordinate, a rectangular coordinate system is drawn, and the F point (M) is obtained by taking the average precision and the average recall rate as a coordinate axis R ,M P ) And calculating the azimuth angle OF the OF line through plane projection, wherein the azimuth angle calculation OF the OF line is shown as a formula (7):
Figure BDA0004183293360000047
wherein M is P Is the average value of the accuracy in the interval, M R Is the average value of recall rate in the interval;
s43: the OF line azimuth is subjected to cosine transformation to introduce an adaptive parameter RPF, which is defined as an evaluation adaptive parameter, and the specific definition is shown in a formula (8):
Figure BDA0004183293360000051
wherein cos represents a cosine function, alpha is calculated in S42 to obtain a cosine function interval which is positioned in [0, pi/2 ];
s44: an adaptive loss function is introduced to reduce loss, which is an improvement on BCELoss;
the loss weight and intensity coefficient β, which introduce a balance, are shown in equation (9):
β=|PRF|,β∈[0,1] (9)
removing the gradient caused by the positive sample when the recall rate exceeds 0.7, otherwise removing the positive sample;
Figure BDA0004183293360000052
wherein Ci is the loss value of the pixel i after the feature is lost;
in order to improve diversity, a random tensor r is introduced, the size of which is consistent with that of an input picture, the shape of the random tensor r is the same as that of a tensor label of an input neural network (namely a true value), and random numbers uniformly distributed on intervals [0,1] are filled; the calculation method of each Di in drop loss is as shown in (11):
D i =L i +C i ·r i (11)
wherein D is i Is the drop loss value, L, of the gradient after dropping over pixel i i Is the BCE loss value generated at pixel i, and r i A random tensor between 0 and 1;
finally, the expression of DropLoss for the adaptive loss function is shown in (12):
Figure BDA0004183293360000053
where i is the pixel index, pi is the probability that pixel i is predicted to belong to the front class, yi is the true probability that pixel i belongs to the front class, N is the total number of pixels in the batch, β is the intensity coefficient in equation (9), and ri is the random tensor defined in equation (11).
Preferably, step S5 comprises the following sub-steps:
s51: PRF sampling
Dividing the HRRC data set in the step S22 into two subsets according to the proportion of positive sample pixels of each image, namely the proportion of crack pixels of each image to the whole image, putting the image with the positive sample pixel proportion of more than 0.5% into a data set R, sampling the rest of the put data set P from the two subsets with different probabilities S, wherein the initial value of S is 0.1, and iterating the PRF to generate update, wherein the following formula (13) shows that:
Figure BDA0004183293360000061
where S is the probability of sampling from the dataset P;
the sampling strategy is as follows: before each sampling step, taking a random number R E [0,1], when R is less than S, the sample is from the data set P; otherwise, the sample is from dataset R;
s52: the data set P and the data set R are subjected to PRF sampling and then are transferred into the HRNet model again for training;
s53: inputting data of a minimum Batch-Size into the HRNet model each time to obtain model output, and calculating a Loss value by utilizing a self-adaptive Loss function Droploss of the step S4 in combination with a marked real label;
s54: continuously updating the gradient according to a back propagation algorithm of the network, and continuously and iteratively updating the weights of all parts by using a gradient descent method;
s55: the accuracy rate and recall rate of each round in the training process are transmitted back to the PRF to estimate the direction and the intensity, wherein the direction is the azimuth angle alpha, and the intensity is the self-adaptive parameter RPF, so that the PRF sampling and DropLoss are updated to enable the model to achieve the best performance;
s56: and stopping training when the loss value of the model is not reduced any more, and obtaining the final fracture segmentation model.
Preferably, step S6 comprises the following sub-steps:
s61: collecting images acquired by an industrial camera;
s62: and (3) carrying out prediction extraction on the image transmitted into the trained crack segmentation model, and outputting an image, namely classifying each pixel of the image to obtain a crack map.
The invention is not exhaustive and can be seen in the prior art.
The beneficial effects of the invention are as follows:
according to the road pavement crack deep learning extraction method, the PRF framework and the self-adaptive loss function are introduced, so that the adjustment precision and the recall rate can be effectively balanced. Spontaneous flow between precision and recall is enabled by the introduction of PRF, similar to temperature repeated conduction. During training, the degree of imbalance is dynamically evaluated (equation 8), the sampling rate is determined (equation 13), and the loss weights of the positive and negative features are adjusted (equation 12). Through the steps, a channel is established between precision and recall rate, balance is kept when the precision and the recall rate flow mutually, and high-precision low-recall rate phenomenon which is easy to occur in small sample identification is avoided. According to the invention, the PRF is used for spontaneously adjusting parameters in the training process, so that the training convergence speed is higher, and the model detection performance is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application.
FIG. 1 is a flow chart of the method for deep learning and extracting cracks of a road surface.
The specific embodiment is as follows:
in order to better understand the technical solutions in the present specification, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the implementation of the present specification, but not limited thereto, and the present invention is not fully described and is according to the conventional technology in the art.
Example 1
A method for deep learning and extracting cracks of a road surface, as shown in fig. 1, comprises the following steps:
s1: continuously acquiring road surface images from a real high-speed scene by a vehicle-mounted high-resolution industrial camera;
s2: screening the pavement image obtained in the step S1, and selecting pavement images containing cracks to construct a high-resolution pavement crack image dataset, namely an HRRC dataset;
s3: introducing the data set obtained in the step S2 into an HRNet model (namely an initial fracture segmentation model) for pre-training to obtain the accuracy rate and the recall rate of the evaluation index; the precision and recall rate of each round in training are introduced into a PRF framework to carry out self-adaptive adjustment, so that the performance of the model is optimized;
s4: introducing the accuracy and recall of the evaluation index into the evaluation adaptability Parameter (PRF), while improving BCELoss defines a new adaptive loss function DropLoss;
s5: dividing the HRRC data set into two sub data sets, and re-transmitting the sub data sets into a crack segmentation neural network for training according to PRF sampling to obtain a crack segmentation model;
s6: inputting the image to be detected into the crack segmentation model obtained in the step S5 to obtain a detection result.
Example 2
The method for deep learning and extracting the road surface cracks is as in embodiment 1, except that the step S1 specifically includes:
the high-resolution industrial camera is fixed on a vehicle, the camera is always aligned with the ground, road surface images are continuously and uninterruptedly collected on a highway at the running speed of 80/km per hour and the frequency of 4 frames/s, and the collected road surface images comprise road shadows, road stains, road materials and the like so as to ensure data diversity.
Example 3
The method for deep learning and extracting the road surface cracks is as described in embodiment 2, except that the step S2 includes the following sub-steps:
s21: manually screening the continuous pavement images acquired in the step S11, selecting pavement images containing cracks, and establishing an HRRC data set without labels;
s22: clipping the image of the HRRC dataset in step S21 to 1024 x 512, drawing the Crack outline along the Crack edge on the pixel level using the professional labeling software Labelme, and labeling as a Crack, creating the HRRC dataset comprising the image and the label;
s23: dividing the HRRC data set obtained in the step S22, wherein the ratio of the training set to the verification set to the test set is 8:1:1, wherein the training set and the verification set are used for model pre-training, and the test set is used for model verification.
Example 4
A method for deep learning and extracting cracks of a road pavement as in embodiment 3, wherein the step S3 includes the following sub-steps:
s31: introducing the HRRC data set obtained in the step S2 into a HRNet model for pre-training, inputting a minimum Batch-Size (the number of samples selected by one-time training) into the HRNet model for training to obtain model output, combining with a marked real label, and combining with a BCELoss Loss function to calculate a Loss value Loss;
BCELoss are defined as shown in formulas (1) and (2):
Figure BDA0004183293360000081
where i is the pixel index, pi is the probability that the predicted pixel i belongs to the front class, yi is the true probability that the pixel i belongs to the front class, and Li is the BCE loss value generated on the pixel i;
Figure BDA0004183293360000082
where N is the total number of pixels in the batch;
the HRNet model is composed of four layers, which can be divided into a Stem layer at the beginning of a network, a feature layer extracted in parallel, a stage layer responsible for semantic space information interaction and a final output layer. In the HRNet model, the stem layer acts against the base of one feature extraction, and initially the same residual structure as in resten is used to extract the features of the original feature map.
BCELoss loss function is a loss function for the two-classification problem, and is commonly used to calculate the difference between the neural network output value and the label. The physical meaning of the loss function is: minimizing cross entropy between the probability distribution of the labels and the predicted probability distribution can be understood as a negative log likelihood loss in a classification problem.
S32: during back propagation, corresponding gradient values can be obtained by deriving the BCE Loss, and then model parameters are updated by utilizing optimization algorithms such as gradient descent, so that the model is gradually fitted with training data, and the classification accuracy is improved;
continuously updating the gradient according to a back propagation algorithm of the network, and continuously and iteratively updating the weights of all parts by using a gradient descent method;
the principle of the back propagation algorithm is that the partial derivative of the loss function between the actual output result and the actual result on each weight parameter or bias term is calculated by utilizing a chained derivative rule, then the weight or bias term is reversely updated layer by layer according to an optimization algorithm, a training mode of forward-backward propagation is adopted, and the loss function is converged by continuously adjusting parameters in a model, so that an accurate model structure is constructed.
The back propagation algorithm can be divided into three steps:
(1) Forward propagation: inputting sample data into a network, and transmitting the data from an input layer to an output layer through layer-by-layer calculation to obtain a corresponding actual output result;
(2) Reversely calculating an error term of the layer L neuron i, wherein the error term represents the partial derivative of a loss function of the network on the output value of the neuron;
(3) Calculating gradients of each neuron parameter according to the optimization algorithm, and updating each parameter
The gradient descent method is an iterative algorithm for optimizing an objective function, wherein each step updates the weights according to the gradient of the objective function to the weights (or parameters). In machine learning, gradient descent methods are commonly used to train models such as neural networks.
In the gradient descent method, each iteration updates parameters in the model, including weights and bias terms in the model. Specifically, in the iterative process of the gradient descent method, the partial derivative of the objective function with respect to each parameter is calculated to obtain the gradient of each parameter, and then the value of the parameter is updated in the negative gradient direction so as to reduce the objective function as much as possible.
Thus, the partial weights of the gradient descent method iterations include all weights and bias terms in the model. In each iteration, the gradient descent method updates the values of all parameters according to the gradients of the objective function, so as to gradually optimize the model and improve the performance of the model;
the general formula for gradient descent method calculation is shown as follows:
Figure BDA0004183293360000091
where alpha is the learning rate, represents a number,
Figure BDA0004183293360000092
representing the theta of J j In gradient descent, to achieve simultaneous updating of θ 0 And theta 1
S33: repeating the steps S31 and S32 until the loss value is not reduced;
preferably, to objectively evaluate the performance of the network, the accuracy, recall, IOU index, and F1score are used to verify the performance of the semantic segmentation model, defined as follows:
Figure BDA0004183293360000093
Figure BDA0004183293360000094
Figure BDA0004183293360000095
Figure BDA0004183293360000101
wherein precision represents precision, recovery represents recall, IOU represents IOU index, F1score represents F1score, TP, FP and FN represent the number of true positive, false positive and false negative pixels, respectively;
the prediction probability map (probability map) is an image output by a model, in which the value of each pixel represents the probability that the pixel belongs to a certain class, a pixel having a prediction probability greater than 0.5 is determined as a positive example, and a pixel having a prediction probability equal to or less than 0.5 is determined as a negative example. In the image segmentation task, it is often necessary to convert the predictive probability map into a binary mask (binary mask) to distinguish whether each pixel in the image belongs to a positive example or a negative example.
Example 5
A method for deep learning and extracting cracks of a road pavement as in embodiment 4, except that the step S4 includes the following sub-steps:
s41: the average value of the accuracy and recall obtained in the training process of the step S33 is stored in a list L [ R ] i ,P i ]Wherein R is i Represents the ith recall, P i Representing the ith precision, grouping the values in the list L according to each 3-group, calculating the average precision value and the average recall value of each group, and storing the average precision value and the average recall value into a new list Lg;
s42: the average precision and average recall rate obtained by grouping every 3-wheel groups in the step S41 are represented by the abscissa with the origin of coordinates as the (0, 0) pointThe ratio, the ordinate represents the average precision, a rectangular coordinate system is drawn, and the F point (M) is obtained by taking the average precision and the average recall ratio as coordinate axes R ,M P ) And calculating the azimuth angle OF the OF line through plane projection, wherein the azimuth angle calculation OF the OF line is shown as a formula (7):
Figure BDA0004183293360000102
wherein M is P Is the average value of the accuracy in the interval, M R Is the average value of recall rate in the interval;
s43: the OF line azimuth is subjected to cosine transformation to introduce an adaptive parameter RPF, which is defined as an evaluation adaptive parameter, and the specific definition is shown in a formula (8):
Figure BDA0004183293360000103
wherein cos represents a cosine function, alpha is calculated in S42 to obtain a cosine function interval which is positioned in [0, pi/2 ];
s44: an adaptive loss function is introduced to reduce loss, which is an improvement on BCELoss;
the loss weight and intensity coefficient β, which introduce a balance, are shown in equation (9):
β=|PRF| ,β∈[0,1] (9)
removing the gradient caused by the positive sample when the recall rate exceeds 0.7, otherwise removing the positive sample;
Figure BDA0004183293360000111
wherein Ci is the loss value of the pixel i after the feature is lost;
in order to improve diversity, a random tensor r is introduced, the size of which is consistent with that of an input picture, the shape of the random tensor r is the same as that of a tensor label of an input neural network (namely a true value), and random numbers uniformly distributed on intervals [0,1] are filled; the calculation method of each Di in drop loss is as shown in (11):
D i =L i +C i ·r i (11)
wherein D is i Is the drop loss value, L, of the gradient after dropping over pixel i i Is the BCE loss value generated at pixel i, and r i A random tensor between 0 and 1;
finally, the expression of DropLoss for the adaptive loss function is shown in (12):
Figure BDA0004183293360000112
where i is the pixel index, pi is the probability that pixel i is predicted to belong to the front class, yi is the true probability that pixel i belongs to the front class, N is the total number of pixels in the batch, β is the intensity coefficient in equation (9), and ri is the random tensor defined in equation (11).
Example 6
As described in example 5, the difference in the method for deep learning and extracting the road surface crack is that step S5 includes the following sub-steps:
s51: PRF sampling
Dividing the HRRC data set in the step S22 into two subsets according to the proportion of positive sample pixels of each image, namely the proportion of crack pixels of each image to the whole image, putting the image with the positive sample pixel proportion of more than 0.5% into a data set R, sampling the rest of the put data set P from the two subsets with different probabilities S, wherein the initial value of S is 0.1, and iterating the PRF to generate update, wherein the following formula (13) shows that:
Figure BDA0004183293360000113
where S is the probability of sampling from the dataset P;
the sampling strategy is as follows: before each sampling step, taking a random number R E [0,1], when R is less than S, the sample is from the data set P; otherwise, the sample is from dataset R;
s52: the data set P and the data set R are subjected to PRF sampling and then are transferred into the HRNet model again for training;
s53: inputting data of a minimum Batch-Size into the HRNet model each time to obtain model output, and calculating a Loss value by utilizing a self-adaptive Loss function Droploss of the step S4 in combination with a marked real label;
s54: continuously updating the gradient according to a back propagation algorithm of the network, and continuously and iteratively updating the weights of all parts by using a gradient descent method;
s55: the accuracy rate and recall rate of each round in the training process are transmitted back to the PRF to estimate the direction and the intensity, wherein the direction is the azimuth angle alpha, and the intensity is the self-adaptive parameter RPF, so that the PRF sampling and DropLoss are updated to enable the model to achieve the best performance;
s56: and stopping training when the loss value of the model is not reduced any more, and obtaining the final fracture segmentation model.
Example 7
As described in example 6, the difference in the method for deep learning and extracting the road surface crack is that step S6 includes the following sub-steps:
s61: collecting images acquired by an industrial camera;
s62: and (3) carrying out prediction extraction on the image transmitted into the trained crack segmentation model, and outputting an image, namely classifying each pixel of the image to obtain a crack map.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (8)

1. The deep learning extraction method for the road surface cracks is characterized by comprising the following steps of:
s1: continuously acquiring road surface images from a real high-speed scene by a vehicle-mounted high-resolution industrial camera;
s2: screening the pavement image obtained in the step S1, and selecting pavement images containing cracks to construct a high-resolution pavement crack image dataset, namely an HRRC dataset;
s3: introducing the data set obtained in the step S2 into an HRNet model for pre-training to obtain the accuracy rate and recall rate of the evaluation index;
s4: introducing the precision and recall of the evaluation index into the evaluation adaptability parameters, and simultaneously improving BCELoss to define a new self-adaptive loss function DropLoss;
s5: dividing the HRRC data set into two sub data sets, and transmitting the sub data sets into a fracture splitting neural network for training according to PRF sampling to obtain a fracture splitting model;
s6: inputting the image to be detected into the crack segmentation model obtained in the step S5 to obtain a detection result.
2. The method for deep learning and extracting a crack of a road pavement according to claim 1, wherein the step S1 is specifically:
the high-resolution industrial camera is fixed on a vehicle, the camera is always aligned to the ground, road images are continuously and uninterruptedly collected on a highway at the running speed of 80/km per hour and the frequency of 4 frames/s, and the collected road images comprise road shadows, road stains and road materials so as to ensure data diversity.
3. The method for deep learning extraction of a crack in a road pavement according to claim 1, wherein the step S2 comprises the following sub-steps:
s21: manually screening the continuous pavement images acquired in the step S11, selecting pavement images containing cracks, and establishing an HRRC data set without labels;
s22: clipping the image of the HRRC dataset in step S21 to 1024 x 512 and drawing the Crack outline along the Crack edge on the pixel level using labeling software Labelme, and labeling as a mask, establishing the HRRC dataset comprising the image and the label;
s23: dividing the HRRC data set obtained in the step S22, wherein the ratio of the training set to the verification set to the test set is 8:1:1, wherein the training set and the verification set are used for model pre-training, and the test set is used for model verification.
4. The method for deep learning extraction of road surface cracks according to claim 3, wherein the step S3 comprises the following sub-steps:
s31: importing the HRRC data set obtained in the step S2 into a HRNet model for pre-training, inputting a minimum Batch-Size into the HRNet model for training to obtain model output, combining a marked real label and combining a BCELoss loss function to calculate a loss value BCELoss;
BCELoss are defined as shown in formulas (1) and (2):
L i =[y i ·log(P i )+(1-y i )·log(1-P i )] (1)
where i is the pixel index, pi is the probability that the predicted pixel i belongs to the front class, yi is the true probability that the pixel i belongs to the front class, and Li is the BCE loss value generated on the pixel i;
Figure FDA0004183293350000021
where N is the total number of pixels in the batch;
s32: during back propagation, corresponding gradient values can be obtained by deriving the BCE Loss, and then model parameters are updated by utilizing optimization algorithms such as gradient descent, so that the model is gradually fitted with training data, and the classification accuracy is improved;
continuously updating the gradient according to a back propagation algorithm of the network, and continuously and iteratively updating the weights of all parts by using a gradient descent method;
s33: steps S31 and S32 are repeated until the loss value is no longer reduced.
5. The method according to claim 4, wherein for objectively evaluating the performance of a network, the accuracy, recall, IOU index, and F1score are used to verify the performance of a semantic segmentation model, defined as follows:
Figure FDA0004183293350000022
Figure FDA0004183293350000023
Figure FDA0004183293350000024
Figure FDA0004183293350000025
wherein precision represents precision, recovery represents recall, IOU represents IOU index, F1score represents F1score, TP, FP and FN represent the number of true positive, false positive and false negative pixels, respectively;
the predictive probability map is an image output by a model, in which the value of each pixel represents the probability that the pixel belongs to a certain class, a pixel with a predictive probability greater than 0.5 is determined as a positive example, and a pixel with a predictive probability less than or equal to 0.5 is determined as a negative example.
6. The method for deep learning extraction of road surface cracks according to claim 5, wherein step S4 comprises the sub-steps of:
s41: the average value of the accuracy and recall obtained in the training process of the step S33 is stored in a list L [ R ] i ,P i ]Wherein R is i Represents the ith recall, P i Representing the ith precision, grouping the values in the list L according to each 3-group, calculating the average precision value and the average recall value of each group, and storing the average precision value and the average recall value into a new list Lg;
s42: the average precision and average recall rate obtained by grouping every 3-wheel groups in the step S41 are represented by the abscissa, the average recall rate is represented by the ordinate, the average precision is represented by the ordinate, and the drawing is carried out by taking the origin of coordinates as the (0, 0) pointAnd (3) preparing a rectangular coordinate system, and obtaining the F point (M) by taking the average precision and the average recall rate as coordinate axes R ,M P ) And calculating the azimuth angle OF the OF line through plane projection, wherein the azimuth angle calculation OF the OF line is shown as a formula (7):
Figure FDA0004183293350000031
wherein M is P Is the average value of the accuracy in the interval, M R Is the average value of recall rate in the interval;
s43: the OF line azimuth is subjected to cosine transformation to introduce an adaptive parameter RPF, which is defined as an evaluation adaptive parameter, and the specific definition is shown in a formula (8):
Figure FDA0004183293350000032
wherein cos represents a cosine function, alpha is calculated in S42 to obtain a cosine function interval which is positioned in [0, pi/2 ];
s44: an adaptive loss function is introduced to reduce loss, which is an improvement on BCELoss;
the loss weight and intensity coefficient β, which introduce a balance, are shown in equation (9):
β=|PRF|,β∈[0,1] (9)
removing the gradient caused by the positive sample when the recall rate exceeds 0.7, otherwise removing the positive sample;
Figure FDA0004183293350000033
wherein Ci is the loss value of the pixel i after the feature is lost;
in order to improve diversity, a random tensor r is introduced, the size of the tensor is consistent with that of an input picture, the shape of the random tensor r is the same as that of a tensor label of an input neural network, and random numbers uniformly distributed on intervals [0,1] are filled; the calculation method of each Di in drop loss is as shown in (11):
D i =L i +C i ·r i (11)
wherein D is i Is the drop loss value, L, of the gradient after dropping over pixel i i Is the BCE loss value generated at pixel i, and r i A random tensor between 0 and 1;
finally, the expression of DropLoss for the adaptive loss function is shown in (12):
Figure FDA0004183293350000034
where i is the pixel index, pi is the probability that pixel i is predicted to belong to the front class, yi is the true probability that pixel i belongs to the front class, N is the total number of pixels in the batch, β is the intensity coefficient in equation (9), and ri is the random tensor defined in equation (11).
7. The method for deep learning extraction of road surface cracks according to claim 6, wherein step S5 comprises the sub-steps of:
s51: PRF sampling
Dividing the HRRC data set in the step S22 into two subsets according to the proportion of positive sample pixels of each image, namely the proportion of crack pixels of each image to the whole image, putting the image with the positive sample pixel proportion of more than 0.5% into a data set R, sampling the rest of the put data set P from the two subsets with different probabilities S, wherein the initial value of S is 0.1, and iterating the PRF to generate update, wherein the following formula (13) shows that:
Figure FDA0004183293350000041
where S is the probability of sampling from the dataset P;
the sampling strategy is as follows: before each sampling step, taking a random number R E [0,1], when R is less than S, the sample is from the data set P; otherwise, the sample is from dataset R;
s52: the data set P and the data set R are subjected to PRF sampling and then are transferred into the HRNet model again for training;
s53: inputting data of a minimum Batch-Size into the HRNet model each time to obtain model output, and calculating a Loss value by utilizing a self-adaptive Loss function Droploss of the step S4 in combination with a marked real label;
s54: continuously updating the gradient according to a back propagation algorithm of the network, and continuously and iteratively updating the weights of all parts by using a gradient descent method;
s55: the accuracy rate and recall rate of each round in the training process are transmitted back to the PRF to estimate the direction and the intensity, wherein the direction is the azimuth angle alpha, and the intensity is the self-adaptive parameter RPF, so that the PRF sampling and DropLoss are updated to enable the model to achieve the best performance;
s56: and stopping training when the loss value of the model is not reduced any more, and obtaining the final fracture segmentation model.
8. The method for deep learning extraction of a pavement crack of a road according to claim 7, wherein the step S6 comprises the sub-steps of:
s61: collecting images acquired by an industrial camera;
s62: and (3) carrying out prediction extraction on the image transmitted into the trained crack segmentation model, and outputting an image, namely classifying each pixel of the image to obtain a crack map.
CN202310411298.4A 2023-04-17 2023-04-17 Deep learning extraction method for road surface cracks Pending CN116343157A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310411298.4A CN116343157A (en) 2023-04-17 2023-04-17 Deep learning extraction method for road surface cracks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310411298.4A CN116343157A (en) 2023-04-17 2023-04-17 Deep learning extraction method for road surface cracks

Publications (1)

Publication Number Publication Date
CN116343157A true CN116343157A (en) 2023-06-27

Family

ID=86882427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310411298.4A Pending CN116343157A (en) 2023-04-17 2023-04-17 Deep learning extraction method for road surface cracks

Country Status (1)

Country Link
CN (1) CN116343157A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117542012A (en) * 2024-01-09 2024-02-09 武汉易为泰汽车技术开发股份有限公司 New energy automobile control method and system based on 5G short-slice private network transmission
CN117975374A (en) * 2024-03-29 2024-05-03 山东天意机械股份有限公司 Intelligent visual monitoring method for double-skin wall automatic production line

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117542012A (en) * 2024-01-09 2024-02-09 武汉易为泰汽车技术开发股份有限公司 New energy automobile control method and system based on 5G short-slice private network transmission
CN117542012B (en) * 2024-01-09 2024-04-12 武汉易为泰汽车技术开发股份有限公司 New energy automobile control method and system based on 5G short-slice private network transmission
CN117975374A (en) * 2024-03-29 2024-05-03 山东天意机械股份有限公司 Intelligent visual monitoring method for double-skin wall automatic production line

Similar Documents

Publication Publication Date Title
CN110163110B (en) Pedestrian re-recognition method based on transfer learning and depth feature fusion
CN110070008B (en) Bridge disease identification method adopting unmanned aerial vehicle image
CN107609525B (en) Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy
CN116343157A (en) Deep learning extraction method for road surface cracks
CN110008854B (en) Unmanned aerial vehicle image highway geological disaster identification method based on pre-training DCNN
CN114092832B (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN110276264B (en) Crowd density estimation method based on foreground segmentation graph
CN112884791B (en) Method for constructing large-scale remote sensing image semantic segmentation model training sample set
CN112347970B (en) Remote sensing image ground object identification method based on graph convolution neural network
CN111563893A (en) Grading ring defect detection method, device, medium and equipment based on aerial image
CN111860596A (en) Unsupervised pavement crack classification method based on deep learning and model establishment method
CN112613350A (en) High-resolution optical remote sensing image airplane target detection method based on deep neural network
CN108154158B (en) Building image segmentation method for augmented reality application
CN112766334A (en) Cross-domain image classification method based on pseudo label domain adaptation
CN116206185A (en) Lightweight small target detection method based on improved YOLOv7
CN115937626B (en) Automatic generation method of paravirtual data set based on instance segmentation
CN115457044B (en) Pavement crack segmentation method based on class activation mapping
CN111695640A (en) Foundation cloud picture recognition model training method and foundation cloud picture recognition method
CN114998251A (en) Air multi-vision platform ground anomaly detection method based on federal learning
CN115512247A (en) Regional building damage grade assessment method based on image multi-parameter extraction
CN114022368A (en) Pavement disease data enhancement method based on generation of countermeasure network
CN115131747A (en) Knowledge distillation-based power transmission channel engineering vehicle target detection method and system
CN113077438B (en) Cell nucleus region extraction method and imaging method for multi-cell nucleus color image
CN111144462A (en) Unknown individual identification method and device for radar signals
CN109255794B (en) Standard part depth full convolution characteristic edge detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination