CN112819762B

CN112819762B - Pavement crack detection method based on pseudo-twin dense connection attention mechanism

Info

Publication number: CN112819762B
Application number: CN202110087473.XA
Authority: CN
Inventors: 王彩玲; 陈良全; 蒋国平
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2022-10-18
Anticipated expiration: 2041-01-22
Also published as: CN112819762A

Abstract

The invention discloses a pavement crack detection method based on a pseudo-twin dense connection attention mechanism, which comprises the following steps of: s1, acquiring a data set; s2, preprocessing the pictures in the training set; s3, constructing a pseudo twin residual error network; s4, designing a loss function of a pseudo-twin residual network, training the pseudo-twin residual network until the loss function is converged, and storing a model; and S5, detecting the cracks of the picture in the test by using the model obtained in the step S4. The invention can effectively detect the detection result under the mixed background, namely the mixed data set, by improving the traditional Encoder-Decoder model; and the loss function is optimized, so that the method is more suitable for the pavement crack background.

Description

Pavement crack detection method based on pseudo-twin dense connection attention mechanism

Technical Field

The invention relates to the field of image segmentation, in particular to a pavement crack detection method based on a pseudo-twin dense connection attention mechanism.

Background

Along with the increase of the road surface area year by year, the manpower and the energy for detecting and maintaining the road surface also increase year by year, and the artificial detection of the road surface crack not only has certain errors, but also increases the danger of detection personnel when the detection is carried out on the road surface, so that the design of an automatic road surface crack detector is necessary. The purpose of automatic pavement crack detection is to output a detection result by inputting a pavement picture or a video sequence, although the traditional method can also realize automatic detection, the efficiency and the detection precision of the traditional method have defects all the time, most of research works focus on a pavement crack detection algorithm based on a deep learning method at present, and a deep learning model can obtain the segmentation or prediction of a crack distribution region through the training of a large number of data sets. In most cases, the distribution of the pavement cracks is unbalanced, for example, some cracks are distributed more finely, some cracks are thicker, some cracks are distributed in complex texture, and some cracks are very simple, so that the detector which needs to be designed can adapt to the distribution of most pavement cracks.

In road maintenance, it is necessary to use an automatic detection device based on deep learning, which can improve not only detection efficiency but also detection accuracy. The detection capability of the deep learning model requires a large number of data sets to train, and there are now a large number of open-source and heterogeneous road surface crack data sets from the open-source code project hosting platform GitHub. Relatively few studies are currently conducted on detection models for mixed fractures.

Disclosure of Invention

In view of the above, the present invention provides a road surface crack detection method based on a pseudo-twin dense connection attention mechanism. The method improves the traditional Encoder-Decoder model, so that the detection result under the mixed background, namely the mixed data set can be effectively detected, and the loss function is optimized, so that the method is more suitable for the pavement crack background.

In order to achieve the purpose, the invention provides the following technical scheme:

a pavement crack detection method based on a pseudo-twin dense connection attention mechanism is characterized by comprising the following steps:

s1, acquiring a data set, and dividing the data set into a training set and a test set;

s2, preprocessing the pictures in the training set;

s3, constructing a pseudo twin residual error network;

s4, designing a loss function of the pseudo-twin residual error network, wherein the loss function is obtained by weighting a focus loss function and an L1 regular loss, training the pseudo-twin residual error network until the loss function is converged, and storing a model;

and S5, detecting the cracks of the picture in the test by using the model obtained in the step S4.

Further, the data set includes: crack500, crack200, CFD, AEL and GAPs384.

Further, in the step S1, the acquiring a data set specifically includes:

searching an open-source database on a GitHub platform, wherein the search keywords are as follows: version, crack and detection; the project languages of the database are Python and C + +; the ordering is labeled most star.

Further, the step S2 includes: the pictures are divided into three types of coarse cracks, fine cracks and uniform cracks according to the distribution size of the cracks in the pictures, and the pictures are processed to be 480 × 320 in resolution.

Further, the step S3 includes: the pseudo-twin residual error network takes SegNet as a basic framework, and specifically comprises the following steps: the encoder network and the decoder network have the same structure, the structure of the encoder network is five convolution layers which are connected in sequence, the size of the first convolution layer is 3 x 64, the step length is 2, the size of the second convolution layer is 3 x 128, the step length is 2, the size of the third convolution layer is 3 x 256, the step length is 2, the size of the fourth convolution layer is 3 x 512, the step length is 2, the size of the fifth convolution layer is 3 x 512, and the step length is 2; all the Padding methods of the five-layer convolution layer are Valid.

Furthermore, an attention mechanism network is added on the basis of the basic frame, the attention mechanism network adopts five convolutional layers, and the attention mechanism network and the encoder network adopt pseudo-twin input to form a pseudo-twin network;

the attention mechanism network is used for generating an attention parameter, and the encoder network is used for extracting crack features;

the attention mechanism network is densely connected with the encoder network, each layer in the attention mechanism network generates an attention parameter, and the attention parameters weight each layer in the encoder network in a densely connected mode;

adding residual blocks to the decoder on the basis of the base frame, second

The layer residual block expression is:

in the formula (1), the first and second groups,

is shown as

The residual block of the layer is then determined,

represented as in the encoder

The characteristics of the layer(s) are,

is shown as in the decoder

The characteristics of the layer.

Further, the expression of the attention parameter is as follows:

in the formula (2), σ ₂ Denoted sigmoid activation function, P denotes resampling operation,

is shown in equation (3):

in the formula (3), the first and second groups of the compound,

is shown as

The input of the layer(s) is (are),

expressed as hyper-parameters for performing 1 x 1 convolution operations,

expressing the attention parameter characteristic equation, the expression is shown as formula (4):

in the formula (4), the first and second groups,

and

are respectively represented as

Weights and offsets of layers, BN _γ,β Denotes batch normalization processing, σ ₁ Indicating the Relu activation function.

Further, the expression of the loss function is:

in formula (5), T is expressed as a true segmentation map, P is expressed as a predicted segmentation map, α =0.7, L1 is L1 regular loss, and Tversky is expressed as a Tversky loss function.

The beneficial effects of the invention are:

the invention can effectively detect the detection result under the mixed background, namely the mixed data set, by improving the traditional Encoder-Decoder model; and the loss function is optimized, so that the method is more suitable for the pavement crack background.

Drawings

Fig. 1 is a flowchart of a road surface crack detection method provided in embodiment 1.

Fig. 2 is a schematic structural diagram of the pseudo-twin residual network provided in embodiment 1.

Fig. 3 is a flowchart of attention parameter generation in embodiment 1.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Referring to fig. 1 to 3, the present embodiment provides a pavement crack detection method based on a pseudo-twin dense connection attention mechanism, including the following steps:

specifically, in this implementation, an open source database is first searched on GitHub.

The search keyword is set to "vector credit detection", and for the selection of an item, the present embodiment selects an item (Python, C + +) marked in 2 languages as a keyword, and the sort mark is "most star".

Five public data sets were finally collected, respectively: crack500, crack200, CFD, AEL and GAPs384.

S2, preprocessing the pictures in the training set;

in particular, because each data set has its own characteristics, the data set needs to be preprocessed before use. Where Crack500 and Crack200 belong to the coarse fracture dataset, GAPs384 belong to the fine fracture dataset, and CFD and AEL belong to the homogeneous dataset. In addition, the picture size of each data set is different, and the present embodiment performs resize operation on all pictures through the PIL library in Python, all of which are set to 480 × 320 pixels.

Finally, this embodiment eliminates some of the marked inaccurate data and integrates them together in ascending order of numbers. For the training set and the test set, the present embodiment sets the ratio of training and test data to 9:1.

s3, constructing a pseudo twin residual error network;

specifically, the pseudo-twin residual error network takes SegNet as a basic framework, and specifically comprises the following steps: the method comprises the steps that a symmetrical encoder-decoder model structure is adopted, an encoder and a decoder are of the same structure and are symmetrically connected, the encoder adopts convolution and pooling operation, the decoder adopts reverse convolution and reverse pooling operation, encoder characteristics are restored layer by layer, and a binary segmentation graph with the number of output channels of the last layer of the decoder being 1 is obtained, namely a crack prediction result graph.

The structure of the encoder is five convolution layers which are connected in sequence, the size of the first convolution layer is 3 x 64, the step size is 2, the size of the second convolution layer is 3 x 128, the step size is 2, the size of the third convolution layer is 3 x 256, the step size is 2, the size of the fourth convolution layer is 3 x 512, the step size is 2, the size of the fifth convolution layer is 3 x 512, and the step size is 2; all the Padding methods for five-layer convolutional layers are Valid.

In this embodiment, on the basis of the basic framework, an attention mechanism network is added, which specifically includes:

the embodiment adopts a strategy of a pseudo-twin network, the twin network needs all directions to share the weight, and the network structure designed by the embodiment does not need to share the network weight. The pseudo-twin residual error network provided by this embodiment has two input ends, and the input ends are the same RGB fracture picture, where the encoder network is a main network and is used for extracting fracture features, the attention generating network is an auxiliary network and is mainly used for generating attention parameters, and the attention parameters of each layer not only act on the auxiliary current layer, but also serve as an auxiliary for feature extraction of all layers below.

With the convolution and pooling operations, the size of the feature map is continuously reduced, the number of channels is continuously increased, and meanwhile, the detail information of the feature map is lost layer by layer.

Wherein the attention parameter α is obtained by the following formula:

in the formula (1) to the formula (3),

is to

The input of the layer(s) is (are),

and

is that

Layer weight and deviation, sigma 2 refers to sigmoid activation function, P is resampling, so that the size of the current attention parameter conforms to the characteristic size of the current coding layer characteristic, otherwise, weighting operation cannot be carried out,

hyper-parameters for performing 1 x 1 convolution operations.

If the order of magnitude of the feature is too large, it will enter its saturation region early when passing the activation function, so the batch normalization BN is used _γ,β ，σ1 is the Relu activation function and,

is to note the parametric characteristic equation.

Is a feature of the transformed attention parameter.

Finally, the attention parameters are obtained by Softmax and max-posing, wherein the Sigmoid activation function is defined as follows:

in the attention mechanism network introduced by the embodiment, the parameters generated by each layer are not only used in the current layer, but all the layers are multiplexed later, and then the input feature graph F epsilon R of the encoder ^H*W*C H W is the size of the feature map, namely 480W 320C is initially set, C is the channel number of the feature map, and RGB three channels are initially set; feature map after each layer pass

Multiplexing the attention parameters of each layer also requires row rolling and pooling the feature matrix to fit the size of the current layer features. The final encoder portion forms a pseudo-twin intensive attention mechanism.

Through double input, the auxiliary channel guides the generation of the main channel characteristics, and attention parameters are multiplexed, so that the loss of detail information in the convolution and pooling processes of the crack characteristics can be reduced, namely in the generated crack detection picture, the edge information of the crack appears smooth and lacks of texture details.

In addition, this embodiment introduces a residual block in the decoder, second

The generation process of the layer residual value is as follows:

in the formula (5), the first and second groups of the chemical reaction materials are selected from the group consisting of,

is shown as

The residual block of the layer is then determined,

denoted as encoder

The characteristics of the layer(s) are,

expressed as decoder

The characteristics of the layer.

Due to the asymmetry of the network, the encoder layer characteristics need to be rolled and inverse pooled according to the decoder layer characteristics, and finally

The layer decoder is characterized by

The decoder features of the layer are generated together with the residual block.

More specifically, in this embodiment, the SegNet is selected as the basic framework, which is determined through experiments, specifically as follows:

in this embodiment, for a pseudo-twin residual network to be constructed, three basic frames are selected in advance, namely fusion net, unet and SegNet; obtaining data through a design experiment, and finally selecting SegNet as a basic frame through data comparison; the method comprises the following specific steps:

since the selection of the base framework does not involve the content of the design, only three sets of comparative experiments were designed on the CFD data set, and the experimental results are shown in table 1:

table 1: dice index of fusion Net, UNet and SegNet on CFD dataset

It can be seen from the table that SegNet as a base frame is more suitable for pavement crack backgrounds. Meanwhile, in this experiment, it is found that the detection accuracy of the detector in the table is general, and the detailed information of the crack is seriously lost, even the crack is broken between cracks, so the purpose of this embodiment is to grasp the detailed information of the crack while improving the detection accuracy.

Then, a SegNet network is used for testing three types of data sets, the test index is the MIoU index, and the experimental result is shown in table 2:

table 2: testing of SegNet on three types of data sets

S4, designing a loss function of the pseudo-twin residual error network, wherein the loss function is obtained by weighting a focus loss function and L1 regular loss, training the pseudo-twin residual error network until the loss function is converged, and storing a model;

in particular, the selection of the loss function plays a certain decisive role in the learning of the network. The traditional pavement crack detection network generally selects a cross entropy loss function, but the cross entropy loss function has great limitation on a data set with unbalanced data distribution, and the background of pavement crack data is generally noisy, so that the background can be detected as cracks by adopting the cross entropy loss function. That is, during network training, background features, such as low-contrast shadow portions, are continuously enlarged in the convolutional layer, which finally causes the detector to segment the background as well as the crack features.

Therefore, in this embodiment, in addition to using Tversky loss function instead of cross entropy loss function, L1 regular loss is also used to enhance the robustness of the network. Tverseky loss is defined as the generalized coefficients of the Dice coefficient and the Jaccard coefficient:

in equation (6), the Tversky loss is the Dice coefficient when α = β =0.5, and the javascript loss is the Jaccard coefficient when α = β = 1. We set α =1- β =0.7. The overall loss function is the weighting of Tversey loss and L1 regularization loss, and is defined as:

in equation (7), T is the true segmentation map, P is the predicted segmentation map, and the loss factor is set to α =0.7.

In this example, to verify the performance of the method, a comparative experiment was also performed, and the experimental results are shown in table 3.

Table 3: experimental results of comparative algorithm

Through the experiment, the method provided by the embodiment is superior to other detectors in precision, recall ratio and F1-measure, and the superiority of the pseudo-twin residual error network is shown.

The invention has the advantages of reducing the problems of crack detail information loss and crack fracture existing in the detection result of the current crack detector, thereby improving the detection precision.

The invention is not described in detail, but is well known to those skilled in the art.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions that can be obtained by a person skilled in the art through logical analysis, reasoning or limited experiments based on the prior art according to the concepts of the present invention should be within the scope of protection determined by the claims.

Claims

1. A pavement crack detection method based on a pseudo-twin dense connection attention mechanism is characterized by comprising the following steps:

s2, preprocessing the pictures in the training set;

s3, constructing a pseudo-twin residual error network;

the pseudo-twin residual error network takes SegNet as a basic framework, and specifically comprises the following steps: a symmetrical encoder network-decoder network structure is adopted, the encoder network and the decoder network have the same structure, the encoder network structure is five convolution layers which are sequentially connected, the size of the first convolution layer is 3 × 64, the step length is 2, the size of the second convolution layer is 3 × 128, the step length is 2, the size of the third convolution layer is 3 × 256, the step length is 2, the size of the fourth convolution layer is 3 × 512, the step length is 2, the size of the fifth convolution layer is 3 × 512, and the step length is 2; the Padding mode of the five-layer convolutional layer is Valid; adding an attention mechanism network on the basis of a basic frame, wherein the attention mechanism network adopts five convolutional layers, and the attention mechanism network and the encoder network adopt pseudo-twin input to form a pseudo-twin network;

adding residual blocks to the decoder on the basis of the base frame, second

The layer residual block expression is:

in the formula (1), is expressed as

The residual block of the layer is then determined,

represented as in an encoder

The characteristics of the layer(s) are,

is shown as in the decoder

A characteristic of the layer;

2. The method of claim 1, wherein the data set comprises: crack500, crack200, CFD, AEL and GAPs384.

3. The method for detecting the pavement crack based on the pseudo-twin dense connection attention mechanism as claimed in claim 1, wherein in the step S1, the acquiring the data set specifically comprises:

4. A pavement crack detection method based on a pseudo-twin dense-connection attention mechanism according to any one of claims 1-3, wherein the step S2 comprises: the pictures are divided into three types of coarse cracks, fine cracks and uniform cracks according to the distribution size of the cracks in the pictures, and the pictures are processed to be 480 × 320 in resolution.

5. The method for detecting the pavement crack based on the pseudo-twin dense-connection attention mechanism as claimed in any one of claims 1 to 3, wherein the expression of the attention parameter is as follows:

is shown in equation (3):

in the formula (3), the first and second groups,

is shown as

The input of the layer(s) is (are),

expressed as hyper-parameters for performing 1 x 1 convolution operations,

in the formula (4), W _i ^T And

are respectively represented as

6. The method for detecting the pavement crack based on the pseudo-twin dense connection attention mechanism as claimed in claim 5, wherein the expression of the loss function is as follows:

in equation (5), T is expressed as a true segmentation map, P is expressed as a predicted segmentation map, α =0.7, L1 is L1 regular loss, and Tversky is expressed as a Tversky loss function.