CN112700418A

CN112700418A - Crack detection method based on improved coding and decoding network model

Info

Publication number: CN112700418A
Application number: CN202011633762.7A
Authority: CN
Inventors: 徐守坤; 杨秋媛; 李宁; 石林; 庄丽华; 王雨生
Original assignee: Changzhou University
Current assignee: Changzhou University
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-23
Anticipated expiration: 2040-12-31
Also published as: CN112700418B

Abstract

The invention relates to a crack detection method based on an improved coding and decoding network model, which is characterized in that a data set subjected to data preprocessing is sent to a coder for feature extraction, the model adopts a coder-decoder structure, a main network is ResNet34 pre-trained in ImageNet, a cascaded dual-core cavity convolution is added in the middle layer of the coder so as to keep bottom layer semantic information and space structure information with more cracks for fusion at a jump connection stage, a multi-core pooling module is introduced at the decoder stage for obtaining and fusing crack information with different sizes, the information obtained by the method is more global compared with the information obtained by a single pooling layer, and fine cracks in a picture can be effectively detected. The invention performs experiments on different data sets and performs comparison experiments with other mainstream algorithm models at the same time, and the result shows that the method provided by the invention has higher precision and makes up the defects of the traditional method.

Description

Crack detection method based on improved coding and decoding network model

Technical Field

The invention relates to the technical field of safety monitoring image processing, in particular to a crack detection method based on an improved coding and decoding network model.

Background

Concrete surface crack detection is an important content of concrete building structure health monitoring. If the surface of the building cracks and continues to extend, the structure can fail in the long term, and serious economic loss and casualties are caused. The process of manually detecting cracks is time-consuming and labor-consuming, certain subjective judgment factors influence detection precision, and structures such as high-rise buildings, bridges and the like are difficult to manually detect. Therefore, a method for automatically detecting a concrete crack based on an image processing technology becomes a hot spot of current research.

With the rapid development of information technology, researchers propose to apply computer vision and image processing technologies to crack detection, and the traditional crack detection methods include Gabor filters, histogram of oriented gradient, local binary pattern, threshold-based estimation and the like. Although these methods also obtain good detection results, the requirements for the quality of the data set are high, and the method is not good enough for the data set with uneven illumination, complex crack topology and much noise. With the wide application and excellent performance of deep learning in various fields, researchers have been working on applying convolutional neural networks to crack detection to solve the limitations of the conventional methods in recent years.

In the crack detection field, the resolution of a feature map is reduced in the traditional convolution neural network feature extraction stage, so that low-level semantic information and fine-grained spatial structure information of a plurality of crack images are lost, and some fine cracks are easily missed to be detected. Even if the image detail information is restored through operations such as up-sampling and feature fusion, the network still has difficulty in accurately extracting the crack image detail features with complex topological structures and extremely unbalanced foreground and background.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: in order to overcome the defects in the prior art, the invention provides a crack detection method based on an improved coding and decoding network model, and aims to solve the problems that the detection accuracy of the concrete apparent crack is low, the tiny crack is easy to lose, and the pixel ratio of the crack is small in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows: a crack detection method based on an improved coding and decoding network model comprises the following steps:

s1, data acquisition: shooting and collecting crack image data on the concrete building by using intelligent equipment;

s2, data marking: label classification is carried out on the collected image data by using labelme software;

s3, preprocessing data: multiplying the acquired data set by adopting a data enhancement method;

s4, sending the data set subjected to data preprocessing into an encoder for feature extraction: the backbone Network in the encoder uses the pre-trained Resnet34, and the main idea of ResNet is to add a direct channel, namely the idea of high way Network, in the Network. ResNet directly bypasses input information to output, integrity of the information is protected, the whole network only needs to learn the part of input and output difference, and learning objectives and difficulty are simplified.

The method is characterized in that a cavity convolution is added in the middle layer of an encoder, the convolutional neural network can automatically learn the picture characteristics in order to extract the picture characteristics in the encoder stage, the characteristic information in the picture is divided into low-layer semantic information and high-layer semantic information, the shallow neural network learns the low-layer information such as the outline and the texture of the picture, and the network can learn more abstract and high-level characteristics along with the deepening of the layer number of the neural network. However, in the stage of extracting features, that is, in the operation of maximal pooling of multiple layers of convolutions, the method reduces the resolution of the feature map and increases the receptive field, so that much image detail information and spatial information are lost, and some tiny cracks are easy to miss detection.

The hole convolution is proposed to achieve an increase in the receptive field without reducing the size of the feature map, allowing the output of each convolution to contain a larger range of information.

Defining k as the size of the convolution kernel, k' as the size of the receptive field, and d as the size of the expansion rate, the calculation formula of the receptive field is:

k′＝k+(k-1)*(d-1) (1)

explaining the hole convolution from a one-dimensional angle, defining H as the input size, FR as the convolution kernel size, Output as the Output characteristic diagram size, D as the expansion rate, P as the filling and S as the step length. The calculation formula of the size and the dimension of the output characteristic graph is as follows:

and S5, finishing the feature extraction, and performing the decoder operation of restoring the feature map size to the original size: adding a multi-core pooling module at the front end of the decoder to generate a plurality of receptive fields for capturing information on different scales, wherein a calculation formula of the size of an output characteristic diagram after pooling operation is as follows:

in the above formula, (H, W) is the feature map resolution, (FH, FW) is the pooling kernel size, (OutputH, OutputW) is the output feature map resolution, P is the padding, S is the step size, and the outermost symbol is the rounding-down;

and S6, detecting the data set in the step S3 by using the improved network model, generating a training model, monitoring the model by using a loss function in the process of generating the model, training the improved coding and decoding network model by using the crack data set to generate a detection model, judging the crack pixel area of the picture, and outputting a crack segmentation result.

Specifically, in step S5, the acquiring context information by the multi-core pooling module includes the following steps:

s5-1, performing up-sampling on the multi-size low-dimensional feature graphs output under different pooling kernels through bilinear interpolation to enable the multi-size low-dimensional feature graphs to be the same as the original feature graphs in size;

s5-2, after each feature map, using 1x1 convolution to reduce the dimension of the feature map to 1/n of the original feature, wherein n is the number of channels of the original feature map;

and S5-3, splicing the feature maps of different levels and the original feature map into a final output feature map, wherein the splicing is a concat process.

Preferably, when the training model is generated, a combined loss function Bce loss + Dice loss is selected, wherein Bce loss is a two-class cross entropy loss function, and the formula is represented as:

wherein y is_iExpressed as the label value of the ith pixel point,

for the prediction label value of the ith pixel point, the common cross entropy loss function is slow and may not be optimized to be optimal in the iteration process of a large number of simple samples,

dice may be understood as the degree of similarity of two outline regions, represented by A, B as the set of points encompassed by the two outline regions, defined as:

DSC(A，B)＝2|A∩B|/(|A|+|B|) (5)

the formula of the binary class Dice loss is as follows:

the two are linearly combined to obtain a loss function, and the formula is expressed as follows:

Loss＝0.5BL+DL (7)

the invention has the beneficial effects that: the invention provides a cascade dual-core hole convolution for crack detection, which is used in a lower sampling intermediate layer and aims to keep low-layer semantic information with more cracks and spatial structure information which is easy to lose to be fused in a jump connection stage, thereby realizing more detailed image information retention; by introducing the multi-core pooling module at the decoder stage, a plurality of receptive fields are generated to capture information on different scales and are fused, compared with the information obtained by a single pooling layer, the information is more global, and the detection effect of cracks with different sizes can be effectively improved.

Drawings

The invention is further illustrated with reference to the following figures and examples.

Fig. 1 is a schematic diagram of data enhancement in the present invention.

Fig. 2 is a schematic diagram of the residual error of the ResNet in the present invention.

FIG. 3 is a schematic diagram of the convolution of holes at different expansion rates as described in the present invention.

FIG. 4 is a schematic diagram of the cascaded dual-core hole convolution module according to the present invention.

FIG. 5 is a schematic diagram of the structure of the multi-core pooling module described in the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.

A crack detection method based on an improved coding and decoding network model comprises the steps of collecting crack image data sets for marking, multiplying the number of the data sets through data enhancement to reduce the risk of overfitting, and sending test data into the improved coding and decoding network model for experiment to obtain a result; in terms of network improvement: the network training speed is increased by changing the backbone network to a pre-trained ResNet 34.

The specific scheme is as follows:

firstly, the data preprocessing aspect is as shown in fig. 1.

S1, data acquisition: selecting equipment such as a smart phone and the like to shoot and collect crack images on concrete buildings such as bridges and pavements;

s2, data marking: after data acquisition, label classification is carried out on image data by using labelme software;

s3, preprocessing data: doubling the data set by using a data enhancement method such as Rotate90 degrees, random GridShuffle, horizon Flip and the like, processing the marked crack data set to obtain a data set with multiple times, solving the over-fitting phenomenon caused by too few data sets and improving the detection precision.

Secondly, in the aspect of network model improvement

And S4, the network training speed is accelerated by changing the backbone network into a pre-trained ResNet 34. As shown in fig. 2, the main idea of ResNet is to add a direct connection channel, i.e. the idea of Highway Network, in the Network. ResNet directly bypasses input information to output, integrity of the information is protected, the whole network only needs to learn the part of input and output difference, and learning objectives and difficulty are simplified.

In order to realize the increase of the receptive field under the condition of not reducing the size (resolution) of the feature graph and enable the output of each convolution to contain larger range of information, the invention adds cascaded dual-core hole convolutions in the middle layer of an encoder stage, as shown in fig. 3, the hole convolution schematic diagram with different expansion rates is shown, wherein (a) subgraphs are common convolutions with the expansion rate of 1 and the receptive field is 3, (b) subgraphs are hole convolutions with the expansion rate of 2 and the receptive field is 5, and (c) subgraphs are hole convolutions with the expansion rate of 3 and the receptive field is 7. The calculation formula of the receptive field under different expansion rates is as follows:

k′＝k+(k-1)×(d-1) (1)

where k is the convolution kernel size, k' is the receptive field size, and d is the dilation rate size.

FIG. 4 is a schematic diagram of a cascaded dual-core hole convolution, which is composed of two hole convolutions with different expansion rates. Wherein the receptive field is 7x7 when the expansion rate is 3 and the receptive field is 11x11 when the expansion rate is 5. The expanded receptive field enables the convolution output to contain semantic information in a larger range, the smaller receptive field can extract the features of smaller objects, and the larger receptive field can extract the features of larger objects. And performing corresponding pixel addition on the output feature map after the expansion convolution operation, wherein the corresponding value addition can ensure that the dimension of the feature map is not changed, but each dimension can contain more features, namely semantic information. The method selects a lower sampling middle layer to use cascaded dual-core hole convolution, and aims to keep low-layer semantic information with more cracks and small crack information which is easy to lose to be fused at a jump connection stage, so that more accurate segmentation is realized.

S5, in order to solve the problem of crack multi-scale, the invention proposes to adopt a multi-core pooling module to generate a plurality of receptive fields to capture information on different scales at the decoder stage. As shown in fig. 5, the multi-core pooling module fuses features at four different scales, and uses pooling combinations of 2x2, 3x3, 5x5 and 7x7, and the output feature map size calculation formula after the pooling operation is as follows:

(H, W) is the feature map resolution, (FH, FW) is the pooled kernel size, (OutputH, OutputW) is the output feature map resolution, P is the padding, S is the step size, and the outermost symbol is the floor.

The process of the multi-core pooling module for acquiring the context information comprises the following steps: firstly, multi-size low-dimensional feature maps output under different pooling kernels are up-sampled through bilinear interpolation, so that the size of the feature maps is the same as that of the original feature map. Then, in order to maintain the weight of the global feature, 1x1 convolution is used after each feature map to reduce the dimension of the feature map to 1/n of the original feature, where n is expressed as the number of channels of the original feature map. And finally, splicing the feature maps of different levels and the original feature map into a final output feature map, wherein the splicing is a concat process.

For unbalanced samples with extremely large background proportion and extremely small object proportion in crack detection, the selection of a proper loss function is very important, and the combined loss function Bce loss + Dice loss is selected in the invention.

The Bce loss two-class cross entropy loss function and the classification loss function which is commonly used in semantic segmentation are expressed by a formula:

wherein y is_iExpressed as the label value of the ith pixel point,

and the predicted label value of the ith pixel point is obtained. Common cross-entropy loss functions are slow and may not be optimized to be optimal in an iterative process of a large number of simple samples.

Because the area needing to be segmented occupies a small picture, learning is easy to fall into the local minimum value of a loss function, and therefore the weight of the foreground area is increased by the loss. Dice may be understood as the degree of similarity between two outline regions, represented by A, B as the set of points contained in the two outline regions, defined as

DSC(A,B)＝2|A∩B|/(|A|+|B|) (5)

The formula of the binary class Dice loss is expressed as

The loss function herein combines the two linearly, and is formulated as:

Loss＝0.5BL+DL (7)

the combined loss function can focus on the classification accuracy of the pixel level and the segmentation effect of the image foreground at the same time, so that the model training is more stable, and the problem of unbalanced distribution of positive and negative samples in the crack image can be effectively solved, thereby obtaining a more accurate pixel level detection result.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims

1. A crack detection method based on an improved coding and decoding network model is characterized in that: comprises the following steps:

s4, sending the data set subjected to data preprocessing into an encoder for feature extraction: the backbone network in the encoder uses the pretrained Resnet34, adds hole convolution in the middle layer of the encoder,

k′＝k+(k-1)*(d-1) (1)

2. The crack detection method based on the improved coding and decoding network model as claimed in claim 1, characterized in that: in step S5, the acquiring context information by the multi-core pooling module includes the following steps:

3. The crack detection method based on the improved coding and decoding network model as claimed in claim 1, characterized in that: when the training model is generated, selecting a combined loss function Bce loss + Dice loss,

bce loss is a two-class cross entropy loss function, and the formula is as follows:

wherein y is_iExpressed as the label value of the ith pixel point,

DSC(A,B)＝2|A∩B|/(|A|+|B|) (5)

the formula of the binary class Dice loss is as follows:

Loss＝0.5BL+DL (7)