CN111861978B

CN111861978B - Bridge crack example segmentation method based on Faster R-CNN

Info

Publication number: CN111861978B
Application number: CN202010473952.0A
Authority: CN
Inventors: 李良福; 冯建云
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2023-10-31
Anticipated expiration: 2040-05-29
Also published as: CN111861978A

Abstract

The invention belongs to the technical field of image target detection, and particularly relates to a bridge crack example segmentation method based on Faster R-CNN, which comprises the following steps of constructing a bridge crack data set; marking a training sample; step three, constructing a bridge crack example segmentation model for improving Faster R-CNN; training the instance segmentation model built in the step three; step five, testing the instance segmentation model trained in the step four; step six: and (5) actual detection. Compared with the prior art, the method has stronger robustness, not only can obtain accurate bridge crack classification and positioning results, but also can generate a high-quality bridge crack segmentation mask for evaluating the damage degree of the bridge and making a corresponding maintenance scheme; in addition, the method can accurately detect multiple cracks in the image, so that the detection efficiency can be improved and the complete crack form can be obtained by combining the image splicing technology.

Description

Bridge crack example segmentation method based on Faster R-CNN

Technical Field

The invention belongs to the technical field of image target detection, and particularly relates to a bridge crack example segmentation method based on Faster R-CNN.

Background

The bridge is used as an important carrier for connecting two large-span position points, and plays an important role in road transportation in China. However, in long-term sun-drying and rain-spraying and load-carrying operation, the generated internal stress can be transferred to weak parts along the bridge structure, so that cracks are easy to generate and develop on the surface of the bridge structure at the weak parts, the damage degree of the surface cracks with different directions to the bridge structure is also different, and unsafe accidents are easy to be caused if the extending directions of the surface cracks are perpendicular to the bearing surface of the structure.

Engineering practice and theoretical analysis show that most in-service bridges work with cracks, and potential hazards caused by the cracks of the bridges are not small. Once the concrete bridge has serious cracks, outside air and harmful media can easily permeate into the concrete to generate carbonate through chemical reaction, so that the alkalinity environment of the steel bar is reduced, the purification film on the surface is more easily rusted after being damaged, in addition, shrinkage cracking can be aggravated by concrete carbonization, and serious harm is generated to the safe use of the concrete bridge. As the most common disease feature in bridge construction, extremely small cracks (less than 0.05 mm) generally have little influence on structural performance and can be allowed to exist; the larger cracks can be generated and expanded continuously under the action of load or external physical and chemical factors to form through seams and deep seams, so that the service life and the safety performance of the beam structure are indirectly and even directly influenced; if the width of the crack reaches more than 0.3mm, the integrity of the structure can be directly damaged, the carbonization of concrete, the peeling of a protective layer and the corrosion of steel bars are caused, and a mechanical discontinuity is formed in the bridge, so that the bearing capacity of the bridge is greatly reduced, and even a collapse accident occurs when the bridge is serious, and the normal use of the structure is endangered.

Thus, if it is possible to find it at the beginning of the crack occurrence, and to make an associated safety factor estimate, in order to maintain it at the beginning of the risk formation. The traditional detection method based on artificial vision is high in cost and low in efficiency, the detection accuracy is influenced by subjective factors, the detection requirement of the bridge crack cannot be met more and more, the existing detection method based on the Faster R-CNN technology is used for detecting the crack, only the crack in an image is marked by a rectangular frame (such as application number 201910526241.2), morphological characteristics of the crack cannot be directly extracted, namely the damage degree of the crack cannot be intuitively detected, in addition, the detection method is basically used for detecting a single picture (local picture), the detection efficiency is low, and the detected crack is incomplete.

In view of the above, the present inventors have provided a method for dividing bridge crack examples based on fast R-CNN after a lot of experimental research and study to solve the above-mentioned problems.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a bridge crack example segmentation method based on the Faster R-CNN, which not only can realize the framing of the position of a crack on an original image, but also can generate a high-quality bridge crack segmentation mask according to the real form of the crack by improving the Faster R-CNN model; in addition, the method can accurately detect multiple cracks in the image, so that the detection efficiency can be improved and the complete crack form can be obtained by combining the image splicing technology.

The technical problems to be solved by the invention are realized by the following technical scheme: the invention provides a bridge crack example segmentation method based on Faster R-CNN, which comprises the following steps:

step one, constructing a bridge crack data set

1) Normalizing the acquired bridge crack image into a bridge crack image with 256 multiplied by 256 resolution;

2) Amplifying the number of normalized bridge crack image samples by adopting geometric transformation, linear transformation and image filtering algorithm;

3) Dividing the amplified bridge crack data into a training set, a testing set and a verification set;

step two, marking training samples

Labeling the training set samples divided in the step one;

step three, constructing a bridge crack example segmentation model for improving Faster R-CNN

1) Adding crack mask branches;

2) Pooling with region of interest alignment instead of the region of interest;

3) Adding predicted fracture mask cross-point branches

Training the instance segmentation model built in the step III

Training the bridge crack example segmentation model of the Faster R-CNN built in the third step by using the training sample marked in the second step;

step five, testing the instance segmentation model trained in the step four

After training, testing a bridge crack example segmentation model of the Faster R-CNN after training by using the test set sample in the first step, wherein the test set sample is used for verifying the robustness of the bridge crack example segmentation model of the Faster R-CNN after improvement;

step six, actual detection

And inputting the bridge crack image to be identified into a bridge crack example segmentation model of the Faster R-CNN after the test, judging whether the image is the bridge crack image, if so, framing the position of the crack, and generating a bridge crack segmentation mask.

Further, the specific process of step 1) in the third step is as follows:

the crack mask branch is a small full convolution network applied to each crack region of interest, the bridge crack segmentation mask is predicted in a pixel-to-pixel mode, and the positive region selected by the region of interest classifier is used as the input of the crack mask branch network;

a corresponding soft mask represented by floating point numbers is then generated and is concurrent with the branches for fracture classification and bounding box regression in the Faster R-CNN network.

Further, the specific process of the step three 2) is as follows:

firstly, the region of interest or a space unit is not quantized at all, so that the deviation between the extracted characteristics and the input is avoided;

Then calculating the accurate values of the input features of four regular sampling positions in each interested area unit by adopting a bilinear interpolation method;

and finally, calculating a final result by adopting a maximum value or average value mode.

Further, the specific process of the step 3) is as follows:

the predicted fracture mask intersection ratio branch model is used for describing initial fracture segmentation quality by utilizing pixel level intersection ratio between the predicted fracture mask and the fracture mask truly marked by the predicted fracture mask; using the characteristics of the region of interest alignment layer and the predicted crack mask connection as input of a predicted crack mask intersection ratio model part, wherein the output is the intersection ratio between the predicted crack mask and a matched real marked mask;

in connection, using a maximum pooling layer with a kernel size of 2 and a stride of 2, so that the predicted fracture mask has the same spatial size characteristics as the region of interest;

the prediction crack mask cross-blending ratio branch model part consists of 4 convolution layers and 3 complete connection layers, for the 4 convolution layers, the kernel size and the filter number of all the convolution layers are respectively set to 3 and 256 on the basis of a mask basic frame, for the 3 complete connection layers, the basic frame of the regional convolution neural network is adopted, the output of the first two complete connection layers is set to 1024, and the output of the last complete connection layer is set to be similar.

Further, the total loss function formula of the bridge crack example segmentation model in the third step is as follows:

L＝L _cls +L _box +L _mask (1)

wherein L is _cls Is the loss of classification branch, L _box Is the loss of bounding box regression, L _mask Is the loss of crack mask branching.

Further, the training process in the fourth step specifically includes:

1) The training sample marked in the second step is sent to a bridge crack example segmentation model of Faster R-CNN built in the third step, and is firstly subjected to backbone network processing, wherein the backbone network is a residual network 101 and is used for initially extracting crack characteristics;

2) Inputting the preliminarily extracted crack characteristic results into a characteristic pyramid network for further processing so as to be capable of representing the crack enhancement on multiple scales;

3) The area generating network process is that it is a small-sized neural network, utilize the sliding window mode to process the picture, find the area that has goal to exist in the picture, produce the candidate frame of goal, then judge the candidate frame is the prospect or background through softmax function, then revise the anchor frame through the regression of the boundary frame, namely further confirm the candidate frame, finally according to the candidate frame of the area generating network, choose the boundary frame comprising slit accurately, and fine-tune its position and size, in order to get the goal candidate frame of the detection result, finish transmitting backward at the same time;

4) And finally, aligning the interested areas to extract features from each candidate frame, executing crack classification by using crack classification branches of the network, generating a crack boundary frame by using a boundary frame regression branch, generating a crack mask by using a crack mask branch, and finally completing the example segmentation of the bridge crack.

Further, the dividing ratio of the data set in the step one is listed as a training set: test set: validation set = 10:1:1.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the bridge crack example segmentation method based on the fast R-CNN, since the bridge crack is not in a simple form, the background of a crack image is complex, noise interference is large, a traditional digital image processing algorithm and a shallow machine learning algorithm cannot detect the crack robustly, but the deep visual structural characteristics obtained through learning of the crack detection method based on deep learning can improve the detection result of the bridge crack, a target detection model (such as the fast R-CNN) only detects the crack in an original image and marks the position of the crack by a rectangular frame in the deep learning, the real form of the crack cannot be intuitively obtained, and further the damage degree of the crack cannot be evaluated and processed in a follow-up mode; the problem of pooling dislocation of the region of interest is solved by using the region of interest to align and replace pooling of the region of interest, so that the exact spatial position is accurately reserved; by adding prediction crack mask cross-merging branches, the problems that an accurate frame level positioning result and a higher classification score can be obtained by dividing a bridge crack example, but the corresponding bridge crack mask is inaccurate are solved, and an accurate mask is finally generated; the method can realize accurate detection of the bridge cracks, and has stronger robustness compared with the prior art.

2. After the bridge crack image is detected, the damage degree of the bridge can be evaluated according to the generated high-quality bridge crack segmentation mask, and the extracted cracks are further quantified in information, so that a reliable reference data index can be provided for maintenance and management of the bridge, and defects can be found and processed in time to avoid accidents. The method has important significance for ensuring the safe operation and the prolonged service life of the bridge, and can divide the level of the damage degree of the bridge crack through detection data (such as the area of the bridge crack or the width of the corresponding crack), so that the bridge defect can be found as early as possible, the bridge defect can be repaired and reinforced in time, the bridge repair cost can be saved, and the comprehensive economic benefit of the bridge operation period can be improved.

3. According to the bridge crack example segmentation method based on the fast R-CNN, after multiple experiments prove that the image stitching technology is utilized to stitch a plurality of crack images into a plurality of crack images, the method can still be used for accurately detecting cracks, and the time of use is equivalent to that of a single crack, so that the detection efficiency can be greatly improved; in addition, as the acquisition of the bridge crack image is local, after the image splicing technology is adopted, the more complete crack condition can be seen by detecting the bridge crack image by using the method, and the method is beneficial to complete analysis of the actual condition of the crack.

Drawings

FIG. 1 is a flow chart of the steps of the method of the present invention;

FIG. 2 is a transformation chart of a partial bridge crack image for experiments according to the present invention;

FIG. 3 is a graph showing the result of segmentation labeling of an example of a part of bridge crack image for experiments according to the invention;

FIG. 4 is a schematic diagram illustrating the operation of the region of interest alignment layer of the present invention;

FIG. 5 is a diagram of various design choices for the PCMIoU input of the present invention;

FIG. 6 is a diagram of a bridge crack example segmentation model according to the present invention;

FIG. 7 is a diagram of a feature pyramid network of the present invention;

FIG. 8 is a simplified diagram of a feature anchor of the present invention;

FIG. 9 is a comparative chart of PCMIoU module of the present invention;

FIG. 10 is a graph comparing crack detection results using different data sets in accordance with the present invention;

FIG. 11 is a graph of the multi-crack detection results of the present invention;

FIG. 12 is a graph of the multi-slit results of the splice of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects solved by the invention clearer, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention, as described in detail below.

The inventor makes a bridge crack example segmentation method based on Faster R-CNN through a large number of experimental demonstration, solves the problem that the traditional digital image processing algorithm and the shallow machine learning algorithm cannot well detect bridge cracks, and can obtain accurate bridge crack classification and positioning results and generate a high-quality bridge crack segmentation mask.

The present invention is described in further detail below with reference to the examples and the accompanying drawings.

Examples: as shown in the attached figure 1, the invention provides a bridge crack example segmentation method based on Faster R-CNN, which comprises the following steps:

step one, constructing a bridge crack data set

1) Firstly, carrying out normalization processing on 2000 acquired bridge crack images, and normalizing the acquired 2000 bridge crack images into 256 multiplied by 256 resolution bridge crack images;

specifically, in order to ensure the quantity balance of various types of bridge crack images in the data set, the various types of bridge crack images, namely, a cracked bridge crack image, a netlike bridge crack image, a transverse bridge crack image, a longitudinal bridge crack image, a pit-slot type bridge crack image and a crack-free bridge image are processed, the bridge crack images after a series of related algorithm expansion (including geometric transformation, linear transformation, image filtering algorithm and the like) are processed through digital image processing are shown in the figure 2, and in this way, the operation of expanding the data set does not influence the example segmentation of the bridge crack images, and the final quantity of the expanded bridge crack image data set reaches 24000 crack images;

the embodiment uses the amplified bridge crack data according to a training set: test set: the ratio of verification set=10:1:1 is divided, namely 20000 training set samples, 2000 test set samples and 2000 verification set samples.

Step two, marking training samples

And (3) marking the training set samples divided in the step one, wherein in the marking process, an open source image marking tool LabelMe is used for marking each training data, and when the marking is performed, the cracks in the images are focused, and each crack in the images is accurately marked, and the marking is sequentially named as Bridgecrack1, bridgecrack2, bridgecrack3 and the like. And generating a corresponding json file after each image is marked, wherein marked label information is stored. However, for the image instance segmentation task, the corresponding labels are required to be the image files in the format of. Png/. Bmp, etc., the transformation of a single json file can be performed by executing the transformation command: and generating a folder by using the corresponding json file, wherein the folder contains 5 files, namely img.png (original image), info.yaml, label.png, label_names.txt and label_viz.png, as shown in figure 3. However, in practice, because the training set data is large, the batch conversion is realized in a code mode, and manually marked image files are obtained from folders in batches.

1) Adding crack mask branches

The Fast R-CNN model includes two phases after the input image passes through the convolution layer, the region-generating network being the first phase, acting to output candidate bounding boxes, while the second phase, which is the same as part of the Fast R-CNN model, uses region-of-interest pooling to extract features from each candidate box and performs classification and bounding box regression. Because the two stages share the same convolutional neural network, the inputs of the two stages are the same characteristic diagram, thereby accelerating the running speed of the network. The bridge crack example segmentation model has the same first stage, namely a region generation network. In the second stage, however, a binary mask is output for each crack region of interest in addition to the label of the output crack and the bounding box position of the crack, and this branch from which the crack binary mask is output is called the crack mask branch, which is parallel to the crack classification branch and the bounding box regression branch.

The crack mask branches are small full convolution networks applied to each crack region of interest, predict bridge crack segmentation masks in pixel-to-pixel fashion, take the positive region selected by the region of interest classifier as input to the crack mask branch network, then generate the corresponding soft mask represented by floating point numbers, and parallel to the branches used for crack classification and bounding box regression in the existing Faster R-CNN model.

When the Faster R-CNN model is trained, the invention makes the multi-task loss function on each sampled region of interest, namely the general loss function formula of the Faster R-CNN model is as follows:

L＝L _cls +L _box +L _mask (1)

wherein L is _cls Is the loss of classification branch, L _box Is the loss of bounding box regression, L _mask Is the loss of crack mask branching. The definition of the loss of classification branches and the loss of bounding box regression is the same as in Faster R-CNN, and there is Km for each region of interest of the image after crack mask branching ² The output of the dimension, that is to say the split mask branching, generates K binary masks of resolution mxm for each region of interest, with a few splits generating a few masks, K representing the number of splits. In calculating loss, the invention calculates loss of each pixel by adopting a sigmoid loss function and calculates L _mask Defined as the average binary cross entropy loss. While each region of interest associated with an actual type k, L _mask The definition of (c) is applied to the kth mask because the output of the remaining mask causes little loss and is negligible.

The invention is directed to L _mask The definition of (c) allows the model to generate a binary mask for each crack individually, avoiding corner-by-corner between cracks. The present invention uses the fracture classification branches to determine the class of the output fracture mask, thus making the fracture mask and the class prediction non-interfering with each other. The operation of image semantic segmentation that works on a full convolution network to pixel level softmax and multiple cross entropy losses is quite different, as such operation can cause preemption of masks by different cracks. The present invention therefore replaces the multiple cross entropy penalty with a binary penalty, while the pixel level softmax remains unchanged. Through verification, the processing mode can obtain a high-quality segmentation result.

The mask encodes the spatial layout of the input objects. Thus, unlike class labels or frame offsets that are inevitably shortened to a non-long output vector by the full connection layer, the spatial structure of the extraction mask can be naturally solved by the inter-pixel relationship provided by convolution. In particular, the present invention uses a full convolution network to predict one m mask for each region of interest. In this way each layer in the crack mask branch can significantly maintain the spatial layout of the m x m objects unchanged without shortening to vector representations that reduce the spatial dimensions. Unlike those methods that use fully connected layers for mask prediction, and the full convolution representation of the present invention requires fewer parameters, the experimental run results are also more accurate.

2) Substitution region of interest pooling with region of interest alignment

The bridge crack example segmentation model is expanded on the basis of the Faster R-CNN model, and the correct construction of crack mask branches is important for finally obtaining an accurate bridge crack example segmentation detection result. However, since the Faster R-CNN model is not intended to guarantee pixel-to-pixel alignment between the input and output of the model, region of interest pooling (RoIPool) is employed in the Faster R-CNN model, which is a rough spatial quantization to perform feature extraction. Pixel-to-pixel approaches require that the region of interest features must be properly aligned to accurately preserve explicit pixel spatial correspondence, which is critical to outputting high quality crack masks. In order to solve the dislocation problem of the region of interest pooling, so that the model of the invention can output a high-quality bridge fracture mask, the invention adopts a simple non-quantized layer called region of interest alignment (RoIAlign), and the exact spatial position is accurately reserved.

Because the region of interest pooling layer is not aligned pixel by pixel, the effect on the final output crack boundary candidate box is not very large, but the accuracy of the output crack mask is very large, in order to solve the problem, the invention adopts a region of interest alignment layer, and the key of the layer is that strict quantization is not executed, and the deviation between the extracted feature and the input is avoided. The invention does not quantize the region or space unit of interest, and adopts x/16 as different from the pooling of interest; then, the bilinear interpolation method is adopted when the accurate values of the input characteristics of four regular sampling positions in each region-of-interest unit are calculated; finally, the final result is calculated in a maximum or average manner, the detailed operation information is shown in fig. 4, in which the dotted grid represents a feature map, the solid line represents a region of interest (2 x 2 memory cells in this embodiment), the points represent 4 sampling points in each memory cell, and the region of interest alignment calculates the value of each sampling point by bilinear interpolation from adjacent grid points on the feature map without performing quantization on any coordinates involved in the memory cells or sampling points of the region of interest. The result is generally insensitive to the exact sampling location or the number of samples, as long as no quantization is performed.

3) Adding predicted fracture mask cross-point branches

In the case segmentation study of the target, most of the confidence in the case classification is used simultaneously as a quality score for the mask. However, mask quality is typically quantified as the ratio of the intersection of the example mask with its true labeling, and is generally not well correlated with the classification score, resulting in an example segmentation network that achieves accurate frame-level localization results and higher classification scores, but the corresponding mask is inaccurate. In order to solve the problem that the bridge fracture example segmentation network can obtain accurate frame-level positioning results and higher classification scores, but the corresponding bridge fracture mask is inaccurate, in the invention, a network for predicting the fracture mask intersection ratio is called a predicted fracture mask intersection ratio (Predict Crack Mask Intersection-over-Union, pcmio u) and is used for learning the quality of the predicted fracture example segmentation mask.

PCMIoU, unlike previous approaches aimed at obtaining more accurate instance positioning or segmentation masks, focuses on scoring the mask. To achieve this goal, the network will learn the score of each crack mask, rather than use its classification score. For ease of distinction, the invention refers to the score learned by pcmio u as a crack mask score, which is inspired by an average accuracy (Average Precision, AP) metric of instance segmentation, which is a network model that directly learns crack mask intersection ratios, which describes initial crack segmentation quality using pixel-level intersection ratios between predicted crack masks and their truly labeled crack masks. In the present invention, once the predicted fracture mask intersection ratio is obtained at the testing stage, the fracture mask score can be re-evaluated by multiplying the predicted fracture mask intersection ratio with the classification score, so that the fracture mask score can be known for both the fracture semantic class and the integrity of the fracture instance mask.

The invention will S _mask Defined as the score of the crack prediction mask, ideal S _mask Equal to the pixel level intersection ratio between the fracture prediction mask and the true annotated fracture mask it matches. Ideal S _mask There should also be positive values only for the true tag class and zero for the other classes, since one mask belongs to only one class, which requires that the crack mask score works well for both tasks: firstly, classifying the masks into correct categories, and secondly, returning the candidate mask cross ratios into foreground object categories. It is difficult to train these two tasks using only a single objective function. For simplicity, the present invention decomposes the mask score learning task into fracture mask classification and mask cross-ratio regression, so for all fracture classes, the score for each fracture prediction mask is expressed by equation (2):

S _mask ＝S _cls ·S _iou (2)

wherein S is _cls Focusing on the score that classifies the predictions belonging to which class, because S _cls The goal of (1) is to classify the predictions belonging to which class, which has been done in the classification task of the regional convolutional neural network stage, so the corresponding classification score can be used directly, S _iou Then it is the score of the regression fracture mask.

The goal of the pcmio u network part is to regress the intersection ratio between the predicted fracture mask and its truly annotated fracture mask. The present invention uses the features of the region of interest alignment layer and the predicted fracture mask connection as inputs to the pcmio u network portion, the output being the intersection ratio between the predicted fracture mask and the true-labeling mask that it matches. In connection, the present invention uses a max pooling layer with a kernel size of 2 and a stride of 2, so that the predicted crack mask has the same spatial size characteristics as the region of interest, the present invention chooses only to cross-regress the crack mask to a true tag class instead of all classes, the pcmio u model part of the present invention consists of 4 convolution layers and 3 fully connected layers, for 4 convolution layers, the present invention sets the kernel size and filter number of all convolution layers to 3 and 256 on a mask base frame basis, for 3 fully connected layers, the present invention adopts the base frame of the regional convolutional neural network and sets the outputs of the first two fully connected layers to 1024, and the outputs of the last fully connected layer to the number of classes.

In training the pcmio u network portion, the present invention uses candidates for the region-generated network as training samples. Training samples require an intersection ratio of greater than 0.5 between the prediction box and the matching real label box, as with training samples of the mask base frame portion. In order to generate a regression target of each training sample, the invention firstly obtains a prediction mask of a target class, and binarizes the prediction mask by using a threshold of 0.5; the intersection ratio between the binary mask and the true annotated mask that it matches is then used as the target of the crack mask intersection ratio. The invention uses L ₂ The crack mask intersection ratio is regressed by loss, the loss weight is set to be 1, the basic framework of the crack mask intersection ratio is integrated into the constructed bridge crack example segmentation model, and the whole network is trained end to end.

The design of the pcmio u network module was chosen from four options as shown in fig. 5, namely, predicting a fracture mask score map (28 x C) from a fracture mask base frame and region of interest feature fusion.

The four designs are explained as follows:

(a) Only the target mask connects the region of interest features: firstly, a scoring graph of a target class is taken, and then the scoring graph is subjected to maximum pooling treatment and is connected with the characteristics of the region of interest.

(b) Only the target mask is multiplied by the region of interest feature: firstly, a score graph of the target class is taken, and then the score graph is subjected to maximum pooling treatment and multiplied by the characteristics of the region of interest.

(c) All masks connect the region of interest features: all class C mask score graphs are subjected to maximum pooling treatment and are connected with the features of the region of interest.

(d) Only the target mask connects the high resolution region of interest features: firstly, a scoring graph of the target class is taken and connected with 28×28 interesting region features.

The present invention uses the COCO evaluation index AP and reports results that exceed the cross-over threshold on average, including [email protected], [email protected], [email protected] (or [email protected]) to indicate whether the cross-over threshold of 0.5 (or 0.75) is used to determine whether the predicted bounding box or mask is positive in the evaluation. The AP of the present invention was evaluated using the crack mask intersection ratio and the results of the four designs are shown in table 1 below:

table 1pcmio u inputs the results of different design choices

As can be seen from the comparison, the performance of the pcmio u network module is robust to different methods of fusing crack mask predictions and features of interest. Performance improvements can be observed in various designs. The present invention takes the target score map as a default choice, since it is connected to the region of interest feature to achieve the best results.

The final constructed bridge fracture example segmentation model structure is shown in fig. 6, and the model is divided into two stages, wherein the first stage processes an input image and generates a candidate frame, and the second stage classifies the candidate frame and generates an accurate boundary frame and a high-quality mask.

Training the instance segmentation model built in the step III

The method comprises the following steps: 1) The training sample marked in the second step is sent to a bridge crack example segmentation model for building an improved Faster R-CNN in the third step, and is firstly subjected to the processing of a backbone Network, wherein the backbone Network is a Residual Network (ResNet) 101 which is a standard convolutional neural Network and is used as a feature extractor for initially extracting crack features;

the residual network is chosen because it adds a jump over the standard feed forward convolutional network to bypass some layers of connections, each bypass producing a block of residuals accordingly, the convolutional layer predicting the residuals of the input tensor. The common deep feed-forward network is difficult to optimize, but the residual network has short connection of layer jump, namely, a shortcut for adding output summation of a convolution layer, so that the forward and backward propagation of information in the network is smoother, and the degradation problem of the deep neural network is solved. The residual network adopted by the invention is ResNet-101, and 101 refers to a 101-layer with weight of the residual neural network, wherein the 101-layer comprises a convolution layer and a full connection layer, and does not comprise a pooling layer and a batch normalization layer.

the introduction of the feature pyramid network by the present invention on the basis of the ResNet-101 network further extends the model of the present invention to enable the enhancement of the ability to represent targets on multiple scales. Compared to the standard feature extraction pyramid, the FPN of the present invention improves feature extraction performance by adding a second pyramid, which has the effect of selecting high-level features from the first pyramid and passing them on to lower layers, as shown in fig. 7, where the features of each level can be combined with the high-level and low-level features, and the FPN introduces additional complexity: the second pyramid in the FPN has a feature map that contains features at each level, rather than a single backbone feature map in the standard backbone (i.e., the highest level in the first pyramid), which level of features is selected is dynamically determined by the size of the target.

3) The area generating network process is then a small neural network, the image is processed by utilizing a sliding window mode, the area with the target existing in the image is found, the target candidate frame is generated, then the candidate frame is judged to be foreground or background by a softmax function, then the anchor frame is corrected by the regression of the boundary frame, namely the candidate frame is further determined, finally the boundary frame accurately containing the crack is selected according to the candidate frame of the area generating network, the position and the size of the boundary frame are finely adjusted, as shown in fig. 8, if a plurality of mutually overlapped candidate frames exist, the candidate frame with the highest foreground score is reserved finally, and the rest is abandoned by using a non-maximum value suppression method. To obtain target candidate boxes of the detection result, and simultaneously complete backward transmission, so as to complete the first stage of the model.

4) Finally, the interested region alignment is used for extracting features from each candidate frame, the crack classification branch of the network is utilized for executing crack classification, the boundary frame regression branch is utilized for generating a crack boundary frame, the crack mask branch is utilized for generating a high-quality crack mask, and finally, the example segmentation of the bridge crack is completed.

Since a typical classifier can only handle fixed input sizes, but the region of interest frames are scaled as described above to appear different sizes, this problem needs to be solved first before further processing the region to generate the output of the network. The present invention solves this problem by using region of interest alignment, which is to cut a region of the feature map first and then alter the cut portion to the desired resolution. Region of interest alignment implements operations similar to image cropping and resizing in image software, with only differences in implementation details. The specific operation is to sample at different points of the feature map, calculate the accurate values of the input features of four regular sampling positions in each region of interest unit by bilinear interpolation, and calculate the final result in a maximum or average mode, thus accurately preserving the exact spatial position. After the treatment of the region of interest alignment layer, the classifier, the boundary box regressor and the crack mask branches can be used for classifying candidate boxes and generating boundary boxes and high-quality crack masks, and finally, the example segmentation of the bridge cracks is completed.

Step five, testing the instance segmentation model trained in the step four

After training, testing and training the improved bridge crack example segmentation model of the Faster R-CNN by using the test set sample in the first step, wherein the test set sample is used for verifying the robustness of the bridge crack example segmentation model of the Faster R-CNN;

step six, actual detection

Inputting the bridge crack image to be identified into a bridge crack example segmentation model of the Faster R-CNN after testing, judging whether the image is the bridge crack image, if so, framing the position of the crack, and generating a segmentation mask to mark the real area of the crack shape.

To verify the feasibility of the invention, the inventors devised the following three sets of comparative experiments to verify from different angles:

the first set of experiments are used for verifying whether the network module PCMIoU for predicting the intersection ratio of the crack masks is added to the bridge crack example segmentation model or not, and the experimental result is shown in fig. 9.

In order to further verify the effect of the bridge crack example segmentation model on crack detection, the invention adopts an accuracy (Precision), a Recall (Recall), an F1 Score (F1-Score) and a classification Score to quantitatively evaluate and analyze the crack detection effect, and specific calculation formulas of a crack accuracy index Pre and a crack Recall index Rec are shown as formula (3) and formula (4):

and the F1 score is the harmonic mean of the precision rate and the recall rate, and the calculation formula is shown in the formula (5):

after calculation, the evaluation results are shown in the following table 2:

table 2 comparison of effects of pcmio u network modules on results

As can be seen from the experimental result data in Table 2, the accuracy, recall rate, F1 fraction and classification fraction of crack detection after PCMIoU network module is added are obviously improved, which indicates that the classification, the generation and the positioning frame and the crack mask of the crack example segmentation network model of the bridge are greatly improved after PCMIoU module is added.

The second set of experiments is used for testing the influence of the bridge fracture data set amplification method on the accuracy of the bridge fracture example segmentation model based on the invention. The detailed experimental operation process is as follows: the first step is to train the bridge crack example segmentation model built by the invention by using 2000 acquired original bridge crack images, and the model is still divided according to a certain proportion before training. The data set is finally divided into 1500 pieces of training data, 300 pieces of test data and 200 pieces of verification data in the experiment; secondly, performing iterative training on the bridge crack example segmentation model built by the method for the same times by adopting training samples mixed by 20000 original bridge crack images and the expanded bridge crack images selected at first; and thirdly, respectively testing results of the models after the two training is finished, wherein the results are shown in fig. 10, and the results are obtained from experimental results, the accuracy of the example segmentation results after the model is trained by the expanded data set is greatly improved, and the positioning frame output of the crack and the generation quality of the crack mask are obviously improved. Based on the conclusion, the quality of the final detection effect of the bridge crack example segmentation model on the premise of reasonable construction depends on the number of training samples to a great extent.

The third group of experiments are used for verifying the detection effect of the bridge crack example segmentation model on multiple cracks, and the result is shown in fig. 11, wherein the bridge crack example segmentation model is also robust in detecting the multiple cracks, and can accurately detect the multiple cracks in the image. Therefore, in order to improve the detection efficiency of the bridge cracks, an image stitching technology (the prior art) is adopted, a plurality of images to be detected are combined into one large image and then detected by adopting the method, an experimental result is shown in fig. 12, the crack detection effect can be found to be accurate, and the time for detecting the stitched image is the same as that for detecting the stitched image of the detection sheet Zhang Meiyou. Therefore, the detection of a plurality of bridge crack images can be completed by using the time for detecting one bridge crack image, so that the bridge crack detection efficiency can be greatly improved by combining the bridge crack example segmentation model with the image splicing technology. The invention provides a bridge crack example segmentation method based on a Faster R-CNN, which is characterized in that the traditional Faster R-CNN technology is improved, so that the accurate extraction and positioning of the bridge crack are finally realized, the position of the crack can be accurately framed on an original image, a high-quality segmentation mask can be generated in the original image by the real form of the crack, the detection effect of target detection and semantic segmentation is achieved, the complete crack condition can be obtained by combining the image splicing technology, the damage degree of the bridge can be evaluated through the generated bridge crack mask, and the extracted crack is further quantized in information, so that a reliable reference data index can be provided for the maintenance and management of the bridge, some defects can be found and processed in time, unsafe hidden dangers are eliminated, and the safe and stable operation of the bridge is ensured.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. The bridge crack example segmentation method based on Faster R-CNN is characterized by comprising the following steps of:

step one, constructing a bridge crack data set

step two, marking training samples

Labeling the training set samples divided in the step one;

1) Adding crack mask branches;

2) Pooling with region of interest alignment instead of the region of interest;

3) Adding predicted fracture mask cross-point branches

Training the instance segmentation model built in the step III

step five, testing the instance segmentation model trained in the step four

step six, actual detection

2. The fast R-CNN based bridge fracture instance segmentation method according to claim 1, wherein: the specific process of the step three 1) is as follows:

A corresponding soft mask represented by floating point numbers is then generated and is parallel to the branches used for fracture classification and bounding box regression in the Faster R-CNN model.

3. The fast R-CNN based bridge fracture instance segmentation method according to claim 1, wherein: the specific process of the step three 2) is as follows:

4. The fast R-CNN based bridge fracture instance segmentation method according to claim 1, wherein: the specific process of the step 3) is as follows:

5. The fast R-CNN based bridge fracture instance segmentation method according to claim 1, wherein: in the third step, the total loss function formula of the bridge crack example segmentation model is as follows:

L＝L _cls +L _box +L _mask (1)

6. The fast R-CNN based bridge fracture instance segmentation method according to claim 1, wherein: the training process in the fourth step specifically comprises the following steps:

7. The fast R-CNN based bridge fracture instance segmentation method according to claim 1, wherein: the dividing ratio of the data set in the step one is listed as a training set: test set: validation set = 10:1:1.