CN114494240A

CN114494240A - Ballastless track slab crack measurement method based on multi-scale cooperation deep learning

Info

Publication number: CN114494240A
Application number: CN202210148079.7A
Authority: CN
Inventors: 胡文博; 王卫东; 邱实; 杨怀志; 王劲; 汪思成; 伍定泽; 朱星盛; 谷永磊
Original assignee: Central South University; Beijing Shanghai High Speed Railway Co Ltd
Current assignee: Central South University; Beijing Shanghai High Speed Railway Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-05-13

Abstract

The embodiment of the invention provides a ballastless track slab crack measuring method based on multi-scale collaborative depth learning, which belongs to the technical field of image processing and specifically comprises the following steps: constructing a crack measurement frame for multi-scale collaborative depth learning; dividing a plurality of sample images into a training set and a verification set; adjusting the hyper-parameters of the depth target detection network and outputting an optimal crack region extraction result corresponding to each sample image in the training set; cutting boundary coordinates of the extraction result of each crack region to obtain a crack image and inputting the crack image into a depth semantic segmentation network to adjust the hyper-parameters of the depth semantic segmentation network; obtaining a crack measurement model; and inputting the acquired target image corresponding to the target ballastless track slab into the crack detection model to obtain the continuous width value of the crack in the target image. By means of the scheme, the characteristics of three scales of image-pixel-width are analyzed and transmitted in a cooperative mode, pixel misjudgment caused by complex background is reduced, and a refined crack width measurement value is obtained.

Description

Ballastless track slab crack measurement method based on multi-scale cooperation deep learning

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a ballastless track slab crack measuring method based on multi-scale collaborative deep learning.

Background

At present, with the increase of the operation time of a high-speed railway, the surface of a ballastless track plate gradually cracks and continuously grows under the influence of climate, environment, service time and the like. The cracks not only reduce the strength of the concrete structure, but also shorten the service life of the track slab. When the width of the crack is expanded to a certain degree of severity, the fastener is loosened and the track is displaced, so that the whole ballastless track structure is failed, and the operation safety of the high-speed railway is seriously threatened. The method for accurately measuring the width of the crack on the surface of the ballastless track plate and evaluating the severity of the crack is an important content of the routine inspection work of the high-speed railway and a key basis for maintenance and repair decisions.

The accuracy of crack boundary detection directly determines its severity, i.e., the reliability of the crack width measurement. Conventional image processing technology-based measurement schemes distinguish crack pixels from background pixels by processing shallow features (such as color, grayscale, contour, edge, and frequency) in the inspection image. Despite their simple structure and low computational cost, the selection of optimal thresholds and seed pixels requires a lot of manual intervention and predefined formula design, requires adjustment of computational parameters or even redesign of algorithms if the crack characteristics or detection background change significantly, has high specificity, low generalization and uncertainty, and is prone to generate fuzzy or discontinuous crack boundaries (missing detection) resulting in invalid width measurements.

Although the solution based on deep learning greatly improves the detection accuracy and efficiency compared with the traditional image processing technology, most algorithms are only suitable for monotonous and uniform scenes, the common characteristics of the algorithms are that the crack boundaries are detected only through the analysis of pixel scales, the crack pixels and background pixels similar to the crack height, such as noise, pollution and the like are difficult to distinguish, and the crack width measurement is seriously deviated from the reality due to the missed detection or false detection caused by poor adaptability to complex background conditions.

In summary, the refined measurement of the crack width is a key basis for describing and maintaining the crack state of the high-speed railway ballastless track plate, but the existing solution based on machine vision is only suitable for monotonous and uniform scenes, and the crack measurement result deviates from the true value due to missed detection or false detection under the complex background condition.

Therefore, a ballastless track slab crack measuring method based on multi-scale cooperative deep learning with high adaptability and high measuring accuracy is needed.

Disclosure of Invention

In view of this, the embodiment of the invention provides a ballastless track slab crack measurement method based on multi-scale cooperation deep learning, which at least partially solves the problems of poor adaptability and poor measurement accuracy in the prior art.

In a first aspect, an embodiment of the present invention provides a ballastless track slab crack measurement method based on multi-scale collaborative deep learning, including:

constructing a crack measurement frame of multi-scale collaborative deep learning, wherein the crack measurement frame comprises a depth target detection network, a depth semantic segmentation network and an improved orthogonal projection algorithm;

dividing a plurality of sample images into a training set and a verification set, wherein the sample images are corresponding images of a ballastless track slab containing cracks;

training the deep target detection network by the training set and the verification set, adjusting the hyper-parameters of the deep target detection network and outputting the optimal crack region extraction result corresponding to each sample image in the training set;

cutting boundary coordinates of each crack region extraction result to obtain a crack image, and inputting the crack image into the deep semantic segmentation network to adjust the hyper-parameters of the deep semantic segmentation network;

obtaining a crack measurement model according to the depth target detection network for adjusting the hyper-parameters, the depth semantic segmentation network for adjusting the hyper-parameters and the improved orthogonal projection algorithm;

and inputting the acquired target image corresponding to the target ballastless track slab into the crack detection model to obtain the continuous width value of the crack in the target image.

According to a specific implementation manner of the embodiment of the invention, the step of inputting the acquired target image corresponding to the target ballastless track slab into the crack detection model to obtain the continuous width value of the crack in the target image includes:

inputting the target image into the crack detection model to obtain a crack region extraction result corresponding to the target image;

performing boundary coordinate cutting on the crack region extraction result to obtain a cut image;

and obtaining a crack boundary pixel detection result according to the cutting image, and calculating the continuous width value according to the improved orthogonal projection algorithm.

According to a specific implementation manner of the embodiment of the present invention, the step of obtaining a crack boundary pixel detection result according to the cut image and calculating the continuous width value according to the improved orthogonal projection algorithm includes:

inputting the crack boundary pixel detection result into the improved orthogonal projection algorithm, and extracting a crack skeleton of a single pixel, wherein the direction of the crack skeleton is the tangential direction of each skeleton point derived from local adjacent skeleton points, and the center of the crack skeleton is a skeleton point;

tracking and matching the crack boundary pixel detection result along the anticlockwise direction and eight neighborhoods, and converting the binary boundary into a layering sequence by matching all boundary pixels to obtain a complete crack outline;

and extracting orthogonal projection rays from each skeleton point pixel to serve as a normal of the crack skeleton, and taking the Euclidean distance between two intersection points of two contour lines of each orthogonal projection ray and the crack as the continuous width value.

According to a specific implementation manner of the embodiment of the invention, the deep target detection network comprises an input end, a feature extraction module and a coordinate prediction module.

According to a specific implementation manner of the embodiment of the invention, the deep semantic segmentation network comprises an encoder and a decoder.

According to a specific implementation manner of the embodiment of the present invention, before the steps of training the deep target detection network with the training set and the verification set, adjusting a hyper-parameter of the deep target detection network, and outputting an optimal fracture region extraction result corresponding to each sample image in the training set, the method further includes:

and carrying out region marking on the cracks in each sample image.

According to a specific implementation manner of the embodiment of the present invention, the training set and the verification set to train the deep target detection network, adjusting the hyper-parameters of the deep target detection network, and outputting the optimal crack region extraction result corresponding to each sample image in the training set includes:

inputting all the training sets into the depth target detection network to obtain a prediction bounding box corresponding to each sample image;

calculating a loss function according to the prediction boundary frame and the labeling boundary frame of each sample image, adjusting the hyperparameter of the depth target detection network according to the loss function until the average precision after the verification with the verification set reaches a preset condition, fixing the hyperparameter of the depth target detection network, and outputting an optimal crack region extraction result corresponding to each sample image in the training set.

According to a specific implementation manner of the embodiment of the invention, the step of cutting the boundary coordinates of each crack region extraction result to obtain a crack image and inputting the crack image into the deep semantic segmentation network to adjust the hyper-parameters of the deep semantic segmentation network comprises the following steps:

inputting all the crack images into the depth semantic segmentation network to obtain a prediction pixel corresponding to each sample image;

and calculating cross entropy loss according to the prediction pixels and the marking pixels of each sample image, adjusting the hyper-parameters of the depth target detection network according to the cross entropy loss until the average precision after verification with the verification set reaches a preset condition, and fixing the hyper-parameters of the depth target detection network.

The ballastless track slab crack measurement scheme based on multi-scale cooperation deep learning in the embodiment of the invention comprises the following steps: constructing a crack measurement frame of multi-scale collaborative deep learning, wherein the crack measurement frame comprises a depth target detection network, a depth semantic segmentation network and an improved orthogonal projection algorithm; dividing a plurality of sample images into a training set and a verification set, wherein the sample images are corresponding images of a ballastless track slab containing cracks; training the deep target detection network by the training set and the verification set, adjusting the hyper-parameters of the deep target detection network and outputting the optimal crack region extraction result corresponding to each sample image in the training set; cutting boundary coordinates of each crack region extraction result to obtain a crack image, and inputting the crack image into the deep semantic segmentation network to adjust the hyper-parameters of the deep semantic segmentation network; obtaining a crack measurement model according to the depth target detection network for adjusting the hyper-parameters, the depth semantic segmentation network for adjusting the hyper-parameters and the improved orthogonal projection algorithm; and inputting the acquired target image corresponding to the target ballastless track slab into the crack detection model to obtain the continuous width value of the crack in the target image.

The embodiment of the invention has the beneficial effects that: by means of the scheme, the characteristics of three scales of image-pixel-width are analyzed and transmitted in a cooperative mode, pixel misjudgment caused by complex background is reduced, and a refined crack width measurement value is obtained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a ballastless track slab crack measurement method based on multi-scale collaborative deep learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a fracture measurement framework for multi-scale collaborative deep learning according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an overall framework of a deep target detection network according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a hole convolution module according to an embodiment of the present invention;

fig. 5 is a schematic flow chart of image data acquisition and annotation of a ballastless track slab crack provided by an embodiment of the invention;

fig. 6 is a schematic diagram of a training result of a deep target detection network according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a training result of a deep semantic segmentation network according to an embodiment of the present invention;

FIG. 8 is a block diagram of an improved orthographic projection based fracture width measurement method according to an embodiment of the invention;

FIG. 9 is a schematic diagram of a fracture skeleton extraction method according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating the definition of the skeleton direction of a crack according to an embodiment of the present invention;

FIG. 11 is a schematic diagram illustrating a calculation of a continuous width of a crack based on orthogonal projection according to an embodiment of the present invention;

FIG. 12 is a graphical representation of a fracture continuity width measurement of a partial image provided in accordance with an embodiment of the present invention;

fig. 13 is a schematic diagram of a crack boundary detection result of a different-deep learning algorithm on a test set according to an embodiment of the present invention;

FIG. 14 is a schematic diagram of a crack boundary detection result of four test images by a different-deep learning algorithm according to an embodiment of the present invention;

fig. 15 is a schematic diagram illustrating comparison between crack width measurement results of a multi-scale cooperation algorithm and a conventional orthogonal projection method according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than the number, shape and size of the components in practical implementation, and the type, quantity and proportion of the components in practical implementation can be changed freely, and the layout of the components can be more complicated.

In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.

The embodiment of the invention provides a ballastless track slab crack measurement method based on multi-scale cooperation deep learning, which can be applied to a crack data measurement process of a railway inspection scene.

Referring to fig. 1, a flow diagram of a ballastless track slab crack measurement method based on multi-scale collaborative deep learning is provided in an embodiment of the present invention. As shown in fig. 1, the method mainly comprises the following steps:

s101, constructing a multi-scale cooperative deep learning crack measurement frame, wherein the crack measurement frame comprises a deep target detection network, a deep semantic segmentation network and an improved orthogonal projection algorithm;

optionally, the deep target detection network includes an input end, a feature extraction module, and a coordinate prediction module.

Optionally, the deep semantic segmentation network includes an encoder and a decoder.

In specific implementation, considering that the refined measurement of the crack width is a key basis for describing and maintaining the crack state of the ballastless track slab of the high-speed railway, but the existing solution based on machine vision is only suitable for monotonous and uniform scenes, and detection omission or false detection is easily generated under a complex background condition to cause the deviation of the crack measurement result from a true value, a crack measurement framework which integrates three scales of images, pixels and width and is used for multi-scale cooperative deep learning can be established as shown in fig. 2, and the crack measurement framework comprises crack region extraction based on a depth target detection network, crack boundary detection based on a depth semantic segmentation network and crack width measurement based on an improved orthogonal projection method. The input image improves the adaptability to complex background conditions through the cooperative analysis of the image scale and the pixel scale and the characteristic transmission of the width scale, forms the refined mapping from the input inspection image to the output crack continuous width measurement value, and thus realizes the accurate depiction of the crack severity.

For example, a deep target detection network for fracture region extraction is first built. The deep target detection network used by the invention consists of an input end, a feature extraction module and a coordinate prediction module, and the deep target detection network packages region classification and coordinate regression into one network, wherein the confidence coefficient of a target in a boundary box and the probability of the class of the target are directly captured from a complete image in one-time evaluation, so that real-time and efficient end-to-end detection is realized, as shown in fig. 3.

The multi-scale crack image enters a depth target detection network from an input end and is subjected to feature extraction in the form of three-dimensional tensor, wherein each dimension respectively represents height, width and channel (red, green and blue). The input divides an input image into an S x S grid. If the center of an object falls within a grid cell, the grid cell is responsible for detecting the object.

The feature extraction module of the deep target detection network adopts a Darknet-53 structure (comprising layers 0 to 105). The convolution layers use a large number of convolution kernels (filters) with different sizes, such as 1, 3 and 5, to perform convolution operation with the neighborhood of the input image (the weight parameter is used for performing inner product with the data of the local window of the input image matrix), and then slide from the upper left of the image to the lower right with a certain step length, and output and calculate the convolution characteristic map of the whole input image. Each residual unit contains a 3x3 and a 1x1 convolution kernel, where each convolution kernel contains three layers of operation: convolution operation, BN operation, and leakage ReLU activation operation. At the end of each residual unit, an element-level addition is performed between the input and output vectors. The residual error unit simplifies the training of the deep network and can effectively solve the problem of gradient disappearance. Further, layers 75-105 are feature interaction layers of the YOLOv3 network, which obtain features from three dimensions (32, 16, and 8), respectively. At each scale, convolution kernels (3 × 3 and 1 × 1) implement local feature interaction between feature maps, and a full-link layer implements global feature interaction.

The coordinate prediction module of the depth target detection network constructs a multi-scale feature pyramid with three scale features (32, 16 and 8) using a single resolution input image. Each layer of the pyramid can be used to detect objects of different scales, where a lower resolution feature map with a larger step size can produce a very coarse representation of the input image, while a feature map with a smaller step size with a higher resolution has finer granularity of features. Further, the multi-scale feature pyramid predicts three prediction boxes per grid cell on three output feature maps of different resolutions. Each prediction box consists of a confidence variable, four coordinate variables (bx, by, bh, bw), and a variable representing a class. These prediction variables are converted to confidence, class probability, and location of the object, respectively, to generate a prediction of the fracture region. The calculation process of the coordinates of the prediction box is as in formula 1:

wherein, c_xAnd c_yDenotes the grid cell coordinate, p, to which the prediction box belongs_wAnd p_hHeight and weight, t, representing the predefinition of the prediction box_xAnd t_yRepresenting depth target detectionMeasuring offset coordinates, t, predicted by the network_wAnd t_hA scale offset representing the net prediction result.

And then further establishing a deep semantic segmentation network for crack boundary detection. The deep semantic segmentation network used in the present invention is an encoder-decoder architecture for semantic segmentation, as shown in fig. 3. In an Encoder module (Encoder), the network calculates deep features of the crack by taking a crack region extraction result output by a depth target detection network as an input of an inclusion-ResNet-v 2 (network backbone), controls a receptive field to extract multi-scale semantic information while not changing the size of a feature map by using a hole convolution module (Atrous), and then enters a porous space pyramid pooling module (ASPP) to screen and capture the most discriminative crack feature map. Finally, the extracted feature map of the most discriminative whole image is input into a Decoder module (Decoder). The decoder module decodes the input feature map through an upsampling operation, fuses with corresponding low-level features from an inclusion-ResNet-v 2 (network backbone) network and restores the low-level features to a spatial dimension consistent with the input image, and therefore fine detection of boundary pixels in a crack region is achieved. The specific architecture of the encoder module is as follows:

an inclusion-ResNet-v 2 network composed of an inclusion-respet module and a Reduction module is used as a network main trunk for crack pixel feature extraction. The inclusion module combines convolution layers in a parallel mode, the width and structural nonlinearity of a network are increased, the parameter quantity of the whole network is reduced, and compared with a direct serial combination mode of traditional models (AlexNet and VGGNet), the inclusion module can accelerate the calculation speed and excavate deeper image features. The Resnet module is directly connected with the input layer and the output layer of the increment-Resnet module, the difference value of the input layer and the output layer is directly learned, and the difference value is added into the result output by the input layer, so that the problem that the gradient caused by excessive network layers disappears can be effectively solved, and the accuracy of pixel detection is improved. The Reduction module aggregates effective information in the characteristic information extracted from the crack image in a mode of halving a data space and deepening a channel and the like.

The cavity convolution module (Atrous) is a key component of the encoder module, can control the receptive field without changing the size of the characteristic diagram, and is favorable for extracting multi-scale information. The hole convolution module is shown in FIG. 4, where the sampling rate controls the size of the field, and the larger the value, the larger the field. The deep semantic segmentation network used herein further combines the hole convolution module and the deep separation convolution, which has the advantages that: deep separation convolution can greatly reduce computational complexity while maintaining the same performance.

The encoder uses a porous spatial pyramid pooling module (ASPP) to further extract multi-scale information, which uses hole convolution at different sampling rates to achieve this. The ASPP module mainly comprises a 1x1 convolution layer, three 3x3 cavity convolutions and a global average pooling layer to obtain image-level features, then the image-level features are sent into a 1x1 convolution layer and bilinearly interpolated to the original size, the features of different scales obtained by the convolution layers are further combined together, and then the combined features are sent into a 1x1 convolution layer to be fused to obtain a crack feature map with the most distinguishing capability.

According to the detection result of the crack boundary pixel obtained by the two networks, an improved crack width measurement method based on orthogonal projection is provided.

S102, dividing a plurality of sample images into a training set and a verification set, wherein the sample images are corresponding to ballastless track slabs containing cracks;

for example, a high-resolution linear array camera mounted on a high-speed comprehensive track inspection vehicle can be used for scanning the surface of a CRTS III type ballastless track slab of a certain high-speed railway section to acquire a target data set required by training. Because the high-speed railway track structure is generally in a natural environment and is influenced by superposition of natural illumination and a camera light source, and is interfered by complex environmental conditions such as vibration and noise of the inspection vehicle, the acquired original images have complex irregular backgrounds such as noise and dirt. The original images are processed by the following steps and then enter a fracture measurement framework of multi-scale collaborative deep learning for analysis and calculation, as shown in fig. 5:

(1) the pixel size of the acquired original patrol inspection image is 4096 pixels × 4096 pixels, and the corresponding actual physical area is 235.5mm × 235.5 mm.

(2) The image was reset to 400 pixels by 400 pixels for a total of 3000 sheets.

(3) 500 images containing cracks are selected as sample data of the method, 10% of 500 images are selected as a verification set each time, the rest images are used as training sets, 10 groups of training sets and verification sets are obtained, and 30 images are additionally selected as test sets.

S103, training the deep target detection network by the training set and the verification set, adjusting the hyper-parameters of the deep target detection network and outputting the optimal crack region extraction result corresponding to each sample image in the training set;

optionally, before the training set and the verification set are used to train the deep target detection network, the hyper-parameter of the deep target detection network is adjusted, and the optimal fracture region extraction result corresponding to each sample image in the training set is output in step S103, the method further includes:

and carrying out region marking on the cracks in each sample image.

Further, in step S103, training the deep target detection network with the training set and the verification set, adjusting a hyper-parameter of the deep target detection network, and outputting an optimal crack region extraction result corresponding to each sample image in the training set, includes:

For example, labelme may be used to perform region labeling (bounding box) on 10 groups of < training set, validation set > of cracks in the image, save in a json file format, obtain a reference standard of the depth target detection network, and further perform pixel-by-pixel labeling on crack boundaries in the box to obtain a reference standard of the depth semantic segmentation network.

Then 10 groups of < training set, verification set > images marked by the crack regions and corresponding json files are input into the deep target detection network for full learning and training, and detection results of the crack target regions are output. The method takes a GPU as a computing core (CPU: AMD2990WX @3.0GHz, RAM: 64GB, GPU: NVIDIA GeForce RTX 2080Ti), relies on a Facebook open source deep learning frame PyTorch1.2.0, performs a hyper-parameter adjustment experiment based on 10 groups of < training set, verification set >, and determines the optimal hyper-parameter setting according to a training set loss function and a performance evaluation index of a deep target detection network so as to output the optimal crack region extraction result. As shown in table 1:

TABLE 1

The loss function (loss) used by the deep target detection network is defined as the sum of square errors between the prediction bounding box and the labeling bounding box, and the coordinates and the confidence value of the prediction bounding box can be corrected according to the numerical value of the loss function, as shown in formula 2:

in the formula:

coordinates representing a prediction box; (x)_i,y_i,w_i,h_i,c_i,p_i) Coordinates representing the label box;

representing whether the object appears in the bounding box, if so, the value is 1; and none are 0.

In step S103, the average value (MAP) of the Average Precision (AP) of all the class detection objects is used to evaluate the performance of the deep target detection network, and the MAP calculation step is as follows:

(1) the first step is to calculate the average cross ratio (IoU), IoU is defined as the overlapping degree between the prediction region and the label region, and its mathematical expression is as formula 3.

(2) The second step distinguishes between positive and negative samples according to IoU thresholds and sorts all predicted confidences in descending order. IoU is typically predefined (typically set to 0.5), and the prediction bounding box is defined as a positive sample when IoU between the prediction bounding box and the annotation bounding box is greater than this threshold, and a negative sample otherwise. In addition, confidence thresholds for the prediction bounding box are also used to distinguish between positive and negative predictions. When IoU for the prediction result is greater than 0.5 and the prediction is correct, a True Positive (TP) is indicated, when IoU for the prediction result is less than 0.5 or the prediction is wrong, a False Positive (FP) is indicated, and when IoU with the labeled bounding box is not present, a False Negative (FN) is indicated, which indicates that the model cannot detect any object label from the manual annotation.

(3) Precision (precision) and recall (recall) are calculated from TP, FP and FN, precision being defined as the proportion of correctly detected targets to the total number of detections, recall being defined as the proportion of correctly detected targets to the actual total number of targets, and mathematical expressions such as equations (4) and (5). And further calculating the precision rate and the recall rate corresponding to different confidence coefficient thresholds to draw a P-R curve, wherein the MAP value is obtained by integrating the P-R curve and represents the area enclosed by the P-R curve and the coordinate axis.

The training result of the deep target detection network in step S103 is shown in fig. 6, and as shown in (a) in the figure, through 1000 rounds of sufficient training and learning on 450 images, both the training set loss and the MAP are converged and stabilized, and at this time, the deep target detection network reaches a fitting state. Further calculating the area under the P-R curve drawn based on 50 verification set images, and obtaining a MAP value of 93.36% for the depth target detection network for fracture region extraction, as shown in fig. 6 (b).

S104, cutting boundary coordinates of each crack region extraction result to obtain a crack image, and inputting the crack image into the deep semantic segmentation network to adjust the hyper-parameters of the deep semantic segmentation network;

on the basis of the above embodiment, the step of performing boundary coordinate clipping on each crack region extraction result to obtain a crack image and inputting the crack image into the deep semantic segmentation network to adjust the hyper-parameters of the deep semantic segmentation network in step S104 includes:

In specific implementation, after the boundary coordinates of each crack region extraction result are cut to obtain crack images, all the crack images can be used as the input of the depth semantic segmentation network. And extracting a feature map with rich semantic information from the input crack region through an encoder module of the deep semantic segmentation network, further inputting the extracted feature map into a decoder module, recovering the space dimension consistent with the output result of the step S103, and outputting a crack boundary pixel detection result of a pixel scale. The optimal hyper-parameter setting of the used deep semantic segmentation network on 450 training set images is shown in table 2, and the training effect and crack boundary detection performance of the deep semantic segmentation network are evaluated by adopting cross entropy loss and MIoU distribution.

TABLE 2

The error between the output predicted pixel and the labeled pixel can then be defined by adopting cross entropy loss to evaluate the training effect of the deep semantic segmentation network, and the mathematical expression of the error is shown as the formula (6):

in the formula: loss is the total loss of DeepLabv3 +; y is a tag value;

is a predicted value.

Meanwhile, the detection performance of deep semantic segmentation networks such as deep labv3+ can also be evaluated by using MIoU, which takes the average value of IoU of different classes of objects in the invention and is used for measuring the overlapping degree between predicted crack pixels belonging to each class and marked crack pixels, and the mathematical expression of the method is as shown in formula (7):

in the formula: n is the class of the detection object.

The training result of the deep semantic segmentation network in step S104 is shown in fig. 7, where (a) indicates that the training set loss and MIoU are converged and stabilized by 500 rounds of sufficient training and learning on 450 images, and at this time, the deep semantic segmentation network reaches a fitting state. Further statistics of the average value of IoU over 50 validation set images gave a MIoU of 82.99% for the deplab v3+ network, as shown in fig. 7 (b).

S105, obtaining a crack measurement model according to the depth target detection network for adjusting the hyper-parameters, the depth semantic segmentation network for adjusting the hyper-parameters and the improved orthogonal projection algorithm;

after the hyper-parameters of the depth target detection network and the depth semantic segmentation network are adjusted, a crack measurement model is obtained by combining the improved orthogonal projection algorithm, and then the crack measurement model can be used for directly measuring the collected crack image and outputting the continuous width value of the crack in the image.

And S106, inputting the acquired target image corresponding to the target ballastless track slab into the crack detection model to obtain the continuous width value of the crack in the target image.

On the basis of the foregoing embodiment, step S106 is to input the acquired target image corresponding to the target ballastless track slab into the crack detection model to obtain a continuous width value of the crack in the target image, and includes:

Further, the step of obtaining a crack boundary pixel detection result according to the cut image and calculating the continuous width value according to the improved orthogonal projection algorithm includes:

In specific implementation, an improved crack width measurement method based on orthogonal projection is provided according to the detection result of the crack boundary pixels obtained by integrating the two networks. The general framework of the improved orthogonal projection-based crack measurement method is shown in fig. 8, and specifically includes the following steps:

the first step is to define crack blocks and extract crack outlines according to the crack boundary pixel detection result of the depth semantic segmentation network. The crack block is the basic unit for performing crack quantitative analysis and is defined as a set of pixels representing a single crack. The crack blocks can well represent the natural contour characteristics of the cracks in nature. However, since fractures exhibit a complex pattern of non-planar, multi-branching extensions that intersect one another, it is necessary to segment the interconnected fractures into individual fracture pieces prior to fracture geometric analysis. The invention redefines the crack blocks: i.e. non-intersecting sets of slit pixels with elongated features. According to this definition, each fracture block should have two characteristics: one is that there are two end points but no intersection; and the other is that the slab shows a slender pattern layout, and the two points are important characteristics for distinguishing cracks, stripping and chipping or other diseases of the ballastless track slab. And according to the defined crack block, tracking and matching the edge pixels of the crack block along the anticlockwise direction and the eight neighborhoods. And converting the binary edge into a layering sequence by matching all the edge pixels, thereby obtaining a complete crack profile.

Secondly, calculating the crack direction according to the obtained crack skeleton information. The fracture skeleton is a very important geometrical characteristic for representing the fracture, is used for positioning the basic structure of the fracture in each fracture block, and is also an important component in intelligent analysis of the fracture. The skeleton of the crack block proposed by the skeleton extraction algorithm is a central white curve, and the extracted skeleton is pruned, as shown in fig. 9, where (a) is the skeleton of the crack block and (b) is the image of the skeleton after pruning.

And thirdly, determining the direction of the crack skeleton and calculating the normal line of the skeleton, wherein the direction of the crack skeleton is defined as the tangential direction of each skeleton point derived from local adjacent skeleton points, and the center of the crack skeleton is a skeleton point, as shown in figure 10. It should be noted that the defined fracture directions are different, and the fracture skeleton is determined according to the whole fracture block. The nature of the framework points in a fractured framework is generally two things, one being non-terminal and the other being terminal. For each non-endpoint, a 5 × 5 block matrix is constructed on the skeleton map centered on the skeleton point, and the skeleton direction is determined by the overlapping region of the matrix and the skeleton (lower left color block in fig. 10). For the end point at the end of the skeleton, a 5 × 5 block matrix is constructed on the skeleton map with the end point as the center, but an extension block (upper right extension block in fig. 10) needs to be constructed to the inner skeleton point at the same time. The matrix is aligned with each point on the skeleton and moves the same pixel unit in parallel and vertically, and then the direction of the crack skeleton is determined. The pixel distance of the matrix movement can be adjusted according to user input. A smaller matrix is required to determine the change in the direction of the local fracture block, so the present invention uses a 5 x 5 block matrix. The direction of the matrix theta is determined by extracting connected components within the block matrix and calculating the second moment of each pixel matrix. As indicated by the oval arrows in fig. 10. Since the normal direction of each skeleton point is perpendicular to the orientation direction, the normal direction of each skeleton point is θ + π/2.

Fourthly, expanding the skeleton normal to the whole crack outline to further calculate the continuous width of the whole crack. The identified fracture skeleton normal portion between the two fracture profiles is defined as the fracture width at that point, as shown by the curve in fig. 11 (a). Wherein, the thinner curve represents the extracted crack skeleton, the thicker curve represents the orthogonal projection ray derived from each skeleton point pixel, and each orthogonal projection ray and the crack profile have two intersection points, namely the dots in fig. 11. Thus, the Euclidean distance between two intersection points can be used to determine the continuous width of a certain point in the crack.

Specifically, the euclidean distance between the two intersection points is defined as the continuous width of a certain point inside the crack as shown in equation 8. The values of the crack continuation widths of the partial images output in step S5 are shown in fig. 12.

And further calculating fracture severity depicting indexes including a minimum width, a maximum width, an average width (formula 9), a median width (formula 10) and a standard deviation (formula 11) according to the distribution of the fracture continuous widths output in the steps so as to comprehensively and systematically depict the fracture severity.

In the formula: a is_ijIs the continuous width of a point in the crack; x is the number of_i，x_jC and R are left; c is a crack contour point set; r is an orthogonal projection ray point set; n is a crack continuous width measurement point set;

is the average width of the crack;

median width of the fracture;

is the standard deviation.

According to the ballastless track slab crack measuring method based on the multi-scale collaborative deep learning, through collaborative analysis and transmission of three-scale image-pixel-width characteristics, pixel misjudgment caused by a complex background is reduced, and a refined crack width measurement value is obtained.

The following describes a solution with reference to a specific embodiment, and the output result of the multi-scale collaborative deep learning-based crack measurement framework constructed by the present invention is compared with a conventional solution to evaluate the crack measurement performance of the present invention, including comparison with a single-pixel-scale deep learning algorithm and comparison with a conventional orthogonal projection method, as follows:

(1) in the first step, 30 test set images are processed by respectively using the multi-scale cooperation deep learning-based crack measurement framework and the U-Net and deep Labv3+ network, so that corresponding crack boundary detection results are obtained, and the performances of various solutions are compared from three aspects of MIOU, calculation cost and detection results.

(ii) MIOU and computational cost for various solutions. FIG. 13 shows the crack boundary detection results of the depth semantic segmentation network of the present invention and the existing two pixel scales for 30 test set images, wherein the computational cost (single image processing time) of the U-Net network is the lowest, but the MIoU on the test set is only 68.8%, while the MIoU of the DeepLabv3+ network reaches 77.59%, but also brings higher computational cost. The total computational cost of the multi-scale cooperation algorithm in the image-pixel scale cooperation analysis is composed of two parts: crack region extraction (yellow region) and crack boundary detection (green region). Compared with the DeepLabv3+ network of a single pixel scale, although the total computation cost consumed by the multi-scale cooperation algorithm is slightly higher (the processing time of a single image is 0.03s longer), the MIoU is improved by nearly 7%, and the crack region pre-extraction of the image scale also enables the computation cost of the algorithm to be lower than that of the DeepLabv3+ when the crack boundary is detected at the pixel scale.

② crack boundary detection results of various solutions. Fig. 14 further compares the crack boundary detection results of the four test images of the depth semantic segmentation network of the two pixel scales of the present invention with the existing crack boundary detection results of the two pixel scales, the U-Net network has a poor detection effect on the crack details, and is prone to generate discontinuous crack boundaries, and both U-Net and deep labv3+ are prone to misjudge background pixels with highly similar noise, contamination, and the like to the cracks as crack pixels (indicated by yellow circles in fig. 13). Compared with U-Net and Deeplabv3+, the multi-scale cooperation algorithm effectively eliminates noise and fouling interference outside the target region when extracting the crack region in an image scale, and can obtain a more refined detection result by carrying out pixel scale boundary segmentation on the crack region subjected to coordinate cutting.

(2) And secondly, processing 30 test set images by using the crack measurement framework based on multi-scale collaborative deep learning and the traditional orthogonal projection method, which are set up by the invention, respectively to obtain corresponding crack continuous width measurement results, and comparing crack severity depicting indexes output by the two solutions to evaluate the detection performances of the crack measurement framework and the traditional orthogonal projection method. Fig. 15 compares the calculation results of the crack severity quantification indicators of the present invention and the conventional orthographic projection method for four test images (400 pixels × 400 pixels), including the minimum width, the maximum width, the average width, the median width, and the standard deviation. In fig. 15, the histogram of (a) shows the calculation result of the conventional orthogonal projection method, and the dot line graph shows the calculation result of the multi-scale cooperation algorithm. The maximum width values of the first test image and the second test image which are calculated by the traditional orthogonal projection method are more than 8 times of the calculation result of the multi-scale cooperation algorithm and are far larger than the average width measurement value of the first test image and the second test image, which shows that the traditional orthogonal projection method is easy to misjudge the complex backgrounds which are similar to the height of the crack, such as noise, dirt, and the like, as the crack pixels, so that the skeleton is deformed, and the maximum width measurement value has larger deviation. Further, the minimum width value of the crack of the four test images can be calculated by the multi-scale cooperation algorithm, and the minimum width value of the crack calculated by the traditional orthogonal projection method is approximately 0, because the Canny operator adopted by the traditional orthogonal projection method in the crack pixel contour extraction stage is easy to generate discontinuous crack boundaries, the true minimum width value of the crack is difficult to obtain. In addition, the difference between the average width and the median width of the crack of the four test images calculated by the multi-scale cooperation algorithm is less than 0.1mm, and the difference can reach 0.3mm for the traditional orthogonal projection method. In fig. 15, (b) further compares the standard deviation of the continuous widths of the cracks measured by the two methods, which measured more than 350 continuous width values along the skeleton line of the crack on each of the four test images, wherein the standard deviation of the continuous width measurement result of the crack of the multi-scale cooperation algorithm is below 2, while the standard deviation of the second test image of the traditional orthogonal projection method reaches 18 once, which shows that the multi-scale cooperation algorithm has more excellent measurement stability and reliability. The comparison result shows that the multi-scale cooperation algorithm can realize more practical and stable fracture severity characterization under the interference of complex background conditions such as noise, fouling and the like.

The comparison and analysis result shows that the deep learning algorithm based on multi-scale cooperation can overcome the defect that the measurement result deviates from the true value due to easy omission or false detection of the existing machine vision scheme under the condition of a complex background, has high crack width measurement precision and low calculation cost, and has excellent adaptability and robustness to the complex background with noise, dirt and the like similar to the crack height.

The beneficial effects of the invention can be seen to include:

1. the invention provides a deep learning algorithm based on multi-scale cooperation, and effectively solves the problem that the boundary detection of the surface crack of the ballastless track plate of the high-speed railway by the existing single-pixel-scale deep semantic segmentation network is easily interfered by complex environmental conditions, so that the detection result is seriously deviated from the actual condition.

2. The deep learning algorithm based on multi-scale cooperation can remove most of pixel misjudgments caused by noise and contamination, and shows good adaptability to complex background conditions highly similar to cracks.

3. The characterization index of the severity degree of the surface crack of the ballastless track slab, which is calculated by the deep learning algorithm based on multi-scale cooperation, is closer to the actual situation, the fluctuation (standard deviation) is far lower than that of the traditional orthogonal projection method (the maximum amplitude can be 90%), and the crack width measurement performance of the algorithm has the characteristics of high precision, good stability and good reliability.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A ballastless track slab crack measurement method based on multi-scale collaborative deep learning is characterized by comprising the following steps:

2. The method according to claim 1, wherein the step of inputting the acquired target image corresponding to the target ballastless track slab into the crack detection model to obtain the continuous width value of the crack in the target image comprises:

3. The method of claim 2, wherein the step of deriving a crack boundary pixel detection result from the cropped image and calculating the continuous width value according to the improved orthographic projection algorithm comprises:

4. The method of claim 1, wherein the deep target detection network comprises an input, a feature extraction module, and a coordinate prediction module.

5. The method of claim 1, wherein the deep semantic segmentation network comprises an encoder and a decoder.

6. The method of claim 1, wherein before the steps of training the training set and the validation set to the deep target detection network, adjusting hyper-parameters of the deep target detection network, and outputting an optimal fracture region extraction result corresponding to each sample image in the training set, the method further comprises:

and carrying out region marking on the cracks in each sample image.

7. The method of claim 6, wherein the training set and the validation set to train the deep target detection network, adjusting the hyper-parameters of the deep target detection network, and outputting the optimal fracture region extraction result corresponding to each sample image in the training set comprises:

8. The method according to claim 6, wherein the step of performing boundary coordinate clipping on each crack region extraction result to obtain a crack image and inputting the crack image into the deep semantic segmentation network to adjust the hyper-parameters of the deep semantic segmentation network comprises:

and calculating cross entropy loss according to the prediction pixel and the marking pixel of each sample image, adjusting the super parameters of the depth target detection network according to the cross entropy loss until the average precision after verification with the verification set reaches a preset condition, and fixing the super parameters of the depth target detection network.