CN115908276A

CN115908276A - Bridge apparent damage binocular vision intelligent detection method and system integrating deep learning

Info

Publication number: CN115908276A
Application number: CN202211327687.0A
Authority: CN
Inventors: 张治成; 刘金童; 张鹤; 沈芷菁
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-10-27
Filing date: 2022-10-27
Publication date: 2023-04-04

Abstract

The invention discloses a binocular vision intelligent detection method and system for bridge apparent damage fused with deep learning, wherein the method comprises the following steps: respectively constructing a training data set of each bridge part, each bridge member and each damage type; respectively constructing a part identification model, a component identification model and a damage identification model based on the improved VGG16 model to obtain a three-level pre-classification model; constructing a pixel-level detection model for each impairment category based on the improved encoder-decoder FCN model; then respectively training a multi-level pre-classification model and a pixel-level detection model by using corresponding training data sets, and carrying out online image detection to obtain the damage number, the form profile and the dip angle trend of the pre-classified bridge apparent image; and (3) performing three-dimensional reconstruction and quantitative measurement on the damaged image form contour based on the central projection model of the binocular stereo vision to obtain the damage size. The invention can output comprehensive apparent damage information such as position, category, size and the like, and provides objective basis for the evaluation of the apparent state of the serving bridge.

Description

Bridge apparent damage binocular vision intelligent detection method and system integrating deep learning

Technical Field

The invention relates to the field of bridge structure appearance information detection, in particular to a binocular vision intelligent detection method and system for bridge appearance damage by integrating deep learning.

Background

In recent years, a bridge detection method based on machine vision assisted by an unmanned aerial vehicle is gradually started, and the apparent damage of the bridge is efficiently and accurately detected by combining a deep learning method with a few researches. However, the existing bridge detection system has the following problems in the aspect of apparent information detection: (1) The apparent damage information is incompletely extracted or the information is not integrated, and the artificial detection result is still required to be combined, so that the method can be further applied to bridge apparent state evaluation; (2) Specific damage characteristics on different members of different bridge parts are not considered, so that similar damages at different bridge positions are mutually interfered, and the damage identification precision is reduced; (3) The damage severity is not considered to correspond to different influence weights at different bridge positions, so that the measurement result cannot directly provide reference for quantitative evaluation of damage; (4) The training thought of the related damage identification or detection model is single, and the application range is limited.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a binocular vision intelligent detection method and system for bridge apparent damage by integrating deep learning, and the specific technical scheme is as follows:

a binocular vision intelligent detection method for apparent damage of a bridge fused with deep learning comprises the following steps:

the method comprises the following steps: acquiring an apparent image of a bridge through a binocular shooting system, marking the position of the bridge, a specific component of the bridge where a damage is located and the damage type of each image, classifying the images according to the positions of the bridge, and respectively constructing a training data set of each bridge position; then, classifying the image of each bridge part according to the concrete member of the bridge where the damage is located, and respectively constructing a training data set of each bridge member of each bridge part; finally, classifying the images of each bridge member according to the damage types, and constructing a training data set of each damage type of each bridge member of each bridge part;

step two: constructing an improved VGG16 model, namely modifying the number of neurons of three full-connection layers of the VGG16 model into 1024, 512 and the number of recognition categories, and then training the improved VGG16 model by using the training data set of each bridge part respectively to obtain a part recognition model corresponding to each bridge part; then, training a part recognition model of a corresponding bridge part by using the training data set of each bridge member to obtain a member recognition model of the corresponding member; finally, training the member identification model of the corresponding member by using the training data set of each damage type of each bridge member of each bridge part to obtain the damage identification model of each damage type of the corresponding bridge member; thereby obtaining a three-level pre-classification model;

step three: constructing an improved encoder-decoder FCN model, namely, adopting an encoder network based on VGG19 as a feature extraction network, and overlapping the maximum pooling index of an encoder to the corresponding decoder transpose convolution output with the same resolution ratio for fusion, thereby detecting the damage detail features which are easy to ignore; the encoder network generates a characteristic diagram according to an input bridge apparent image, then the characteristic diagram is input into a decoder network to generate a dense prediction diagram, the dense prediction diagram is processed by a Softmax layer to obtain the class probability of each pixel, and the damaged pixel level detection is completed; training an improved encoder-decoder FCN model by using the bridge apparent image of each damage category as a training set to obtain a pixel-level detection model corresponding to each damage category;

step four: inputting the bridge appearance image to be predicted into the part recognition model, recognizing the bridge part corresponding to the image, then inputting the bridge appearance image to be predicted into the component recognition model corresponding to the bridge part, and recognizing the component corresponding to the image; finally, inputting the bridge apparent image to be predicted into the damage identification model of the corresponding component, and outputting the damage type of the bridge apparent image to be predicted; and inputting the bridge apparent image to be predicted into a pixel-level detection model corresponding to the damage category, and outputting the damage number, the form profile and the inclination trend of the bridge apparent image to be predicted.

Step five: constructing a central projection model based on binocular stereo vision, and adopting a binocular stereo vision model with crossed optical axes as a damage positioning model; extracting and searching matching point pairs between the bridge apparent image pair to be measured through a SIFT algorithm and a nearest neighbor algorithm, selecting three point pairs from the matching point pairs through a random algorithm, and inputting the three point pairs into the damage positioning model to obtain a space damage plane; and the space damage plane and the corresponding pinhole camera model form a central projection model, then three-dimensional reconstruction is carried out on the damage form contour output by the bridge apparent image to be measured corresponding to the central projection model in the fourth step, the damage quantitative measurement is completed, and the damage size of the bridge apparent image to be measured is output.

Further, when a training data set is constructed based on the bridge appearance image, firstly, manual classification and calibration are carried out on a small amount of acquired data to form a small training data set, then, an improved VGG16 model is quickly trained by using the small training data set, the obtained three-level pre-classification model is used for predicting the unclassified bridge appearance image, manual check is completed according to the classification result, the image after the manual check is supplemented into the small training data set, and semi-automatic expansion of the original data set is completed.

Furthermore, the pixel level detection models are five types, namely a pitted surface detection model, a broken hole detection model, a crack detection model, an exposed rib detection model and a concrete peeling detection model.

Furthermore, when the improved VGG16 model is trained, a ReLU function is adopted as an activation function in the convolution process, and the weights of partial convolution layers close to the input end of the model are kept unchanged, namely, the convolution layers are frozen, so that the effect of retaining partial learned characteristics is achieved, and parameter overfitting is prevented; and keeping the weights of partial convolutional layers close to the output end to be updated continuously, namely unfreezing the convolutional layers, so that the model has stronger learning capability and learns new characteristics continuously.

Furthermore, the improved VGG16 model has 5 convolution blocks in total, three convolution blocks close to the input end of the model are frozen, and two convolution blocks close to the output end of the model are unfrozen; the volume blocks are separated by a max-pooling layer.

Further, after the semi-automatic expansion of the original data set is completed, data enhancement is performed on the existing bridge appearance image, and a training data set is constructed by using the enhanced image.

A bridge apparent damage binocular stereoscopic vision intelligent detection system fused with a deep learning algorithm is used for realizing a bridge apparent damage binocular vision intelligent detection method, and comprises the following steps:

the bridge image acquisition module is used for acquiring an apparent image of the bridge by a binocular shooting system and uploading the image in real time;

the multi-level pre-classification module comprises a part identification model, a component identification model and a damage identification model, and is used for identifying the shot part of the bridge apparent image, the component where the damage is located and the damage type step by step;

the damage pixel level detection module is used for detecting the number, the form profile and the inclination trend of the specific types of damage;

and the classifier is used for classifying and associating the bridge apparent image according to the output result of the previous-stage model in the training and verification processes of the part identification model, the member identification model, the damage identification model and the pixel-level detection model, and calling the corresponding next-stage model to finish the transmission of the bridge apparent image among the models at all stages.

The damage measuring module consists of a damage positioning model based on binocular vision and a central projection model and is used for reconstructing damage form contours and quantitatively measuring damage sizes;

and the database is used for storing the bridge apparent information characteristic images.

Further, the system also comprises a data enhancer which is used for carrying out transformation enhancement operation on the bridge apparent image and expanding the number of the bridge apparent images.

The invention has the following beneficial effects:

(1) The invention constructs a multi-level pre-classification model based on a multi-level classification idea, comprehensively considers the damage characteristics of different components at different bridge positions, avoids the mutual interference of similar damages at different positions and effectively improves the damage identification effect.

(2) The damage size is measured quantitatively by applying a binocular stereo vision technology, the influence of the damage severity on a specific component of a specific bridge part is further considered, and the condition rating of the specific bridge component is determined conveniently according to a corresponding bridge inspection standard. According to the method, the bridge concrete surface is positioned by using binocular stereo vision, and the damaged form outline is projected on the bridge concrete surface in a central projection mode to complete damaged form outline reconstruction.

(3) The method has the advantages that the intelligent detection of the apparent damage of the bridge is realized, the high precision and the high efficiency are considered, the comprehensive apparent damage information such as the position, the category and the size is output, the digital twinning of the damage information of the bridge is realized, and the objective basis is provided for the evaluation of the apparent state of the service bridge.

(4) And an extended training data mechanism is provided, so that the parameters and the performance of the multi-level pre-classification model and the damage detection model can be continuously updated and improved in the later application process, and the model precision and the generalization capability are improved.

Drawings

FIG. 1 is a flow chart of a bridge apparent damage binocular stereo vision intelligent detection method fused with a deep learning algorithm;

FIG. 2 is a general framework diagram of a VGG 16-based multi-level pre-classification model;

FIG. 3 is a diagram of a modified encoder-decoder FCN model network architecture;

FIG. 4 is a graph of comparison of accuracy at different epochs;

FIG. 5 is a diagram of a central projection model based on binocular stereo vision;

FIG. 6 is an exemplary graph of pixel level detection and measurement of common crack damage.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and preferred embodiments, and the objects and effects of the present invention will be more apparent, it being understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.

As shown in FIG. 1, the binocular vision intelligent detection method for bridge apparent damage by fusing deep learning comprises the following steps:

s1: constructing a bridge apparent information detection data set;

carrying out manual classification on each bridge image acquired by a binocular shooting system carried by the unmanned aerial vehicle according to damage characteristics to obtain a small-sized preliminary data set. And (3) rapidly training an improved VGG16 model by using the data set, and obtaining a better model based on the small data set after adjusting the hyper-parameters and the network architecture. And training the unsorted bridge apparent image by using the initial model, and supplementing the newly added data into the existing data set after manually checking the classification result to obtain a semi-automatically expanded original data set.

The bridge damage includes but is not limited to pitted surface, pits, exposed ribs, cracks, peeling and the like, and the encoder-decoder FCN damage segmentation data set which is calibrated also includes a non-damaged image. The parts and members of the bridge and the damage list corresponding to each member are shown in table 1. And randomly dividing the bridge apparent damage detection data set constructed in the step S1 into 5:1, wherein the former is used as a training set training model, and the latter is used as a test set evaluation model quality. And finally, obtaining a training data set of each bridge part, a training data set of each bridge member of each bridge part and a training data set of each damage type of each bridge member of each bridge part.

TABLE 1 categorical catalogues of bridge appearance information

More importantly, the method carries out data enhancement on the training set, including angle rotation, translation, miscut transformation, scaling and horizontal overturning operation on the image, and reduces the overfitting level of the recognition result.

S2: constructing a three-stage pre-classification model based on the improved VGG16 model;

as shown in FIG. 2, the convolutional neural network model is improved on the basis of the original VGG16 model, a 'Tensorflow + Python' system framework is used for constructing the convolutional neural network model, the model is formed by fully connecting 5 convolutional blocks and three layers, a specific convolutional layer is separated by a pooling layer, and the conditions of each structural layer of the convolutional neural network are listed in Table 2.

TABLE 2VGG neural network architecture conditions

Note: n denotes the number of classes of the classification network.

The VGG has five blocks, and each block is formed by superposing 2-3 convolution layers on one pooling layer. Wherein the convolution kernel size employed in the convolutional layer is 3 × 3. The convolution layer neuron convolution calculation formula is as follows:

wherein X is a two-dimensional vector with the region of (M, N), and omega _ij And b is a convolution kernel, b is an output characteristic additional bias term, sigma is an activation function, and the model adopts a ReLU function.

The multi-level pre-classification model has three levels, wherein the first level is a part identification model of a bridge part, the second level is a component identification model of a bridge component, and the third level is a damage identification model of each damage category. In order to improve the model efficiency, the first-stage classification network is trained and optimized, and the second-stage network and the third-stage network are performed according to the first stage. The reason is that the overall classification number of the classification model is increased from the first level to the third level, the classification difficulty is advanced, and the objects classified at the second level, the third level and the first level are all the bridge appearance information, so that the overall classification task is similar. Under the condition that other test conditions are the same, after the model is adopted for fine adjustment, the model performance is optimized to a certain degree.

In order to improve the accuracy of the model, two thawing strategies of a top convolution layer and a pooling layer in the pre-trained VGG are researched. Compared with the original VGG result, when the convolution block 5 is unfrozen, the verification precision of the model is obviously improved from 49% to 69.4%, and when the convolution blocks 4 and 5 are unfrozen, the verification precision reaches 95.5%.

When the neuron number is kept as originally set, the training and verification precision of the defrozen convolution blocks 4 and 5 is about 0.25 higher than that of the defrozen block 5 alone. The number of neurons of the two dense layers is changed before the change, so that the model performance is slightly different; when the number of neurons in the dense layer is 1024+256, the verification set precision is the highest and reaches 96%. The results show that the verification accuracy of the improved VGG is improved by about 45% compared with the original VGG on the task of bridge apparent defect classification, and the detailed data are shown in Table 3.

Therefore, the neuron numbers of the first two dense layers of the model are modified from 4096 to 1024 and 256, and the model training speed is increased.

TABLE 3 comparison of original and improved VGG Performance

In order to improve the training precision of the model, freezing part of the convolution layer at the bottom of the model, and keeping part of learned characteristics to prevent overfitting caused by excessive parameters; meanwhile, the head convolution layer is unfrozen, so that the model has stronger learning ability and continuously learns new characteristics.

Epoch represents the turn of data set training. The optimization direction of the method is comprehensively determined according to the size of the data set, the performance of the GPU and the model expression. The accuracy can be improved by properly increasing the Epoch, and the model training process is more stable; however, too large an Epoch may result in over-fitting or inefficient training with the model already optimized. The training result is analyzed, the integral convergence is faster because the primary classification number is only three, and the accuracy can be higher by adopting 100 for Epoch. The classification number of the secondary component classification models is more, and the classification models are respectively 11 types (bridge deck systems), 7 types (lower structures) and 5 types (upper structures). When the Epoch =100, the model oscillation is large, the accuracy is about 0.8 and is unstable, which indicates that the whole network is still under-fitted. When the bridge deck system and the upper structure are Epoch =150 and the lower structure is Epoch =200, the learning curve is more stable, and the stabilized precision is improved by about 10%. Table 4 shows the hyper-parameter settings of the improved VGG16 model, and fig. 4 is a comparison graph of accuracy at different epochs.

TABLE 4 hyper-parameter settings for VGG models

Hyper-parameter	Value of
		Epoch	30-200
Initial learning rate	1×10 ^-4 ～4×10 ^-6
		Rate of decay	0.8/0.9
Waiting step size	5(epoch)
		Batch size	128/256

Taking the bridge appearance information image and the corresponding three-grading information as input data and output data of the model, setting the iteration times of training as 100-200 times, saving parameters of the training result, and simultaneously outputting the result in the following format according to different recognition results: "site-member-injury category".

And selecting the concentrated bridge apparent images of different categories to carry out verification on the effectiveness of the convolutional neural network model, comparing the model identification result with the actual classification, wherein the identification accuracy rate needs to be more than 80%, and otherwise, adjusting the parameters and the hyper-parameters of the model until the error meets the requirement.

After the model training is passed, inputting the bridge apparent information image to be detected into the bridge apparent damage three-level pre-classification model with updated parameters, and determining the final damage type.

S3: constructing a pixel-level detection model corresponding to each impairment category based on the improved encoder-decoder FCN model;

in the embodiment, considering three aspects of being common, having great influence on the service state of the bridge and occupying great weight in the evaluation of the service state of the bridge, five kinds of damages such as exposed ribs, cracks, pits, peeling and honeycomb pitted surfaces are selected as research objects needing to obtain deeper detection information. In the foregoing multi-level pre-classification work, damage classification information is obtained, and pictures containing the five common damages are extracted and distinguished from a large number of original input pictures as a work basis.

Conventional VGG19 designs use only top-level feature maps for prediction, resulting in shallow feature maps containing more detailed information being ignored, which is not conducive to the detection of small targets. In view of the wide application of CNN of the encoder-decoder framework in semantic segmentation, a new encoder-decoder FCN is built for the detection of the damaged pixel level using a pre-trained VGG19 classification network as an encoder, together with a transposed convolutional layer. Fig. 3 shows a block diagram of the encoder-decoder FCN. The last softmax layer of the VGG19 is removed from the encoder network, 3 fully-connected layers of the VGG19 are replaced by convolution layers, and corresponding output is converted into a characteristic diagram related to damage from the number of categories. The feature diagram is expanded to the original input size through 5 times of transposed convolution upsampling, and the feature fusion is carried out on the upsampling result of the previous 4 times of transposed convolution and the maximal pooled output of the encoder with the corresponding size by adopting a skip connection structure, so that the network can adapt to the information required by identifying small targets such as cracks, exposed ribs and the like. Since the pre-classification can determine the types of the damage contained in the image, the model only needs to perform a pixel-level detection task on the five common types of damage. Table 5 lists the case of the layers of the structure of the encoder-decoder FCN.

TABLE 5 Coder-decoder FCN architectural status

A parameter migration learning strategy is adopted, and the convolution layer parameters in the FCN encoder part are initialized through the weight of the pre-trained VGG19, so that the training time is saved, and the learning efficiency is optimized. In addition, the filter weights of the decoder partial deconvolution layer are initialized with a truncated normal distribution with a mean value of 0 and a standard deviation of 0.01, and the initial value of the deviation thereof is set to a constant zero vector. And (3) training 20 epochs on each FCN model by taking the bridge appearance information image and the corresponding pixel level labels as training data, wherein the batch size is set to be 2 (considering GPU memory constraint).

And inputting the apparent image of the bridge to be detected into a corresponding damage pixel level detection model according to the damage identification result in the multi-level pre-classification, and returning the damage quantity, the form profile and the inclination trend. Wherein, the damage form contour, namely the image point pixel coordinate set { (u) _i ,v _i )i＝…}。

The encoder-decoder architecture increases the model depth while also increasing computational cost, but has little overall impact on the speed of impairment detection. The data to be processed input into the encoder-decoder FCN is five kinds of damage obtained through VGG classification, so that the input of other damage and non-damage images is reduced, on one hand, the workload of the encoder-decoder FCN is reduced from input pre-screening, the pixel detection efficiency is improved, and on the other hand, the interference of other damage is reduced.

To measure the prediction bias, the cross entropy is used as a loss function to evaluate the similarity between the target value (i.e., the labeled value) and the predicted logarithm value, as shown in the following formula:

wherein J (θ) is a loss function, a _i Representing the target value, p, of the ith pixel _i Representing the predicted value of the ith pixel, and N is the total number of pixels in the picture.

Adam is adopted in a training optimizer, the optimization method combines the advantages of AdaGrad optimization algorithm and RMSProp optimization algorithm, and the formula is as follows:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t ,

wherein, beta ₁ And beta ₂ Exponential decay rates representing the first and second moment estimates, set to 0.9 and 0.999, respectively; t is an iteration index; m is _t And v _t Represents an exponential moving average of the first and second moments of the gradient,

and &>

Respectively representing the deviation of the corrected first-order momentum and the second-order momentum; theta _t-1 And theta _t Respectively representing weights before and after iteration; ε =10 ^-8 Is a small default number to maintain stability of the value during optimization; η represents the learning rate.

Considering that the learning rate is a key hyper-parameter to determine whether the training process converges or not and the convergence rate, three sets of experiments were performed with 1e-3, 1e-4 and 1e-5, respectively, in order to determine an appropriate initial value. In each set of training process, the learning rate in the training step is annealed using an exponential step decay function to quickly achieve a faster local optimum. The corresponding formula is as follows:

wherein eta is ₀ Is a beginning studyLearning rate, r _d Is the decay rate, t is the iteration step;

is a lower bound operation that returns the largest integer less than or equal to the input value.

In the art, the model is usually evaluated by three indexes, i.e., IOU (interaction over Union), precision ratio, and recall ratio. The IOU represents the ratio of the area of coincidence of the real value region and the prediction region to the area of the union. The formula can be expressed as:

the accuracy rate refers to the proportion of positive samples contained in all samples predicted to be positive samples, and is expressed by the following formula:

recall refers to the proportion of all positive samples that are correctly identified as positive samples, and is formulated as:

the F1 score is also an index for measuring accuracy, and takes into account both accuracy and recall, and is expressed by the following formula:

wherein a test that is not a lesion but is identified as a is considered a False Positive (FP); false Negatives (FN) were identified that were A lesions but not correctly identified. The detection that the sample was correctly identified is considered to be a True Positive (TP). Because the semantic segmentation task has higher requirements than the classification task, the judgment standard for correct identification is stricter. Correct recognition is only considered when the detected area matches the real area defect type and the IoU of both areas is not less than 0.7.

After obtaining precision and recycle values, respectively taking the precision and recycle values as vertical and horizontal coordinates to make a P-R curve, wherein the area enclosed by the curve and the coordinate axis is the AP (average precision) value of a certain disease. Averaging the APs of all categories yields the value of the AP, and the formula can be expressed as:

wherein n is the total number of classes, AP _i Indicates the AP value of the ith category. The method adopts mAP as an index to judge the overall recognition accuracy of the model. The comparison and identification accuracy needs to reach more than 80%, otherwise, parameters and hyper-parameters of the model are adjusted, or the network layer is finely adjusted, and partial convolutional layers are frozen and unfrozen until errors meet the requirements.

S4: inputting the bridge apparent image to be predicted into a part recognition model, recognizing a bridge part corresponding to the image, then inputting the bridge apparent image to be predicted into a component recognition model corresponding to the bridge part, and recognizing a component corresponding to the image; finally, inputting the bridge apparent image to be predicted into a damage identification model of the corresponding member, and outputting the damage type of the bridge apparent image to be predicted; and inputting the bridge apparent image to be predicted into a pixel-level detection model corresponding to the damage category, and outputting the damage number, the form profile and the inclination trend of the bridge apparent image to be predicted by the model.

S5: reconstructing the damage form contour output in the step S4 based on the central projection model of the binocular stereo vision and quantitatively measuring the damage size;

and taking the bridge apparent image pair and the damage form contour output in the step S4 as objects, and naming the camera corresponding to the central projection model as a main camera (C1) and the other camera as a positioning camera (C2) so as to facilitate the following description. Firstly, determining the space position of the damage determined by a damage positioning model based on binocular stereo vision. As shown in fig. 5, the specific process of lesion localization is:

method for extracting stereoscopic image pair I = { I) by applying SIFT feature extraction algorithm ₁ ,I ₂ The characteristic points of two images in the Chinese character are respectively F ₁ ＝{(p _1,i ,f _1,i ) I =1 … M } and F ₂ ＝{(p _2,j ,f _2,j ) I j =1 … N }, where f _1,i And f _2,j Are respectively an image I ₁ At the ith feature point position p _1,i And image I ₂ At jth landmark position p _2,j The local feature descriptor of (1).

On the basis of accurately determining the feature points, searching and calculating reference feature points (image I) by adopting a nearest neighbor search algorithm ₁ Upper) and target feature points (image I) ₂ The shortest Euclidean distance of the descriptor, taking two corresponding characteristic points as a pair of matching points, and the matching result is a set of characteristic point pairs { (p) ₁ ,p ₂ )|p ₁ ∈I ₁ ,p ₂ ∈I ₂ }。

Selecting three non-collinear feature points p on structural plane in reference image by using random algorithm ₁ 、p ₂ And p ₃ And forming three point pairs together with homologous points P1', P2' and P3' of the three points on another image, and calculating three non-collinear space points P1, P2 and P3 corresponding to the three point pairs respectively through a binocular stereo vision model to complete the determination of the space position of the crack.

Taking the calculation space point P1 as an example, the main camera coordinate of the P1 point in the world coordinate system is (X) _p1 ,Y _p1 ,Z _p1 ) And its image point

The transformation relationship of (c) can be expressed as follows:

wherein, A ₁ Is a matrix of parameters in the main camera, the parameters of which

Is C1 master point->

Pixel coordinates of f ^l Is the main camera focal length, k ^l And l ^l Respectively represent a coordinate system->

At u ^l Shaft and v ^l Physical length of each pixel in the axial direction, y ₁ Represents u ^l Shaft and v ^l The tilt coefficient of the shaft, typically 0; I.C. A ₃ Is a 3X 3 unit matrix, O _3×1 Representing a 3 x 1 zero vector.

At the same time, P1 (X) _p1 ,Y _p1 ,Z _p1 ) Also by a homography moment H and

establishing a conversion relation, wherein a corresponding formula is as follows:

wherein A is ₂ Is the internal parameter matrix of the positioning camera, R, t are its rotation matrix and translation vector, respectively, with respect to the main camera, R ^-1 Representing the inverse of the rotation matrix R. The method is obtained by the following two formulas:

calibrating a binocular shooting device to obtain internal parameters and relative poses of the two cameras, and adopting a calibrated focal distance f of the main camera ^l As the distance from the optical center of the pinhole model to the imaging plane, and considering the u on the imaging plane ^l Shaft and v ^l Tilt factor of the axis, with main point of calibration

The position is converted from image pixel coordinate to physical coordinate, rather than being simply fixed at the center of the image, so as to be closer to the imaging processAnd reconstruction errors are reduced. If the point pair p is determined ₁ And p1', p ₂ And p2' and p ₃ And P3', the space points P1, P2 and P3 of the damage in the C1 measurement coordinate system can be respectively calculated, and the positioning of the plane where the damage is positioned is completed.

And then reconstructing the shape contour of the damage by adopting a central projection model. Different from the traditional three-dimensional reconstruction method for generating dense point cloud based on dense matching result, the invention uses the pinhole model to simulate the imaging process of the C1 camera and uses the optical center of the C1 camera

And taking the space plane where the damage is located determined by the P1, the P2 and the P3 as a projection origin, and projecting the image damage form contour output in the step S4 onto the projection plane to reconstruct a real damage form contour. The damage information is reconstructed through the central projection, the pairing problem of damage form contour pixel points between the bridge apparent image pairs is effectively avoided, the calculation cost is reduced, and the reconstruction precision is improved.

Before this, the pixel points of the damage shape outline on the image are unified with the image bearing surface under the same reference system, i.e. the pixel coordinate (u) _i ,v _i ) Conversion to C1 coordinate (x) _i ,y _i ,z _i ). The camera image coordinate system

And a camera coordinate system

The transformation relationship between them is expressed as follows:

and after coordinate conversion, performing central projection calculation. Calculating the intersection point of the straight line from the center projection model optical center to the damage form contour image point after coordinate transformation and the photographic support surface to complete the damage real form contour { (X) _i ,Y _i ,Z _i ) I =1 … }.

And finally, quantitatively measuring the characteristic size of the damage form on the basis of the reconstruction space point of the damage form contour. The quantitative measurement of the distance type dimension takes the crack as an example, and knowing the real shape profile of the crack, such as the skeleton and the edge of the crack, the width of the crack can be calculated through the distance between two edge points closest to the skeleton point, and the length of the crack shape can be calculated through accumulating the Euclidean distance of adjacent skeleton points, so that the quantitative measurement of the crack dimension is completed. The quantitative measurement of the area-like size takes a hole as an example, the centroid of the closed contour is calculated according to the real shape contour of the hole, the centroid and two adjacent reconstruction points on the contour form a triangle, and the actual area of the hole is approximated by accumulating the areas of the triangles.

And finally, S6: and (5) combining the multi-level pre-classification result and the pixel level detection result output in the step (S4) and the damage size obtained in the step (S5) to integrate, and finally outputting comprehensive apparent damage detection information including position, type, size and the like.

The invention discloses a binocular stereoscopic vision intelligent detection system for bridge apparent damage fused with a deep learning algorithm, which is used for realizing an intelligent detection method for bridge apparent image damage, and comprises the following steps:

the multi-level pre-classification module comprises a part identification model, a component identification model and a disease identification model, and is used for identifying the shooting part of the bridge apparent image, the component where the damage is located and the damage type step by step;

the damage pixel level detection module is used for detecting the number, the form profile and the inclination trend of the damage of a specific category;

and the classifier is used for classifying and associating the bridge apparent image according to the output result of the previous-stage model in the training and verification processes of the part identification model, the member identification model, the damage identification model and the damage pixel-level detection model, calling the corresponding next-stage model and finishing the transmission of the bridge apparent image among the models at all stages.

and the database is used for storing the apparent information characteristic images of the bridge.

And the data enhancer is used for carrying out transformation enhancement operation on the bridge apparent image and expanding the number of the bridge apparent images.

Examples

In the embodiment, 7200 non-destructive classification-contained bridge apparent images collected on site by an unmanned aerial vehicle carrying binocular shooting system are selected, 1200 images are randomly extracted, manual classification is completed, and a small-sized primary data set is obtained. And rapidly training the VGG16 model by using the obtained small-sized preliminary data set, and finishing the structural optimization of the model after adjusting the neuron number and the hyper-parameter of the VGG16 model to obtain a better model based on the small-sized data set.

And training the remaining 6000 unclassified images by using the optimized VGG16 model, manually checking the result, and expanding the newly added data to the existing data set to complete the establishment of the original data set. The original data set can be used for obtaining the shooting position, the component type and the damage category information of each bridge appearance image in the data set, and finally a plurality of classified data files are constructed into the bridge appearance information detection initial data set.

And (3) randomly dividing the finished bridge appearance information detection initial data set into a training set and a verification set according to a proportion to obtain the training set containing 6000 images and the verification set containing 1200 images. The improved VGG16 model is trained using a training set and model performance is evaluated through a validation set. After training, the multi-level recognition precision of each model can reach more than 90%, and the multi-level recognition precision has a lower overfitting level, so that a multi-level pre-classification model which can be used after training is obtained. The multi-level pre-classification model is used for processing 14842 bridge appearance images containing common damage types to obtain 3352 exposed tendon images, 5916 crack images, 1558 Zhang Kengdong images, 2740 Zhang Baola images and 1276 Zhang Fengwo pitted surface images, all the images keep original sizes, namely 2048 multiplied by 1536dpi, and compression is not carried out. And then, carrying out pixel level labeling on the images, and respectively constructing damage pixel level detection databases such as exposed ribs, cracks, pits, peeling, honeycomb pitted surfaces and the like.

And (3) training the improved encoder decoder FCN model by using various damaged pixel level detection databases, wherein the iteration times are 500 times, and storing the training results to obtain each damaged pixel level detection model of the bridge apparent image. Taking a crack pixel level detection model as an example, the training result is detailed in table 3, and the FCN model with 1e-4 as the initial learning rate realizes the highest accuracy, recall rate and F1 score, which are 83.10%,85.74% and 84.14%, respectively. And taking the FCN model as a default crack damage pixel level detection model, and extracting crack quantity information, form outline and inclination trend from the image.

TABLE 6 crack pixel level test model training results

Learning rate (× 10) ^-4 )	Precision ratio (%)	Recall (%)	F1-score (%)
				0.1	80.48	80.67	80.47
1	83.10	85.74	84.14
				10	79.53	79.84	78.43

And (3) performing effectiveness verification on the model by using the rest bridge apparent images, wherein effective detection of the damage is required to be more than 80%, otherwise, adjusting the parameters and the hyper-parameters of the model until the precision meets the requirement. The super parameter values set in the example are shown in table 5 in detail, and through training, the damage detection precision of each training set averagely reaches 89%, and the defect identification precision of each verification set averagely reaches 81%, so that the requirements are met.

TABLE 7 example model hyper-parameter settings

Hyper-parameter	Value of
		Learning rate	1×10 ^-4
Epoch	50
		Batch	2
Steps per Epoch	500
		Validation steps	25

And in the damaged pixel level detection task, displaying the damage detection result after model prediction, and comparing the damage detection result with the corresponding pixel level labeled image. When the bridge appearance image has two or more damage types, the damage types are represented by the segmentation areas with different colors. Taking a crack as an example, the pixel level detection results and the corresponding size results of different forms of cracks are shown in fig. 6. By judging whether a single pixel belongs to a crack or a background, information such as the number of cracks, the shape profile and the inclination angle trend in the graph is further obtained. For large-area damage (such as pitted surface) and small-size damage (such as cracks and exposed ribs) which have irregular shapes and even surround the background in the target, the model can show better performance, for example, in the situation that the damage forms such as cross cracks and complex background cracks in the graph 6 are complex or the background is complex, the model can also better detect.

According to the bridge inspection standard, the maximum width limit value of the structural crack of the main beam is 0.2mm, and the limit value of the concrete crack of the main pier is 0.3mm. To meet the measurement requirements, the size quantification of the cracks is to be achieved on the scale of a filament (0.1 mm). In the embodiment, the error of the crack measurement result can be stably controlled within 0.1mm, the maximum error is 0.093mm, and the requirement of measurement precision can be met.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and although the invention has been described in detail with reference to the foregoing examples, it will be apparent to those skilled in the art that various changes in the form and details of the embodiments may be made and equivalents may be substituted for elements thereof. All modifications, equivalents and the like which come within the spirit and principle of the invention are intended to be included within the scope of the invention.

Claims

1. A binocular vision intelligent detection method for apparent damage of a bridge fused with deep learning is characterized by comprising the following steps:

the method comprises the following steps: acquiring an apparent image of a bridge through a binocular shooting system, marking the position of the bridge, a specific component of the bridge where a damage is located and the damage type of each image, classifying the images according to the positions of the bridge, and respectively constructing a training data set of each bridge position; then, the images of each bridge part are further classified according to the concrete members of the bridge where the damage is located, and a training data set of each bridge member of each bridge part is respectively constructed; finally, classifying the images of each bridge member according to the damage types, and constructing a training data set of each damage type of each bridge member of each bridge part;

step four: inputting the bridge appearance image to be predicted into the part recognition model, recognizing the bridge part corresponding to the image, then inputting the bridge appearance image to be predicted into the component recognition model corresponding to the bridge part, and recognizing the component corresponding to the image; finally, inputting the bridge apparent image to be predicted into a damage identification model of the corresponding member, and outputting the damage type of the bridge apparent image to be predicted; and inputting the bridge apparent image to be predicted into a pixel-level detection model corresponding to the damage category, and outputting the damage number, the form profile and the inclination trend of the bridge apparent image to be predicted.

Step five: constructing a central projection model based on binocular stereo vision, and adopting a binocular stereo vision model with crossed optical axes as a damage positioning model; extracting and searching matching point pairs between the pair of the bridge apparent images to be measured through an SIFT algorithm and a nearest neighbor algorithm, and selecting three point pairs from the matching point pairs through a random algorithm to be input into the damage positioning model to obtain a space damage plane; and the space damage plane and the corresponding pinhole camera model form a central projection model, then three-dimensional reconstruction is carried out on the damage form contour output by the bridge apparent image to be measured corresponding to the central projection model in the fourth step, the damage quantitative measurement is completed, and the damage size of the bridge apparent image to be measured is output.

2. The binocular vision intelligent detection method for the apparent damage of the bridge fused with the deep learning of claim 1, wherein when a training data set is constructed based on an apparent image of the bridge, firstly, a small amount of acquired data is manually classified and calibrated to form a small training data set, then, an improved VGG16 model is quickly trained by using the small training data set, the obtained three-level pre-classification model is used for predicting the apparent image of the bridge which is not classified, manual check is completed according to the classification result, and the image which is manually checked is supplemented into the small training data set, so that the semi-automatic expansion of the original data set is completed.

3. The binocular vision intelligent detection method for the apparent damage of the bridge fused with the deep learning of claim 1, wherein the pixel-level detection models are five types, namely a pitted surface detection model, a pothole detection model, a crack detection model, an exposed rib detection model and a concrete peeling detection model.

4. The binocular vision intelligent detection method for apparent damage of a bridge fused with deep learning according to claim 1, wherein during training of the improved VGG16 model, a ReLU function is adopted as an activation function in a convolution process, and weights of partial convolution layers close to the input end of the model are kept unchanged, namely, the convolution layers are frozen, so that the effect of retaining partial learned features is achieved, and overfitting of parameters is prevented; and keeping the weights of partial convolutional layers close to the output end to be updated continuously, namely unfreezing the convolutional layers, so that the model has stronger learning capability and learns new characteristics continuously.

5. The binocular vision intelligent detection method for the bridge apparent damage fused with the deep learning of claim 4, wherein the improved VGG16 model has 5 convolution blocks in total, three convolution blocks close to the input end of the model are frozen, and two convolution blocks close to the output end of the model are unfrozen; the volume blocks are separated by a max-pooling layer.

6. The binocular vision intelligent detection method for the apparent damage of the bridge fused with the deep learning of claim 2, wherein after the semi-automatic expansion of an original data set is completed, data enhancement is performed on an existing apparent image of the bridge, and a training data set is constructed by using the enhanced image.

7. A binocular stereoscopic vision intelligent detection system for bridge apparent damage fused with a deep learning algorithm is characterized in that the system is used for realizing the binocular visual vision intelligent detection method for bridge apparent damage of any one of claims 1 to 6, and the system comprises:

8. The binocular stereoscopic vision intelligent detection system for bridge apparent damage fused with the deep learning algorithm, as recited in claim 7, further comprising a data enhancer for performing transformation enhancement operation on the bridge apparent images to expand the number of the bridge apparent images.