CN117197040A

CN117197040A - Automatic pavement damage detection method, system, medium and equipment

Info

Publication number: CN117197040A
Application number: CN202310937457.4A
Authority: CN
Inventors: 申富林; 钟子林; 郭咏辉; 黎剑华; 董勤喜; 宋绮婷
Original assignee: Guangzhou Railway Polytechnic
Current assignee: Guangzhou Railway Polytechnic
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-12-08

Abstract

The application belongs to the field of image recognition, and particularly relates to an automatic pavement damage detection method, an automatic pavement damage detection system, an automatic pavement damage detection medium and automatic pavement damage detection equipment. The method comprises the following steps: step 1, acquiring a road surface image of a road surface to be detected through shooting equipment arranged on a detection vehicle; step 2, processing the pavement image through sfm to obtain a three-dimensional reconstruction map; and step 3, processing the three-dimensional reconstruction map based on a deep learning model to obtain a pavement damage detection result. The beneficial effects of the application are as follows: all information of the road surface can be extracted through shooting equipment, sfm-based processing is beneficial to quickly constructing a three-dimensional map, and in addition, the recognition of road surface damage can be accurately realized in batches based on a deep learning model. The scheme has simple structure and high treatment efficiency.

Description

Automatic pavement damage detection method, system, medium and equipment

Technical Field

The application belongs to the field of image recognition, and particularly relates to an automatic pavement damage detection method, an automatic pavement damage detection system, an automatic pavement damage detection medium and automatic pavement damage detection equipment.

Background

With the continuous development of the highway transportation industry, people put higher demands on the quality of highway pavement and maintenance thereof. The traditional manual detection method has the defects of low efficiency, traffic influence, time and labor consumption, inaccuracy and the like, and can not meet the requirements of highway development.

Disclosure of Invention

The application aims to provide a pavement damage automatic detection method, a pavement damage automatic detection system, a pavement damage automatic detection medium and pavement damage automatic detection equipment.

The technical scheme for solving the technical problems is as follows: an automatic pavement damage detection method comprises the following steps:

step 1, acquiring a road surface image of a road surface to be detected through shooting equipment arranged on a detection vehicle;

step 2, processing the pavement image through sfm to obtain a three-dimensional reconstruction map;

and step 3, processing the three-dimensional reconstruction map based on a deep learning model to obtain a pavement damage detection result.

The beneficial effects of the application are as follows: all information of the road surface can be extracted through shooting equipment, sfm-based processing is beneficial to quickly constructing a three-dimensional map, and in addition, the recognition of road surface damage can be accurately realized in batches based on a deep learning model. The scheme has simple structure and high treatment efficiency.

On the basis of the technical scheme, the application can be improved as follows.

Further, the photographing apparatus provided on the detection vehicle specifically includes:

the photographing apparatus includes: at least two cameras are arranged at fixed intervals and angles.

Further, the specific process of processing the pavement image through sfm to obtain a three-dimensional reconstruction map comprises the following steps:

and processing the pavement image at any moment based on sfm to obtain a three-dimensional reconstruction image corresponding to the pavement image at the moment until the three-dimensional reconstruction image at all moments is obtained.

Further, the specific process of processing the three-dimensional reconstruction map based on the deep learning model to obtain the detection result of road surface damage is as follows:

and performing feature processing on the three-dimensional reconstruction map through a transposed convolution layer in a deep learning model to obtain a first processing result, repeatedly performing feature processing for at least n times by taking the first processing result as input of the transposed convolution layer to obtain a detection result map corresponding to the three-dimensional reconstruction map, and taking the detection result map as a detection result, wherein the detection result map comprises feature areas and damage probability corresponding to each feature area.

The other technical scheme for solving the technical problems is as follows: an automatic pavement damage detection system, comprising:

the setting module is used for: collecting road surface images of the road surface to be detected through shooting equipment arranged on the detection vehicle;

the reconstruction module is used for: processing the pavement image through sfm to obtain a three-dimensional reconstruction map;

the detection module is used for: and processing the three-dimensional reconstruction map based on the deep learning model to obtain a pavement damage detection result.

The other technical scheme for solving the technical problems is as follows: a storage medium having instructions stored therein which, when read by a computer, cause the computer to perform the method of any of the preceding claims.

The other technical scheme for solving the technical problems is as follows: an electronic device includes the storage medium and a processor executing instructions within the storage medium.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of an automatic pavement damage detection method according to the present application;

FIG. 2 is a block diagram of an embodiment of an automatic pavement damage detection system according to the present application;

FIG. 3 is a schematic flow chart of three-dimensional reconstruction using SfM according to an embodiment of the automatic pavement damage detection method of the present application;

FIG. 4 is a schematic diagram of the basic principle of three-dimensional reconstruction provided by an embodiment of the automatic pavement damage detection method of the present application;

FIG. 5 is a schematic flow chart of three-dimensional reconstruction using SfM according to an embodiment of the automatic pavement damage detection method of the present application;

FIG. 6 is a first schematic diagram of a U-Net-based architecture provided by an embodiment of a method for automatically detecting road surface damage according to the present application;

FIG. 7 is a second schematic diagram of a U-Net-based architecture provided by an embodiment of an automatic pavement damage detection method of the present application;

FIG. 8 is a schematic diagram of a detected vehicle according to an embodiment of the present application;

FIG. 9 is a schematic view of a first photographing angle according to an embodiment of the present application;

fig. 10 is a schematic view of a second photographing angle according to an embodiment of the present application.

Detailed Description

The principles and features of the present application are described below with examples given for the purpose of illustration only and are not intended to limit the scope of the application.

As shown in fig. 1, a pavement damage automatic detection method includes:

In some possible embodiments, all information of the pavement can be extracted through shooting equipment, sfm-based processing is beneficial to quickly constructing a three-dimensional map, and in addition, batch accurate pavement damage identification can be realized based on a deep learning model. The scheme has simple structure and high treatment efficiency.

S1, acquiring a road surface image of a road surface to be detected through shooting equipment arranged on a detection vehicle;

1) The detection vehicle comprises the following steps: any car which can normally run on the road surface;

2) The photographing apparatus includes: at least two cameras;

3) The road surface to be detected is: any road surface that allows the vehicle to run normally.

S11, arranging shooting equipment on a detection vehicle according to fixed requirements;

wherein, the fixing requirement can be: the vertical distance between the cameras and the road surface is 0.8m, and the horizontal distance between the cameras is 0.6m. With this arrangement, the image can cover a road surface of about 2m2, with a minimum of 0.27mm per pixel accurately observable on the image.

The fixing requirements can also be adaptively adjusted according to the width of the detected vehicle or the width of the road surface or the performance of the camera.

S12, starting a detection vehicle, enabling a shooting device to work according to a preset running path, and shooting and sampling according to a fixed frequency;

the driving path is determined by detecting the actual road condition of the road section;

the fixed frequency is determined by the function of the photographing device in combination with detecting the running speed of the vehicle.

S2, processing the pavement image through sfm to obtain a three-dimensional reconstruction map;

sfm: the motion restoration structure (Structure from motion) is a technique for deriving camera parameters and performing three-dimensional reconstruction by analyzing a sequence of images.

And S3, processing the three-dimensional reconstruction map based on the deep learning model to obtain a pavement damage detection result.

The deep learning model is a model which is built in advance and used for identifying a damaged area;

the detection result is a damage region corresponding to the three-dimensional reconstruction map and the damage probability of the region.

Preferably, in any of the above embodiments, the photographing apparatus provided on the detection vehicle is specifically:

Preferably, in any of the foregoing embodiments, the specific process of processing the pavement image through sfm to obtain a three-dimensional reconstruction map includes:

Preferably, in any embodiment of the foregoing, the specific process of processing the three-dimensional reconstruction map based on the deep learning model to obtain a detection result of road surface damage includes:

As shown in fig. 2, an automatic road surface breakage detection system includes:

the setting module 100 is configured to: collecting road surface images of the road surface to be detected through shooting equipment arranged on the detection vehicle;

the reconstruction module 200 is configured to: processing the pavement image through sfm to obtain a three-dimensional reconstruction map;

the detection module 300 is used for: and processing the three-dimensional reconstruction map based on the deep learning model to obtain a pavement damage detection result.

1. Stereoscopic vision based on motion photogrammetry structure

The motion structure (SfM) is an innovative 3D reconstruction technique that can construct 3D scenes from 2D digital images taken from different locations. Inspired by human vision to extract three-dimensional information from a moving object, sfM recovers depth information of an image through parallax between corresponding points of different images. The most promising feature of SfM is that it requires no camera position information and no laser equipment, and only a series of two-dimensional images are needed to complete the three-dimensional reconstruction, which means that the method is hardly affected by vibrations and strong light. The main steps of three-dimensional reconstruction using SfM are shown in fig. 3. In these technical details, feature matching and three-dimensional estimation are key steps that have a great impact on the quality of the three-dimensional model.

1) Feature extraction and matching

The SfM-based three-dimensional scene reconstruction relies on corresponding searches to extract and match points of interest between multiple views. The purpose of the correspondence search is to identify a correspondence point on the input image i= { ii|i= … N }. For each image Ii, a feature set fi= { (pj, fj) |j= … M } will be identified, including the local feature descriptor fj for each location pj. These local features are stable with variations in position, orientation and scale so that unique matching of feature points can be achieved across multiple images. Feature extraction has been chosen from a variety of algorithms, such as Speeded Up Robust Features (SURF), speeded up segment test Features (FAST), and Scale Invariant Feature Transform (SIFT), which are widely used in SfM for their robustness and efficiency.

In generating feature sets for all images, feature matching may be performed based on local feature descriptors (pj, fj) i for different images. In order to improve the calculation speed and the matching accuracy, various effective algorithms, such as a nearest neighbor search algorithm, are proposed. The output of feature matching is a set of feature point pairs { (p 1, p 2) |p1∈Ii, p2∈Ii+1} between two images.

2) Three-dimensional reconstruction

The SfM discovers the three-dimensional information of the feature points through the geometrical relationship between the image and the space. Fig. 4 illustrates the basic principle of three-dimensional reconstruction. To obtain the spatial coordinates of points on the image, the relative position of the camera is first estimated. The positional relationship between the different cameras can be described by the epipolar geometry and the basic matrix E. Assume a first camera coordinate system O ₁ xyz coincides with world coordinate system, where O ₁ Is the optical center, p ₁ ＝[x ₁ ，y ₁ ] ^T Is the image coordinates of the target point P on the first image. The second camera coordinate system is O ₂ xyz, where O ₂ Is the optical center, p ₂ ＝[x ₂ ，y ₂ ] ^T Is the image coordinates of the target point P on the second image. Characteristic point pair (p) ₁ ，p ₂ ) Can be identified by feature extraction and matching. At the same time, the two images have the same camera parameters, wherein the focal length is f _x ，f _y The coordinates of the optical center on the image are [ c ] _x ，c _y ] ^T 。

Where K is the camera eigenvalue, which can be obtained by camera calibration. [ X, Y, Z] ^T Is the target point P at O ₁ -spatial coordinates in xyz camera coordinate system, [ X ', Y ', Z ] '] ^T Is the target point P at O ₂ -spatial coordinates in an xyz camera coordinate system; d, d ₁ And d ₂ Is the perpendicular distance from the target point to the optical center of the camera.

After each camera system obtains the coordinate transformation relationship from equations (1) - (2), the mutual positional relationship between the two camera coordinate systems can be estimated by the following formula, where the rotation matrix R is 3×3 and the translation vector T is 3×1:

formula (5) can be obtained by solving for perpendicular to T and K-1[x ₂ ，y ₂ ，1]T3X 1 vector [ T ]]×K ^-1 [x ₂ ，y ₂ ,1] ^T ＝T×[K ^-1 [x ₂ ，y ₂ ，1] ^T ]Further simplified, wherein [ T ]]X is a 3 x 3 antisymmetric matrix. Equation (5) can then be converted into:

in formula (6) [ T ]]X R is the basis matrix E and can be solved based on a number of known feature point pairs selected by the random sample consensus (RANSAC) algorithm. The rotation matrix R and the translational transformation vector T are then extracted from the base matrix using a Singular Value Decomposition (SVD) algorithm. Once the relationship between the cameras is determined, triangulation may be used to calculate the spatial coordinates of the target point. Space coordinates [ X, Y, Z] ^T The SVD algorithm can be used to calculate the following formula:

wherein [ [ x ] ₂ ，y ₂ ,1] ^T ]X is a 3 x 3 antisymmetric matrix which is perpendicular to [ x ] ₂ ，y ₂ ,1] ^T 。

2. Pavement damage segmentation based on deep learning

1) Network architecture

The network architecture developed herein is a U-Net based architecture with a depth separable convolutional encoder, as shown in fig. 5 and 6. The original U-Net architecture was designed for capillary and cell segmentation, allowing the use of small datasets. This is consistent with road surface breakage segmentation (i.e., crack and pit segmentation) because both are small and complex objects. Thus, this network structure is also suitable for road surface breaking segmentation with smaller training data sets. In order to reduce the computational effort and maintain segmentation performance, depth separable convolutions are introduced in the network.

The input of the proposed U-network based convolutional network is a three-channel image, which may be a color image, a depth image or a color depth overlay image. The depth image is converted into a three-channel image by channel replication. The network output is a single channel image. In the encoder section, features of the input image are first extracted by a standard convolution block comprising a convolution layer with a kernel size of 3 x 3, a Bulk Normalization (BN) layer and a correction linear element (ReLU) activation layer. Then, a series of depth separable convolutions are followed to complete the remaining downsampling. After four convolution operations (step size 2), the feature map size is reduced to one sixteenth of the input image.

The decoder portion of the architecture is based primarily on a transposed convolutional layer, with a kernel size of 2 x 2, a step size of 2, and a compression operation after transposed convolution to connect the encoded blocks to the corresponding decoder blocks. The feature map is doubled in size in each upsampling step while halving the number of maps. After performing four transpose convolutions, the network returns a feature map that is the same size as the input image. The last layer of the network is the softmax layer, which returns a probability map that provides the road damage probability for each pixel.

2) Depth separable convolution

Depth separable convolution was presented as a factorized convolution, and was first in the Xception structure. The standard volume is divided into two steps of depth convolution and point convolution by the depth separation convolution, so that the number of model parameters is greatly reduced. As shown in fig. 7, assuming that the input image has M channels, the output feature map has N channels, and for one convolution operation, the number of parameters required for the standard convolution based on the kernel size k×k is:

P _conv ＝M×N×k×k (9)

in the depth convolution, feature mapping of M channels is convolved by M convolution kernels, respectively. Thus, the number of parameters required for the depth convolution is:

P _Dconv ＝M×k×k (10)

based on the feature map generated by the depth convolution, the feature map of the N channels is generated by performing 1×1 convolution on the M feature maps. The parameter number in the point-by-point convolution operation is as follows:

P _Pconv ＝M×N×1×1 (11)

thus, the total number of parameters for the depth separable convolution is:

P _DSconv model parameters for depth-separable convolution are reduced compared to standard convolution by =mxkxk+mxnx1×1 (12):

3) Model training method

The cross entropy loss function and the dice coefficient loss function are widely used for training of the deep learning network. In the initial training phase, the cross entropy loss function is used to measure the performance of a pixel-level classifier whose probability of output is between 0 and 1, since it has proven to be effective for various classification tasks. The cross entropy loss LCE is defined as follows:

where K is the number of pixels; y is _i Is the basic fact of pixel i, y _i E {0,1}; yi is a pixelPrediction e [0,1 ]]。

However, for the segmentation of road cracks, the number of non-crack pixels is much greater than the number of crack pixels. In view of this fact, a dice coefficient loss function is used to evaluate the similarity of the ground truth image and the predicted image. The dice coefficient loss LDC is determined by the following formula:

where s is a constant that prevents the denominator from being zero, in this context s=1.

The purpose of model training is to adjust the weights of the layers in the structure to achieve the best segmentation effect. After the weight is initialized, the weight is optimized by using an Adam optimization algorithm, so that the loss value is minimized. Adam's algorithm is widely used in model training processes due to its outstanding performance in terms of computational efficiency and robustness. Meanwhile, we do not fix the learning rate, but dynamically change the learning rate throughout the training process. In particular, if the loss of the validation set does not drop over several iterations, the learning rate will be halved.

Third, data acquisition and processing

1) Vehicle-mounted photographing system

In order to improve the efficiency of stereoscopic imaging, an in-vehicle photographing system is applied to road surface image acquisition (fig. 8). A hatchback equipped with a plurality of gobro HERO8 can take pictures at an operating speed of 3-15 km/h. The system adopts GoPro-hereo 8 with the resolution of 1200 ten thousand pixels, and has the characteristics of good stability and light weight. Three GoPro are fixed on a horizontal bracket at the rear of the vehicle, which can be easily removed and installed on other vehicles. The three GoPro cameras are controlled by an intelligent remote controller, and can be controlled to shoot images at the same time.

The image acquisition process has a great influence on the quality of the three-dimensional reconstruction of the road surface, and needs to be designed from the height, interval and angle of photography. In order to balance between the field of view and the Ground Sampling Distance (GSD), the vertical distance between the cameras and the road surface is 0.8m, and the horizontal distance between the cameras is 0.6m. With this arrangement, the image can cover a road surface of about 2m2, with a minimum of 0.27mm per pixel accurately observable on the image. According to previous studies, the degree of overlap of the images should be between 70% and 80% in order to ensure a high degree of detail. Furthermore, different capture angles capture different surface micro features, as the raised surface texture obscures the surrounding surface. Therefore, it is significant to compare the three-dimensional reconstruction effect under different photographing angles.

For field investigation, two different camera angle schemes were employed to compare the quality of three-dimensional point cloud construction (fig. 9). The first solution is that all camera directions are perpendicular to the surface, as shown in fig. 9. In this case, each camera can capture the surface texture in various directions. However, the features captured on the image may not be significant, subject to micro-parallax between one lens and the other. The second solution is that the center camera direction is still perpendicular to the ground, while the side cameras are tilted towards the center at a fixed angle of 30 deg., as shown in fig. 10. It is certain that each camera can better capture the surface details of its own direction. A disadvantage of this strategy is that many road surface features in other directions may be missed.

2) Establishment and processing of point cloud model

A key technique for three-dimensional reconstruction using SfM is to accurately estimate camera coordinates. According to the spatial relationship between the cameras, the spatial position of each effective pixel in the image can be accurately estimated by using the SfM. And secondly, converting the point cloud model generated by SfM into an orthographic image through calibration and formatting. The specific steps of the point cloud construction and treatment are as follows:

(1) Camera calibration

The camera matrix K should be obtained first in the three-dimensional reconstruction process. The relationship between spatial points and image points in the camera coordinate system can be described by a camera eigenmatrix. The purpose of camera calibration is thus to obtain the internal physical characteristics of the camera. At the same time, lens distortions, including radial and tangential distortions, must be eliminated by camera calibration. And (3) automatically calculating camera calibration parameters by inputting a group of chessboard images and utilizing MATLAB or Python-OpenCV.

(2) Point cloud reconstruction

Once the road surface photogrammetry and camera calibration are completed, the road surface images are divided into several groups, each group consisting of 20-30 images. Then, a Python script is compiled to call the function of PhotoScan, and batch processing is automatically performed on the image group. The workflow defined in the python script consists of an image import module, an image grouping module, an image alignment module, a dense point cloud generation module and a point cloud file export module. To reduce the computational effort, groups of images are reconstructed in batches.

(3) Point cloud calibration

The surface of the point cloud model generated by SfM is not parallel to the road surface, as the images are taken from different angles. Therefore, the orthographic image of the road surface cannot be directly obtained based on the original point cloud model. There are several ways to find the plane of the original point cloud model and calibrate it parallel to the road surface. A common method is to fit a plane to the points and calculate the perpendicular distance between the original point and the fitted plane. However, this approach uses the projection principle to correct the vertical offset of the point cloud, ignoring the horizontal offset of the point cloud.

To overcome this problem, the following steps are proposed to estimate the normal vector of the point cloud plane and calibrate the plane by rotation. First, three feature vectors of the point cloud data are calculated by using a Principal Component Analysis (PCA) algorithm, which are intuitively regarded as three main directions of the point cloud model. Among these feature vectors, the feature vector having the smallest feature value represents the normal to the point cloud plane. The original coordinates of each point are then multiplied by the normal feature vector to generate a cloud of calibration points. Geometrically, a horizontal point cloud surface is obtained by rotating the original point cloud. In addition, the coordinates of the original point cloud are relative coordinates, and can be calibrated into absolute coordinates through a scale factor. The scale factor may be estimated from the ratio of the actual distance to the point cloud distance.

3) Deep learning oriented orthophoto generation

After the corrected point cloud model is obtained, an orthographic image of road surface damage detection based on deep learning can be established. Three types of orthographic images were generated in this study, including color orthographic images, depth orthographic images, and color-depth overlapping orthographic images.

Unlike point clouds generated based on a laser method, the point clouds generated by stereoscopic vision are disordered and randomly distributed on the surface of an object. Therefore, the calibrated point cloud is spatially divided, and an orthographic image is generated. Each pixel on the orthographic image represents point cloud information within a certain spatial range. For a color orthogonal image, the pixels on the image are the average RGB values of the point cloud in the designated area. Similarly, the pixels on the depth orthographic image are the average height of the point cloud within the specified area. It is noted that due to the same generation region, the two orthographic images overlap completely. Thus, a superimposed image can be generated by an image superimposition operation.

The process of overlapping the color image and the depth image is as follows: first, one three-channel color image is divided into three single-channel images. Next, the pixel values on the three single-channel images are averaged with the pixel values on the depth image, respectively. Finally, the three new single-channel images are recombined to form an overlapping image.

The reader will appreciate that in the description of this specification, a description of terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the method embodiments described above are merely illustrative, e.g., the division of steps is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple steps may be combined or integrated into another step, or some features may be omitted or not performed.

The above-described method, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The present application is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present application, and these modifications and substitutions are intended to be included in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. An automatic pavement damage detection method is characterized by comprising the following steps:

2. The automatic road surface breakage detection method according to claim 1, wherein the photographing device provided on the detection vehicle is specifically:

3. The automatic pavement damage detection method according to claim 1, wherein the specific process of processing the pavement image through sfm to obtain a three-dimensional reconstruction map comprises the following steps:

4. The automatic pavement damage detection method according to claim 1, wherein the specific process of processing the three-dimensional reconstruction map based on the deep learning model to obtain the pavement damage detection result comprises the following steps:

5. An automatic pavement damage detection system, comprising:

6. The automatic road surface breakage detection system according to claim 5, wherein the photographing device provided on the detection vehicle is specifically:

7. The automatic pavement damage detection system according to claim 5, wherein the specific process of processing the pavement image through sfm to obtain a three-dimensional reconstruction map comprises:

8. The automatic pavement damage detection system according to claim 5, wherein the specific process of processing the three-dimensional reconstruction map based on the deep learning model to obtain the pavement damage detection result comprises the following steps:

9. A storage medium having stored therein instructions which, when read by a computer, cause the computer to perform the method of any of claims 1 to 4.

10. An electronic device comprising the storage medium of claim 9, a processor executing instructions within the storage medium.