CN116452574A

CN116452574A - Gap detection method, system and storage medium based on improved YOLOv7

Info

Publication number: CN116452574A
Application number: CN202310487074.1A
Authority: CN
Inventors: 蒋庆; 叶冠廷; 晋强; 李赛; 陶金泰
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-07-18

Abstract

The invention relates to the technical fields of civil structure engineering and computers, in particular to a gap detection method, a gap detection system and a gap detection storage medium based on improved YOLOv 7. According to the gap detection method based on the improved YOLOv7 model, a self-grinding module is embedded into the existing YOLOv7 model, and then a road gap detection model is obtained through machine learning training. The invention adopts the road gap detection model to realize road gap detection, thereby greatly improving the gap recognition precision and efficiency. And the self-grinding Aatten module performs data cutting in the dimension in the data processing process, and then sums up the four data with the same dimension after cutting in the element level to realize multi-level processing of the characteristic data, so as to enlarge the receptive field of the image and better extract the characteristic information on the original image.

Description

Gap detection method, system and storage medium based on improved YOLOv7

Technical Field

The invention relates to the technical fields of civil structure engineering and computers, in particular to a gap detection method, a gap detection system and a gap detection storage medium based on improved YOLOv 7.

Background

Concrete is one of the most common materials in bridges, tunnels, dams and other infrastructure. Due to the effects of temperature, overload, corrosion and periodic maintenance deficiencies, there are various types of defects and may result in loss of infrastructure system functionality and security damage. Traditional road crack detection is mainly based on artificial vision, but the method has the following defects: 1) Explicitly refining key features is highly dependent on expertise and skill; 2) The workload is excessive in engineering practice, the efficiency is low and the feasibility is lacked; 3) Manual inspection is difficult to cover the entire area. Manual detection is limited by manpower and time, and is difficult to comprehensively detect a large-area road, so that the condition of missed detection may exist. 4) Cannot be monitored in real time: the traditional detection method can only detect once, and cannot monitor the state change of the road crack in real time.

Therefore, there is an urgent need to develop a reliable and effective structural health monitoring method, which has the advantages of long distance, non-contact, high precision, high efficiency and the like so as to facilitate the inspection method of road cracks.

Disclosure of Invention

In order to overcome the defect that the prior art lacks a high-efficiency reliable road crack detection method, the invention provides a crack detection method based on improved YOLOv7, and the road crack detection with long distance, non-contact, high precision and high efficiency is realized.

A gap detection method based on improved YOLOv7, the YOLOv7 model comprises a backbone network, a neck network and a head network; the backbone network comprises a first CBS-1 network, a first CBS-2 network, a second CBS-1 network, a second CBS-2 network, a first ELAN network, a first sub-module, a second sub-module and a third sub-module which are sequentially connected; the first sub-module, the second sub-module and the third sub-module have the same structure and are composed of an MP-1 network and an ELAN network which are sequentially connected; the header network comprises a first output network, a second output network and a third output network; the first output network, the second output network and the third output network have the same structure and are composed of REP networks and CBM networks which are sequentially connected;

the gap detection method based on the improved YOLOv7 comprises the following steps:

s1, performing structural optimization on a YOLOv7 model to obtain a YOLOv7 optimization model;

s2, constructing a road gap image marked with a gap recognition result as a marking sample, and training a YOLOv7 optimization model by combining the marking sample to obtain a trained YOLOv7 optimization model as a road gap detection model;

s3, inputting the slit image to be identified into a road slit detection model, and outputting an identification result corresponding to the slit image to be identified by the road slit detection model;

In S1, a YOLOv7 optimization model comprises optimization of a first sub-module and a second sub-module of a backbone network, and the optimized first sub-module and second sub-module have the same structure and respectively comprise an MP-1 network, an Aatten module and an ELAN network which are sequentially connected;

the Aatten module comprises an input layer, a first CBS network, a first MP network, a second MP network, a third MP network, a fifth Concat splicing network, a Sigmoid activation layer, a Split segmentation network, a second CBS network and an output layer which are sequentially connected;

the input of the Aatten module is the input of the input layer of the Aatten module, and the output of the Aatten module is the output of the output layer of the Aatten module;

let 2 of corresponding size P0Q 0 ⁿ⁰ The layer characteristic data is denoted as data P0 x Q0 x 2 ⁿ⁰ ；

The input layer of the Aatten module is used for inputting data P1Q 1 2 ^m Conversion to data 32 x 2 ⁸ P1 x Q1 is the image size in the data, m is any positive integer; first CThe BS network is used for outputting data 32×32×2 from the input layer ⁸ The data is generated by 32 x 2 after the convolution output of the dimension reduction ⁷ Output data of the first CBS network is 32 x 2 ⁷ The multi-dimensional linear transformation is sequentially carried out through the first MP network, the second MP network and the third MP network, and the fifth Concat splicing network is used for carrying out dimensional splicing on the output data of the first CBS network, the output data of the first MP network, the output data of the second MP network and the output data of the third MP network to obtain data 32X 2 ⁹ The method comprises the steps of carrying out a first treatment on the surface of the The Sigmoid activation layer is used for outputting data 32 x 2 to the fifth Concat splicing network ⁹ Activating to obtain the weight of each dimension; the input of the Split network is output data 32 x 2 of the fifth Concat splicing network ⁹ And the weighted convolution data of the Sigmoid activation layer output is 32 x 2 ⁹ The Split dividing network performs dimension division on the input convolution data to obtain 4 pieces of divided data 32 x 2 ⁷ The input of the second CBS network is 4 pieces of Split data 32 x 2 output by the Split network ⁷ Data 32×32×2 after element level addition ⁷ The second CBS network convolves the input data and outputs the data 32 x 2 ⁸ The method comprises the steps of carrying out a first treatment on the surface of the The output layer outputs 32 x 2 of the second CBS network ⁸ And data output from input layer 32 x 2 ⁸ Element level summation is carried out and then the element level summation is converted into data P1Q 1 ^m And outputting the result.

Preferably, the YOLOv7 optimization model in S1 further includes optimization of a third sub-module of the backbone network, and the optimized third sub-module adds a Myswin module at the output end;

the Myswin module comprises an input layer, a third CBS network, a Swin-T network, a sixth Concat splicing network, a fourth CBS network and an output layer; the input layer, the Swin-T network, the sixth Concat splicing network, the fourth CBS network and the output layer are sequentially connected, the input of the third CBS network is connected with the output of the input layer, and the output of the third CBS network is connected with the input of the sixth Concat splicing network;

The input layer inputs data P2.q2.2 ^m Conversion to data 32 x 2 ⁸ P2Q 2 is the image size in the data; the Swin-T network outputs 32 x 2 data from the input layer ⁸ Conversion to data 32*32*2 ⁷ The third CBS network outputs the data 32×32×2 from the input layer ⁸ Conversion to data 32 x 2 ⁷ Output data of the sixth Concat splicing network to the Swin-T network is 32 x 2 ⁷ And output data of the third CBS network 32 x 2 ⁷ Performing dimension concatenation to obtain data of 32×32×2 ⁸ The method comprises the steps of carrying out a first treatment on the surface of the Output data of the fourth CBS network to the sixth Concat splicing network is 32 x 2 ⁸ Performing convolution processing, and outputting data 32×32×2 by the fourth CBS network ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The output layer of the Myswin module outputs 32×32×2 output data of the fourth CBS network ⁷ Data P2Q 2 of the same data structure as the input data of Myswin module ^m 。

Preferably, the YOLOv7 optimization model in S1 further includes optimization of a head network, and the optimized head network adds a FEEM module at an input end of each output network;

the FEEM module comprises an input layer, a first branch, a second branch, a third branch, a BN standardization layer, a fourth Silu activation layer and an output layer; the input layer inputs data P3Q 32 ^m Conversion to data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the P3Q 3 is the image size in the data;

The first branch, the second branch and the third branch all comprise a CBS network, a ConV expansion reel layer and a Silu activation layer which are sequentially connected, wherein the CBS network outputs 32 x 2 data to the input layer ⁷ Convolving and outputting data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The ConV expansion rolling machine layer outputs 32 x 2 data to the CBS network in the same branch ⁷ Puffing and outputting data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The input data of the Silu activation layer is the output data of the ConV expansion reel layer in the same branch 32 x 2 ⁷ And output data of input layer 32 x 2 ⁷ Data 32×32×2 after element level addition ⁷ The Silu active layer outputs data 32 x 2 ⁷ As the output of the branch; the expansion coefficient of the ConV expansion rolling machine layer in the first branch, the expansion coefficient of the ConV expansion rolling machine layer in the second branch and the expansion coefficient of the ConV expansion rolling machine layer in the third branch are different;

the input of the BN standardization layer is output data 32 x 2 of the first branch ⁷ Second oneOutput data of branch 32 x 2 ⁷ And output data of the third branch 32 x 2 ⁷ Data 32×32×2 after element level addition ⁷ The BN standardization layer carries out standardization treatment on input data and then outputs data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the Data output by the fourth Silu activation layer to the BN normalization layer is 32 x 2 ⁷ Output data 32 x 2 after activation ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The output layer outputs the data 32 x 2 of the fourth Silu activation layer ⁷ Data P3Q 3 x 2 converted into the same data structure as the input data of the FEEM module ^m 。

Preferably, the labeling sample is a road gap image labeled with a road gap class, the recognition result output by the road gap detection model is the road gap class, and the road gap class comprises a transverse gap, a longitudinal gap and a fatigue gap.

Preferably, S2 comprises the following substeps:

s21, obtaining road gap categories, and obtaining a plurality of road gap images under each gap category as labeling samples, wherein the road gap images under the same gap category come from a plurality of different road surfaces; performing image processing on the road gap image in at least part of the marked samples to obtain a processed road gap image with a known gap class as a derivative sample; collecting a labeling sample and a derivative sample as a learning sample set; the image processing includes: one or more of image size conversion, image rotation, changing exposure and noise processing;

s22, enabling the YOLOv7 optimization model to perform machine learning on the learning sample set so as to train parameters of the YOLOv7 optimization model;

S23, when training of the YOLOv7 optimization model reaches a set iteration termination condition, fixing the YOLOv7 optimization model as a road gap detection model.

Preferably, the iteration termination condition in S23 is: the iteration number of the YOLOv7 optimization model reaches a set value; alternatively, the loss function of the YOLOv7 optimization model reaches the set point.

The invention also provides a gap detection system based on the improved YOLOv7, which is used for executing the gap detection method based on the improved YOLOv 7; the system comprises: the system comprises an image acquisition module, a gap identification module and an early warning module; the gap recognition module is respectively connected with the image acquisition module and the early warning module;

the image acquisition module is used for acquiring a slit image to be identified, wherein the slit image comprises slits;

the gap recognition module is used for acquiring a gap image to be recognized from the image acquisition module, inputting the gap image to be recognized into the road gap detection model, and outputting a recognition result of the gap image to be recognized by the road gap detection model;

the early warning module acquires the recognition result output by the road gap detection model, and executes the set early warning action according to the recognition result.

Preferably, the early warning action includes: recording the identification result and sending the identification result to at least one of the set contacts.

The invention also provides a gap detection system based on the improved YOLOv7, which comprises a memory and a processor, wherein the memory is stored with a computer program, the processor is connected with the memory, and the processor is used for executing the computer program to realize the gap detection method based on the improved YOLOv 7.

The invention also proposes a storage medium storing a computer program for implementing the improved YOLOv 7-based gap detection method when executed.

The invention has the advantages that:

(1) According to the gap detection method based on the improved YOLOv7 model, a self-grinding module is embedded into the existing YOLOv7 model, and then a road gap detection model is obtained through machine learning training. The invention adopts the road gap detection model to realize road gap detection, thereby greatly improving the gap recognition precision and efficiency.

(2) According to the invention, the user-defined Aatten module is added, data is cut in the dimension in the data processing process, and then the four data with the same dimension after cutting are summed in the element level, so that the multi-level processing of the characteristic data is realized, and the receptive field of the image is further enlarged, so that the characteristic information on the original image is better extracted.

(3) According to the invention, by adding a self-defined Myswin module, the self-attention calculation of the mechanical energy is performed by adopting a Swin_T network (Swin TransformerBlock), and then the self-attention calculation result and the convolution data before calculation are spliced. The Myswin module can capture the relation between different levels of inputs, is beneficial to the neural network to extract key information more accurately, and improves the abstraction degree of the inputs, thereby improving the performance and accuracy of the YOLOv7 optimization model.

(4) According to the invention, the custom FEEM module is added, the data is processed by adopting multi-path puffing convolution, the sensitivity of the model is improved by enlarging the characteristic area in the image, and the receptive field is increased, so that the YOLOv7 optimization model can better identify the target in the image, and the model has more adaptability in the process of fitting the data and extracting complex modes.

(5) According to the invention, through the image processing of the road gap image in the labeling sample, the number of the learning samples is enriched, so that the trained model can adapt to real-time detection under different environments, the model is helped to learn the characteristics of road gaps with different sizes fully, the accuracy and stability of the road gap detection model are improved further, the predictable range of the road gap detection model is enlarged, and the accuracy of road gap detection using the road gap detection model is improved.

(6) According to the gap detection system based on the improved YOLOv7 model, the gap detection result of the road can be obtained in real time by adopting the gap detection method based on the improved YOLOv7 model, and then the road is pre-warned in real time by the pre-warning module, so that the road safety monitoring is greatly improved, and the safety risk caused by road damage is reduced.

(7) The gap detection system and the storage medium based on the improved YOLOv7 model provide a carrier for the gap detection method based on the improved YOLOv7 model, and are convenient to popularize and apply.

Drawings

FIG. 1 is a network topology of a conventional YOLOv7 model;

FIG. 2 is a network topology of a first YOLOv7 optimization model;

FIG. 3 is a network topology of a second YOLOv7 optimization model;

FIG. 4 is a network topology of a third YOLOv7 optimization model;

FIG. 5 (a) is a graph showing the accuracy performance of YOLOv7 and YOLOv7-Aatten on different slot categories;

FIG. 5 (b) is a graph showing the recall performance of Yolov7 and Yolov7-Aatten on different slit categories;

FIG. 5 (c) is a graph showing the mean average accuracy performance of YOLOv7 and YOLOv7-Aatten across different slot categories;

FIG. 5 (d) is a comparison of Yolov7 and Yolov7-Aatten on different evaluation indicators;

FIG. 6 (a) is a graph showing the accuracy performance of YOLOv7 and YOLOv7-AM on different slot categories;

FIG. 6 (b) is a graph showing the recall performance of YOLOv7 and YOLOv7-AM on different slot categories;

FIG. 6 (c) is a comparison of YOLOv7 and YOLOv7-AM on different evaluation indicators;

FIG. 7 (a) is a graph showing the accuracy performance of YOLOv7 and YOLOv7-AMF on different slot categories;

FIG. 7 (b) is a recall performance of Yolov7 and Yolov7-AMF on different slot categories;

FIG. 7 (c) is a comparison of YOLOv7 and YOLOv7-AMF on different evaluation indexes.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For convenience of description, in this embodiment, 2 corresponding to the dimension P0Q 0 will be ⁿ⁰ The layer characteristic data is denoted as data P0 x Q0 x 2 ⁿ⁰ I.e. P0*Q0*2 ⁿ⁰ 2 representing an image of size P0Q 0 ⁿ⁰ Features in the individual dimensions. n0 is any positive integer.

Traditional YOLOv7 model

As shown in fig. 1, the conventional YOLOv7 model includes: backbone network (backbone network), neck network (backbone network), and head network (head network).

The backbone network comprises a first CBS-1 network, a first CBS-2 network, a second CBS-1 network, a second CBS-2 network, a first ELAN network, a first sub-module, a second sub-module and a third sub-module which are sequentially connected.

The first sub-module, the second sub-module and the third sub-module have the same structure and are composed of an MP-1 network and an ELAN network which are sequentially connected; the input of the MP-1 network in the first sub-module is connected with the output of the first ELAN network, the output of the ELAN network in the first sub-module is connected with the input of the MP-1 network in the second sub-module, and the output of the ELAN network in the second sub-module is connected with the input of the MP-1 network in the third sub-module.

The neck network comprises an SPPCSPC network, a first CBS-3 network, a first UPsample network, a first Concat splice network, a first ELAN-W network, a second CBS-3 network, a second UPsample network, a second Concat splice network, a second ELAN-W network, a first MP-2 network, a third Concat splice network, a third ELAN-W network, a second MP-2 network, a fourth Concat splice network and a fourth ELAN-W network which are sequentially connected; the neck network further includes a third CBS-3 network and a fourth CBS-3 network.

The input end of the third CBS-3 network is connected with the output end of the second sub-module, and the output end of the third CBS-3 network is connected with the input end of the first Concat splicing network; the first Concat splicing network is used for splicing the output of the second sub-module and the output of the first UPsample network.

The input end of the fourth CBS-3 network is connected with the output end of the first submodule, and the output end of the fourth CBS-3 network is connected with the input end of the second Concat splicing network; the second Concat splicing network is used for splicing the output of the first sub-module and the output of the second UPsample network.

The header network comprises a first output network, a second output network and a third output network; the first output network, the second output network and the third output network have the same structure and are composed of REP networks and CBM networks which are sequentially connected, wherein the input of the REP networks is used as the input of the output networks, the output of the REP networks is connected with the input of the CBM networks, and the output of the CBM networks is used as the output of the output networks.

An input of the first output network is connected to an output of the second ELAN-W network, an input of the second output network is connected to an output of the third ELAN-W network, and an input of the second output network is connected to an output of the fourth ELAN-W network.

YOLOv7 inputs an image of size p×q, and the first CBS-1 network extracts image feature acquisition data p×q×2 ⁿ N is any positive integer; the first CBS-2 network transmits data P.times.Q.times.2 ⁿ Conversion to data (P/2) (Q/2) x 2 ⁿ⁺¹ Second CBS-1 network pair data (P/2) (Q/2) x 2 ⁿ⁺¹ Convolving, and converting the convolved result into data (P/4) (Q/4) 2 via a second CBS-2 network ⁿ⁺² The method comprises the steps of carrying out a first treatment on the surface of the Data (P/4) ×2 (Q/4) ⁿ⁺² Conversion to data (P/4) (Q/4) x 2 via a first ELAN network ⁿ⁺³ 。

Data (P/4) ×2 (Q/4) ⁿ⁺³ Converting the MP-1 network in the first sub-module into data (P/8) (Q/8) 2 ⁿ ⁺³ Data (P/8) ×2 (Q/8) ⁿ⁺³ Converting into data (P/8) (Q/8) x 2 via an ELAN network in a first sub-module ⁿ⁺⁴ 。

Data (P/8) ×2 (Q/8) ⁿ⁺⁴ Converting the MP-1 network in the second sub-module into data (P/16) (Q/16) 2 ⁿ⁺⁴ Data (P/16) ×2 (Q/16) ⁿ⁺⁴ Converting into data (P/16) (Q/16) x 2 via the ELAN network in the second sub-module ⁿ⁺⁵ 。

Data (P/16) ×2 (Q/16) ⁿ⁺⁵ Converting the MP-1 network in the third sub-module into data (P/32) (Q/32) 2 ⁿ⁺⁵ Data (P/32) ×2 (Q/32) ⁿ⁺⁵ The ELAN network in the third sub-module outputs data (P/32) ×2 (Q/32) through the ELAN network processing in the third sub-module ⁿ⁺⁵ As an output of the third sub-module.

Data (P/32) output by the third submodule (Q/32) 2 ⁿ⁺⁵ Through a neck networkAnd (5) fusing row characteristics.

Second ELAN-W network output data (P/8) (Q/8) 2 of the neck network ⁿ⁺² The output of the second ELAN-W network is used as an input of the first output network, and after being sequentially processed by the REP network and the CBM network of the first output network, the first output network outputs data (P/8) × (Q/8) ×255.

Third ELAN-W network output data (P/16) (Q/16) 2 of the neck network ⁿ⁺³ The output of the second dispersive ELAN-W network is used as the input of the second output network, and after being sequentially processed by the REP network and the CBM network of the second output network, the second output network outputs data (P/16) × (Q/16) ×255.

Fourth ELAN-W network output data (P/32) (Q/32) 2 of the neck network ⁿ⁺⁴ The output of the fourth ELAN-W network is used as the input of the third output network, and after the output is sequentially processed by the REP network and the CBM network of the third output network, the third output network outputs data (P/32) × (Q/32) ×255.

The YOLOv7 model detects according to (P/8) × (Q/8) ×255, (P/16) × (Q/16) ×255 and (P/32) ×255 output by the three output layers, and outputs a detection result.

The YOLOv7 model is an existing model structure in the field, wherein a backbone network, a neck network and a head network are all well defined in the YOLOv7 model, and the model belongs to technical common sense in the field. The backbone network and the header network are briefly described as the subsequent model improvements are directed to improvements of the backbone network and the header network. CBS network, MP-1 network, MP-2 network, ELAN network, neck network, REP network, CBM network are all well defined in YOLOv7 model. In FIG. 1, the CBS networks with different convolution kernel sizes at different positions in the YOLOv7 model are distinguished by the CBS-1 network, the CBS-2 network and the CBS-3 network, and a YOLOv7 model can be directly constructed by combining the positions of the CBS-1 network, the CBS-2 network and the CBS-3 network and the input/output data structure by a person skilled in the art, and the YOLOv7 model is not repeated here.

The first model of optimization of Yolov7 was designated Yolov7-Aatten

As shown in FIG. 2, the first optimization model of Yolov7 is based on Yolov7, and an Aatten module is added into a first sub-module and a second sub-module of a backbone network.

The Aatten module comprises an input layer, a first CBS network, a first MP network, a second MP network, a third MP network, a fifth Concat splicing network, a Sigmoid activation layer, a Split splitting network, a second CBS network and an output layer which are sequentially connected.

In the first YOLOv7 optimization model, the improved first submodule and the improved second submodule have the same structure. In the first sub-module and the second sub-module, the MP-1 network, the Aatten module and the ELAN network are sequentially connected, the input of the Aatten module is the output of the MP-1 network, and the output of the Aatten module is the input of the ELAN network.

Specifically, the input of the Aatten module is the input of the input layer of the Aatten module, and the output of the Aatten module is the output of the output layer of the Aatten module.

The input layer of the Aatten module is used for inputting data P1Q 1 2 ^m Conversion to data 32 x 2 ⁸ M is any positive integer. The first CBS network is configured to output data 32×32×2 from the input layer ⁸ The data is generated by 32 x 2 after the convolution output of the dimension reduction ⁷ Output data of the first CBS network is 32 x 2 ⁷ The multi-dimensional linear transformation is sequentially carried out through the first MP network, the second MP network and the third MP network, and the fifth Concat splicing network is used for carrying out dimensional splicing on the output data of the first CBS network, the output data of the first MP network, the output data of the second MP network and the output data of the third MP network to obtain data 32X 2 ⁹ The method comprises the steps of carrying out a first treatment on the surface of the The Sigmoid activation layer is used for outputting data 32 x 2 to the fifth Concat splicing network ⁹ Activating to obtain the weight of each dimension; the input of the Split network is output data 32 x 2 of the fifth Concat splicing network ⁹ And the weighted convolution data of the Sigmoid activation layer output is 32 x 2 ⁹ The Split dividing network performs dimension division on the input convolution data to obtain 4 pieces of divided data 32 x 2 ⁷ The input of the second CBS network is 4 pieces of Split data 32 x 2 output by the Split network ⁷ Data 32×32×2 after element level addition ⁷ The second CBS network convolves the input data and outputs the data 32 x 2 ⁸ The method comprises the steps of carrying out a first treatment on the surface of the The output layer outputs 32 x 2 of the second CBS network ⁸ And data output from input layer 32 x 2 ⁸ Element level summation is carried out and then the element level summation is converted into data P1Q 1 ^m And outputting the result.

In this embodiment, the input of the Aatten module in the first sub-module is the data (P/8) (Q/8) x 2 output by the MP-1 network in the first sub-module ⁿ⁺³ The method comprises the steps of carrying out a first treatment on the surface of the The output of the Aatten module in the second sub-module is data (P/16) (Q/16) 2 ⁿ⁺⁴ . In this embodiment, the structure of the input data of the Aatten module is the same as the structure of the output data of the Aatten module, because the improvement is made on the basis of not changing the original network of YOLOv 7.

The second model of YOLOv7 optimization was designated YOLOv7-AM

As shown in fig. 3, the second YOLOv7 optimization model is further added with a Myswin module at the output end of the third sub-module in the backbone network based on the first YOLOv7 optimization model.

The Myswin module comprises an input layer, a third CBS network, a Swin-T network, a sixth Concat splice network, a fourth CBS network and an output layer. The input layer, the Swin-T network, the sixth Concat splicing network, the fourth CBS network and the output layer are sequentially connected, the input of the third CBS network is connected with the output of the input layer, and the output of the third CBS network is connected with the input of the sixth Concat splicing network.

The input layer inputs data P2.q2.2 ^m Conversion to data 32 x 2 ⁸ The Swin-T network outputs the data 32 x 2 from the input layer ⁸ Conversion to data 32 x 2 ⁷ The third CBS network outputs the data 32×32×2 from the input layer ⁸ Conversion to data 32 x 2 ⁷ Output data of the sixth Concat splicing network to the Swin-T network is 32 x 2 ⁷ And output data of the third CBS network 32 x 2 ⁷ Performing dimension concatenation to obtain data of 32×32×2 ⁸ The method comprises the steps of carrying out a first treatment on the surface of the Output data of the fourth CBS network to the sixth Concat splicing network is 32 x 2 ⁸ Performing convolution processing, and outputting data 32×32×2 by the fourth CBS network ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The output layer of the Myswin module outputs 32×32×2 output data of the fourth CBS network ⁷ Data P2Q 2 of the same data structure as the input data of Myswin module ^m 。

In this embodiment, a Myswin module is added between the third sub-module and the SPPCSPC network of the neck network, where the input of the Myswin module is output data (P/32) ×2 (Q/32) of the third sub-module ⁿ⁺⁵ . In this embodiment, the improvement is performed on the basis of not changing the original network of YOLOv7, so that the structure of the input data of the Myswin module is the same as the structure of the output data of the Myswin module.

The third model of YOLOv7 optimization was designated YOLOv7-AMF

As shown in fig. 4, the third YOLOv7 optimization model is further based on the second YOLOv7 optimization model, and an FEEM module is added to the input end of each output network in the header network.

The FEEM module includes an input layer, a first leg, a second leg, a third leg, a BN normalization layer, a fourth Silu activation layer, and an output layer.

The first branch comprises a fifth CBS network and ConV which are connected in sequence ₁ An expansion reel layer and a first Silu activation layer. The second branch comprises a sixth CBS network and ConV which are connected in sequence ₂ An expansion reel layer and a second Silu activation layer. The third branch comprises a seventh CBS network and ConV which are connected in sequence ₃ An expansion reel layer and a third Silu activation layer.

The input layer is respectively connected with the input end of the first branch, the input end of the second branch and the input end of the third branch; the input of the BN normalization layer is the element level summed data of the output of the first branch, the output of the second branch, and the output of the third branch. The BN normalization layer, the fourth Silu activation layer, and the output layer are connected in sequence.

The input layer inputs data P3Q 32 ^m Conversion to data 32 x 2 ⁷ ；

The input of the first branch is the input of the fifth CBS network, and the output of the first branch is the output of the first Silu active layer. Output data of the fifth CBS network to the input layer is 32×32×2 ⁷ Convolving and outputting data 32 x 2 ⁷ ；ConV ₁ The expansion rolling machine layer outputs 32 x 2 data to the fifth CBS network ⁷ Puffing and outputting data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The input data of the first Silu active layer is ConV ₁ Output data of the expansion winding machine layer is 32 x 2 ⁷ And output data of input layer 32 x 2 ⁷ Data 32×32×2 after element level addition ⁷ The first Silu active layer outputs data 32 x 2 ⁷ 。

The input of the second branch is the input of the sixth CBS network, and the output of the second branch is the output of the second Silu active layer. Output data of the sixth CBS network to the input layer is 32×32×2 ⁷ Convolving and outputting data 32 x 2 ⁷ ；ConV ₂ Data output by the expansion reel layer to the sixth CBS network is 32 x 2 ⁷ Puffing and outputting data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The input data of the second Silu active layer is ConV ₂ Output data of the expansion winding machine layer is 32 x 2 ⁷ And output data of input layer 32 x 2 ⁷ Data 32×32×2 after element level addition ⁷ The second Silu active layer outputs data 32 x 2 ⁷ 。

The input of the third branch is the input of the seventh CBS network, and the output of the third branch is the output of the third Silu activation layer. Output data of the seventh CBS network to the input layer is 32 x 2 ⁷ Convolving and outputting data 32 x 2 ⁷ ；ConV ₃ Data 32 x 2 output by the expansion reel layer to the seventh CBS network ⁷ Puffing and outputting data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The input data of the third Silu active layer is ConV ₃ Output data of the expansion winding machine layer is 32 x 2 ⁷ And output data of input layer 32 x 2 ⁷ Data 32×32×2 after element level addition ⁷ The third Silu active layer outputs data 32×32×2 ⁷ 。

The input of the BN standardization layer is output data 32 x 2 of the first branch ⁷ Output data of the second branch 32 x 2 ⁷ And output data of the third branch 32 x 2 ⁷ Data 32×32×2 after element level addition ⁷ The BN standardization layer carries out standardization treatment on input data and then outputs data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the Data output by the fourth Silu activation layer to the BN normalization layer is 32 x 2 ⁷ Activation is performedPost output data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The output layer outputs the data 32 x 2 of the fourth Silu activation layer ⁷ Data P3Q 3 x 2 converted into the same data structure as the input data of the FEEM module ^m 。

In the present embodiment, conV ₁ Expansion reel layer, conV ₂ Expansion reel layer and ConV ₃ The expansion rolling machine layers are all of conventional ConV expansion rolling machine layer structures, and the three expansion rolling machine layers are different in expansion coefficient. In particular, conV ₁ The expansion coefficient of the expansion rolling machine layer is 1, conV ₂ The expansion coefficient of the expansion rolling machine layer is 2, conV ₃ The expansion coefficient of the expansion winder layer was 3.

In this embodiment, the FEEM module added to the input end of the first output network is referred to as a first FEEM module, the FEEM module added to the input end of the second output network is referred to as a second FEEM module, and the FEEM module added to the input end of the third output network is referred to as a third FEEM module.

The input of the first FEEM module is output data 80 x 2 of the second ELAN-W network in the neck network ⁷ The output of the first FEEM module is the input of the REP network in the first output network.

The input of the second FEEM module is output data 40 x 2 of the third ELAN-W network in the neck network ⁸ The output of the second FEEM module is the input of the REP network in the second output network.

The input of the third FEEM module is the output data 20 x 2 of the fourth ELAN-W network in the neck network ⁹ The output of the third FEEM module is the input of the REP network in the third output network.

In this embodiment, the structure of the input data of the FEEM module is the same as the structure of the output data of the FEEM module, since the improvement is made on the basis of not changing the original network of YOLOv 7.

It is noted that the input layer of Aatten, the output layer of Aatten, the input layer of Myswin, the output layer of Myswin, the input layer of FEEM, and the output layer of FEEM are all data structure processing networks for converting the input data structure, and the output data structure conforms to the setting.

Because of the module characteristics, in this embodiment, the input layer of Aatten and the output layer of Aatten adopt nonlinear conversion networks, the input layer and the output layer of Myswin adopt nonlinear conversion networks, and the input layer and the output layer of FEEM adopt linear conversion networks.

The CBS network is a convolutional network composed of a Conv convolutional layer, a BN standardization layer and a glu activation layer which are sequentially connected, and the CBS network determines the data structure change of output data relative to input data according to the size of a convolutional kernel. The CBS network is described in detail in the conventional YOLOv7 model, in this embodiment, an input data structure and an output data structure of the CBS network are defined, and a person skilled in the art can select CBS networks with different convolution kernel sizes according to the data structure requirement, and the adaptive selection made by combining the definition of the present invention in the art is within the protection scope of the present invention.

MP network, sigmoid activating layer, split dividing network, swin-T network, silu activating layer are all existing neural network, and will not be described here.

The embodiment also provides a gap detection system based on the improved YOLOv7, which comprises an image acquisition module, a gap identification module and an early warning module; the gap recognition module is respectively connected with the image acquisition module and the early warning module.

When the system is applied to road detection, an image acquisition module acquires a road surface image of a road and transmits the road surface image containing gaps to a gap recognition module as a gap image to be recognized;

the gap recognition module is used for inputting the gap image to be recognized into the road gap detection model, and outputting a recognition result of the gap image to be recognized, namely judging whether the road gap is a longitudinal gap, a transverse gap or a fatigue gap.

The early warning module acquires the recognition result output by the road gap detection model, and executes the set early warning action according to the recognition result. Specifically, the early warning model can record the recognition result so as to realize tracking and monitoring of road surface damage; or the early warning model sends the identification result to the set contact person so that road maintenance personnel can know the road damage condition in real time even if the road damage condition is processed.

The conventional YOLOv7 model and the three modified YOLOv7 models provided above were validated as follows in connection with specific examples.

For convenience of description, the following definition of the conventional YOLOv7 model is denoted as YOLOv7 model, and three YOLOv7 optimization models are denoted as: YOLOv7-Aatten, YOLOv7-AM and YOLOv7-AMF.

Defining road gap categories in this embodiment includes: transverse slits, longitudinal slits, and fatigue slits. In this embodiment, the model is evaluated in combination with accuracy (precision), recall (recall), and mean average accuracy (mAP).

In this embodiment, the dataset is first constructed.

When a data set is constructed, firstly, labeling samples under different gap categories are obtained, wherein the labeling samples are road gap images with known gap categories; then, performing size scaling, exposure rate adjustment, noise addition and the like on the road gap image in the marked sample so as to obtain more derived samples on the basis of the marked sample through image processing, wherein the gap types of the derived samples are unchanged; a dataset is a collection of labeled and derived samples. In this embodiment, the size of the road gap image in the labeling sample is 1000×1000 or 3000×3000.

In the embodiment, model training and testing are respectively carried out on models YOLOv7, YOLOv7-Aatten, YOLOv7-AM and YOLOv7-AMF by combining a data set so as to obtain the accuracy (precision), recall (recovery) and mean average accuracy (mAP) of each model; the specific steps are shown in SA1-SA 4.

SA1, dividing a data set into a training set, a verification set and a test set;

SA2, training the model by combining the training set and the verification set until the model converges;

SA3, combining data in the test set to test the converged model in SA2, and obtaining the accuracy (precision), recall (recovery) and mean average precision (mAP) of the model.

SA2 comprises the following sub-steps:

SA21, extracting training samples from the training set, and carrying out parameter training on the model by combining the training samples;

SA22, extracting verification samples from the verification set, and calculating model loss by combining the verification samples; gradient updating is carried out on model parameters according to the loss;

SA23, looping steps SA21 and SA22 until the model converges.

In this embodiment, let p=q=640, P and Q mainly define the data structure of the network inside the model, and do not define the input image size of the model; it is known to those skilled in the art that the model input layer can acquire images so that the images conform to their defined dimensions. For example, in this embodiment, the dataset includes 1000 x 1000 and 3000 x 3000 road slit images, which were each cut by the model to 640 x 640 images before entering the model.

In the embodiment, the same data set is adopted to train and test different models, so that performance comparison of the models in the same data space is ensured, and accuracy of mutual comparison of the models is ensured.

As shown in FIG. 5 (a), the model Yolov7-Aatten has better accuracy than model Yolov7 in both the longitudinal slit and fatigue slit categories; as shown in FIG. 5 (b) and FIG. 5 (c), the recall rate and the average mean value average precision of the model Yolov7-Aatten in all the categories of the longitudinal slit, the transverse slit and the fatigue slit are superior to those of the model Yolov7. As shown in FIG. 5 (d), the model Yolov7-Aatten is superior to the model Yolov7 in overall performance.

It can be seen from fig. 5 (a), 5 (b) and 5 (c) that the model YOLOv7-Aatten has a more excellent performance in the longitudinal slit, and that the model YOLOv7-Aatten has an improvement of 12% in accuracy over the model YOLOv7 in recognition of the longitudinal slit, and the model YOLOv7-Aatten has an improvement of 20% in recall and average accuracy over the model YOLOv7.

As shown in FIG. 6 (a), the model Yolov7-AM was superior to model Yolov7 in terms of accuracy in both longitudinal gap and fatigue gap categories; as shown in FIG. 6 (b), model Yolov7-AM was superior to model Yolov7 in recall in both the longitudinal and transverse slot categories. As shown in FIG. 6 (c), the model Yolov7-AM was superior to the model Yolov7 in overall performance.

As can be seen from fig. 6 (a) and 6 (b), the model YOLOv7-AM has more excellent performance in the longitudinal slit, and the model YOLOv7-AM is improved by 17% in accuracy rate relative to the model YOLOv7, and the performance of the model YOLOv7-AM is improved by 16% in recall rate relative to the model YOLOv7.

As shown in FIG. 7 (a) and FIG. 7 (b), the model Yolov7-AMF was superior to model Yolov7 in terms of accuracy and recall in the longitudinal, transverse and fatigue gap categories.

As shown in FIG. 7 (c), the model Yolov7-AMF was superior to model Yolov7 in overall performance, especially over all the test data, with a 10% improvement in model Yolov7-AMF over model Yolov7.

As can be seen from fig. 7 (a) and 7 (b), the model YOLOv7-AMF has more excellent performance in both the longitudinal slit and the fatigue slit. On longitudinal gap recognition, the model YOLOv7-AMF is improved by 14% in accuracy relative to the model YOLOv7, and the model YOLOv7-AMF is improved by 16% in recall relative to the model YOLOv7. In fatigue gap recognition, the accuracy of the model YOLOv7-AMF is improved by 15% relative to the model YOLOv7, and the recall rate of the model YOLOv7-AMF is improved by 17% relative to the model YOLOv7.

The above embodiments are merely preferred embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A gap detection method based on improved YOLOv7, the YOLOv7 model comprises a backbone network, a neck network and a head network; the backbone network comprises a first CBS-1 network, a first CBS-2 network, a second CBS-1 network, a second CBS-2 network, a first ELAN network, a first sub-module, a second sub-module and a third sub-module which are sequentially connected; the first sub-module, the second sub-module and the third sub-module have the same structure and are composed of an MP-1 network and an ELAN network which are sequentially connected; the header network comprises a first output network, a second output network and a third output network; the first output network, the second output network and the third output network have the same structure and are composed of REP networks and CBM networks which are sequentially connected;

The gap detection method based on the improved YOLOv7 is characterized by comprising the following steps of:

The input layer of the Aatten module is used for inputting data P1Q 1 2 ^m Conversion to data 32 x 2 ⁸ P1 x Q1 is the image size in the data, m is any positive integer; the first CBS network is configured to output data 32×32×2 from the input layer ⁸ The data is generated by 32 x 2 after the convolution output of the dimension reduction ⁷ Output data of the first CBS network is 32 x 2 ⁷ The multi-dimensional linear transformation is sequentially carried out through the first MP network, the second MP network and the third MP network, and a fifth Concat splicing network is used for outputting data of the first CBS network and outputting data of the first MP networkPerforming dimension splicing according to the output data of the second MP network and the output data of the third MP network to obtain data 32 x 2 ⁹ The method comprises the steps of carrying out a first treatment on the surface of the The Sigmoid activation layer is used for outputting data 32 x 2 to the fifth Concat splicing network ⁹ Activating to obtain the weight of each dimension; the input of the Split network is output data 32 x 2 of the fifth Concat splicing network ⁹ And the weighted convolution data of the Sigmoid activation layer output is 32 x 2 ⁹ The Split dividing network performs dimension division on the input convolution data to obtain 4 pieces of divided data 32 x 2 ⁷ The input of the second CBS network is 4 pieces of Split data 32 x 2 output by the Split network ⁷ Data 32×32×2 after element level addition ⁷ The second CBS network convolves the input data and outputs the data 32 x 2 ⁸ The method comprises the steps of carrying out a first treatment on the surface of the The output layer outputs 32 x 2 of the second CBS network ⁸ And data output from input layer 32 x 2 ⁸ Element level summation is carried out and then the element level summation is converted into data P1Q 1 ^m And outputting the result.

2. The improved YOLOv 7-based gap detection method of claim 1, wherein the YOLOv7 optimization model in S1 further comprises optimization of a third sub-module of the backbone network, and the optimized third sub-module is added with a Myswin module at an output end;

the input layer inputs data P2.q2.2 ^m Conversion to data 32 x 2 ⁸ P2Q 2 is the image size in the data; the Swin-T network outputs 32 x 2 data from the input layer ⁸ Conversion to data 32 x 2 ⁷ The third CBS network outputs the data 32×32×2 from the input layer ⁸ Conversion to data 32 x 2 ⁷ Output data of the sixth Concat splicing network to the Swin-T network is 32 x 2 ⁷ And output data of the third CBS network32*32*2 ⁷ Performing dimension concatenation to obtain data of 32×32×2 ⁸ The method comprises the steps of carrying out a first treatment on the surface of the Output data of the fourth CBS network to the sixth Concat splicing network is 32 x 2 ⁸ Performing convolution processing, and outputting data 32×32×2 by the fourth CBS network ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The output layer of the Myswin module outputs 32×32×2 output data of the fourth CBS network ⁷ Data P2Q 2 of the same data structure as the input data of Myswin module ^m 。

3. The improved YOLOv 7-based gap detection method of claim 2, wherein the YOLOv7 optimization model in S1 further comprises optimization of head networks, the optimized head networks adding a FEEM module at the input end of each output network;

the input of the BN standardization layer is output data 32 x 2 of the first branch ⁷ Output data of the second branch 32 x 2 ⁷ And output data of the third branch 32 x 2 ⁷ Performing element levelData after addition 32×32×2 ⁷ The BN standardization layer carries out standardization treatment on input data and then outputs data 32 x 2 ⁷ The method comprises the steps of carrying out a first treatment on the surface of the Data output by the fourth Silu activation layer to the BN normalization layer is 32 x 2 ⁷ Output data 32 x 2 after activation ⁷ The method comprises the steps of carrying out a first treatment on the surface of the The output layer outputs the data 32 x 2 of the fourth Silu activation layer ⁷ Data P3Q 3 x 2 converted into the same data structure as the input data of the FEEM module ^m 。

4. The improved YOLOv 7-based gap detection method of claim 1, wherein the labeling sample is a road gap image labeled with a road gap class, and the recognition result output by the road gap detection model is the road gap class, and the road gap class comprises a transverse gap, a longitudinal gap and a fatigue gap.

5. The improved YOLOv 7-based gap detection method of claim 1, wherein S2 comprises the substeps of:

6. The improved YOLOv 7-based gap detection method of claim 5, wherein the iteration termination condition in S23 is: the iteration number of the YOLOv7 optimization model reaches a set value; alternatively, the loss function of the YOLOv7 optimization model reaches the set point.

7. A modified YOLOv 7-based gap detection system for performing the modified YOLOv 7-based gap detection method of any one of claims 1-6; characterized in that the system comprises: the system comprises an image acquisition module, a gap identification module and an early warning module; the gap recognition module is respectively connected with the image acquisition module and the early warning module;

8. The improved YOLOv 7-based gap detection system of claim 7, wherein the early warning action comprises: recording the identification result and sending the identification result to at least one of the set contacts.

9. A slit detection system based on improved YOLOv7, comprising a memory and a processor, the memory having stored therein a computer program, the processor being connected to the memory, the processor being adapted to execute the computer program to implement the slit detection method based on improved YOLOv7 as claimed in any one of claims 1-6.

10. A storage medium, characterized in that a computer program is stored, which computer program, when being executed, is adapted to carry out the improved YOLOv 7-based gap detection method according to any one of claims 1-6.