WO2023216721A1 - 一种混凝土大坝缺陷时序图像智能识别方法 - Google Patents

一种混凝土大坝缺陷时序图像智能识别方法 Download PDF

Info

Publication number
WO2023216721A1
WO2023216721A1 PCT/CN2023/082484 CN2023082484W WO2023216721A1 WO 2023216721 A1 WO2023216721 A1 WO 2023216721A1 CN 2023082484 W CN2023082484 W CN 2023082484W WO 2023216721 A1 WO2023216721 A1 WO 2023216721A1
Authority
WO
WIPO (PCT)
Prior art keywords
defect
defects
frame
image
sequence
Prior art date
Application number
PCT/CN2023/082484
Other languages
English (en)
French (fr)
Inventor
马洪琪
周华
毛莺池
迟福东
周晓峰
曹学兴
戚荣志
陈豪
谭彬
聂兵兵
Original Assignee
河海大学
华能澜沧江水电股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 河海大学, 华能澜沧江水电股份有限公司 filed Critical 河海大学
Priority to US18/322,605 priority Critical patent/US20230368371A1/en
Publication of WO2023216721A1 publication Critical patent/WO2023216721A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/344Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30132Masonry; Concrete
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Definitions

  • the invention belongs to the technical field of time-series image recognition of concrete dam defects, and in particular relates to a defect recognition method based on time-series difference and self-attention mechanism.
  • defects In the field of construction engineering, inspection items or inspection points that do not meet the construction quality requirements are defined as defects. With the long-term operation of hydropower station dams, factors such as material aging and environmental impact have led to the formation of defects to varying degrees. When the degree of defects is minor, corresponding measures can be taken to deal with them in time to meet the load-bearing requirements of the structure. Once the defects are not dealt with and repaired in time, they will pose a major threat to the safe operation of the dam. Therefore, the use of automatic inspection equipment to detect and eliminate defects in a timely manner can effectively maintain the structural safety of the dam.
  • the data collected through equipment such as drones and mobile cameras are all composed of videos.
  • the video In the process of acquisition and transmission, the video is compressed and encoded to save costs, which makes the model unable to directly process the video data.
  • the video needs to be converted into a time-dimensional image sequence, and the defects in it are located by extracting the time-series image feature information and carried out Identify.
  • the actual collected time series images often contain a large number of background frames that have nothing to do with defects, making it difficult to directly identify the entire image sequence.
  • the model needs to be able to pay attention to the contextual feature relationship of the image sequence to ensure the integrity of defect extraction and recognition accuracy.
  • the present invention provides an intelligent identification method of time-series images of concrete dam defects.
  • An intelligent recognition method for time-series images of concrete dam defects using a dual-stream network to extract a feature sequence of time-series images containing dam defects, and adding a time-dimensional self-attention mechanism to obtain global feature relationships; during the training process of the model, Use the objective function based on the distance intersection ratio to match the located defects and real defects, and calculate the temporal position relationship of the defects to accelerate model convergence; add a loss term based on the close perception intersection ratio to the model loss function to focus on improving the integrity of the defect sequence Accuracy; after completing the defect location, a convolutional neural network based on 2D temporal difference is used to extract defect features and identify defect types. Specifically, it includes the following steps:
  • a convolutional network based on 2D temporal difference is used for frame sampling, and the visual and displacement information of the defect image frame is extracted to identify the defect type.
  • any frame ⁇ is the optical flow formed by stacking the RGB images of frame t n and frame t n +1, which is processed by the temporal flow convolution network, Represents the horizontal or vertical displacement vector at point (u, v) in frame t n +1. and Respectively represent the horizontal and vertical displacement vectors of the t n +1th frame at point (u, v), which can be regarded as the two input channels of the convolutional neural network.
  • the optical flows of L consecutive frames are superimposed together to form 2L input channels.
  • the input of any frame ⁇ is composed of the following formula:
  • w and h are the width and height of the input image.
  • the time series image feature sequence extracted by the dual-stream network is recorded as Use three layers of convolution to form a boundary evaluation network to calculate the probability that each frame is the beginning and end frame of the defect sequence. and Multiply and combine the input features of the time series image and the predicted probability of the start and end of the defect corresponding to each time series position to obtain the feature sequence:
  • W m and W′ m are attention matrices with learnable weights. They are both weights learned through the network. Their functions and dimensions are the same, but their weights are different.
  • a mqk is the multi-head self-attention weight, and ⁇ k is the image. The dimension of the sequence is used to obtain a sequence of defect image features containing attention weights;
  • the image sequence containing the attention-weighted defect image feature sequence is taken as input, and the positions of the starting and ending frames are predicted and output.
  • an objective function based on the distance intersection and union ratio is used to match the location defects and real defects, the model convergence speed is accelerated by calculating the defect position relationship, and a loss based on the close perception intersection and union ratio is added to the loss function. item, improve defect location accuracy by paying attention to the integrity of the defect sequence.
  • the error between the model positioning interval and the real defect interval (referred to as the interval error) is calculated as the loss value optimization model.
  • the optimal matching is calculated by maximizing the objective function.
  • the objective function is as follows:
  • l 1 is the L1 objective function of the strict matching boundary
  • DIoU is the distance intersection and union ratio
  • IoU is the intersection ratio of two defects
  • b and b t represent the center point coordinates of the positioning defect and the real defect interval respectively
  • represents the calculation of the distance between two points (the center points of the two intervals)
  • c means that the positioning can be covered at the same time
  • L bou is the boundary loss, which is used to measure the deviation between the start and end frames of the defect interval and the real defect interval:
  • L pre is the interval loss, which uses the close-aware intersection and union ratio to measure the accuracy and completeness of the defect interval predicted by the model:
  • IOU is the intersection and ratio of the two intervals.
  • the convolutional network based on 2D temporal difference performs frame sampling and extracts the visual and displacement information of the defect image frame to identify the defect type.
  • the specific steps are as follows:
  • the motion information represented by the sampling frame is in features , the sampling frame F t contributes visual image information, and the feature stack H (xt) contributes local motion information, which is obtained by stacking the features of each frame before and after the sampling frame extracted by the average pooling layer.
  • a computer device which includes a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the above computer program, it implements the above-mentioned method for intelligent recognition of concrete dam defect time-series images. .
  • a computer-readable storage medium stores a computer program for executing the above-mentioned intelligent identification method of concrete dam defect time-series images.
  • the present invention has the following advantages:
  • Figure 1 is a schematic diagram of time series images of dam defects in a specific embodiment
  • Figure 2 is an overall framework diagram of the dam defect time series image recognition method in a specific embodiment
  • Figure 3 is a schematic diagram of the dual-stream network framework in a specific embodiment
  • Figure 4 is a schematic diagram of the 2D temporal differential convolution network framework in a specific embodiment.
  • Figure 2 shows the overall framework of the defect identification method for dam defect time-series images.
  • the main workflow of the defect identification method for dam defect time-series images is as follows:
  • w and h are the width and height of the input image
  • the time series image feature sequence extracted by the dual-stream network is recorded as Use three layers of convolution to form a boundary evaluation network to calculate the probability that each frame is the beginning and end frame of the defect sequence.
  • the convolutional layer is recorded as Con(c f , c k , f), and the parameters c f , c k and f are the number of convolution kernels, the number of channels and the activation function respectively.
  • the structure of the above boundary evaluation network can be simply summarized as Conv(512,3,Relu) ⁇ Conv(512,3,Relu) ⁇ Conv(3,1,sigmoid), the step size of these three layers of convolution is the same as 1.
  • W m is the attention matrix with learnable weights
  • a mqk is the multi-head self-attention weight.
  • the network contains 8 self-attention heads and a 2048-dimensional feedforward neural network.
  • the discard ratio is set to 0.1 and ReLU is used as the activation function to obtain a defect image feature sequence containing attention weights;
  • the interval error is calculated as the loss value optimization model.
  • the optimal matching is calculated by maximizing the objective function.
  • the objective function is as follows:
  • l 1 is the L1 objective function of the strict matching boundary
  • DIoU is the distance intersection and union ratio
  • b and b t represent the center point coordinates of the positioning defect interval and the real defect interval respectively, while ⁇ represents the calculated distance between the two points, and c is the length of the minimum time interval that can cover the two intervals at the same time.
  • L bou is the boundary loss, which is used to measure the deviation between the start and end frames of the defect interval and the real defect interval:
  • t s and t e are the positions of the start and end frames of the interval containing defects.
  • L pre is the interval loss, which uses the close-aware intersection and union ratio to measure the accuracy and completeness of the defect interval predicted by the model:
  • the motion information represented by this frame is in features , the sampling frame F t contributes visual image information, and the feature stack H (xt) contributes local motion information. It is obtained by stacking the motion information of n frames before and after the sampling frame and extracting the features of each frame before and after the sampling frame by the average pooling layer.
  • the network The structure is shown in Figure 4.
  • the above-mentioned steps of the defect identification method for dam defect time-series images can be implemented using a general-purpose computing device, and they can be centralized on a single computing device or distributed.
  • they can be implemented with program codes executable by the computing devices, so that they can be stored in a storage device and executed by the computing devices, and in some cases , the steps shown or described may be performed in a different order than here, or may be implemented as separate integrated circuit modules, or multiple modules or steps may be implemented as a single integrated circuit module.
  • embodiments of the present invention are not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开一种混凝土大坝缺陷时序图像智能识别方法,使用双流网络提取包含混凝土大坝缺陷的时序图像的特征序列,并添加时间维度的自注意力机制获取全局上下文特征关系;在模型的训练过程中,使用基于距离交并比的目标函数匹配定位缺陷和真实缺陷,计算缺陷的时序位置关系加速模型收敛;在模型损失函数中添加基于紧密感知交并比的损失项,以关注缺陷序列的完整性提高准确率;在完成缺陷定位后,采用基于2D时序差分的卷积神经网络提取缺陷特征并识别缺陷类型。本发明对混凝土大坝缺陷时序图像进行了有效检测,不仅能够定位长图像序列中的缺陷位置,还能够准确识别缺陷类型。在大坝缺陷时序图像的识别任务中具有较高的识别精度与较好的收敛性能。

Description

一种混凝土大坝缺陷时序图像智能识别方法 技术领域
本发明属于混凝土大坝缺陷时序图像识别技术领域,特别涉及一种基于时序差分和自注意力机制的缺陷识别方法。
背景技术
在建筑工程领域,不符合工程施工质量要求规定的检验项目或检验点被定义为缺陷。随着水电站大坝的长期运行,材料老化、环境影响等因素都不同程度地导致了缺陷的形成。当缺陷程度较轻时,可采取相应措施及时处理,以满足结构的承载要求。一旦缺陷没有得到及时处理和修复,将对大坝的安全运行构成重大威胁。因此,利用自动巡检设备及时检测和排除缺陷,可以有效地维护大坝的结构安全。
在大坝水电站实际巡检场景中,通过无人机、移动摄像头等设备采集到的数据均由视频构成。在获取和传输的过程中,为节约成本会对视频进行压缩编码,从而导致模型无法直接处理视频数据,需要将视频转换为时间维度的图像序列,通过提取时序图像特征信息定位其中的缺陷并进行识别。实际采集到的时序图像,往往包含了大量与缺陷无关的背景帧,直接对整段图像序列识别难度较大。
发明内容
发明目的:经上述现有技术的分析可知,在时序图像缺陷识别任务中,模型需要能够关注图像序列的上下文特征关系以保证缺陷提取的完整性和识别准确率。为了对自动化设备采集到的大坝缺陷时序图像快速识别,本发明提供一种混凝土大坝缺陷时序图像智能识别方法。
技术方案:一种混凝土大坝缺陷时序图像智能识别方法,选用双流网络提取包含大坝缺陷的时序图像特征序列,并添加时间维度的自注意力机制获取全局特征关系;在模型的训练过程中,使用基于距离交并比的目标函数匹配定位缺陷和真实缺陷,计算缺陷的时序位置关系加速模型收敛;在模型损失函数中添加基于紧密感知交并比的损失项,以关注缺陷序列的完整性提高准确率;在完成缺陷定位后,采用基于2D时序差分的卷积神经网络提取缺陷特征并识别缺陷类型。具体包括如下步骤:
(1)针对包含大坝缺陷的时序图像特点设计缺陷定位模型,该模型采用双流网络和Transformer网络进行时序特征提取,使用双流网络提取图像特征,通过Transformer网络给图像帧添加时间维度的自注意力机制,获取全局特征关系以定位缺陷;
(2)在缺陷定位模型训练过程中,使用基于距离交并比的目标函数匹配定位缺陷和真实缺陷,通过计算缺陷位置关系加快模型收敛速度,并在损失函数中添加基于紧密感知交并比的损失项,通过关注缺陷序列的完整性提高缺陷定位准确率;
(3)定位缺陷序列后,采用基于2D时序差分的卷积网络进行帧采样,提取缺陷图像帧的视觉和位移信息以识别缺陷类型。
所述采用双流网络和Transformer网络进行时序特征提取的具体步骤如下:
(1.1)输入原始时序图像,记作该序列包含l个图像帧,其中xn表示该序列X的第n帧。
(1.2)将原始时序图像转换为作为双流网络的输入,其中为原始时序图像X的第tn帧RGB图像,由空间流卷积网络处理;
为第tn帧和tn+1帧RGB图像堆叠成的光流,由时间流卷积网络处理,表示第tn+1帧在点(u,v)上的水平或垂直位移矢量。分别表示第tn+1帧在点(u,v)上的水平和垂直位移矢量,可以看作卷积神经网络的两个输入通道。为了表示一系列时序图像的运动,将L个连续帧的光流叠加在一起,形成2L个输入通道,任意帧τ的输入由如下公式组成:

其中公式中w和h为输入图像的宽度和高度。
(1.3)将双流网络提取的时序图像特征序列记作使用三层卷积组成边界评估网络,计算每一帧作为缺陷序列开始和结束帧的概率并将时序图像的输入特征和每个时序位置对应的缺陷开始和结束的预测概率相乘并组合,得到特征序列:
(1.4)求得图像对应的特征序列后,为每一帧添加位置编码标记时序位置,并使用Transformer网络计算每一帧的全局自注意力权重:
其中Wm和W′m为权重可学习的注意力矩阵,都是通过网络学习的权重,功能和维度大小一致,其权值是不同的,Amqk为多头自注意力权重,Ωk为图像序列的维度,得到包含注意力权重的缺陷图像特征序列;
(1.5)采用多层感知机,将包含注意力权重的缺陷图像特征序列的图像序列作为输入,预测并输出起开始和结束帧的位置。
所述在缺陷定位模型训练过程中,使用基于距离交并比的目标函数匹配定位缺陷和真实缺陷,通过计算缺陷位置关系加快模型收敛速度,并在损失函数中添加基于紧密感知交并比的损失项,通过关注缺陷序列的完整性提高缺陷定位准确率,具体步骤如下:
(2.1)在模型的训练过程中,首先需要将定位缺陷与真实缺陷两两匹配,计算模型定位的区间和真实缺陷区间之间的误差(简称区间误差)作为损失值优化模型。在匹配过程中通过最大化目标函数计算最优匹配,目标函数如下:
其中l1为严格匹配边界的L1目标函数,DIoU为距离交并比;
(2.2)为了加快模型训练速度,保证模型在定位的缺陷和真实缺陷不重叠时也能收敛,将基于距离交并比的目标函数定义为:
其中IoU为两缺陷交并比,b和bt分别代表定位缺陷和真实缺陷区间的中心点坐标,而ρ则表示计算两点(两个区间中心点)间的距离,c为能同时覆盖定位缺陷和真实缺陷区间的最小时间区间的长度。
(2.3)最后缺陷定位任务的损失函数定义为:
其中Lbou为边界损失,用以衡量缺陷区间的起止帧与真实缺陷区间的偏差:
其中ts和te为含有缺陷的区间开始和结束帧的位置,和表示预测出的缺陷区间的起止位置。Lpre为区间损失,使用紧密感知交并比衡量模型预测的缺陷区间准确性和完整性:
为定位的区间和真实的区间,IOU为两区间交并比。
所述基于2D时序差分的卷积网络进行帧采样,提取缺陷图像帧的视觉和位移信息以识别缺陷类型,具体步骤如下:
(3.1)将包含缺陷的图像序列分割为没有重叠的T个等长片段,从每个片段中随机抽取一帧Xt组成集合为X=[x′1,x′2,...,x′T]以增加训练的多样性,使得基于2D时序差分的卷积网络能够学习同一缺陷的不同实例变化。X中的采样帧均由2D卷积神经网络提取特征,得到特征集合F=[F1,F2,...,FT];
(3.2)采样帧所表示的运动信息为在特征中,采样帧Ft贡献视觉图像信息,特征堆叠H(xt)贡献局部运动信息,由平均池化层提取采样帧前后各帧的特征后堆叠得到。
(3.3)使用多层感知机和softmax函数对采样的特征图像序列解码得到缺陷类别。
一种计算机设备,该计算机设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行上述计算机程序时实现如上所述的混凝土大坝缺陷时序图像智能识别方法。
一种计算机可读存储介质,该计算机可读存储介质存储有执行如上所述的混凝土大坝缺陷时序图像智能识别方法的计算机程序。
有益效果:本发明与现有技术相比具有以下优点:
(1)使用Transformer网络给图像序列添加时间维度的自注意力机制,使得模型能够关注全局特征关系,提高缺陷定位准确率。
(2)通过在模型训练过程中添加基于距离交并比的目标函数和基于紧密感知交并比的损失项,关注缺陷的位置信息和完整性以加快模型收敛速度,提高定位准确率。
(3)在识别缺陷的过程中,采用基于2D时序差分的卷积网络,使得采样帧能够同时包含缺陷的图像特征和位移信息,在节省计算资源的情况下提升了识别精确度。
附图说明
图1为具体实施例中大坝缺陷时序图像示意图;
图2为具体实施例中大坝缺陷时序图像识别方法总体框架图;
图3为具体实施例中双流网络框架示意图;
图4为具体实施例中2D时序差分卷积网络框架示意图。
具体实施方式
下面结合具体实施例,进一步阐明本发明,应理解这些实施例仅用于说明本发明而不用于限制本发明的范围,在阅读了本发明之后,本领域技术人员对本发明的各种等价形式的修改均落于本申请所附权利要求所限定的范围。
已知有某电站大坝工程巡检缺陷时序图像,每段图像中可能包含4类缺陷,分别为裂缝、碱性物析出、渗水、混凝土剥落,如图1所示。
图2给出了面向大坝缺陷时序图像的缺陷识别方法的总体框架,面向大坝缺陷时序图像的缺陷识别方法,主要工作流程实施如下:
(1)针对大坝缺陷时序图像序列长、且包含大量与缺陷无关的背景帧的问题,设计缺陷定位模型,使用双流网络提取时序图像特征序列,并添加时间维度的自注意力机制获取全局特征关系以定位缺陷,如图3所示。
(1.1)输入原始时序图像,记作该序列包含l个图像帧,其中xn表示该序列X的第n帧;
(1.2)将原始图像序列转换为作为双流网络的输入,其中为图像序列X的第tn帧RGB图像,由空间流卷积网络处理;为第tn帧和tn+1帧堆叠成的光流,由时间流卷积网络处理。分别表示第tn+1帧在点(u,v)上的水平和垂直位移矢量,可以看作卷积神经网络的两个输入通道。为了表示一系列时序图像的运动,将L个连续帧的光流叠加在一起,形成2L个输入通道,任意帧τ的输入由如下公式组成:

其中公式中w和h为输入图像的宽度和高度;
(1.3)将双流网络提取的时序图像特征序列记作使用三层卷积组成边界评估网络,计算每一帧作为缺陷序列开始和结束帧的概率卷积层被记作Con(cf,ck,f),参数cf,ck和f分别为卷积核数、通道数和激活函数,则上述边界评估网络的结构可以简单概括表示为Conv(512,3,Relu)→Conv(512,3,Relu)→Conv(3,1,sigmoid),这三层卷积的步长相同均为1。最后将时序图像的输入特征和每个时序位置对应的缺陷开始和结束的预测概率相乘并组合,得到特征序列:
(1.4)为每一帧添加位置编码标记时序位置,并使用Transformer网络计算每一帧的全局自注意力权重:
其中Wm为权重可学习的注意力矩阵,Amqk为多头自注意力权重。该网络包含8个自注意力头和2048维的前馈神经网络,丢弃比例设置为0.1并使用ReLU作为激活函数,得到包含注意力权重的缺陷图像特征序列;
(1.5)将特征序列作为3层512维多层感知机的输入,预测并输出起开始和结束帧的位置。
(2)在缺陷定位模型的训练阶段,针对模型无法关注缺陷位置和完整性的问题,使用基于距离交并比的目标函数匹配定位缺陷和真实缺陷,计算缺陷的时序位置关系加速模型收敛,并在损失函数中添加基于紧密感知交并比的损失项提高模型准确率。
(2.1)在模型的训练过程中,首先需要将定位缺陷与真实缺陷两两匹配,计算区间误差作为损失值优化模型。在匹配过程中通过最大化目标函数计算最优匹配,目标函数如下:
其中l1为严格匹配边界的L1目标函数,DIoU为距离交并比。
(2.2)为了加快模型训练速度,保证模型在定位的缺陷和真实缺陷不重叠时也能收敛,将基于距离交并比的目标函数定义为:
其中b和bt分别代表定位缺陷区间和真实缺陷区间的中心点坐标,而ρ则表示计算两点间的距离,c为能同时覆盖两区间的最小时间区间的长度。
(2.3)最后缺陷定位任务的损失函数定义为:
其中Lbou为边界损失,用以衡量缺陷区间的起止帧与真实缺陷区间的偏差:
其中ts和te为含有缺陷的区间开始和结束帧的位置。Lpre为区间损失,使用紧密感知交并比衡量模型预测的缺陷区间准确性和完整性:
(3)定位缺陷序列后,采用基于2D时序差分的卷积网络进行帧采样,提取缺陷图像帧的视觉和位移信息以识别缺陷类型,在加快模型识别速度的同时保证识别准确率,如图4所示。
(3.1)将包含缺陷的图像序列割为没有重叠的T个等长片段,从每个片段中以1/32的采样帧率随机抽取一帧Xt组成集合为Xt=[x′1,x′2,...,x′T]以增加训练的多样性,使得网络能够学习同一缺陷的不同实例变化。所有采样帧均由以Resnet50作为骨干网络的2D卷积神经网络提取特征,得到特征集合F=[F1,F2,...,FT]。
(3.2)该帧所表示的运动信息为在特征中,采样帧Ft贡献视觉图像信息,特征堆叠H(xt)贡献局部运动信息,由采样帧前后n帧的运动信息,由平均池化层提取采样帧前后各帧的特征后堆叠得到,网络结构如图4所示。
(3.2)使用3层512维多层感知机和softmax函数对采样的特征图像序列解码得到缺陷类别。
显然,本领域的技术人员应该明白,上述的本发明实施例的面向大坝缺陷时序图像的缺陷识别方法各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明实施例不限制于任何特定的硬件和软件结合。

Claims (8)

  1. 一种混凝土大坝缺陷时序图像智能识别方法,其特征在于,包括如下步骤:
    (1)针对包含大坝缺陷的时序图像特点设计缺陷定位模型,缺陷定位模型采用双流网络和Transformer网络进行时序特征提取,使用双流网络提取图像特征,通过Transformer网络给图像帧添加时间维度的自注意力机制,获取全局特征关系以定位缺陷;
    (2)在缺陷定位模型训练过程中,使用基于距离交并比的目标函数匹配定位缺陷和真实缺陷,通过计算缺陷位置关系加快模型收敛速度,并在损失函数中添加基于紧密感知交并比的损失项,通过关注缺陷序列的完整性提高缺陷定位准确率;
    (3)定位缺陷序列后,采用基于2D时序差分的卷积网络进行帧采样,提取缺陷图像帧的视觉和位移信息以识别缺陷类型。
  2. 根据权利要求1所述的混凝土大坝缺陷时序图像智能识别方法,其特征在于,
    所述采用双流网络和Transformer网络进行时序特征提取的具体步骤如下:
    (1.1)输入原始时序图像,记作该序列包含l个图像帧,其中xn表示该序列X的第n帧;
    (1.2)将原始时序图像转换为作为双流网络的输入,其中为原始时序图像X的第tn帧RGB图像,由空间流卷积网络处理; 为第tn帧和tn+1帧RGB图像堆叠成的光流,由时间流卷积网络处理;分别表示第tn+1帧在点(u,v)上的水平和垂直位移矢量,看作卷积神经网络的两个输入通道;
    (1.3)将双流网络提取的时序图像特征序列记作使用三层卷积组成边界评估网络,计算每一帧作为缺陷序列开始和结束帧的概率并将时序图像的输入特征和每个时序位置对应的缺陷开始和结束的预测概率相乘并组合;
    (1.4)为每一帧添加位置编码标记时序位置,并使用Transformer网络计算每一帧的全局自注意力权重;
    (1.5)采用多层感知机预测包含缺陷的图像序列,输出起开始和结束帧的位置。
  3. 根据权利要求1所述的混凝土大坝缺陷时序图像智能识别方法,其特征在于,
    所述在缺陷定位模型训练过程中,使用基于距离交并比的目标函数匹配定位缺陷和真实缺陷,通过计算缺陷位置关系加快模型收敛速度,并在损失函数中添加基于紧密感知交并比的损失项,通过关注缺陷序列的完整性提高缺陷定位准确率,具体步骤如下:
    (2.1)在模型的训练过程中,首先需要将定位缺陷与真实缺陷两两匹配,计算区间误差作为损失值优化模型;在匹配过程中通过最大化目标函数计算最优匹配,目标函数如下:
    其中l1为严格匹配边界的L1目标函数,DIoU为距离交并比;
    (2.2)将基于距离交并比的目标函数定义为:
    其中IoU为两缺陷交并比,b和bt分别代表定位缺陷和真实缺陷区间的中心点坐标,而ρ则表示计算两点间的距离,c为能同时覆盖定位缺陷和真实缺陷区间的最小时间区间的长度;
    (2.3)最后缺陷定位任务的损失函数定义为:
    其中Lbou为边界损失,用以衡量缺陷区间的起止帧与真实缺陷区间的偏差;Lpre为区间损失,用以衡量模型预测的缺陷区间准确性和完整性。
  4. 根据权利要求1所述的混凝土大坝缺陷时序图像智能识别方法,其特征在于,
    所述基于2D时序差分的卷积网络进行帧采样,提取缺陷图像帧的视觉和位移信息以识别缺陷类型,具体步骤如下:
    (3.1)将提取到的缺陷序列分割为没有重叠的若干个等长片段,从每个片段中随机抽取一帧组成采样帧的集合;
    (3.2)以每一个采样帧为中心,抽取前后若干帧进行堆叠操作,通过残差连接与当前帧融合,捕捉短期位移特征,使单个采样帧能够感知局部变化;
    (3.3)使用多层感知机和softmax函数对采样的特征图像序列解码得到缺陷类别。
  5. 根据权利要求4所述的混凝土大坝缺陷时序图像智能识别方法,其特征在于,
    所有采样帧均由2D卷积神经网络提取特征,得到特征集合F=[F1,F2,...,FT];采样帧所表示的运动信息为特征中,采样帧Ft贡献视觉图像信息,特征堆叠H(xt)贡献局部运动信息。
  6. 根据权利要求2所述的混凝土大坝缺陷时序图像智能识别方法,其特征在于,
    为了表示一系列时序图像的运动,将L个连续帧的光流叠加在一起,形成2L个输入通道,任意帧τ的输入由如下公式组成:

    其中公式中w和h为输入图像的宽度和高度。
  7. 一种计算机设备,其特征在于:
    该计算机设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行上述计算机程序时实现如权利要求1-6中任一项所述的混凝土大坝缺陷时序图像智能识别方法。
  8. 一种计算机可读存储介质,其特征在于:
    该计算机可读存储介质存储有执行如权利要求1-6中任一项所述的混凝土大坝缺陷时序图像智能识别方法的计算机程序。
PCT/CN2023/082484 2022-05-11 2023-03-20 一种混凝土大坝缺陷时序图像智能识别方法 WO2023216721A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/322,605 US20230368371A1 (en) 2022-05-11 2023-05-24 Intelligent recognition method for time sequence image of concrete dam defect

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210515193.9A CN114913150B (zh) 2022-05-11 2022-05-11 一种混凝土大坝缺陷时序图像智能识别方法
CN202210515193.9 2022-05-11

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/322,605 Continuation US20230368371A1 (en) 2022-05-11 2023-05-24 Intelligent recognition method for time sequence image of concrete dam defect

Publications (1)

Publication Number Publication Date
WO2023216721A1 true WO2023216721A1 (zh) 2023-11-16

Family

ID=82766049

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/082484 WO2023216721A1 (zh) 2022-05-11 2023-03-20 一种混凝土大坝缺陷时序图像智能识别方法

Country Status (2)

Country Link
CN (1) CN114913150B (zh)
WO (1) WO2023216721A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117250208A (zh) * 2023-11-20 2023-12-19 青岛天仁微纳科技有限责任公司 基于机器视觉的纳米压印晶圆缺陷精准检测***及方法
CN117544544A (zh) * 2023-12-13 2024-02-09 广州思林杰科技股份有限公司 一种多端口poe测试设备
CN117910517A (zh) * 2024-01-25 2024-04-19 河海大学 一种基于物理信息神经网络的堤坝空鼓隐患识别方法及***

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913150B (zh) * 2022-05-11 2023-08-22 河海大学 一种混凝土大坝缺陷时序图像智能识别方法
CN115457006B (zh) * 2022-09-23 2023-08-22 华能澜沧江水电股份有限公司 基于相似一致性自蒸馏的无人机巡检缺陷分类方法及装置
CN115994891B (zh) * 2022-11-22 2023-06-30 河海大学 基于狼群算法的无人载具混凝土坝表面缺陷动态检测方法
CN116385794B (zh) * 2023-04-11 2024-04-05 河海大学 基于注意力流转移互蒸馏的机器人巡检缺陷分类方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921201A (zh) * 2018-06-12 2018-11-30 河海大学 基于特征组合与cnn的大坝缺陷识别与分类方法
US20200272823A1 (en) * 2017-11-14 2020-08-27 Google Llc Weakly-Supervised Action Localization by Sparse Temporal Pooling Network
CN112926396A (zh) * 2021-01-28 2021-06-08 杭州电子科技大学 一种基于双流卷积注意力的动作识别方法
CN113239822A (zh) * 2020-12-28 2021-08-10 武汉纺织大学 基于时空双流卷积神经网络的危险行为检测方法及***
CN113283298A (zh) * 2021-04-26 2021-08-20 西安交通大学 基于时间注意力机制和双流网络的实时行为识别方法
CN114913150A (zh) * 2022-05-11 2022-08-16 河海大学 一种混凝土大坝缺陷时序图像智能识别方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989933B (zh) * 2021-10-29 2024-04-16 国网江苏省电力有限公司苏州供电分公司 一种在线行为识别模型训练、检测方法及***

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200272823A1 (en) * 2017-11-14 2020-08-27 Google Llc Weakly-Supervised Action Localization by Sparse Temporal Pooling Network
CN108921201A (zh) * 2018-06-12 2018-11-30 河海大学 基于特征组合与cnn的大坝缺陷识别与分类方法
CN113239822A (zh) * 2020-12-28 2021-08-10 武汉纺织大学 基于时空双流卷积神经网络的危险行为检测方法及***
CN112926396A (zh) * 2021-01-28 2021-06-08 杭州电子科技大学 一种基于双流卷积注意力的动作识别方法
CN113283298A (zh) * 2021-04-26 2021-08-20 西安交通大学 基于时间注意力机制和双流网络的实时行为识别方法
CN114913150A (zh) * 2022-05-11 2022-08-16 河海大学 一种混凝土大坝缺陷时序图像智能识别方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117250208A (zh) * 2023-11-20 2023-12-19 青岛天仁微纳科技有限责任公司 基于机器视觉的纳米压印晶圆缺陷精准检测***及方法
CN117250208B (zh) * 2023-11-20 2024-02-06 青岛天仁微纳科技有限责任公司 基于机器视觉的纳米压印晶圆缺陷精准检测***及方法
CN117544544A (zh) * 2023-12-13 2024-02-09 广州思林杰科技股份有限公司 一种多端口poe测试设备
CN117910517A (zh) * 2024-01-25 2024-04-19 河海大学 一种基于物理信息神经网络的堤坝空鼓隐患识别方法及***

Also Published As

Publication number Publication date
CN114913150A (zh) 2022-08-16
CN114913150B (zh) 2023-08-22

Similar Documents

Publication Publication Date Title
WO2023216721A1 (zh) 一种混凝土大坝缺陷时序图像智能识别方法
CN114998673B (zh) 一种基于本地自注意力机制的大坝缺陷时序图像描述方法
US20230368371A1 (en) Intelligent recognition method for time sequence image of concrete dam defect
CN111431986A (zh) 基于5g和ai云边协同的工业智能质检***
CN108830185B (zh) 基于多任务联合学习的行为识别及定位方法
CN110633738B (zh) 一种用于工业零件图像的快速分类方法
CN115223009A (zh) 基于改进型YOLOv5的小目标检测方法及装置
CN112614130A (zh) 基于5G传输和YOLOv3的无人机输电线路绝缘子故障检测方法
CN113902792A (zh) 基于改进RetinaNet网络的建筑物高度检测方法、***和电子设备
CN115830407A (zh) 一种基于yolov4目标检测模型的电缆管线故障判别算法
CN117788402A (zh) 一种基于LIDD-Net高实时轻量化网络的工业产品缺陷检测方法
CN116402769A (zh) 一种兼顾大小目标的高精度纺织品瑕疵智能检测方法
CN109657682B (zh) 一种基于深度神经网络和多阈值软切分的电能表示数识别方法
Chourasia et al. Safety helmet detection: a comparative analysis using YOLOv4, YOLOv5, and YOLOv7
CN117237611A (zh) 一种基于改进YOLOv7的钢材表面缺陷检测方法
CN114937153B (zh) 弱纹理环境下基于神经网络的视觉特征处理***及方法
Wen et al. Underwater target detection based on modified YOLOv5
Wen et al. Detecting the surface defects of the magnetic-tile based on improved YOLACT++
CN110751174A (zh) 一种基于多任务级联卷积网络的表盘检测方法和***
Pang et al. An Efficient Network for Obstacle Detection in Rail Transit Based on Multi-Task Learning
Li et al. An Improved YOLO-v4 Algorithm for Recognition and Detection of Underwater Small Targets
CN116229381B (zh) 一种河湖采砂船船脸识别方法
Shanbin et al. Electrical cabinet wiring detection method based on improved yolov5 and pp-ocrv3
Li et al. Building Recognition of Aerial Images Based on Improved Unet Network
Wang et al. Pointer-Type Meter Recognition Algorithm in Complex Substation Scenarios

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802497

Country of ref document: EP

Kind code of ref document: A1