CN113239886A - Method and device for describing underground pipeline leakage based on cross-language image change description - Google Patents

Method and device for describing underground pipeline leakage based on cross-language image change description Download PDF

Info

Publication number
CN113239886A
CN113239886A CN202110626949.2A CN202110626949A CN113239886A CN 113239886 A CN113239886 A CN 113239886A CN 202110626949 A CN202110626949 A CN 202110626949A CN 113239886 A CN113239886 A CN 113239886A
Authority
CN
China
Prior art keywords
image
module
cross
attention
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110626949.2A
Other languages
Chinese (zh)
Other versions
CN113239886B (en
Inventor
胡迪
刘玉洁
罗辉
段章领
卫星
赵冲
赵明
陆阳
李航
帅竞贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Intelligent Manufacturing Institute of Hefei University Technology
Original Assignee
Hefei University of Technology
Intelligent Manufacturing Institute of Hefei University Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology, Intelligent Manufacturing Institute of Hefei University Technology filed Critical Hefei University of Technology
Priority to CN202110626949.2A priority Critical patent/CN113239886B/en
Publication of CN113239886A publication Critical patent/CN113239886A/en
Application granted granted Critical
Publication of CN113239886B publication Critical patent/CN113239886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a cross-language image change description-based underground pipeline leakage description method and a cross-language image change description-based underground pipeline leakage description device, wherein the method comprises the following steps: acquiring an underground pipeline scene image, and preprocessing the image to obtain a training set and a test set; constructing a cross-language image change description model based on a dual dynamic attention mechanism; training a cross-language image change description model based on a dual dynamic attention mechanism on a training set; testing the test set by using a trained cross-language image change description model based on a dual dynamic attention mechanism to obtain an image description result; the invention has the advantages that: the downhole pipeline leakage description is accurate.

Description

Method and device for describing underground pipeline leakage based on cross-language image change description
Technical Field
The invention relates to the field of underground pipeline change description, in particular to an underground pipeline leakage description method and device based on cross-language image change description.
Background
A mine is a generic term for roadways, chambers, equipment, ground buildings and structures that form the production system of an underground coal mine. Most of the important coal mines in China are large and medium-sized mines; most of local national and camp coal mines are medium and small mines. With the rapid development of national economy, the demand of China for energy is increasing. Coal is used as a security pillar energy source in China for a long time in the future, the demand quantity is increased year by year, and the large-scale coal industry is trended. The coal industry is used as a basic energy industry, and the investment scale is increased along with the existing trend of coal mine group enlargement after decades of development, and the average 300-kiloton/a well type investment is about 6-7 million yuan; in the past, extensive exploitation aiming at profit is gradually replaced by large-scale mechanized production, so that various underground safety protection systems are derived immediately, wherein the detection of underground pipeline leakage is also one of the prevention measures.
Pipeline transportation is the fifth largest transportation means following railway, highway, aviation and water transportation. Has unique advantages in the transportation of fluids such as oil, natural gas and the like. However, with the increase of the age of the pipe, the leakage condition of the pipe frequently occurs due to the existence of construction defects, corrosion and artificial damage, and great threat is brought to the lives and properties and living environment of people. The pipeline leakage condition of the underground scene is hidden, the pipeline leakage condition is not easy to find and process in time, a large amount of time and energy of maintenance and inspection personnel are consumed, and the effect is very little.
The method for detecting the leakage of the fluid conveying pipeline is multiple, and the classification is also multiple, and according to related data at home and abroad in the last ten years, the relatively accepted classification method mainly comprises the following steps: hardware and software based methods, classification according to the measurement medium, classification according to the location of the detection device, classification according to the detection object, classification based on signal processing, etc.
The deep learning network model is provided, so that the computer vision field is further developed. The deep learning model is self-adaptive learning from the image and is an end-to-end detection method. With the advent of the big data era, various data sets for training deep learning network models are continuously enriched and perfected, and the development of the deep learning-based computer vision field is also promoted. Wherein change captions have been a great deal of development as a cross-domain of computer vision and natural language processing. The main task in the field is to mark images, the marked images are processed into a group of two images, the two images are compared in a time sequence, descriptive characters which accord with the image content are generated, and the main targets in the images can be identified and the change relation between the targets is also considered. The underground mine pipeline scene is described through the change scenario model, so that patrol personnel can be assisted to monitor the underground pipeline state in real time and play a role in early warning in time.
Chinese patent No. CN107013812B discloses a method for monitoring leakage of a three-field coupling pipeline, comprising the following steps: the method comprises the steps of constructing a pipeline three-field coupling sensing system, monitoring the no-load state simulation of a detected pipeline by the pipeline three-field coupling sensing system, monitoring the normal working condition simulation of the detected pipeline by the pipeline three-field coupling sensing system, simulating the leakage event of the pipeline three-field coupling sensing system, modeling and learning a pipeline monitoring neural network, and monitoring the pipeline leakage. The invention aims to provide a method for monitoring alarm, positioning and judging leakage size of pipeline leakage alarm, which can effectively reduce false alarm, avoid missing alarm, accurately position leakage point, provide leakage size through a neural network algorithm and provide reliable basis for establishing a maintenance scheme by acquiring three parameters around a pipeline and establishing mutual relation between detection parameters. However, the data are acquired by adopting a sensor detection mode to describe the pipeline, the data of the sensor in the underground pipeline are unstable, and the pipeline leakage description is inaccurate if the sensor fails or is damaged.
Disclosure of Invention
The invention aims to solve the technical problem that the underground pipeline leakage description method in the prior art is not accurate enough.
The invention solves the technical problems through the following technical means: a downhole tubular leak description method based on cross-language image change description, the method comprising:
step a: acquiring an underground pipeline scene image, and preprocessing the image to obtain a training set and a test set;
step b: constructing a cross-language image change description model based on a dual dynamic attention mechanism;
step c: training a cross-language image change description model based on a dual dynamic attention mechanism on a training set;
step d: and testing the test set by using a trained cross-language image change description model based on a dual dynamic attention mechanism to obtain an image description result.
The method collects the scene images of the underground pipeline, avoids using a sensor for detection, ensures the accuracy of the collected data, constructs a cross-language image change description model based on a dual dynamic attention mechanism, trains the model, and finally utilizes the trained model to describe the leakage state of the pipeline, thereby ensuring the accuracy of the description of the leakage of the underground pipeline.
Further, the step a comprises:
step a 1: installing a camera at the front end of the underground pipeline to acquire daily state video stream data of the underground pipeline;
step a 2: extracting key frames in video stream data according to a preset time interval and storing the key frames as an underground pipeline scene image;
step a 3: cutting all the underground pipeline scene images to 512 x 512 to obtain an image data set; dividing the image data set into a plurality of groups according to two images as a group, wherein one image in each group is a pipeline non-leakage state image of a previous frame, and the other image is a changed image with leakage or an image without leakage change but with other factor change of a next frame; marking the image by using a COCO official pycocools package to obtain a marked mark data set; the labeled data set is classified into 3: 1 is divided into training set and test set.
Further, the step b comprises:
the cross-language image change description model based on the dual dynamic attention mechanism comprises an encoder, an RNN (radio network) embedded with a spatial attention mechanism, a dynamic attention module and a labeling module based on a dynamic speaking mechanism, wherein the dynamic attention module and the labeling module are recursive models based on LSTM (least squares), a training set or a test set is input into the encoder, the encoder is connected with the RNN embedded with the spatial attention mechanism, the RNN outputs a spatial attention result, namely an image position needing attention, the RNN embedded with the spatial attention mechanism is connected with the dynamic attention module, the dynamic attention module is connected with the labeling module, the labeling module outputs a current word, the current word is distributed, and the current word contains the time of paying attention to the image, namely when each image begins to be paid attention.
Still further, the step b further comprises:
extraction of input image set features (X) using 1 ResNet-101 network as encoderbef,Xaft);
Input image set features (X)bef,Xaft) Inputting the image into an RNN network embedded with a double attention mechanism, and performing image feature (X) on the coded input image groupbef,Xaft) By the formula Xaft-XbefDifference is made to obtain difference characteristic Xdiff(ii) a The obtained difference characteristic XdiffRespectively with input image group characteristics (X)bef,Xaft) Connecting to obtain two different space attention image groups AbefAnd Aaft
The LSTM decoder in the dynamic attention module will tag the previous hidden state of the module
Figure BDA0003101751840000041
And lbef、ldiff、laftAs an input, predicting attention weights
Figure BDA0003101751840000042
Attention is paid to the weight
Figure BDA0003101751840000043
Cumulatively summing visual features to obtain dynamic engagement features
Figure BDA0003101751840000044
Dynamic engagement feature
Figure BDA0003101751840000045
And the previous word xt-1Inputting the word into LSTM decoder of labeling module to generate current word distribution for distributing current word.
Further, the ResNet-101 network includes 1 conv1 convolutional layer, 3 conv2_ x convolutional layers, 4 conv3_ x convolutional layers, 23 conv4_ x convolutional layers, 3 conv5_ x convolutional layers and 1 fully connected layer connected in sequence, conv1 convolutional layers are 7 × 7 convolutional layers with a step size of 2, conv2_ x convolutional layers are composed of one convolutional core 1 × 1 and 64 numbers of convolutional layers, one convolutional core 3 × 3 and 64 numbers of convolutional layers and one convolutional core 1 × 1 and 256 numbers of convolutional layers, conv3_ x convolutional layers are composed of one convolutional core 1 × 1 and 128 numbers of convolutional layers, one convolutional layer core 3 × 3 and 128 numbers of convolutions and one convolutional core 1 × 1 and 512 numbers of convolutions, conv4_ x convolutional layers are composed of one convolutional core 1 × 1 and 256 numbers of convolutional layers, one convolutional core 3 × 3 and 256 numbers of convolutions and 1024 numbers of convolutions, the conv5_ x convolutional layers are composed of one convolutional core 1 × 1 and 512 convolutional layers, one convolutional core 3 × 3 and 512 convolutional layers, and one convolutional core 1 × 1 and 2048 convolutional layers.
Still further, the step c includes:
initializing a training parameter;
input image set features (X)bef,Xaft) Inputting the data into a ResNet-101 network of a cross-language image change description model based on a double dynamic attention mechanism, continuously updating the learning rate of the ResNet-101 network, the weight coefficient of a dynamic attention module and the weight coefficient of a labeling module, and stopping training until a loss function value is minimum to obtain the trained cross-language image change description model based on the double dynamic attention mechanism.
Further, the initialization training parameters include an initialization learning rate, an initialization maximum iteration number, an initialization update gradient, a weight coefficient of an initialization dynamic attention module, and a weight coefficient of an initialization tagging module, wherein the update formula of the learning rate is as follows
Figure BDA0003101751840000061
Wherein iter is the current iteration number, max _ iter is the maximum iteration number, power is the update gradient, and leaningrate is the current learning rate.
Further, the loss function is formulated as
L(θ)=LXE1L1entLent
Figure BDA0003101751840000062
Figure BDA0003101751840000063
L1=||Wc||+||Wd2||
Wherein L isXERepresenting the value, L, obtained by minimizing the cross-entropy loss for the training target1Representing the regularized value, LentRepresents the cross entropy loss value, λ1Representing a predetermined first hyperparameter, λentRepresenting a preset second hyperparameter, pθIndicating the probability value, WcRepresenting a weight coefficient, W, representing the labelling moduled2Weight coefficient, ω, representing dynamic attention moduletWeight, α, representing the labelling moduletThe attention weight of the dynamic attention module is represented.
The invention also provides a device for describing the leakage of the underground pipeline based on cross-language image change description, which comprises:
the image preprocessing module is used for acquiring an image of a scene of the underground pipeline and preprocessing the image to obtain a training set and a test set;
the model construction module is used for constructing a cross-language image change description model based on a dual dynamic attention mechanism;
the model training module is used for training the cross-language image change description model based on the dual dynamic attention mechanism on a training set;
and the test module is used for testing the test set by utilizing the trained cross-language image change description model based on the dual dynamic attention mechanism to obtain an image description result.
Further, the image preprocessing module is further configured to:
step a 1: installing a camera at the front end of the underground pipeline to acquire daily state video stream data of the underground pipeline;
step a 2: extracting key frames in video stream data according to a preset time interval and storing the key frames as an underground pipeline scene image;
step a 3: cutting all the underground pipeline scene images to 512 x 512 to obtain an image data set; dividing the image data set into a plurality of groups according to two images as a group, wherein one image in each group is a pipeline non-leakage state image of a previous frame, and the other image is a changed image with leakage or an image without leakage change but with other factor change of a next frame; marking the image by using a COCO official pycocools package to obtain a marked mark data set; the labeled data set is classified into 3: 1 is divided into training set and test set.
Further, the model building module is further configured to:
the cross-language image change description model based on the dual dynamic attention mechanism comprises an encoder, an RNN (radio network) embedded with a spatial attention mechanism, a dynamic attention module and a labeling module based on a dynamic speaking mechanism, wherein the dynamic attention module and the labeling module are recursive models based on LSTM (least squares), a training set or a test set is input into the encoder, the encoder is connected with the RNN embedded with the spatial attention mechanism, the RNN outputs a spatial attention result, namely an image position needing attention, the RNN embedded with the spatial attention mechanism is connected with the dynamic attention module, the dynamic attention module is connected with the labeling module, the labeling module outputs a current word, the current word is distributed, and the current word contains the time of paying attention to the image, namely when each image begins to be paid attention.
Still further, the model building module is further configured to:
extraction of input image set features (X) using 1 ResNet-101 network as encoderbef,Xaft);
Input image set features (X)bef,Xaft) Inputting the image into an RNN network embedded with a double attention mechanism, and performing image feature (X) on the coded input image groupbef,Xaft) By the formula Xaft-XbefDifference is made to obtain difference characteristic Xdiff(ii) a The obtained difference characteristic XdiffRespectively with input image group characteristics (X)bef,Xaft) Connecting to obtain two different space attention image groups AbefAnd Aaft
The LSTM decoder in the dynamic attention module will tag the previous hidden state of the module
Figure BDA0003101751840000081
And lbef、ldiff、laftAs an input, predicting attention weights
Figure BDA0003101751840000082
Attention is paid to the weight
Figure BDA0003101751840000083
Cumulatively summing visual features to obtain dynamic engagement features
Figure BDA0003101751840000084
Dynamic engagement feature
Figure BDA0003101751840000085
And the previous word xt-1Inputting the word into LSTM decoder of labeling module to generate current word distribution for distributing current word.
Further, the ResNet-101 network includes 1 conv1 convolutional layer, 3 conv2_ x convolutional layers, 4 conv3_ x convolutional layers, 23 conv4_ x convolutional layers, 3 conv5_ x convolutional layers and 1 fully connected layer connected in sequence, conv1 convolutional layers are 7 × 7 convolutional layers with a step size of 2, conv2_ x convolutional layers are composed of one convolutional core 1 × 1 and 64 numbers of convolutional layers, one convolutional core 3 × 3 and 64 numbers of convolutional layers and one convolutional core 1 × 1 and 256 numbers of convolutional layers, conv3_ x convolutional layers are composed of one convolutional core 1 × 1 and 128 numbers of convolutional layers, one convolutional layer core 3 × 3 and 128 numbers of convolutions and one convolutional core 1 × 1 and 512 numbers of convolutions, conv4_ x convolutional layers are composed of one convolutional core 1 × 1 and 256 numbers of convolutional layers, one convolutional core 3 × 3 and 256 numbers of convolutions and 1024 numbers of convolutions, the conv5_ x convolutional layers are composed of one convolutional core 1 × 1 and 512 convolutional layers, one convolutional core 3 × 3 and 512 convolutional layers, and one convolutional core 1 × 1 and 2048 convolutional layers.
Still further, the model training module is further configured to:
initializing a training parameter;
input image set features (X)bef,Xaft) Input to Cross-language based on Dual dynamic attention mechanismAnd in the ResNet-101 network of the image change description model, continuously updating the learning rate of the ResNet-101 network, the weight coefficient of the dynamic attention module and the weight coefficient of the labeling module, and stopping training until the loss function value is minimum to obtain the trained cross-language image change description model based on the dual dynamic attention mechanism.
Further, the initialization training parameters include an initialization learning rate, an initialization maximum iteration number, an initialization update gradient, a weight coefficient of an initialization dynamic attention module, and a weight coefficient of an initialization tagging module, wherein the update formula of the learning rate is as follows
Figure BDA0003101751840000091
Wherein iter is the current iteration number, max _ iter is the maximum iteration number, power is the update gradient, and leaningrate is the current learning rate.
Further, the loss function is formulated as
L(θ)=LXE1L1entLent
Figure BDA0003101751840000092
Figure BDA0003101751840000093
L1=||Wc||+||Wd2||
Wherein L isXERepresenting the value, L, obtained by minimizing the cross-entropy loss for the training target1Representing the regularized value, LentRepresents the cross entropy loss value, λ1Representing a predetermined first hyperparameter, λentRepresenting a preset second hyperparameter, pθIndicating the probability value, WcRepresenting a weight coefficient, W, representing the labelling moduled2Weight coefficient, ω, representing dynamic attention moduletWeight, α, representing the labelling moduletThe attention weight of the dynamic attention module is represented.
The invention has the advantages that:
(1) the method collects the scene images of the underground pipeline, avoids using a sensor for detection, ensures the accuracy of the collected data, constructs a cross-language image change description model based on a dual dynamic attention mechanism, trains the model, and finally utilizes the trained model to describe the leakage state of the pipeline, thereby ensuring the accuracy of the description of the leakage of the underground pipeline.
(2) The cross-language image change description model based on the dual dynamic attention mechanism is trained by adopting a training set formed by labeled underground pipeline state images, a spatial attention result, namely an image position needing attention, is obtained through an RNN network embedded with the spatial attention mechanism in the training process, a current word is output through the dynamic attention module and the labeling module and is distributed, the current word comprises the attention image time, namely when each image begins to be noticed, the whole model finally generates Chinese description of a target scene, the underground pipeline state does not need to be detected through manual observation, and the description effect is good.
(3) The invention breaks through the problems that a large amount of manual inspection exists in the traditional underground pipeline leakage state detection, the misjudgment of visual observation caused by the complex environment is caused, the traditional monitoring equipment (such as sensor detection) cannot provide effective state information and the like, improves the accuracy rate of the system for detecting the underground pipeline leakage state detection, and is more suitable for being applied to complex industrial scenes.
Drawings
FIG. 1 is a flow chart of a method for describing a downhole tubular leak based on cross-language image change description, according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of ResNet-101 architecture in a cross-language image change description-based downhole tubing leak description method disclosed in an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a downhole pipeline state image acquisition process in the downhole pipeline leakage description method based on cross-language image change description disclosed in the embodiment of the present invention;
FIG. 4 is a flow chart illustrating the preprocessing of the downhole tubing state image in the cross-language image change description-based downhole tubing leak description method disclosed in the embodiments of the present invention;
FIG. 5 is a flowchart illustrating a processing of a preprocessed data set in a cross-language image change description-based downhole tubular leak description method according to an embodiment of the present invention;
FIG. 6 is a flowchart of model training in a cross-language image change description-based downhole tubing leak description method disclosed in an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a cross-language image change description model based on a dual dynamic attention mechanism in the cross-language image change description-based downhole tubular leakage description method disclosed in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1 and 2, a method for describing a downhole pipeline leakage based on cross-language image change description, the method comprising:
step a: acquiring an underground pipeline scene image, and preprocessing the image to obtain a training set and a test set; as shown in fig. 3, the specific process is as follows:
s11, mounting cameras at positions with the vertical distance h from the side face of the underground pipeline, wherein the focal length of the cameras is f, and the cameras can be mounted at multiple angles to achieve multi-directional observation of the underground pipeline;
s12, setting camera parameters, wherein the camera is set to adopt higher resolution to capture more characteristics of images because the industrial field environment is more complex and has great interference on the images acquired by the camera; setting a camera frame rate, and adopting a higher camera frame rate when the underground pipeline leaks to enable the acquired image to be clearer; and adjusting parameters such as saturation and contrast of the camera according to the underground light characteristics so as to achieve optimal shooting of underground pipeline state acquisition.
And S13, acquiring the underground pipeline state image from the video frame, setting a fixed time interval, extracting the key frame according to the specified time interval and converting the key frame into the image. The downhole tubular state image is a data source for a training set and a testing set.
As shown in fig. 4, the process of preprocessing the downhole pipe state image is as follows:
and S21, primarily screening the images, removing unqualified images such as excessive blur, excessive occlusion, excessive exposure, insufficient exposure and the like, and processing the images with the size resolution of 512 multiplied by 512.
And S22, labeling the qualified image, and labeling the image data by using a COCO official pycocools package. The labeling rules were according to the Amazon Mechanical Turk Standard. The labeled label data is stored in a json format, and each image comprises the following label files:
(1) info: the method comprises the steps of establishing a data set, downloading an address, a version number and the like;
(2) and (3) license: data set usage terms;
(3) images: the method comprises the filename, height and width of a picture, and the id of a caption corresponding to the picture;
(4) annotation: the description comprises the id of the image, the id of the corresponding caption and the 3 sentences corresponding to each picture.
And S23, splitting the labeled data set into a training set and a test set according to a certain proportion.
As shown in FIG. 5, S31, according to Amazon Mechanical Turk standard, manually checking each image annotation description, and eliminating the description which does not meet the standard.
S32, according to Amazon Mechanical Turk standard, complement the description of the culling.
Step b: constructing a cross-language image change description model based on a dual dynamic attention mechanism; the method specifically comprises the following steps: a cross-language image change description model based on a dual dynamic attention mechanism is constructed, an Encoder network and a Decode network are selected at first, and hyper-parameters of a training network are set. Alternative Encoder network types are LeNet, AlexNet, VGGNet-16, VGGNet-19, ResNet-50, ResNet-101, ResNet-152, GoogleNet, and the like. Starting from the VGG network, the number of layers of the neural network is deeper and deeper, more features can be extracted by the deep network, but the training effect of the network is not good due to the problem of gradient disappearance. ResNet introduces a residual network structure (residual network) by which the gradient vanishing problem can be effectively solved. Alternative Decoder networks are RNN, LSTM, GRU, etc. For longer sequence input, a deeper neural network is generally needed to solve the long-term dependence problem, but as with the general deep network, the RNN also has the problem of difficult optimization, such as gradient disappearance and gradient explosion. For the gradient vanishing problem, the interaction gradient decreases exponentially, so the long-term dependent signal becomes very weak and is susceptible to short-term signal fluctuations. The LSTM realizes the functions of information retention and information selection (forgetting gate and input gate) by designing a gate structure, thereby enabling input information to be transmitted for a long time. The GRU is a simplification of the LSTM, and combines an input gate and a forgetting gate into an updating gate (the updating gate determines a hidden state retention or abandoning part); however, in many LSTM variants, performance and robustness are comparable to RNN and LSTM for many tasks. LSTM selects a single layer structure and sets the hidden _ size to 512.
Setting hyper-parameters for training a neural network, comprising: optimization methods (SGD, AdaGrad, RMSProp, Adam), initial learning rate, weight attenuation rate, and the like.
In summary, the cross-language image change description model based on the dual dynamic attention mechanism constructed by the invention comprises an encoder, an RNN network embedded with a spatial attention mechanism, a dynamic attention module based on a dynamic speech mechanism and a tagging module, wherein both the dynamic attention module and the tagging module are recursive models based on LSTM, a training set or a test set is input to the encoder, the encoder is connected with the RNN network embedded with the spatial attention mechanism, the RNN network outputs a spatial attention result, namely, a position of an image to be noticed, the RNN network embedded with the spatial attention mechanism is connected with the dynamic attention module, the dynamic attention module is connected with the tagging module, the tagging module outputs a current word and distributes the current word, and the current word contains attention image time, namely, when each image starts to be noticed.
The working process of the cross-language image change description model based on the dual dynamic attention mechanism is as follows: first, 1 ResNet-101 network is adopted as an encoder to extract the characteristics of an input image group (X)bef,Xaft) (ii) a The ResNet-101 network includes 1 conv1 convolutional layer, 3 conv2_ x convolutional layers, 4 conv3_ x convolutional layers, 23 conv4_ x convolutional layers, 3 conv5_ x convolutional layers and 1 fully connected layer, connected in sequence, conv1 convolutional layers are 7 × 7 convolutional layers with a step size of 2, conv2_ x convolutional layers are composed of one convolutional core 1 × 1 and 64 number of convolutional layers, one convolutional core 3 × 03 and 64 number of convolutional layers and one convolutional core 1 × 11 and 256 number of convolutional layers, conv3_ x convolutional layers are composed of one convolutional core 1 × 21 and 128 number of convolutional layers, one convolutional core 3 × 3 and 128 number of convolutional layers and one convolutional core 1 × 1 and 512 number of convolutional layers, conv4_ x convolutional layers are composed of one convolutional core 1 × 1 and 256 number of convolutional layers, one convolutional core 3 × 3 and 256 number of convolutional layers and one convolutional core 1 × 1 and 1024 number of convolutional layers, the conv5_ x convolutional layers are composed of one convolutional core 1 × 1 and 512 convolutional layers, one convolutional core 3 × 3 and 512 convolutional layers, and one convolutional core 1 × 1 and 2048 convolutional layers.
Then, the image group characteristics (X) are inputbef,Xaft) Inputting the image into an RNN network embedded with a double attention mechanism, and performing image feature (X) on the coded input image groupbef,Xaft) By the formula Xaft-XbefDifference is made to obtain difference characteristic Xdiff(ii) a The obtained difference characteristic XdiffRespectively with input image group characteristics (X)bef,Xaft) Connecting to obtain two different space attention image groups AbefAnd Aaft(ii) a The specific formula is as follows:
Xdiff=Xaft-Xbef (1)
X′bef=[Xbef;Xdiff];X′aft=[Xaft;Xdiff] (2)
abef=σ(conv2(ReLU(conv1(X′bef)))) (3)
aaft=σ(conv2(ReLU(conv1(X′aft)))) (4)
lbef=∑H,Wabef⊙Xbef (5)
laft=∑H,Waaft⊙Xaft (6)
the above is based on a dual attention mechanism that allows the system to process different images depending on the type of change and the amount of viewpoint movement, critical to detection. In order to correctly describe a pipeline leak condition, the model needs to locate and match the changing object in both images; if only one pipeline state on one image is concerned, the pipeline leakage is possibly misjudged, and the result accuracy is influenced. In a duct leak, the most obvious state change is that there is a property change (e.g. color) that does not involve object displacement, and a single attention may not be sufficient to correctly locate the changed object under a viewpoint movement, while a double attention may be used to adapt well to this environment.
Finally, to successfully describe a change, the model should learn not only where to detect in each image (spatial attention, predicted by double attention), but also when to look at each image (semantic attention). In fact, the hope model may exhibit dynamic reasoning, by which it can learn when to focus on "before" (l)bef) After (l)aft) Or "difference" characteristics (l)diff=laft-lbef) And generates a word sequence for it, i.e. the final output chinese description.
Therefore, a dynamic attention module and a labeling module based on a dynamic speaking mechanism are designed, and the dynamic attention moduleThe LSTM decoder in (a) will tag the previous hidden state of the module
Figure BDA0003101751840000161
And lbef、ldiff、laftAs an input, predicting attention weights
Figure BDA0003101751840000162
Attention is paid to the weight
Figure BDA0003101751840000163
Cumulatively summing visual features to obtain dynamic engagement features
Figure BDA0003101751840000164
Dynamic engagement feature
Figure BDA0003101751840000165
And the previous word xt-1Inputting the word into LSTM decoder of labeling module to generate current word distribution for distributing current word. The specific formula process is as follows:
Figure BDA0003101751840000166
Figure BDA0003101751840000167
Figure BDA0003101751840000168
Figure BDA0003101751840000169
Figure BDA00031017518400001610
wherein liIs time tbef、ldiff、laftThe visual characteristics of (a) the visual characteristics of (b),
Figure BDA00031017518400001611
and
Figure BDA00031017518400001612
the LSTM outputs at decoder time step t for the dynamic attention module and the tagging module, respectively, Wd1, bd1, Wd2, bd2 are learnable parameters. Dynamic participation characteristics are obtained from equation (7) using the attention weight predicted by equation (11)
Figure BDA00031017518400001613
Finally, the process is carried out in a batch,
Figure BDA00031017518400001614
and the previous word xt-1Input to the LSTM decoder of the tagging module, the next word is distributed:
Figure BDA0003101751840000171
Figure BDA0003101751840000172
Figure BDA0003101751840000173
Figure BDA0003101751840000174
Figure BDA0003101751840000175
is the previous word omegat-1E is an embedded layer; x is the number oft-1Is a heat of the previous word at the embedding layerEncoding a value; c (t) is
Figure BDA0003101751840000176
And the one-hot coded value x of the previous wordt-1Concatenated and then input to the LSTM decoder of the tagging module to begin generating the next word distribution. The two decoders predict each word in parallel and keep interacting with each other.
H input at each time steptAnd ZtAnd calculating by adopting a baseline model method. Use of
Figure BDA0003101751840000177
To represent an affine transformation involving learned parameters:
Figure BDA0003101751840000178
ct=ft⊙ct-1+it⊙gt
ht=ot⊙tanh(ct)
where i ist,ft,ct,ot,htRespectively the input, forget, memory, output and hidden states of the LSTM. Vector quantity
Figure BDA0003101751840000179
Is an image vector that captures visual information associated with a particular input location, as described below.
Figure BDA00031017518400001710
Is an embedded matrix. Let m and n denote the embedding dimension and the LSTM dimension, respectively, and σ and |, denote the logical-sigmoid activation and element multiplication, respectively.
Step c: training a cross-language image change description model based on a dual dynamic attention mechanism on a training set; the specific process is as follows:
initializing a training parameter;
input image set features (X)bef,Xaft) Inputting the data into a ResNet-101 network of a cross-language image change description model based on a double dynamic attention mechanism, continuously updating the learning rate of the ResNet-101 network, the weight coefficient of a dynamic attention module and the weight coefficient of a labeling module, and stopping training until a loss function value is minimum to obtain the trained cross-language image change description model based on the double dynamic attention mechanism.
The initialization training parameters comprise an initialization learning rate, an initialization maximum iteration number, an initialization updating gradient, a weight coefficient of an initialization dynamic attention module and a weight coefficient of an initialization tagging module, and the updating formula of the learning rate is
Figure BDA0003101751840000181
Wherein iter is the current iteration number, max _ iter is the maximum iteration number, power is the update gradient, and leaningrate is the current learning rate. In this example, the trained batch size is batchsize 4, and the maximum number of iterations is set to 30000. Momentum is 0.9, and the initial learning rate is set to 0.001. And adjusting the learning rate by adopting an inv strategy when the model is trained.
As shown in fig. 6, the ResNet-101 network weight initialization is performed, the weights of the other layers except the last layer of the network are initialized in an unbiased manner, that is, the bias (bias) is 0, the variance (var) is gaussian distributed (σ ═ 0.01), the weight parameter of the last layer of the network takes the problem of unbalanced distribution of samples into consideration, and a formula is adopted during weight initialization
Figure BDA0003101751840000182
Wherein pi is a hyper-parameter, pi is set to 0.01 in the example, and the model initialization strategy is changed to ensure that the model does not deflect to more negative samples;
the model stops training when the optimal solution is found using the following loss function:
L(θ)=LXE1L1entLent
Figure BDA0003101751840000183
Figure BDA0003101751840000191
L1=||Wc||+||Wd2||
wherein L isXERepresenting the value, L, obtained by minimizing the cross-entropy loss for the training target1Representing the regularized value, LentRepresents the cross entropy loss value, λ L1 represents a preset first hyper-parameter, and λ ent represents a preset second hyper-parameter. p is a radical ofθIndicating the probability value, W, of the initial timec、bcAnd Wd2、bd2All give an initial value, enter the dual attention module, and turn Wd2、bd2Substituting the initial value of (11) for the initial value of alphatAccording to the initial alphatObtaining the initial LentThen enters into dynamic speaking mechanism and W is sentc、bcSubstituting the initial value of (2) into the formula (15) to obtain the initial value of omegatFrom the initial ωtTo obtain the initial LXEThen using the initial WcAnd an initial Wd2Calculating to obtain initial L1Then according to the initial LXEInitial LentAnd an initial L1To obtain initial loss values, and then updating W separately by back propagationcAnd Wd2And finally obtaining a loss value in each updating process, stopping updating when the loss function finds the optimal solution, fixing parameters, and substituting the parameters into the formula (11) and the formula (15) to obtain the finally trained model.
Step d: and testing the test set by using a trained cross-language image change description model based on a dual dynamic attention mechanism to obtain an image description result. FIG. 7 is a schematic diagram of a cross-language image change description model architecture according to the present invention.
Through the technical scheme, the method collects the scene images of the underground pipeline, avoids using a sensor for detection, ensures the accuracy of the collected data, constructs a cross-language image change description model based on a dual dynamic attention mechanism, trains the model, and finally utilizes the trained model to describe the leakage state of the pipeline, thereby ensuring the accuracy of the description of the leakage of the underground pipeline.
Example 2
The invention also provides a device for describing the leakage of the underground pipeline based on cross-language image change description, which comprises:
the image preprocessing module is used for acquiring an image of a scene of the underground pipeline and preprocessing the image to obtain a training set and a test set;
the model construction module is used for constructing a cross-language image change description model based on a dual dynamic attention mechanism;
the model training module is used for training the cross-language image change description model based on the dual dynamic attention mechanism on a training set;
and the test module is used for testing the test set by utilizing the trained cross-language image change description model based on the dual dynamic attention mechanism to obtain an image description result.
Specifically, the image preprocessing module is further configured to:
step a 1: installing a camera at the front end of the underground pipeline to acquire daily state video stream data of the underground pipeline;
step a 2: extracting key frames in video stream data according to a preset time interval and storing the key frames as an underground pipeline scene image;
step a 3: cutting all the underground pipeline scene images to 512 x 512 to obtain an image data set; dividing the image data set into a plurality of groups according to two images as a group, wherein one image in each group is a pipeline non-leakage state image of a previous frame, and the other image is a changed image with leakage or an image without leakage change but with other factor change of a next frame; marking the image by using a COCO official pycocools package to obtain a marked mark data set; the labeled data set is classified into 3: 1 is divided into training set and test set.
Specifically, the model building module is further configured to:
the cross-language image change description model based on the dual dynamic attention mechanism comprises an encoder, an RNN (radio network) embedded with a spatial attention mechanism, a dynamic attention module and a labeling module based on a dynamic speaking mechanism, wherein the dynamic attention module and the labeling module are recursive models based on LSTM (least squares), a training set or a test set is input into the encoder, the encoder is connected with the RNN embedded with the spatial attention mechanism, the RNN outputs a spatial attention result, namely an image position needing attention, the RNN embedded with the spatial attention mechanism is connected with the dynamic attention module, the dynamic attention module is connected with the labeling module, the labeling module outputs a current word, the current word is distributed, and the current word contains the time of paying attention to the image, namely when each image begins to be paid attention.
More specifically, the model building module is further configured to:
extraction of input image set features (X) using 1 ResNet-101 network as encoderbef,Xaft);
Input image set features (X)bef,Xaft) Inputting the image into an RNN network embedded with a double attention mechanism, and performing image feature (X) on the coded input image groupbef,Xaft) By the formula Xaft-XbefDifference is made to obtain difference characteristic Xdiff(ii) a The obtained difference characteristic XdiffRespectively with input image group characteristics (X)bef,Xaft) Connecting to obtain two different space attention image groups AbefAnd Aaft
The LSTM decoder in the dynamic attention module will tag the previous hidden state of the module
Figure BDA0003101751840000211
And lbef、ldiff、laftAs an input, predicting attention weights
Figure BDA0003101751840000212
Attention is paid to the weight
Figure BDA0003101751840000213
Cumulatively summing visual features to obtain dynamic engagement features
Figure BDA0003101751840000214
Dynamic engagement feature
Figure BDA0003101751840000215
And the previous word xt-1Inputting the word into LSTM decoder of labeling module to generate current word distribution for distributing current word.
More specifically, the ResNet-101 network includes 1 conv1 convolutional layer, 3 conv2_ x convolutional layers, 4 conv3_ x convolutional layers, 23 conv4_ x convolutional layers, 3 conv5_ x convolutional layers, and 1 fully-connected layer, sequentially connected, conv1 convolutional layers are 7 × 7 convolutional layers having a step size of 2, conv2_ x convolutional layers are composed of one convolutional core 1 × 1 and 64 numbers of convolutional layers, one convolutional core 3 × 3 and 64 numbers of convolutional layers, and one convolutional core 1 × 1 and 256 numbers of convolutional layers, conv3_ x convolutional layers are composed of one convolutional core 1 × 1 and 128 numbers of convolutional layers, one convolutional layer core 3 × 3 and 128 numbers of convolutions, and one convolutional core 1 × 1 and 512 numbers of convolutions, conv4_ x convolutional layers are composed of one convolutional core 1 × 1 and 256 numbers of convolutional layers, one convolutional core 3 × 3 and 256 numbers of convolutions, and 1024 numbers of convolutions, the conv5_ x convolutional layers are composed of one convolutional core 1 × 1 and 512 convolutional layers, one convolutional core 3 × 3 and 512 convolutional layers, and one convolutional core 1 × 1 and 2048 convolutional layers.
More specifically, the model training module is further configured to:
initializing a training parameter;
input image set features (X)bef,Xaft) Inputting the data into a ResNet-101 network of a cross-language image change description model based on a double dynamic attention mechanism, continuously updating the learning rate of the ResNet-101 network, the weight coefficient of a dynamic attention module and the weight coefficient of a labeling module, and stopping training until a loss function value is minimum to obtain a trained cross-language image change description model based on the double dynamic attention mechanism。
More specifically, the initialization training parameters include an initialization learning rate, an initialization maximum iteration number, an initialization update gradient, a weight coefficient of an initialization dynamic attention module, and a weight coefficient of an initialization tagging module, and the update formula of the learning rate is
Figure BDA0003101751840000221
Wherein iter is the current iteration number, max _ iter is the maximum iteration number, power is the update gradient, and leaningrate is the current learning rate.
More specifically, the loss function is formulated as
L(θ)=LXE1L1entLent
Figure BDA0003101751840000222
Figure BDA0003101751840000231
L1=||Wc||+||Wd2||
Wherein L isXERepresenting the value, L, obtained by minimizing the cross-entropy loss for the training target1Representing the regularized value, LentRepresents the cross entropy loss value, λ1Representing a predetermined first hyperparameter, λentRepresenting a preset second hyperparameter, pθIndicating the probability value, WcRepresenting a weight coefficient, W, representing the labelling moduled2Weight coefficient, ω, representing dynamic attention moduletWeight, α, representing the labelling moduletThe attention weight of the dynamic attention module is represented.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A downhole pipeline leakage description method based on cross-language image change description is characterized by comprising the following steps:
step a: acquiring an underground pipeline scene image, and preprocessing the image to obtain a training set and a test set;
step b: constructing a cross-language image change description model based on a dual dynamic attention mechanism;
step c: training a cross-language image change description model based on a dual dynamic attention mechanism on a training set;
step d: and testing the test set by using a trained cross-language image change description model based on a dual dynamic attention mechanism to obtain an image description result.
2. The method for describing the leakage of the underground pipeline based on the cross-language image change description according to claim 1, wherein the step a comprises the following steps:
step a 1: installing a camera at the front end of the underground pipeline to acquire daily state video stream data of the underground pipeline;
step a 2: extracting key frames in video stream data according to a preset time interval and storing the key frames as an underground pipeline scene image;
step a 3: cutting all the underground pipeline scene images to 512 x 512 to obtain an image data set; dividing the image data set into a plurality of groups according to two images as a group, wherein one image in each group is a pipeline non-leakage state image of a previous frame, and the other image is a changed image with leakage or an image without leakage change but with other factor change of a next frame; marking the image by using a COCO official pycocools package to obtain a marked mark data set; the labeled data set is classified into 3: 1 is divided into training set and test set.
3. The method for describing the leakage of the underground pipeline based on the cross-language image change description according to claim 1, wherein the step b comprises the following steps:
the cross-language image change description model based on the dual dynamic attention mechanism comprises an encoder, an RNN (radio network) embedded with a spatial attention mechanism, a dynamic attention module and a labeling module based on a dynamic speaking mechanism, wherein the dynamic attention module and the labeling module are recursive models based on LSTM (least squares), a training set or a test set is input into the encoder, the encoder is connected with the RNN embedded with the spatial attention mechanism, the RNN outputs a spatial attention result, namely an image position needing attention, the RNN embedded with the spatial attention mechanism is connected with the dynamic attention module, the dynamic attention module is connected with the labeling module, the labeling module outputs a current word, the current word is distributed, and the current word contains the time of paying attention to the image, namely when each image begins to be paid attention.
4. The method for describing the leakage of the downhole pipeline based on the cross-language image change description according to claim 3, wherein the step b further comprises the following steps:
extraction of input image set features (X) using 1 ResNet-101 network as encoderbef,Xaft);
Input image set features (X)bef,Xaft) Inputting the image into an RNN network embedded with a double attention mechanism, and performing image feature (X) on the coded input image groupbef,Xaft) By the formula Xaft-XbefDifference is made to obtain difference characteristic Xdiff(ii) a The obtained difference characteristic XdiffRespectively with input image group characteristics (X)bef,Xaft) Connecting to obtain two different space attention image groups AbefAnd Aaft
The LSTM decoder in the dynamic attention module will tag the previous hidden state of the module
Figure FDA0003101751830000021
And lbef、ldiff、laftAs an input, predicting attention weights
Figure FDA0003101751830000022
Attention is paid to the weight
Figure FDA0003101751830000023
Cumulatively summing visual features to obtain dynamic engagement features
Figure FDA0003101751830000024
Dynamic engagement feature
Figure FDA0003101751830000025
And the previous word xt-1Inputting the word into LSTM decoder of labeling module to generate current word distribution for distributing current word.
5. The method of claim 4, wherein the ResNet-101 network comprises 1 conv1 convolutional layer, 3 conv2_ x convolutional layer, 4 conv3_ x convolutional layer, 23 conv4_ x convolutional layer, 3 conv5_ x convolutional layers and 1 fully connected layer connected in sequence, the conv1 convolutional layer is a 7 x 7 convolutional layer with a step size of 2, the conv2_ x convolutional layer is composed of one convolutional core 1 x 1 convolutional layer with a number of 64, one convolutional core 3 x 3 convolutional layer with a number of 64 and one convolutional core 1 x 1 convolutional layer with a number of 256, the conv3_ x convolutional layer is composed of one convolutional core 1 x 1 convolutional layer with a number of 128, one convolutional core 3 x 3 with a number of 128, and one convolutional core 1 x 1 convolutional layer with a number of 128, and one conv4_ x 1 convolutional layer with a number of 256, the conv 32 _ x convolutional layers is composed of one convolutional core 1 x 1 layer with a number of 256 and one convolutional core 1 x 1 with a number of 256, and the conv convolutional layer with a number of 39512, and the conv 32 _ x 2_ x 1 layer is composed of a number of one convolutional core 256 and one convolutional layer with a number of 256 and one convolutional core 256, One convolution kernel 3 × 3 and 256 numbers of convolution layers and one convolution kernel 1 × 1 and 1024 numbers of convolution layers, and the conv5_ x convolution layer is composed of one convolution kernel 1 × 1 and 512 numbers of convolution layers, one convolution kernel 3 × 3 and 512 numbers of convolution layers and one convolution kernel 1 × 1 and 2048 numbers of convolution layers.
6. The method for describing the leakage of the underground pipeline based on the cross-language image change description according to claim 4, wherein the step c comprises the following steps:
initializing a training parameter;
input image set features (X)bef,Xaft) Inputting the data into a ResNet-101 network of a cross-language image change description model based on a double dynamic attention mechanism, continuously updating the learning rate of the ResNet-101 network, the weight coefficient of an initialized dynamic attention module and the weight coefficient of an initialized labeling module, and stopping training until the loss function value is minimum to obtain the trained cross-language image change description model based on the double dynamic attention mechanism.
7. The method for describing the leakage of the downhole pipeline based on the cross-language image change description according to claim 6, wherein the initialization training parameters comprise an initialization learning rate, an initialization maximum iteration number, an initialization update gradient, a weight coefficient of an initialization dynamic attention module, and a weight coefficient of an initialization tagging module, and the update formula of the learning rate is as follows
Figure FDA0003101751830000041
Figure FDA0003101751830000042
Wherein iter is the current iteration number, max _ iter is the maximum iteration number, power is the update gradient, and leaningrate is the current learning rate.
8. The method for describing downhole tubular leak based on cross-language image change description as claimed in claim 6, wherein the loss function is formulated as
L(θ)=LXE1L1entLent
Figure FDA0003101751830000043
Figure FDA0003101751830000044
L1=||Wc||+||Wd2||
Wherein L isXERepresenting the value, L, obtained by minimizing the cross-entropy loss for the training target1Representing the regularized value, LentRepresents the cross entropy loss value, λ1Representing a predetermined first hyperparameter, λentRepresenting a preset second hyperparameter, pθIndicating the probability value, WcRepresenting a weight coefficient, W, representing the labelling moduled2Weight coefficient, ω, representing dynamic attention moduletWeight, α, representing the labelling moduletThe attention weight of the dynamic attention module is represented.
9. A downhole tubing leak description apparatus based on cross-language image change description, the apparatus comprising:
the image preprocessing module is used for acquiring an image of a scene of the underground pipeline and preprocessing the image to obtain a training set and a test set;
the model construction module is used for constructing a cross-language image change description model based on a dual dynamic attention mechanism;
the model training module is used for training the cross-language image change description model based on the dual dynamic attention mechanism on a training set;
and the test module is used for testing the test set by utilizing the trained cross-language image change description model based on the dual dynamic attention mechanism to obtain an image description result.
10. The device for describing downhole tubular leak based on cross-language image change description as claimed in claim 9, wherein the image preprocessing module is further configured to:
step a 1: installing a camera at the front end of the underground pipeline to acquire daily state video stream data of the underground pipeline;
step a 2: extracting key frames in video stream data according to a preset time interval and storing the key frames as an underground pipeline scene image;
step a 3: cutting all the underground pipeline scene images to 512 x 512 to obtain an image data set; dividing the image data set into a plurality of groups according to two images as a group, wherein one image in each group is a pipeline non-leakage state image of a previous frame, and the other image is a changed image with leakage or an image without leakage change but with other factor change of a next frame; marking the image by using a COCO official pycocools package to obtain a marked mark data set; the labeled data set is classified into 3: 1 is divided into training set and test set.
CN202110626949.2A 2021-06-04 2021-06-04 Underground pipeline leakage description method and device based on cross-language image change description Active CN113239886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110626949.2A CN113239886B (en) 2021-06-04 2021-06-04 Underground pipeline leakage description method and device based on cross-language image change description

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110626949.2A CN113239886B (en) 2021-06-04 2021-06-04 Underground pipeline leakage description method and device based on cross-language image change description

Publications (2)

Publication Number Publication Date
CN113239886A true CN113239886A (en) 2021-08-10
CN113239886B CN113239886B (en) 2024-03-19

Family

ID=77136997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110626949.2A Active CN113239886B (en) 2021-06-04 2021-06-04 Underground pipeline leakage description method and device based on cross-language image change description

Country Status (1)

Country Link
CN (1) CN113239886B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067103A (en) * 2021-11-23 2022-02-18 南京工业大学 Intelligent pipeline third party damage identification method based on YOLOv3
CN114577410A (en) * 2022-03-04 2022-06-03 浙江蓝能燃气设备有限公司 Automatic leakage detection system for helium leakage of bottle group container and application method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330153A1 (en) * 2014-05-13 2017-11-16 Monster Worldwide, Inc. Search Extraction Matching, Draw Attention-Fit Modality, Application Morphing, and Informed Apply Apparatuses, Methods and Systems
US20190005069A1 (en) * 2017-06-28 2019-01-03 Google Inc. Image Retrieval with Deep Local Feature Descriptors and Attention-Based Keypoint Descriptors
US20200034948A1 (en) * 2018-07-27 2020-01-30 Washington University Ml-based methods for pseudo-ct and hr mr image estimation
WO2020028382A1 (en) * 2018-07-30 2020-02-06 Memorial Sloan Kettering Cancer Center Multi-modal, multi-resolution deep learning neural networks for segmentation, outcomes prediction and longitudinal response monitoring to immunotherapy and radiotherapy
CN111160467A (en) * 2019-05-31 2020-05-15 北京理工大学 Image description method based on conditional random field and internal semantic attention
CN111368846A (en) * 2020-03-19 2020-07-03 中国人民解放军国防科技大学 Road ponding identification method based on boundary semantic segmentation
CN111832501A (en) * 2020-07-20 2020-10-27 中国人民解放军战略支援部队航天工程大学 Remote sensing image text intelligent description method for satellite on-orbit application
WO2020222985A1 (en) * 2019-04-30 2020-11-05 The Trustees Of Dartmouth College System and method for attention-based classification of high-resolution microscopy images
CN111914710A (en) * 2020-07-24 2020-11-10 合肥工业大学 Method and system for describing scenes of railway locomotive depot
WO2020244287A1 (en) * 2019-06-03 2020-12-10 中国矿业大学 Method for generating image semantic description
WO2020248471A1 (en) * 2019-06-14 2020-12-17 华南理工大学 Aggregation cross-entropy loss function-based sequence recognition method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330153A1 (en) * 2014-05-13 2017-11-16 Monster Worldwide, Inc. Search Extraction Matching, Draw Attention-Fit Modality, Application Morphing, and Informed Apply Apparatuses, Methods and Systems
US20190005069A1 (en) * 2017-06-28 2019-01-03 Google Inc. Image Retrieval with Deep Local Feature Descriptors and Attention-Based Keypoint Descriptors
US20200034948A1 (en) * 2018-07-27 2020-01-30 Washington University Ml-based methods for pseudo-ct and hr mr image estimation
WO2020028382A1 (en) * 2018-07-30 2020-02-06 Memorial Sloan Kettering Cancer Center Multi-modal, multi-resolution deep learning neural networks for segmentation, outcomes prediction and longitudinal response monitoring to immunotherapy and radiotherapy
WO2020222985A1 (en) * 2019-04-30 2020-11-05 The Trustees Of Dartmouth College System and method for attention-based classification of high-resolution microscopy images
CN111160467A (en) * 2019-05-31 2020-05-15 北京理工大学 Image description method based on conditional random field and internal semantic attention
WO2020244287A1 (en) * 2019-06-03 2020-12-10 中国矿业大学 Method for generating image semantic description
WO2020248471A1 (en) * 2019-06-14 2020-12-17 华南理工大学 Aggregation cross-entropy loss function-based sequence recognition method
CN111368846A (en) * 2020-03-19 2020-07-03 中国人民解放军国防科技大学 Road ponding identification method based on boundary semantic segmentation
CN111832501A (en) * 2020-07-20 2020-10-27 中国人民解放军战略支援部队航天工程大学 Remote sensing image text intelligent description method for satellite on-orbit application
CN111914710A (en) * 2020-07-24 2020-11-10 合肥工业大学 Method and system for describing scenes of railway locomotive depot

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
姚义;王诗珂;陈希豪;林宇翩;: "基于深度学习的结构化图像标注研究", 电脑知识与技术, no. 33 *
牛斌;李金泽;房超;马利;徐和然;纪兴海;: "一种基于注意力机制与多模态的图像描述方法", 辽宁大学学报(自然科学版), no. 01 *
赵小虎;尹良飞;赵成龙;: "基于全局-局部特征和自适应注意力机制的图像语义描述算法", 浙江大学学报(工学版), no. 01 *
韦人予;蒙祖强;: "基于注意力特征自适应校正的图像描述模型", 计算机应用, no. 1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067103A (en) * 2021-11-23 2022-02-18 南京工业大学 Intelligent pipeline third party damage identification method based on YOLOv3
CN114577410A (en) * 2022-03-04 2022-06-03 浙江蓝能燃气设备有限公司 Automatic leakage detection system for helium leakage of bottle group container and application method

Also Published As

Publication number Publication date
CN113239886B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
Kumar et al. Deep learning–based automated detection of sewer defects in CCTV videos
Kopbayev et al. Gas leakage detection using spatial and temporal neural network model
CN111325323B (en) Automatic power transmission and transformation scene description generation method integrating global information and local information
CN107944412A (en) Transmission line of electricity automatic recognition system and method based on multilayer convolutional neural networks
EP3699579B1 (en) Inspection method and inspection device and computer-readable medium
CN113239886B (en) Underground pipeline leakage description method and device based on cross-language image change description
CN109376736A (en) A kind of small video target detection method based on depth convolutional neural networks
CN116824335A (en) YOLOv5 improved algorithm-based fire disaster early warning method and system
CN117523177A (en) Gas pipeline monitoring system and method based on artificial intelligent hybrid big model
Qiu et al. A lightweight yolov4-edam model for accurate and real-time detection of foreign objects suspended on power lines
CN117808739A (en) Method and device for detecting pipeline defects
CN113780111A (en) Pipeline connector based on optimized YOLOv3 algorithm and defect accurate identification method
CN116467485B (en) Video image retrieval construction system and method thereof
CN113111184A (en) Event detection method based on explicit event structure knowledge enhancement and terminal equipment
Li et al. An integrated underwater structural multi-defects automatic identification and quantification framework for hydraulic tunnel via machine vision and deep learning
CN115935241A (en) Real-time positioning method and device for pipe cleaner with multi-parameter mutual fusion
CN113516179B (en) Method and system for identifying water leakage performance of underground infrastructure
Zhang et al. Combining Self‐Supervised Learning and Yolo v4 Network for Construction Vehicle Detection
CN113283382B (en) Method and device for describing leakage scene of underground pipeline
CN112712497B (en) Cast iron pipeline joint stability detection method based on local descriptor
Jia et al. Sample generation of semi‐automatic pavement crack labelling and robustness in detection of pavement diseases
CN115147684A (en) Target striking effect evaluation method based on deep learning
CN114140879A (en) Behavior identification method and device based on multi-head cascade attention network and time convolution network
Jiang et al. Lightweight object detection network for multi‐damage recognition of concrete bridges in complex environments
Wang et al. An improved Image Description Method Using Recurrent Neural Network with Gated Recurrent Unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant