CN115965578A

CN115965578A - Binocular stereo matching detection method and device based on channel attention mechanism

Info

Publication number: CN115965578A
Application number: CN202211400259.6A
Authority: CN
Inventors: 徐波; 华栋; 刘嘉; 林谋
Original assignee: Guangdong Junhua Energy Technology Co ltd; Super High Voltage Branch Of State Grid Jiangxi Electric Power Co ltd; State Grid Corp of China SGCC
Current assignee: Guangdong Junhua Energy Technology Co ltd; Super High Voltage Branch Of State Grid Jiangxi Electric Power Co ltd; State Grid Corp of China SGCC
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-04-14

Abstract

The invention provides a binocular stereo matching detection method and device based on a channel attention mechanism, wherein the method depends on the existing electric power text data and live-line work transformer substation image videos shot by a binocular camera carried by an operator, and the transformer substation image videos are subjected to image preprocessing; constructing a transformer substation power knowledge graph according to the power text data and the transformer substation image video data; and constructing a real-time binocular stereo matching model based on a attention mechanism to carry out binocular stereo matching on the live image video in the electric power knowledge map of the transformer substation, and finally calculating according to a binocular ranging process to obtain a depth distance so as to realize near-electric safety distance detection. The method carries out feature extraction and binocular stereo matching by fusing an attention mechanism, and finally calculates the depth distance according to a binocular distance measurement process to realize near-electric safety distance detection.

Description

Binocular stereo matching detection method and device based on channel attention mechanism

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a binocular stereo matching detection method and device based on a channel attention mechanism.

Background

With the development of computer vision binocular stereo matching technology, the technology is gradually applied to national grid power business, and the detection of fine-grained near-electricity safety distance is realized. Because the electric shock accident happens sometimes, and "protection against electric shock" is one of the most typical and common safety control requirements for preventing personal safety incidents, in order to improve the power grid safety risk prevention capability, when an operator approaches each voltage class device, because surrounding devices are in a charged state, in order to ensure the personal safety of the operator, the safety distance red line range needs to be calculated according to the environment where the operator, the held tools, materials and the like are located. At present, two main types of methods for judging the distance between the charged equipment are provided, the first type utilizes the electric field coupling principle between a high-voltage electric field and a sensor, and the second type utilizes an ultrasonic distance meter to directly measure the distance of the charged equipment. However, the two methods have limitations, and the current transformer substation has complex scene elements and higher requirements on real-time information of live equipment and operators.

Aiming at the problems of complex data types and large data amount of power transformation service scenes and difficulty in application of a traditional database scheme, a multimode data knowledge graph construction technology based on multimode sharing fusion is researched; based on the constructed electric power knowledge graph, a multi-stage binocular stereo matching detection technology based on a channel attention mechanism is researched aiming at the defects of low efficiency and poor intelligent means of transformer substation manpower inspection. The method comprises the steps of relying on a multi-modal database of electric power images of a transformer substation, carrying out preprocessing to obtain high-quality live working binocular image video monitoring, carrying out feature extraction and binocular stereo matching through a fusion attention mechanism, obtaining multi-scale image video features, improving matching accuracy of objects of different scales in multiple stages, and finally achieving near-electric safety distance detection only by calculating depth distance according to a binocular distance measurement process.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a binocular stereo matching method and device based on a channel attention mechanism, which are used for researching a multi-mode-based electric power image database technology, acquiring live-working binocular video monitoring and preprocessing, constructing an electric power image knowledge map, and performing binocular stereo matching by fusing the attention mechanism. And finally, calculating according to a binocular distance measurement process to obtain a depth distance to realize near-electricity safety distance detection.

The invention discloses a binocular stereo matching detection method based on a channel attention mechanism, which comprises the following steps of:

s1, carrying out image preprocessing on a transformer substation image video based on the existing electric power text data and the transformer substation image video of live working shot by a binocular camera carried by an operator;

s2, constructing a transformer substation power knowledge graph according to the power text data and the transformer substation image video data;

s3, constructing a real-time binocular stereo matching model based on a attention mechanism, and performing binocular stereo matching on the live image video in the transformer substation electric power knowledge graph by using the attention mechanism based on the transformer substation electric power knowledge graph; the real-time binocular stereo matching model based on the attention mechanism comprises an attention module and a parallax optimization module;

the attention module relies on a video image data set in a power knowledge graph of the transformer substation, the characteristics of the image are enhanced by using a channel attention and space attention module, the characteristics of the image frame are extracted by using a residual error module, and the characteristic vector of each pixel in the down-sampled image is output to obtain a characteristic map;

the parallax optimization module uses the obtained attention feature map to continue to use the two-dimensional volume block for calculation, network parameters during training are reduced, the real-time binocular stereo matching model based on the attention mechanism is lighter, a three-step sampling mode of gradually amplifying the parallax map is selected, two feature maps are input firstly for convolution operation, a prediction result obtained through the residual block is combined with coarse-granularity parallax prediction, a high-resolution feature parallax map is finally obtained, the receptive field is increased, context information of multi-scale images is obtained, the inference speed of the real-time binocular stereo matching model based on the attention mechanism is improved, the real-time requirement is met, and meanwhile, the high-quality parallax map is obtained;

s4, detecting the near-electricity safety distance: and recovering the three-dimensional geometric information of the transformer substation operating personnel based on the obtained high-quality parallax map, calculating the depth information of the operating personnel and the transformer equipment in the image by utilizing triangulation to form a three-dimensional point cloud, and finally realizing the distance calculation of the operating personnel and the transformer equipment.

Further, the image preprocessing comprises image rectification, image denoising and intelligent image video analysis.

Further, the image rectification: decomposing the power video monitoring into a plurality of image frames, and using a correction algorithm based on contour extraction for the image frames with obvious edges; and aiming at image frames with unobvious edges but in order, a correction algorithm based on Hough line detection is used.

Further, the image denoising: aiming at the problems of unclear image frames and noise in the transformer substation image video, a Gaussian filtering method is used for processing.

Further, the intelligent image video analysis: modeling the levels of foreground and background pixels according to decomposed multi-frame image frames, and modeling through continuous N frames to generate a background model by using the probability density distribution of RGB pixels when a foreground object does not change, so as to extract an operator image in a motion state in the image; and meanwhile, based on the obtained foreground operator image, identifying a specific object by using a pattern recognition technology based on shape characteristics and color characteristics, and if the distance between the operator and the power equipment exceeds a set threshold value, early warning of violation behaviors is carried out.

Further, a substation power knowledge graph is constructed as follows:

s21, firstly, performing knowledge extraction operation on text information such as operator information, operation site safety requirements, transformer equipment operation parameters and the like of the transformer substation, and realizing the linkage of power cross-modal data;

performing natural language processing, word segmentation, part of speech tagging and syntactic analysis according to input power text data and transformer substation image video data, selecting a learning-based mode, performing automatic extraction of entity relations by using power text data after sequence tagging and adopting a long-short term memory network algorithm, merging entities with the same meaning by using a k-means clustering algorithm to realize entity disambiguation, and finally storing data in a triple form;

labeling the transformer substation image video data, and linking the entity with the corresponding transformer substation image video data in the relational database through the serial numbers and names in the relational database to realize knowledge fusion under different modes to form an electric power knowledge base;

s22, the expert examines the power knowledge base with the knowledge fusion completed, and modifies and perfects the power knowledge base according to the expert examination opinions;

and S23, according to the improved transformer substation power knowledge base, forming a complete transformer substation power knowledge map based on the Neo4j map database visualization power knowledge base.

Further preferably, the attention module first compresses the spatial dimension of the input feature map, performs maximum pooling and average pooling, then performs down-sampling of the image frame, uses 2 convolution kernels of 3 × 3, maintains the number of channels 32, obtains the local channel attention cross channel correlation, obtains the channel attention feature map F1 through the channel attention, multiplies the attention feature map F1 and the feature map to obtain a first fused feature map, applies the maximum pooling and average pooling along the channel axis on the basis of the first fused feature map for the spatial attention, then performs convolution on the obtained feature input, finally obtains the spatial attention feature map F2 by using an activation function, and then multiplies the spatial attention feature map F2 and the first fused feature map to obtain the final attention feature map.

The invention also provides a binocular stereo matching detection device based on the channel attention mechanism, which comprises a binocular camera for image video acquisition, an image preprocessing module, a transformer substation power knowledge map module, a binocular stereo matching module, a near-electricity safety distance calculation module and an alarm module;

the transformer substation power knowledge graph module constructs a transformer substation power knowledge graph according to the power text data and the transformer substation image video data;

a real-time binocular stereo matching model based on an attention mechanism is built in the binocular stereo matching module, and the binocular stereo matching module carries out binocular stereo matching on a charged image video in the transformer substation power knowledge map by using the attention mechanism;

the near-electricity safety distance calculation module recovers three-dimensional geometric information of the operating personnel of the transformer substation based on the obtained high-quality disparity map, depth information of the operating personnel and the transformer equipment in the image is calculated by utilizing triangulation to form three-dimensional point cloud, and distance calculation of the operating personnel and the transformer equipment is finally achieved;

and the alarm module informs the operator of the calculation result of the near-electricity safety distance calculation module.

The image preprocessing module comprises an image correction module, an image denoising module and an intelligent image video analysis module.

The invention has the following beneficial effects: relying on a multi-mode database of the power image of the transformer substation, preprocessing the power image to obtain high-quality live working binocular image video monitoring, constructing a power image knowledge map, performing feature extraction and binocular stereo matching by fusing an attention mechanism, and finally calculating a depth distance according to a binocular distance measurement process to realize near-electricity safety distance detection.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of a power image preprocessing process;

FIG. 3 is a process of constructing a knowledge map library of the power of the substation;

FIG. 4 is a diagram of the knowledge contained in the knowledge database of the power of the substation;

FIG. 5 is a binocular stereo matching network structure based on an attention mechanism;

FIG. 6 is a schematic diagram of a near-electric safety distance detection process based on attention mechanism;

fig. 7 is a flow chart of real-time stereo matching.

Detailed description of the invention

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 1, the embodiment of the invention discloses a binocular stereo matching detection method based on a channel attention mechanism, which comprises the following steps:

step S1, the transformer substation image video is subjected to image preprocessing including image correction, image denoising and intelligent image video analysis according to the existing electric power text data and the transformer substation image video of live working, which is shot by a binocular camera carried by an operator, and the image preprocessing operation flow is shown in figure 2.

(1) And (5) image rectification. Decomposing the power video monitoring into a plurality of image frames, and then using a correction algorithm based on contour extraction for the image frames with obvious edges; for image frames which are not obvious in edge but are arranged orderly, a correction algorithm based on Hough line detection is used.

(2) And denoising the image. Aiming at the problems of unclear image frames and noise in the transformer substation image video, the Gaussian filtering method is used for improving the definition of the image frames, improving the compression rate of the transformer substation image video and ensuring the quality of the transformer substation image video.

(3) Intelligent image video analysis. According to the transformer substation image video, the movement of the object in the picture is detected through a foreground extraction technology, different behaviors such as electric wires, article leaving, perimeter and the like are distinguished through a pattern recognition technology, and the object needing to be monitored is subjected to targeted modeling, so that the specific object in the transformer substation image video is detected and applied in a related mode. Firstly, modeling is carried out on the levels of foreground pixels and background pixels according to decomposed multi-frame image frames, when a foreground object does not change, modeling is carried out through continuous N frames to generate a background model by using the probability density distribution of RGB pixels, and then an operator image in a motion state in the image is extracted. Meanwhile, based on the obtained foreground operator image, a pattern recognition technology based on shape features and color features is used for recognizing specific objects such as electric wires, safety helmets and the like, and if the distance between the operator and the power equipment exceeds a set threshold value, early warning of violation behaviors is carried out.

And S2, constructing a transformer substation power knowledge graph according to the power text data and the transformer substation image video data. The specific process is shown in fig. 3:

s21, firstly, carrying out knowledge extraction operation on text information such as operator information, operation site safety requirements and transformer equipment operation parameters of the transformer substation, and realizing the link of power cross-mode data;

according to input power text data and transformer substation image video data, natural language processing, word segmentation, part of speech tagging and syntactic analysis are carried out, a learning-based mode is selected, the power text data after sequence tagging is utilized to automatically extract entity relations by adopting a long-term and short-term memory network algorithm, entities with the same meaning are combined by utilizing a k-means clustering algorithm to realize entity disambiguation, and finally data are stored in a triple form, such as (power equipment, overhead lines), (overhead lines, towers and accessories), (towers and accessories, including insulators).

And performing cross-modal data linkage based on the entity relation data in the power field. Firstly, labeling the transformer substation image video data, linking the entity with the corresponding transformer substation image video data in the relational database through the serial numbers and the names in the relational database, realizing knowledge fusion under different modes, and forming an electric power knowledge base.

S22, the expert reviews the power knowledge base with the knowledge fusion completed, and modification and improvement are carried out according to the review opinions of the expert; and S23, according to the improved transformer substation power knowledge base, forming a complete transformer substation power knowledge map based on the Neo4j map database visualization power knowledge base. The transformer substation power knowledge graph is shown in fig. 4, circles represent transformer substation knowledge point entities, a relation among the entities is represented by 'inclusion' in the graph, the transformer substation power knowledge graph comprises basic information of operators, safety requirements of an operation site, operation parameter information of transformer equipment, a transformer substation three-dimensional map, operation qualification of the operators, knowledge of transformer equipment components and picture video data of other modes corresponding to all parts, and the relation among the constructed knowledge points is not limited to the 'inclusion' relation.

S3, constructing a real-time binocular stereo matching model based on an attention mechanism, and performing binocular stereo matching on a charged image video in the transformer substation power knowledge graph by using the attention mechanism based on the transformer substation power knowledge graph to increase the matching precision of a weak texture region;

the attention mechanism-based real-time binocular stereo matching model of the embodiment is composed of two main modules: an attention module and a disparity optimization module.

The network structure of the attention module is shown in fig. 5, the characteristics of the image are enhanced by using the channel attention and space attention module according to a video image data set in the substation power knowledge graph, the characteristics of the image frame are extracted by using 5 residual error modules, and 32-dimensional feature vectors at each pixel position in the down-sampled image are output to obtain a feature map. Different from the traditional end-to-end feature extraction method, a network introducing channel attention and spatial attention firstly compresses the spatial dimension of an input feature map, performs maximum pooling and average pooling operation, then performs down-sampling of image frames, uses 2 convolution kernels of 3 × 3, keeps the number of channels 32, obtains local channel attention cross channel correlation, obtains a channel attention feature map F1 through the channel attention, multiplies the feature map F1 by the feature map to obtain a first fusion feature map, applies the maximum pooling and average pooling operation along a channel axis on the basis of the first fusion feature map by the spatial attention, then performs convolution operation on the obtained feature input, finally obtains a spatial attention feature map F2 by using an activation function, and then multiplies the spatial attention feature map F2 by the first fusion feature map to obtain a final attention feature map. Through the channel attention and the space attention, the characteristics of the image such as edges, contours, contrast and the like are clearer, and the attention module establishes the correlation among different channels and the correlation among different positions of the same channel, so that the real-time binocular stereo matching model based on the attention mechanism gives more attention weight to a challenge area, and more details are kept.

The network structure of the parallax optimization module is shown in fig. 6. The obtained attention feature map is utilized, two-dimensional convolution blocks are continuously used for calculation, network parameters during training are reduced, the real-time binocular stereo matching model based on the attention mechanism is lighter, a mode of gradually amplifying a disparity map and sampling in three steps is selected, two feature maps are input firstly, convolution operation is carried out, a prediction result obtained through a residual block is combined with coarse-granularity disparity prediction, a high-resolution feature disparity map is finally obtained, the receptive field is increased, context information of multi-scale images is obtained, the reasoning speed of the real-time binocular stereo matching model based on the attention mechanism is improved, the real-time requirement is met, and meanwhile a high-quality disparity map is obtained.

And S4, detecting the near-electricity safety distance. And recovering the three-dimensional geometric information of the transformer substation operating personnel based on the obtained high-quality parallax map, calculating the depth information of the operating personnel and the transformer equipment in the image by utilizing triangulation to form a three-dimensional point cloud, and finally realizing the distance calculation of the operating personnel and the transformer equipment. The real-time stereo matching process is shown in fig. 7.

The embodiment also provides a binocular stereo matching detection device based on a channel attention mechanism, which comprises a binocular camera for image video acquisition, an image preprocessing module, a transformer substation power knowledge map module, a binocular stereo matching module, a near-electricity safety distance calculation module and an alarm module;

the near-electricity safety distance calculation module recovers three-dimensional geometric information of the transformer substation operating personnel based on the obtained high-quality parallax image, depth information of the operating personnel and the transformer equipment in the image is calculated by utilizing triangulation to form three-dimensional point cloud, and distance calculation of the operating personnel and the transformer equipment is finally achieved;

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

Claims

1. A binocular stereo matching detection method based on a channel attention mechanism is characterized by comprising the following steps:

s3, constructing a real-time binocular stereo matching model based on an attention mechanism, and performing binocular stereo matching on a charged image video in the transformer substation power knowledge graph by using the attention mechanism based on the transformer substation power knowledge graph; the real-time binocular stereo matching model based on the attention mechanism comprises an attention module and a parallax optimization module;

s4, detecting the near-electricity safety distance: and recovering the three-dimensional geometric information of the operating personnel of the transformer substation based on the obtained high-quality parallax map, calculating the depth information of the operating personnel and the transformer equipment in the image by utilizing triangulation to form a three-dimensional point cloud, and finally realizing the distance calculation of the operating personnel and the transformer equipment.

2. The binocular stereo matching detection method based on the channel attention mechanism as claimed in claim 1, wherein the image preprocessing comprises image rectification, image denoising and intelligent image video analysis.

3. The binocular stereo matching detection method based on the channel attention mechanism according to claim 2, wherein the image rectification comprises: decomposing the power video monitoring into a plurality of image frames, and using a correction algorithm based on contour extraction for the image frames with obvious edges; and (4) using a Hough line detection-based correction algorithm for image frames with unobvious edges but in regular arrangement.

4. The binocular stereo matching detection method based on the channel attention mechanism as claimed in claim 2, wherein the image denoising: aiming at the problems of unclear image frames and noise in the transformer substation image video, a Gaussian filtering method is used for processing.

5. The binocular stereo matching detection method based on the channel attention mechanism as claimed in claim 2, wherein the intelligent image video analysis: modeling the levels of foreground and background pixels according to the decomposed multi-frame image frames, and modeling through continuous N frames to generate a background model when a foreground object does not change by using the probability density distribution of RGB pixels, so as to extract an operator image in a motion state in the image; and meanwhile, based on the obtained foreground operator image, identifying a specific object by using a pattern recognition technology based on shape characteristics and color characteristics, and if the distance between the operator and the power equipment exceeds a set threshold value, early warning of violation behaviors is carried out.

6. The binocular stereo matching detection method based on the channel attention mechanism as claimed in claim 1, wherein a substation power knowledge graph is constructed as follows:

and S23, according to the improved transformer substation power knowledge base, visualizing the power knowledge base based on the Neo4j database to form a complete transformer substation power knowledge map.

7. The binocular stereo matching detection method based on the channel attention mechanism as claimed in claim 1, wherein the attention module first compresses spatial dimensions of an input feature map, performs maximum pooling and average pooling, then performs down-sampling of image frames, uses 2 convolution kernels of 3 × 3, keeps channel number 32, obtains local channel attention cross-channel correlation, obtains a channel attention feature map F1 through the channel attention, multiplies the feature map by the attention feature map F1 to obtain a first fused feature map, applies maximum pooling and average pooling along a channel axis on the basis of the first fused feature map for spatial attention, then performs convolution on the obtained feature input, finally obtains a spatial attention feature map F2 by using an activation function, and then multiplies the spatial attention feature map F2 by the first fused feature map to obtain a final attention feature map.

8. The binocular stereo matching detection device based on the channel attention mechanism is characterized by comprising a binocular camera for image video acquisition, an image preprocessing module, a transformer substation electric power knowledge map module, a binocular stereo matching module, a near-electric safety distance calculation module and an alarm module;

a real-time binocular stereo matching model based on a attention mechanism is built in the binocular stereo matching module, and the binocular stereo matching module performs binocular stereo matching on a charged image video in the transformer substation electric power knowledge map by using the attention mechanism;

9. The binocular stereo matching detection device based on the channel attention mechanism of claim 8, wherein the image preprocessing module comprises an image rectification module, an image denoising module and an intelligent image video analysis module.