CN116912774A

CN116912774A - Infrared image target identification method, electronic device and storage medium of power transmission and transformation equipment based on edge calculation

Info

Publication number: CN116912774A
Application number: CN202310926824.0A
Authority: CN
Inventors: 陆剑峰; 张可; 金炜; 王剑; 黄文礼; 侯仕杰; 姜文东; 刘爽
Original assignee: Anhui Nanrui Jiyuan Power Grid Technology Co ltd; State Grid Zhejiang Electric Power Co Ltd; Electric Power Research Institute of State Grid Zhejiang Electric Power Co Ltd; State Grid Electric Power Research Institute
Current assignee: Anhui Nanrui Jiyuan Power Grid Technology Co ltd; State Grid Zhejiang Electric Power Co Ltd; Electric Power Research Institute of State Grid Zhejiang Electric Power Co Ltd; State Grid Electric Power Research Institute
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-10-20

Abstract

The application provides an infrared image target identification method, an electronic device and a storage medium of power transmission and transformation equipment based on edge calculation. Acquiring an infrared image of power transmission and transformation equipment to be identified, and performing preprocessing and scaling to obtain a first image with a fixed size; identifying the first image according to a predetermined target identification network model; the method comprises the steps that a predetermined target identification network model is based on a network frame of YOLOv7 and deployed into an edge computing node, and the model comprises a Backbone network Backbone and a detection Head; acquiring a recognition result of the first image; the recognition result is a target prediction frame generated by a predetermined target recognition network model. Compared with the prior art, the residual error structure is added through the improved CA module, the characteristic representation of the corresponding object is added while the information storage capacity is improved, and therefore more effective characteristics are provided for the prediction of the Head part; and adding a CoT module into the three detection detectors, learning the context information of the input features, and improving the final classification accuracy.

Description

Infrared image target identification method, electronic device and storage medium of power transmission and transformation equipment based on edge calculation

Technical Field

The application relates to the technical field of target identification, in particular to an infrared image target identification method, an electronic device and a storage medium of power transmission and transformation equipment based on edge calculation.

Background

For the identification of power transmission and transformation equipment, a general model is difficult to design by the traditional algorithm to extract the characteristics of different equipment, and the accuracy of the identification of complex objects is greatly restricted due to the distance difference of the shooting angles of infrared images, the influence of illumination and complex background interference. In addition, conventional algorithms consume a significant amount of time, which is detrimental to the rapid and accurate completion of subsequent analyses of the device.

The current deep learning-based method mainly realizes the identification and detection of power transmission and transformation equipment through models such as a YOLO series or an R-CNN (Region-based Convolution Neural Networks) model, but the R-CNN model has higher precision, but divides the network into two stages, has complicated steps, and meanwhile, when a picture is extracted, due to the fact that a plurality of generated candidate regions exist, redundancy exists, the completion time can be prolonged, and the extraction difficulty of candidate frames in a complex scene is increased, so that the method cannot well meet the needs of partial real-time scenes.

Disclosure of Invention

In view of the above, an object of the embodiments of the present application is to provide a method for generating an identification model by adding an improved CA module and a CoT module to a YOLOv7 network, so as to improve the characterization capability of the network and the classification accuracy of power transmission and transformation equipment; the infrared image target recognition task of the power transmission and transformation equipment can be well completed by utilizing the network training model, so that the state of the power transmission and transformation equipment is monitored in real time, and the running reliability of the power transmission and transformation equipment and the utilization rate of resources are improved.

The first aspect of the application provides an infrared image target identification method of power transmission and transformation equipment based on edge calculation, which comprises the following steps:

acquiring an infrared image of power transmission and transformation equipment to be identified, and performing preprocessing and scaling to obtain a first image with a fixed size;

identifying the first image according to a target identification network model deployed among edge computing nodes; the predetermined target identification network model is based on a network framework of YOLOv7 and comprises a Backbone network Backbone and a detection Head; the Backbone network backhaul comprises two CA modules, wherein the two CA modules are positioned behind two convolution modules Conv; a CoT module is added into three detection modules in the Head part of the detection Head, and the CoT module fuses the feature graphs expressed by the static and dynamic contexts as output;

acquiring a recognition result of the first image; and the identification result is a target prediction frame generated by a predetermined target identification network model.

Preferably, the CA module is configured to perform one-dimensional horizontal global pooling and one-dimensional vertical global pooling on the feature information of the input, and then map the input between (0, 1) based on channel number addition, two-dimensional convolution, normalization and activation operations to facilitate weight distribution of the Residual connection information by the Re-weight module.

Preferably, the CoT module is configured to map the static context representation of the input data X to K ¹ Generating a new characteristic diagram K according to the contextualized attention matrix A ² The method comprises the steps of carrying out a first treatment on the surface of the Will static context K ¹ And dynamic context K ² The output of the CoT modules is generated by attention mechanism fusion.

Preferably, the predetermined object recognition network model further comprises a YOLOv7 self-contained module; the YOLOv7 self-contained module comprises ELAN, ELAN-H, MPConv, SPPCSPC, repConv; the ELAN is composed of a plurality of convolution layers, the size of the input and output characteristics is kept unchanged, the number of channels is changed after two convolutions are passed, and the channels are output as needed after the last convolution; ELAN-H is also made up of multiple convolutions, the operation of which is to take into account the results of the previous two convolutions when the ELAN is last Concat;

the number of channels of input and output of the MPConv module is equal, but the size of the output size is half of the input size, the size of the upper half is halved through MaxPool, the number of channels is halved through convolution at the lower half, the size is halved through convolution with a step length of 2, and the upper and lower parts are combined through a cat to obtain output;

the network structure of SPPCSPC is mainly composed of a convolution layer and MaxPool. The output layer channel of the whole SPPCSPC layer is out_c, and a hidden layer channel hidden_c=int (2×e×out_c) is calculated in training, which is used for expanding information quantity, and generally taking e=0.5, then hidden_c=out_c.

Preferably, the RepConv uses a different structure in training and reasoning and uses model re-parameterization techniques; during training, the RepConv consists of a 3*3 convolved branch and a 1*1 convolved branch, and if the number of channels input and output and the size of the size are consistent, a branch only with a BN layer is added, and the three branches are added for output; at the time of reasoning, in order to improve efficiency, parameters of branches are re-parameterized to main branches, and main branch convolution output of 3*3 is taken.

Preferably, the method further comprises a training process of a predetermined target recognition network model; the training process comprises the following steps: generating a training set and a testing set by using a plurality of infrared images of power transmission and transformation equipment; setting accuracy, recall rate and performance indexes of the IOU, and training an optimization model based on preset identification accuracy and the ratio of the prediction frame to the real frame IOU, so that the coincidence ratio of the prediction frame to the marking frame of the real equipment is high.

Further, a second aspect of the present application provides an electronic device including: one or more processors, memory for storing one or more computer programs; wherein the computer program is configured to be executed by the one or more processors, the program comprising method steps for performing the edge calculation based power transmission and transformation device infrared image target identification method as described in the first aspect.

Further, a third aspect of the present application provides a storage medium storing a computer program; the program is loaded and executed by a processor to implement the method steps of the infrared image target recognition method of the power transmission and transformation equipment based on edge calculation according to the first aspect.

In the scheme of the application, the infrared image of the power transmission and transformation equipment to be identified is obtained, and preprocessing is performed to zoom the infrared image into a first image with a fixed size; identifying the first image according to a predetermined target identification network model; the predetermined target identification network model is based on a network framework of YOLOv7 and comprises a Backbone network Backbone and a detection Head; the Backbone network backhaul comprises two CA modules, wherein the two CA modules are positioned behind two convolution modules Conv; a CoT module is added into three detection modules in the Head part of the detection Head, and the CoT module fuses the feature graphs expressed by static and dynamic contexts as output; acquiring a recognition result of the first image; and the identification result is a target prediction frame generated by a predetermined target identification network model. Compared with the prior art, by proposing an improved power transmission and transformation equipment infrared image target recognition algorithm CACoT-YOLOv7 of YOLOv7, the algorithm is based on YOLOv7, an improved CA module is added in a backlight of YOLOv7, compared with the original CA module, a residual structure is added, the information storage capacity is improved, and meanwhile, the characteristic representation of a corresponding object is increased, so that more effective characteristics are provided for prediction of a Head part; besides, a CoT module is added in the three detection detectors, the context information of input features is learned, the final classification precision is improved, high-precision target identification of power transmission and transformation equipment such as insulators, conducting wires, hardware fittings and transformer bushings is realized, and the purposes of improving the utilization efficiency of resources and the automation and intelligent level of a transformer substation are achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of the overall structure of a target recognition network model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a CA module according to an embodiment of the application;

FIG. 3 is a schematic diagram of the structure of a CoT module according to an embodiment of the present application;

FIG. 4 is a schematic diagram of performance metrics disclosed in an embodiment of the present application;

FIG. 5 is a schematic representation of a predicted image of a target recognition network model disclosed in an embodiment of the present application;

FIG. 6 is a schematic diagram of an edge computing architecture according to an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

It should be noted that: references herein to "a plurality" means two or more.

The implementation details of the technical scheme of the embodiment of the application are described in detail below:

noun interpretation of the present embodiment:

target detection model (YouOnly Look Oncev, YOLOv 7)

Power transmission and transformation equipment (Power transmission and transformation equipment)

Infrared Image (IR)

Target Detection (Object Detection)

Deep Learning (DL)

The national development rapidly causes the continuous increase of the demand of electricity consumption, and meanwhile, the power equipment covers all areas of the country, so that the probability of failure of the power equipment is improved. Because accidents caused by the power system occur continuously, the safety and the automation degree of the power equipment in China are more and more important. The power transmission and transformation equipment is an important component link of the power equipment, and the main manifestations of the failure of the power transmission and transformation equipment include mechanical damage, temperature rise and local electric field change. Traditional equipment monitoring needs experienced manual work, is high in cost, easily generates larger deviation, increases the intelligent difficulty degree of power inspection, and can effectively solve the problem through non-contact equipment monitoring. The non-contact equipment monitoring mainly analyzes the equipment condition through infrared images, has wide temperature measuring range and high accuracy, and is widely applied to the monitoring of power transmission and transformation equipment.

With the continuous development of infrared diagnosis technology, the infrared technology is very wide in the application field of power transmission and transformation equipment about temperature, and the heating problem of the equipment can be intuitively found through the infrared image of the equipment. Meanwhile, as the target detection technology is mature continuously, the processed infrared image of the power transmission and transformation equipment is combined with the target detection algorithm, so that the fault diagnosis of the power transmission and transformation equipment can be carried out quickly. The target detection method based on deep learning is currently applied to various image recognition fields, and can greatly improve recognition precision and speed by applying the method to detection of power transmission and transformation equipment, and is used for subsequent various analysis of the equipment, so that safe and stable operation of the power transmission and transformation equipment is ensured, and the safety coefficient of transformer substation work is further improved. Therefore, the method for identifying the power transmission and transformation equipment by using the target detection method based on deep learning in the transformer substation has obvious advantages, and the detection model trained by the infrared image data can realize automatic identification of a large number of infrared images of the transformer substation, so that the cost is reduced, and the safety benefit, the social benefit and the economic benefit of the transformer substation are effectively ensured.

The existing power transmission and transformation equipment detection algorithm is mainly divided into two types, namely a traditional method and a deep learning-based method. Conventional target detection algorithms mainly comprise Cascade+HOG/DPM+haar and the like, and many detection algorithms are improved according to the above algorithm. In general, the edge information of various devices, namely feature extraction, is generally extracted through an algorithm, and after the feature information is obtained, the classification of different devices is realized through an algorithm such as a support vector machine (Support Vector Machine, SVM) and an adaptive enhancement (Adaptive boosting). The method based on deep learning mainly adopts a model based on the YOLO series or an R-CNN algorithm, wherein the R-CNN is used for generating a plurality of small areas on an image through a segmentation algorithm, and combining according to the colors and structural characteristics of the small areas to obtain candidate frames. And then the target detection and classification are realized through the convolution neural network and the full connection layer after the picture size is unified. The YOLO series algorithm generally converts an input picture into 7*7 grids, predicts the grids through two bounding boxes, and finally screens the prediction results of the bounding boxes through maximum suppression to obtain a final prediction box. In any deep learning-based method, the infrared images marked with various equipment information are generally trained or the infrared images are firstly trained on other data sets to obtain a pre-trained model, and then the pre-trained model is subjected to migration learning, and a model can be obtained after training, so that the power transmission and transformation equipment can be identified and positioned more accurately and rapidly.

For the identification of power transmission and transformation equipment, a general model is difficult to design by the traditional algorithm to extract the characteristics of different equipment, and the accuracy of the identification of complex objects is greatly restricted due to the distance difference of the shooting angles of infrared images, the influence of illumination and complex background interference. In addition, conventional algorithms consume a significant amount of time, which is detrimental to the rapid and accurate completion of subsequent analyses of the device. The current method based on deep learning mainly realizes the identification and detection of power transmission and transformation equipment through a YOLO series or an R-CNN model, but the R-CNN has higher precision, but divides a network into two stages, has complicated steps, and simultaneously has a plurality of redundancy due to a plurality of generated candidate areas when extracting pictures, the completion time is prolonged, and the extraction difficulty of the candidate frames in complex scenes is increased, so that the method can not well meet the needs of partial real-time scenes. For the YOLO series, the detection and classification of targets can be completed directly through one network without dividing the network into two stages, the YOLOv2 is added with a regression prior frame mechanism relative to the YOLOv1, a new network structure dark-19 is used, the YOLOv3 is added with a residual network relative to the YOLOv2, and the target detection is performed by utilizing multi-scale feature fusion. YOLOv4 and YOLOv3 are not quite different in nature, and mainly CSP block (CSP module) is introduced to improve accuracy. YOLOv5 is faster based on the Pytorch framework and the profile is more user friendly. YOLOv7 uses a highly efficient aggregation network in combination with re-parameterization to accelerate the network while guaranteeing model performance, exceeding previously developed detectors in the range of 5FPS to 160FPS, both in speed and accuracy.

In addition, there are some efforts to combine the YOLO series with attention mechanisms including squeize-and-specification block (SE), bottleneck Attention Module (BAM), convolutional Block Attention Module (CBAM), etc. to improve the recognition capability of the network. However, the SE mechanism only takes into account the encoding of the inter-channel information, and ignores the importance of the location information, which is critical for capturing the target structure in visual tasks. The BAM and CBAM mechanisms attempt to reduce the channel dimension of the input tensor using location information and then calculate spatial attention using convolution, however convolution can only capture local relationships and cannot model long-term dependencies. Coordinate Attention (CA) mechanisms embed location information into channel attention, enabling mobile networks to focus on larger areas while avoiding significant computational overhead and improving classification accuracy. Contextual Transformer (CoT) is a self-attention mechanism that can leverage contextual information between input keys to guide learning of a dynamic attention matrix, as compared to the self-attention mechanism used in transformer, thereby enhancing the ability of visual characterization.

However, with continuous input of infrared images of power transmission and transformation equipment in actual application, the calculation requirement on a server is increased continuously. In order to solve the defects of high time delay, deficient real-time data analysis capability and the like in the traditional data processing mode, an edge computing technology is generated. The edge computing technology provides edge intelligent services nearby by integrating network, computing, storage and application core capabilities on the network edge side close to the object or data source. Jian Shandian, edge calculation is to directly perform aggressive analysis on the data collected from the terminal in a local device or network close to the data generation, and the data is not required to be transmitted to a cloud data processing center.

In order to realize high-precision target recognition of power transmission and transformation equipment such as insulators, conducting wires, hardware fittings and transformer bushings, the embodiment provides an infrared image target recognition algorithm of the power transmission and transformation equipment based on improved YOLOv7, which is called CACoT-YOLOv7, and is based on a network frame of YOLOv7, an improved CA module and a CoT module are added on the basis, the improved CA module improves the characterization capability of a backhaul part, and the CoT module utilizes the characteristics provided by the backhaul part to complete self-attention learning in the process of Head prediction. The two attention mechanisms and the YOLOv7 are combined, so that the classification and identification capacity of the network are improved, and the accurate positioning and identification of various power transmission and transformation devices in the infrared image can be completed. In addition, the trained model is deployed into the edge computing nodes, so that the deployment cost can be effectively reduced, and the information security can be improved.

Fig. 1 is a schematic diagram of the overall structure of the object recognition network model according to the present embodiment.

Specifically, from the overall structure of the network, except that Input is mainly split into two parts, namely backhaul and Head. The Backbone is used to extract features and the Head is used for prediction. The Conv module in the Backbone consists of two groups of conv+bn+silu, the two groups being identical in parameters except for the Stride of the Conv layer. Conv+BN+SiLU, represents convolutional layer+normalized+SiLU active layer.

In terms of flow, firstly, inputting a picture size into a fixed size into a backhaul network, then continuously outputting the picture size in a Head part according to three layers of feature pictures with different size in the backhaul network, and finally, predicting a frame after passing through a RepConv module to output a final result. Size refers to the width and height of the feature map, and Size refers to the adjustment of the input image Size.

The Bacbone part is provided with two improved CA modules, the positions of the improved CA modules are behind the two Conv modules, and as the CA modules are used for improving the characterization capability of the feature images and the feature images generated by the two Conv modules have different scales and depths, the two CA modules can act on the feature images of the input images from different scales, and the effectiveness of the space structure in the subsequent network is improved. And a CoT module is added to three detection modules in the Head part, and the CoT takes fusion of static and dynamic context representations as output, so that the context information of the feature images under different scales can be more fully utilized, and the classification precision is improved.

A first aspect of the present embodiment provides a method for identifying an infrared image target of a power transmission and transformation device based on edge calculation, where the method includes:

s1, acquiring an infrared image of power transmission and transformation equipment to be identified, and performing preprocessing and scaling to obtain a first image with a fixed size;

specifically, by adapting the picture resize to a fixed size of the recognition model, such as 640×480.

S2, identifying the first image according to a target identification network model deployed in the edge computing node.

Specifically, in this embodiment, the predetermined target recognition network model is based on a network frame of YOLOv7, and includes a Backbone network backhaul and a detection Head; the Backbone network backhaul comprises two CA modules, wherein the two CA modules are positioned behind two convolution modules Conv; a CoT module is added to three detection modules in the Head part of the detection Head, and the CoT module fuses the feature graphs of the static and dynamic context representations as output. CA is commonly referred to as Coordinate Attention (CA, coordinated attention), and CoT is commonly referred to as Contextual Transformer (CoT, context switch).

The model generated by the method is deployed in an edge computing node, the edge computing refers to data acquired from a terminal, the data is directly analyzed in local equipment or a network close to data generation, and the analysis refers to the prediction of various power transmission and transformation equipment in an input power transmission and transformation infrared image through the model. The trained model is deployed into the edge computing nodes, so that the identification of the local power transmission and transformation equipment with low delay, low cost and easy expansion is realized.

(a) CA module

In this embodiment, one-dimensional horizontal global pooling and one-dimensional vertical global pooling are operations on the input, and Residual connection information is only used for final weight allocation. The purpose of the channel number addition, two-dimensional convolution, normalization and activation operations is to accomplish feature screening through the attention mechanism. The Sigmoid layer completes the final mapping.

In particular to the structural details of the network, the improved CA module is denoted CA in fig. 1. The structure of the CA module is shown in FIG. 2.

As shown in fig. 2, where X Avg Pool and Y Avg Pool represent one-dimensional horizontal global pooling and one-dimensional vertical global pooling, respectively, the reason for using both pooling layers is that since global pooling is commonly used for channel attention, spatial information is globally encoded, but it compresses global spatial information into a channel descriptor, it is difficult to save location information, which is critical for identification of power transmission and transformation devices. To encourage the attention block to spatially capture long-term interactions with accurate location information, global pooling is broken down into a pair of dimensional feature encoding operations, namely X Avg Pool and Y Avg Pool.

Specifically, for input X, the size of the pooling kernel used is (H, 1) or (1, w), respectively, each channel being encoded along the horizontal and vertical coordinates, respectively. Thus, the output of the c-th channel at height h can be expressed as:

likewise, the output of the c-th channel of width w may also be written as:

wherein the method comprises the steps ofAnd->Representing the output of the c-th channel, x _c Representing the c-th channel of input X. The two transforms gather features along two spatial directions, respectively, to generate a pair of feature maps with direction sensing capability.

Concat is the addition of the channel numbers, conv2d is the two-dimensional convolution, batchNorm+non-linear is the normalization plus an activation function, sigmoid is the weight assignment of Re-weight to Residual connection information mapping the output between (0, 1). Compared with the original CA module, the improved CA module is added with two residual connections, so that information loss in Concat and BatchNorm layers is avoided, the Add operation is simple pixel superposition, the information quantity under the characteristic of the descriptive image is increased, but the dimension of the descriptive image is not increased, but the information quantity under each dimension is increased, and the classification of the final image is obviously beneficial.

(b) CoT module

In this embodiment, theA CoT module for mapping the static context representation of the input data X to K ¹ Generating a new characteristic diagram K according to the contextualized attention matrix A ² The method comprises the steps of carrying out a first treatment on the surface of the Will static context K ¹ And dynamic context K ² The output of the CoT modules is generated by attention mechanism fusion.

The added CoT module is denoted as CoT in fig. 1. The construction of the CoT module is shown in fig. 3.

As shown in fig. 3, wherein x represents a partial matrix multiplication operation. From a flow point of view, let the two-dimensional map of the input be X, keys (K), queries (Q) and values (V) be defined as:

K＝X (3)

Q＝X (4)

V＝XW _v (5)

wherein W is _v Is an embedding matrix. Key, query and Value are elements of Key, query and Value, respectively. From the flow of FIG. three, keyMap maps the representation of each key to K by a k×k group convolution ¹ ，K ¹ Is a static context representation of the input X, after which K is taken ¹ And performing a Concat operation with the Query to realize addition of the channel number. Then the implementation of the attention matrix:

A＝[K ¹ ,Q]W _θ W _δ (6)

wherein K is ¹ Is a key map, W _θ Is a 1 x 1 convolution with ReLu activation function, and W _δ Is a 1 x 1 convolution without an activation function, a is the attention matrix of the output. In other words, for each head of multi-head attention, the local attention matrix for each spatial location of a is learned based on Query features and contextualized Key features, rather than separate Query and Key correspondences. This method is performed in the mined static context K ¹ Enhancing self-attention learning under guidance of (c). Thereafter, a new feature map K is generated based on the contextualized attention matrix A ² The method is characterized by comprising the following steps:

K ² ＝V*A (7)

it can be seen that the characteristic diagram K ² Capturing dynamic feature interactions between inputs, K ² A dynamic context representation called an input. Finally, the output of the CoT module is to use the static context K ¹ And dynamic context K ² Generated by attention mechanism fusion.

(c) YOLOv7 self-contained module

In this embodiment, the YOLOv7 self-contained module includes ELAN, ELAN-H, MPConv, SPPCSPC, repConv. The ELAN is composed of a plurality of convolution layers, the size of the input and output characteristics is kept unchanged, the number of channels changes after passing through two convolutions, and the channels are output as needed after passing through the last convolution. It uses the ways of expanding, randomly grouping, merging cardinalities to achieve a continuous enhancement of the learning ability of the network without disrupting the original gradient path.

In terms of architecture, ELAN-H only changes the architecture in the computation block, while the architecture of the transition layer is completely unchanged, a strategy that uses group convolution to extend the channel and radix of the computation block. In addition, it integrates all the computation blocks for one computation layer using the same set of parameters and across channels. In addition to preserving the traditional ELAN design architecture, ELAN-H may also direct different sets of computing blocks to learn more diverse characteristics. ELAN-H is also made up of multiple convolution layers, which operate substantially the same as ELAN, taking into account the results of the previous two convolutions at the last Concat; concat is the addition of the number of channels. The size of the input and output characteristics of the device is kept unchanged after the last convolution, and the number of channels of the device is reduced to half after the two convolutions, and the number of the last channels is twice the number of the input channels. Its function is similar to ELAN.

The number of channels of the MPConv module is equal, but the size of the output size is half of the input size, the size of the upper half is halved through MaxPool, the number of channels is halved through convolution at first at the lower half, the size is halved through convolution with a step length of 2, and then the upper and lower parts are combined through cat to obtain the output. MaxPool is a max pooling operation.

The network structure of SPPCSPC is mainly composed of a convolution layer and MaxPool. The output layer channel of the whole SPPCSPC layer is out_c, and a hidden layer channel hidden_c=is calculated in training

int (2×e×out_c) is used for expanding the information amount, and generally, e=0.5 is taken, and hidden_c=out_c.

RepConv uses a different structure in training and reasoning and uses model re-parameterization techniques. The reparameterization technique can be regarded as an integration technique, and can be classified into two types of module-level integration and model-level integration. There are two common practices for model level re-parameterization in order to obtain the final inference model. One is to train multiple identical models with different training data and then average the weights of the multiple training models. Another approach is to weight average the weights of the model at different iteration times. Module level re-parameterization divides a module into multiple identical or different module branches during training and integrates multiple branch modules into one fully equivalent module during reasoning. Specifically, in the network structure, the training is composed of a 3*3 convolved branch and a 1*1 convolved branch, and if the number of channels input and output and the size of the size are consistent, a BN-layer-only branch is added, and the three branches add and output. At the time of reasoning, in order to improve efficiency, parameters of branches are re-parameterized to main branches, and main branch convolution output of 3*3 is taken.

S3, acquiring a recognition result of the first image; and the identification result is a target prediction frame generated by a predetermined target identification network model.

Preferably, the method further comprises a training process of a predetermined target recognition network model; the training process comprises the following steps: generating a training set and a testing set by using a plurality of infrared images of power transmission and transformation equipment; selecting accuracy, recall rate and mAP as performance indexes, and training an optimization model based on preset identification precision and the ratio of a prediction frame to a real frame IOU, so that the coincidence ratio of the prediction frame to a marking frame of actual equipment is higher

Here, the indexes are calculated by the trained model. Accuracy refers to the proportion of the true positive class in all the positive classes determined. Recall refers to the proportion of all true positive classes that are determined to be positive. IOU refers to the intersection ratio, representing the degree to which the real and predicted frames overlap. The selection of the training round number generally looks at the convergence degree of the model, and the training can be ended when various indexes are not obviously changed.

From the experimental flow, the method mainly comprises three steps, namely, the preparation of a data set, the training of the data set and the testing of a model. Specifically, the data set is mainly manufactured by marking power transmission and transformation equipment such as insulators, conducting wires, hardware fittings, transformer bushings and the like in the infrared image, and dividing the marked data set into a training set and a verification set in proportion. The training part is to put the data set into a CACoT-YOLOv7 network and set the parameters of the target type and model to be identified, and the model with the best effect is used for testing after training. The final test uses unlabeled pictures and obtains the accuracy of the model based on the drawn predicted and actual frames.

Our experiment used 5821 infrared images with 5182 pictures as the training set and 639 pictures as the test set. In training, our epoch was set to 170, whose performance index is shown in FIG. 4, with a precision of between 86% and 87%. [email protected] the average mAP of the classes with IOU (Intersection over Union, cross-over) threshold greater than 0.5, where IOU represents the intersection of the predicted and true frames and the ratio of the union of the predicted and true frames, [email protected] eventually stabilizes around 0.77. AP represents average accuracy, maps represent AP averaging for all classes.

The predicted image of the final model is shown in fig. 5, and it can be seen that the model trained by the CACoT-Yolov7 network can effectively identify various power transmission and transformation devices, and the overlap ratio of the predicted frame generated by the model and the marking frame of the actual device is higher.

Wherein the actual application of the model is deployed into the edge computing nodes. Edge computing is a distributed open platform that provides edge intelligence services nearby by fusing network, computing, storage, application core capabilities on the network edge side near the object or data source. The edge computing architecture is shown in fig. 6, in which the terminal node is a device for capturing an infrared image, and mainly completes the functions of collecting and uploading the original data. The edge computing node realizes basic service response by reasonably deploying and allocating computing and storage capacities of the network edge side nodes. The network node is responsible for uploading the useful data processed by the edge computing node to the cloud computing node for analysis and processing. The cloud computing node permanently stores the reported data of the edge computing layer in the cloud computing node, and meanwhile, the analysis task which cannot be processed by the edge computing node and the processing task of the comprehensive global information still need to be completed in the cloud computing node. In addition, the cloud computing node can dynamically adjust the deployment strategy and algorithm of the edge computing layer according to the network resource distribution.

According to the embodiment, the CACoT-YOLOv 7-based infrared image target identification of the power transmission and transformation equipment can realize accurate identification of various power transmission and transformation equipment, and compared with a traditional method and an existing network model based on deep learning, the improved CA module and CoT module are added into a YOLOv7 network, so that the representation capacity of the network and the classification precision of the power transmission and transformation equipment are improved. The infrared image target recognition task of the power transmission and transformation equipment can be well completed by utilizing the network training model, so that the state of the power transmission and transformation equipment is monitored in real time, the running reliability of the power transmission and transformation equipment and the utilization rate of resources are improved, and the method has important significance in reducing the maintenance cost.

Further, a second aspect of the present embodiment provides an electronic device including: one or more processors, memory for storing one or more computer programs; wherein the computer program is configured to be executed by the one or more processors, the program comprising method steps for performing the edge calculation based power transmission and transformation device infrared image target identification method as described in the first aspect.

Further, a third aspect of the present embodiment provides a storage medium storing a computer program; the program is loaded and executed by a processor to implement the method steps of the infrared image target recognition method of the power transmission and transformation equipment based on edge calculation according to the first aspect.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The elements described as separate components may or may not be physically separate, and as such, those skilled in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, where the elements and steps of the examples are generally described functionally in the foregoing description of the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a grid device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. An infrared image target identification method of power transmission and transformation equipment based on edge calculation is characterized by comprising the following steps:

2. The method for recognizing infrared image targets of power transmission and transformation equipment based on edge calculation according to claim 1, wherein the CA module is configured to perform one-dimensional horizontal global pooling and one-dimensional vertical global pooling on input feature information, and then map the input between (0, 1) based on channel number addition, two-dimensional convolution, normalization and activation operations to facilitate weight distribution of Residual connection information by Re-weight module.

3. The method for infrared image target recognition of a power transmission and transformation device based on edge calculation according to claim 2, wherein the CoT module is configured to map a static context representation of input data X to K ¹ Generating a new characteristic diagram K according to the contextualized attention matrix A ² The method comprises the steps of carrying out a first treatment on the surface of the Will static context K ¹ And dynamic context K ² The output of the CoT modules is generated by attention mechanism fusion.

4. The infrared image target recognition method of power transmission and transformation equipment based on edge calculation according to claim 3, wherein the predetermined target recognition network model further comprises a YOLOv7 self-contained module; the YOLOv7 self-contained module comprises ELAN, ELAN-H, MPConv, SPPCSPC, repConv; the ELAN is composed of a plurality of convolution layers, the size of the input and output characteristics is kept unchanged, the number of channels is changed after two convolutions are passed, and the channels are output as needed after the last convolution; ELAN-H is also made up of multiple convolution layers, the operation of which is to take into account the results of the previous two convolutions when the ELAN finally con-cate;

the network structure of SPPCSPC mainly comprises a convolution layer and MaxPool; the output layer channel of the whole SPPCSPC layer is out_c, a hidden layer channel hidden_c=int (2×e×out_c) is calculated in training, which is used for expanding information quantity, and taking e=0.5, then hidden_c=out_c.

5. The infrared image target recognition method of the power transmission and transformation equipment based on edge calculation according to claim 4, wherein the RepConv uses different structures during training and reasoning and uses a model re-parameterization technology; during training, the RepConv consists of a 3*3 convolved branch and a 1*1 convolved branch, and if the number of channels input and output and the size of the size are consistent, a branch only with a BN layer is added, and the three branches are added for output; at the time of reasoning, in order to improve efficiency, parameters of branches are re-parameterized to main branches, and main branch convolution output of 3*3 is taken.

6. The method for identifying the infrared image targets of the power transmission and transformation equipment based on edge calculation according to claim 5, wherein the method further comprises a training process of a preset target identification network model; the training process comprises the following steps: generating a training set and a testing set by using a plurality of infrared images of power transmission and transformation equipment; setting accuracy, recall rate and performance indexes of the IOU, and training an optimization model based on preset identification accuracy and the ratio of the prediction frame to the real frame IOU, so that the coincidence ratio of the prediction frame to the marking frame of the real equipment is high.

7. An electronic device, the electronic device comprising: one or more processors, memory for storing one or more computer programs; characterized in that the computer program is configured to be executed by the one or more processors, the program comprising method steps for performing the edge calculation based infrared image object recognition method of a power transmission and transformation device as claimed in any one of claims 1-6.

8. A storage medium storing a computer program; the program is loaded and executed by a processor to implement the method steps of the edge calculation based infrared image object recognition method for power transmission and transformation equipment as claimed in any one of claims 1 to 6.