CN115565232A

CN115565232A - Power distribution room switch cabinet face part identification method based on improved YOLOv5 algorithm

Info

Publication number: CN115565232A
Application number: CN202211299619.8A
Authority: CN
Inventors: 陈泽涛; 陈申宇; 苏崇文; 刘秦铭; 王增煜; 陈志健; 芮庆涛
Original assignee: Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-01-03

Abstract

The invention discloses a method for identifying a cabinet face part of an electric room switch cabinet based on an improved YOLOv5 algorithm, which comprises the following steps: acquiring a switch cabinet image through a camera in a power distribution room to obtain a switch cabinet image data set; carrying out data enhancement on the image data set of the switch cabinet to obtain an enhanced data set, and dividing the enhanced data set into a training set and a test set; constructing an improved YOLOv5 network model, comprising: firstly, improving a network structure, and adding a small target detection layer; using an AF-FPN structure to replace an FPN structure in a neutral network, and finally using an SIOU loss function as a loss function of a bounding box; iteratively training the improved YOLOv5 network model on the training set until convergence, and storing the optimal network weight; and loading the optimal network weight into an improved YOLOv5 network model, testing by using a test set, and outputting a counter part identification result. The improved YOLOv5 algorithm is used, the cabinet surface parts are accurately identified and performance evaluation is carried out under the conditions of complex background and small target, and the cabinet surface part detection method has a good detection and identification effect.

Description

Power distribution room switch cabinet surface part identification method based on improved YOLOv5 algorithm

Technical Field

The invention belongs to the technical field of digital image processing, and particularly relates to a distribution room switch cabinet face part identification method based on an improved YOLOv5 algorithm.

Background

With the increasing demand of production and life for electric power energy, the requirements on the quality of power supply (stability, non-intermittence and accompanying services) provided by a power supply department are higher and higher; the distribution room is the most important power supply node in the power grid, and the number of the power supply nodes is numerous, and the power supply nodes are distributed and distributed widely in regions. In order to ensure the normal operation of a power grid, inspection personnel need to perform daily inspection and maintenance operation on equipment, but due to the fact that field operation environments and personnel are complex, multiple kinds of cross operation are performed, multiple cooperation parties are provided, the problems that operation points are scattered, labor cost of field supervision is high, operation efficiency is low, operation field management is difficult, operation precautionary measures are not in place, abnormal conditions cannot be timely alarmed and processed and the like can occur.

Safety production is the first element of the survival and development of enterprises, and although the electric power safety work regulations strictly stipulate the safety requirements to be followed when relevant personnel enter a work site, the site work environment is complex, a plurality of unpredictable dangers exist, and a large potential safety hazard is brought. Although the video monitoring can be carried out on the operation site at present, a large amount of video and picture data brought by the video monitoring also have new problems; because the shot video and picture information still need to be checked manually, the backgrounds of different electric room images are often complex, the sizes are different, visual fatigue is easy to occur to workers during checking, the conditions of missing detection and false detection occur, and the efficiency is reduced. Therefore, it is a research trend to automatically detect the counter picture by using the image processing technology.

With the development of an image processing technology, deep learning makes good progress in the field of target detection, and the current classical target detection algorithm is mainly divided into a single stage and a double stage, wherein the single stage comprises YOLO, SDD, retina-Net and the like, and the double stage comprises R-CNN, fast R-CNN, mask R-CNN and the like; the YOLO algorithm is fastest in operation speed and high in accuracy, and is suitable for detecting cabinet surface parts such as disconnecting link switches and indicator lamps on cabinet surfaces of switch cabinets in power distribution rooms, but due to the fact that different electric room images are complex in background, shielding exists, the cabinet surface parts are small, recognition effects are not good, and further research is still needed.

Disclosure of Invention

The invention mainly aims to overcome the defects of the prior art and provide a distribution room switch cabinet surface part identification method based on an improved YOLOv5 algorithm, the invention collects switch cabinet images, uses the improved YOLOv5 algorithm to accurately identify and evaluate the surface part under the conditions of complex background and small target, has good detection and identification effects and provides technical support for intelligent construction of a distribution room.

In order to achieve the purpose, the invention adopts the following technical scheme:

a distribution room switch cabinet face part identification method based on an improved YOLOv5 algorithm comprises the following steps:

acquiring a switch cabinet image through a camera in a power distribution room to obtain a switch cabinet image data set;

carrying out data enhancement on the image data set of the switch cabinet to obtain an enhanced data set, and dividing the enhanced data set into a training set and a test set;

constructing an improved YOLOv5 network model, comprising: firstly, improving a network structure, and adding a small target detection layer; using an AF-FPN structure to replace an FPN structure in a neutral network, and finally using an SIOU loss function as a loss function of a bounding box;

iteratively training the improved YOLOv5 network model on the training set until convergence, and storing the optimal network weight;

and loading the optimal network weight into an improved YOLOv5 network model, testing by using a test set, and outputting a counter part identification result.

As a preferred technical scheme, a Mosaic-9 data enhancement method is adopted for data enhancement of a switch cabinet image data set to obtain an enhanced data set, and the method specifically comprises the following steps:

adjusting all switch cabinet images in the switch cabinet image data set to be uniform in size;

randomly taking out n pictures from the image data set of the switch cabinet, randomly cutting, zooming, randomly arranging and splicing into one picture, repeating the Batch-size times to obtain a Batch-size mosaic data enhanced picture, and storing the enhanced picture into the image data set of the switch cabinet to obtain an enhanced data set.

As a preferred technical solution, the improved YOLOv5 network model includes an input layer, a backhaul network, a Neck network, and a prediction layer;

the AF-FPN structure is additionally provided with an adaptive attention module AAM and a feature enhancement module FEM on the basis of the FPN structure;

the adaptive attention module AAM is used for reducing the loss of the context information in the feature channel and the high-level feature map;

the feature enhancement module FEM is used for enhancing the representation of the feature pyramid and improving the reasoning speed.

As a preferred technical solution, the iterative training of the improved YOLOv5 network model on the training set specifically includes:

inputting the training set into a backhaul network through an input layer, and extracting feature mapping of the training set;

acquiring a feature map of a training set in a Neck network according to the feature mapping of the training set;

predicting in a prediction layer based on a feature map of a training set, calculating loss, and updating model parameters;

retraining until the loss function converges or the maximum iteration times is reached, and storing the optimal network weight.

As a preferred technical solution, the feature map of the training set is represented as { C1, C2, C3, C4, C5};

the method comprises the following steps that the neutral network adopts top-down sampling operation and low-level feature mapping fusion to obtain a feature map of a training set, and specifically comprises the following steps:

generating a feature map M5 from the feature map C5 through an adaptive attention module AAM, fusing the feature map M5 with the feature map C5, and inputting the fused feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P5;

generating a feature map M4 from the feature map P5 through downsampling operation, fusing the feature map M4 with the feature map C4, and inputting the feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P4;

and generating a feature map M3 from the feature map P4 through downsampling operation, fusing the feature map M3 with the feature map C3, and inputting the fused feature map into a feature enhancement module FPM for feature enhancement to obtain the feature map P3.

As a preferred technical solution, the generating the feature mapping M5 by the feature mapping C5 through the adaptive attention module AAM specifically includes:

for the input feature mapping C5, firstly, obtaining 3 semantic features with different scales through a self-adaptive pooling layer;

carrying out convolution operation on the semantic features with 3 different scales by using 1 x 1 convolution to obtain the same channel dimension;

performing up-sampling operation on 3 semantic features with different scales by using a bilinear interpolation method, and merging channels through a Concat layer to obtain a feature map;

sequentially passing the feature graph through a 1 × 1 convolutional layer, a ReLU active layer, a 3 × 3 convolutional layer and a sigmoid active layer to generate a space weight of the feature graph;

and after the generated space weight and the feature map are subjected to Hadamard product operation, separating into 3 new context feature representations, and performing matrix sum operation with the input feature map C5 to obtain a feature map M5.

As a preferred technical solution, the feature enhancement module FPM adaptively learns different receptive fields in each feature map, including a multi-branch convolution layer and a branch pooling layer, by using dilation convolution;

the multi-branch convolution layer provides different sizes of receptive fields for the input characteristic diagram through expansion convolution, wherein the receptive fields comprise expansion convolution, a BN layer and a ReLU activation layer; the dilation convolutions in the multi-branch convolutional layer have the same kernel size but different dilation rates; the receptive field formula of the dilated convolution is:

r ₁ ＝d×(k-1)+1

r _n ＝d×(k-1)+r _n-1

wherein k represents the convolution kernel size, r represents the expansion rate, n represents the number of expansion convolutions, and d represents stride of the convolution;

the branch pooling layer fuses receptive field traffic information of different multi-branch convolutional layers by utilizing average operation, so that multi-scale precision prediction is improved, and the expression is as follows:

wherein, y _p Represents the output of the branch pooling layer; b represents the number of branches of the multi-branch convolutional layer, y _i Represents the output of the ith branch convolution layer.

As a preferred technical scheme, the prediction layer performs prediction by adopting a bottom-up upsampling operation and high-level feature map fusion, namely performing upsampling operation on a feature map P3, performing upsampling operation after fusion with a feature map P4, and performing fusion with a feature map P5 to obtain an algorithm detection frame of the counter component;

comparing the algorithm detection frame of the counter part with the actual marking frame to obtain an intersection ratio, and constructing an SIOU loss function;

the intersection ratio formula is expressed as:

wherein B represents an algorithm detection frame of the counter part, B ^GT Actual labeling boxes representing the cabinet components;

the SIOU loss function consists of four parts, namely an Angle cost Angle function, a Distance cost function, a Shape cost function and an IoU cost function;

the Angle cost Angle function is defined as:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

representing the distance of the actual label box on the x-axis,

to represent the distance of the algorithm detection box on the x-axis,

to actually label the distance of the box on the y-axis,

detecting the distance of the box on the y axis for the algorithm;

the Distance cost function is defined as:

wherein, the first and the second end of the pipe are connected with each other,

γ＝2-Λ，c _w the difference value of the actual marking box and the algorithm detection box in the x-axis direction is obtained;

the Shape cost function is defined as:

w represents the width of the algorithm detection box, H represents the height of the algorithm detection box, W ^Gt Indicates the actual width of the label box, H ^Gt Represents the actual height of the labeled box, theta is equal to 2,6]Representing a degree of attention;

the IoU cos function is defined as:

the SIOU loss function is expressed as:

L＝W _box L _box +W _cls L _cls

wherein, W _box Representing the algorithm detection box weight, W _cls Represents the classification loss weight, L _cls Indicates focal length; the algorithmic detection box weights and classification loss weights are calculated using genetic algorithms on different datasets.

As a preferred technical solution, the method further comprises: after the counter part identification result is obtained, evaluating the performance of the network model by using the indexes of accuracy, recall rate and average accuracy rate mean value;

the accuracy calculation formula is as follows: precision = TP/(TP + FP) = TP/all detections;

the recall ratio calculation formula is as follows: recall = TP/(TP + FN) = TP/all groups clusters

The average accuracy mean value calculation formula is as follows:

the method comprises the following steps that TP is an area with correct algorithm detection frame prediction, FP is an area with wrong algorithm detection frame prediction, and FN is an area with correct actual marking frame but not predicted by the algorithm detection frame; all detections are prediction areas of the algorithm detection box, all clusters are actual areas of the actual labeling box, r represents a recall rate, rho (r) is a precision value of the recall rate r, and rho (r) is a precision value of the recall rate r _interp (r _n+1 ) When the recall rate is greater than or equal to r, the highest precision value in the precision values rho (r) is corresponded.

As a preferred technical scheme, the cabinet surface component comprises an operation state warning board, a switch disconnecting link, a grounding disconnecting link and an indicator light;

the pooling coefficient of the adaptive pooling layer in the adaptive attention module AAM is [0.1,0.5], the channel dimension is 256;

the kernel size of the dilation convolution of the multi-branch convolution layer in the feature enhancement module FPM is 3 x 3, and dilation rates d of different branches are 1, 3 and 5 respectively.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. according to the method, the Mobile-9 data is adopted for enhancing, a data set is enriched, the network training speed is improved, meanwhile, random noise data are reasonably added, the small target sample distinguishing of a network model is enhanced, and the generalization effect of the model is improved.

2. The YOLOv5 network structure is improved, a small target detection layer is added, a shallow feature map and a deep feature map are spliced and then detected, the small target detection effect is improved, AF-FPN is used for replacing the original FPN in the Neck part, the loss of context information in the feature maps is reduced, the representation of a feature pyramid is enhanced, and the reasoning speed is improved.

3. And improving a prediction box formula, replacing the original CIOU Loss with the SIOU Loss to serve as a Loss function of the bounding box, and improving the training speed and the reasoning accuracy compared with the original CIOU Loss.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of an electrical room switch cabinet face part identification method based on an improved YOLOv5 algorithm in an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for enhancing Mosaic-9 data according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an improved YOLOv5 network model in an embodiment of the present invention;

FIG. 4 is a schematic diagram of the AF-FPN structure in the embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an adaptive attention Module AAM according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a feature enhancement module FEM according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of the cross-over ratio IoU according to the embodiment of the invention;

FIG. 8 is a schematic diagram of an Angle cost Angle function according to an embodiment of the present invention;

FIG. 9 is a graph illustrating Angle cost as a function of Angle according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a Shape cost function in an embodiment of the invention;

fig. 11 is a schematic diagram of an identification result in the embodiment of the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by a person skilled in the art that the embodiments described herein can be combined with other embodiments.

As shown in fig. 1, the method for identifying a surface part of a switch cabinet of a power distribution room based on an improved YOLOv5 algorithm in the embodiment includes the following steps:

s1, acquiring a switch cabinet image through a camera in a power distribution room to obtain a switch cabinet image data set;

the invention discloses a switch cabinet image data collection system, which is characterized in that a camera is arranged in a power distribution room at present and is used for monitoring the state of the power distribution room and the information of personnel entering and exiting the power distribution room.

In the embodiment, under different backgrounds and different counter targets, 1258 or more switch cabinet images are obtained from the camera, and the richness is expanded as much as possible.

S2, performing data enhancement on the image data set of the switch cabinet to obtain an enhanced data set, and dividing the enhanced data set into a training set and a test set;

the identification and detection of the cabinet surface components are facilitated, data enhancement needs to be performed on the image data set of the switch cabinet to enrich the data set, and in this embodiment, data enhancement is performed by adopting a Mosaic-9 data enhancement method as shown in fig. 2, specifically:

adjusting the sizes of all switch cabinet images in the switch cabinet image data set to be uniform;

In the embodiment, 9 pictures are randomly selected from the image data set of the switch cabinet each time for data enhancement, so that the small sample target is increased while the data set is enriched, and the training speed of the network is increased; and random noise data are reasonably added, so that the small target samples can be distinguished by the network model, and the generalization effect of the model is further improved. After the enhanced data set is obtained, the enhanced data set is divided into a training set and a testing set according to the proportion of 8: 2.

S3, constructing an improved YOLOv5 network model, which comprises the following steps: firstly, improving a network structure, and adding a small target detection layer; using an AF-FPN structure to replace an FPN structure in a neutral network, and finally using an SIOU loss function as a loss function of a bounding box;

as shown in fig. 3, a YOLOv5 network model is constructed based on a YOLOv5 algorithm, and then is improved, firstly, the network structure is improved, a small target detection layer is added, a shallow feature map and a deep feature map are spliced and then detected, although the increase of the detection layer causes the reduction of the inference detection speed, the detection effect on the small target is improved; then, an AF-FPN structure is used in the Neck network to replace an original FPN (characteristic pyramid) structure, and as shown in fig. 4, the AF-FPN structure is formed by adding an Adaptive Attention Module (AAM) and a characteristic enhancement module (FEM) on the basis of a traditional characteristic pyramid network (FPN); reducing loss of context information in feature channels and high-level feature maps using AAM; the representation of the FEM enhanced feature pyramid is used, the reasoning speed is improved, and the recognition performance is improved.

S4, iteratively training the improved YOLOv5 network model on the training set until convergence, and storing the optimal network weight, wherein the method specifically comprises the following steps:

s41, inputting the training set into a backhaul network through an input layer, and extracting feature mapping of the training set;

as shown in fig. 4, after the training set is subjected to multiple convolutions in the backhaul network, the generated feature map is { C1, C2, C3, C4, C5};

s42, acquiring a feature map of the training set in the Neck network according to the feature mapping of the training set;

the hack network adopts top-down sampling operation and low-level feature mapping fusion, and uses the FEM module to expand the receptive field, and obtains the feature map of the training set, as shown in fig. 4 specifically:

and generating a feature map M3 from the feature map P4 through downsampling operation, fusing the feature map M3 with the feature map C3, and inputting the feature map into a feature enhancement module FPM for feature enhancement to obtain the feature map P3.

The specific principle of the adaptive attention module AAM generating the feature map M5 is shown in fig. 5:

for the input feature mapping C5, the size is S = h × w, and firstly, semantic features of 3 different scales are obtained through a self-adaptive pooling layer; the pooling factor in this embodiment is [0.1,0.5], which varies adaptively according to the target size of the data set.

Carrying out convolution operation on the semantic features with 3 different scales by using 1 x 1 convolution to obtain the same channel dimension; 256 in this example;

performing up-sampling operation on 3 semantic features with different scales by using a bilinear interpolation method, up-sampling to S scale, and merging channels through a Concat layer to obtain a feature map;

after the generated space weight and the feature map are subjected to Hadamard product operation, the space weight and the feature map are separated into 3 new context feature representations and are subjected to matrix sum operation with the input feature map C5 to obtain a feature map M5, and the finally obtained feature map M5 has rich multi-scale context information, so that information loss caused by reduction of the number of channels is relieved to a certain extent.

Meanwhile, as shown in fig. 6, the feature enhancement module FPM adaptively learns different receptive fields in each feature map by using the dilation convolution, thereby improving the accuracy of multi-scale target detection and identification, which includes a multi-branch convolution layer and a branch pooling layer;

the multi-branch convolution layer provides different-size receptive fields for the input feature map through expansion convolution, wherein the receptive fields comprise expansion convolution, a BN layer and a ReLU activation layer; wherein the dilation convolutions have the same kernel size but different dilation rates; specifically, the kernel size of each dilation convolution is 3 × 3, and the dilation rates d of different branches are 1, 3, and 5, respectively; expanding the convolution supports an exponentially expanded receptive field without loss of resolution; in the convolution operation of the dilation convolution, since the elements of the convolution kernel are spaced, the size of the space depends on the dilation rate; the expansion rate of the normal convolution is 1, which means that no gap exists before each element of the convolution kernel, and the normal convolution is a special cavity convolution in a broad sense; when r =2, it indicates that each element of the convolution kernel is preceded by a gap, i.e. the position difference between two adjacent elements is 1; therefore, the elements of the convolution kernels in the expanding convolution and the standard convolution operation are all adjacent and different, and when the expanding convolution kernel is changed from 3 × 3 to 7 × 7, the receptive field size of the layer is 7 × 7;

the receptive field formula of the dilated convolution is:

r ₁ ＝d×(k-1)+1

r _n ＝d×(k-1)+r _n-1

the branch pooling layer fuses the receptive field traffic information of different multi-branch convolution layers by utilizing average operation, so that additional parameters are prevented from being introduced, multi-scale precision prediction is improved, and the expression is as follows:

S43, predicting in a prediction layer based on the feature graph of the training set, calculating loss, and updating model parameters;

as shown in fig. 4, the prediction layer performs prediction by using bottom-up upsampling operation and high-level feature map fusion, that is, performing upsampling operation on a feature map P3, performing upsampling operation after fusion with the feature map P4, and performing fusion with a feature map P5 to obtain an algorithm detection frame of the counter component;

as shown in fig. 7, comparing the algorithm detection box of the counter component with the actual labeling box to obtain an intersection ratio, and constructing an SIOU loss function; the intersection-parallel ratio calculation formula is as follows:

wherein B in the figure represents an algorithm detection frame of the counter part, B ^GT Actual labeling boxes representing the cabinet components;

in the invention, an SIOU Loss function is used for replacing the original CIOU Loss to be used as a Loss function of a bounding box, and a network model is optimized; the SIOU loss function consists of four parts, namely an Angle cost Angle function, a Distance cost function, a Shape cost function and an IoU cost function;

as shown in FIG. 8, B represents the algorithm detection box of the counter component, B ^GT Actual reference boxes indicating counter parts, when B to B ^GT When the included angle is smaller than alpha, the convergence is towards the minimum alpha, otherwise, the convergence is towards beta; therefore, the number of variables in relation to distance can be reduced to the maximum extent by using the Angle cost Angle function; the improved YOLOv5 network model in this embodiment first brings the algorithmic detection box to the X or Y axis (whichever is closest), and then continues the approach along the relevant axis if one is

The convergence process will first minimize α, otherwise minimize β:

the plot of the Angle cost Angle function is shown in FIG. 9, and thus the Angle cost Angle functionThe number is defined as:

wherein the content of the first and second substances,

representing the distance of the actual label box on the x-axis,

to represent the distance of the algorithm detection box on the x-axis,

to actually label the distance of the box on the y-axis,

detecting the distance of the box on the y axis for the algorithm;

the Distance cost function is redefined in view of the Angle cost function defined above as:

wherein the content of the first and second substances,

γ＝2-Λ，c _w the difference value of the actual marking box and the algorithm detection box in the x-axis direction is obtained; when α → 0, the contribution rate of Distance cost is greatly reduced; conversely, the Distance cost contribution rate is increased as α approaches π/4The larger; since the distance value to which γ is given time priority has a large influence as the angle increases, the distance value increases as the angle increases.

As shown in FIG. 10, the Shape cost function is defined as:

w represents the algorithm detection box width, H represents the algorithm detection box height, W ^Gt Indicates the actual width of the label box, H ^Gt Represents the actual height of the labeled box, theta ∈ [2,6 ]]Representing a degree of attention; the value of θ defines the Shape cost of each data set and its value is unique; the value of θ is a very important term in this equation, which controls the degree of interest in Shape cost; if the value of θ is set to 1, then one Shape will be optimized immediately, compromising Shape's freedom of movement; therefore calculating the value of θ a genetic algorithm is used for each data set, θ =4 in this example.

The IoU cos function is defined as:

the resulting SIOU loss function is expressed as:

L＝W _box L _box +W _cls L _cls

wherein, W _box Represents the algorithm detection box weight, W _cls Represents the classification loss weight, L _cls The representation is focal loss.

Calculating and obtaining the weight of an algorithm detection frame and the classification loss weight on different data sets by adopting a genetic algorithm; the weight W of the algorithm detection box in this embodiment _box =0.35, classification loss weight W _cls ＝0.65。

And S44, retraining until the loss function converges or the maximum iteration number is reached, and storing the optimal network weight.

And S5, loading the optimal network weight into an improved YOLOv5 network model, testing by using a test set, and outputting a counter part identification result.

In order to evaluate the recognition result, the embodiment also uses the indexes of accuracy, recall rate and average accuracy rate mean value to evaluate the performance of the network model;

as shown in fig. 7, the intersection ratio IoU = TP/(TP + FN + FP), and the IoU is the overlapping area of the actual labeling box and the algorithm detection box divided by their merging area, which is the accuracy of measuring whether the algorithm detection box accurately finds the target object; the method comprises the following steps that TP is an area with correct prediction of an algorithm detection frame, FP is an area with wrong prediction of the algorithm detection frame, FN is an area with correct actual marking frame but not predicted by the algorithm detection frame, and TN is an area without the actual marking frame and the algorithm detection frame;

wherein, the accuracy calculation formula is: precision = TP/(TP + FP) = TP/all detections;

the recall ratio calculation formula is: recall = TP/(TP + FN) = TP/all groups

The average accuracy mean value calculation formula is as follows:

wherein all detections are prediction areas of the algorithm detection box, all groups are actual areas of the actual labeling box, r represents a recall rate, rho (r) is a precision value of the recall rate r, and rho (r) is a precision value of the recall rate r _interp (r _n+1 ) When the recall rate is greater than or equal to r, the highest precision value in the precision values rho (r) is corresponded.

As shown in fig. 11, in the embodiment, as a result of identifying the cabinet surface component in an image of a certain switch cabinet, it can be seen that the invention can accurately identify and label the cabinet surface components such as an operation state warning board, a switch disconnecting link, a grounding disconnecting link, an indicator light, and the like, and has important significance for maintenance and intelligent construction of a power distribution room.

It should be noted that for simplicity and clarity of description, the above-described method embodiments have been presented as a series of interrelated steps, but it should be appreciated by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the invention.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims

1. A distribution room switch cabinet face part identification method based on an improved YOLOv5 algorithm is characterized by comprising the following steps:

2. The improved YOLOv5 algorithm-based part identification method for the switch cabinet surface of the power distribution room, as claimed in claim 1, is characterized in that a Mosaic-9 data enhancement method is adopted to enhance the data of the switch cabinet image data set, so as to obtain an enhanced data set, specifically:

and randomly taking n pictures from the image data set of the switch cabinet, randomly cutting, zooming, randomly arranging and splicing the n pictures into one picture, repeating the Batch-size times to obtain a Batch-size mosaic data enhanced picture, and storing the enhanced picture into the image data set of the switch cabinet to obtain an enhanced data set.

3. The method for identifying components of a switch cabinet of a power distribution room based on the improved YOLOv5 algorithm in claim 1, wherein the improved YOLOv5 network model comprises an input layer, a backhaul network, a hack network and a prediction layer;

4. The method for identifying components of a switch cabinet face of a power distribution room based on an improved YOLOv5 algorithm as claimed in claim 3, wherein the improved YOLOv5 network model is iteratively trained on a training set, specifically:

5. The method for identifying components of a switchgear cabinet of a power distribution room based on the improved YOLOv5 algorithm as claimed in claim 4, wherein the feature map of the training set is represented as { C1, C2, C3, C4, C5};

generating a feature map M5 by the feature map C5 through an adaptive attention module AAM, fusing the feature map M5 with the feature map C5, and inputting the fused feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P5;

6. The method for identifying the surface component of the switch cabinet of the power distribution room based on the improved YOLOv5 algorithm as claimed in claim 5, wherein the feature map C5 is generated into the feature map M5 by the adaptive attention module AAM, specifically:

after the generated space weight and the feature map are subjected to Hadamard product operation, the space weight and the feature map are separated into 3 new context feature representations, and matrix sum operation is performed on the context feature representations and the input feature map C5 to obtain a feature map M5.

7. The improved YOLOv5 algorithm-based part identification method for the switch cabinet face of the power distribution room, characterized in that the feature enhancement module FPM adaptively learns different receptive fields in each feature map, including a multi-branch convolution layer and a branch pooling layer, by using a dilation convolution;

the multi-branch convolution layer provides different sizes of receptive fields for the input feature map through expansion convolution, wherein the receptive fields comprise expansion convolution, a BN layer and a ReLU activation layer; the expansion convolutions in the multi-branch convolution layer have the same kernel size but different expansion rates; the receptive field formula of the dilated convolution is:

r ₁ ＝d×(k-1)+1

r _n ＝d×(k-1)+r _n-1

wherein k represents the convolution kernel size, r represents the expansion rate, n represents the number of expansion convolutions, and d represents stride of convolution;

8. The method for identifying the surface parts of the switch cabinets of the power distribution rooms based on the improved YOLOv5 algorithm, as claimed in claim 5, is characterized in that the prediction layer performs prediction by using bottom-up upsampling operation and high-level feature map fusion, namely performing upsampling operation on a feature map P3, performing upsampling operation after the feature map P4 is fused, and performing fusion with the feature map P5 to obtain the algorithm detection frame of the surface parts;

the intersection ratio formula is expressed as:

wherein B represents the algorithm detection box of the cabinet component, B ^GT Actual labeling boxes representing the cabinet components;

the Angle cost Angle function is defined as:

wherein the content of the first and second substances,

representing the distance of the actual label box on the x-axis,

detection of boxes on the x-axis for representation of algorithmsThe distance of (a) to (b) is,

to actually label the distance of the box on the y-axis,

detecting the distance of the box on the y axis for the algorithm;

the Distance cost function is defined as:

wherein the content of the first and second substances,

the Shape cost function is defined as:

wherein the content of the first and second substances,

the IoU cos function is defined as:

the SIOU loss function is expressed as:

L＝W _box L _box +W _cls L _cls

wherein, W _box Represents the algorithm detection box weight, W _cls Represents the classification loss weight, L _cls Indicates focal length; the algorithmic detection box weights and classification loss weights are calculated using genetic algorithms on different datasets.

9. The method for identifying components of a switchgear cabinet of a power distribution room based on the improved YOLOv5 algorithm of claim 8, wherein the method further comprises: after the counter part identification result is obtained, evaluating the network model performance by using the indexes of the precision, the recall rate and the average precision rate mean value;

the recall ratio calculation formula is as follows: recall = TP/(TP + FN) = TP/all groups

The average accuracy mean value calculation formula is as follows:

10. The improved YOLOv5 algorithm-based distribution room switch cabinet face component identification method according to claim 8, wherein the cabinet face components comprise an operating state warning board, a switch knife switch, a grounding knife switch and an indicator light;