CN115565232A - Power distribution room switch cabinet face part identification method based on improved YOLOv5 algorithm - Google Patents

Power distribution room switch cabinet face part identification method based on improved YOLOv5 algorithm Download PDF

Info

Publication number
CN115565232A
CN115565232A CN202211299619.8A CN202211299619A CN115565232A CN 115565232 A CN115565232 A CN 115565232A CN 202211299619 A CN202211299619 A CN 202211299619A CN 115565232 A CN115565232 A CN 115565232A
Authority
CN
China
Prior art keywords
feature map
feature
algorithm
switch cabinet
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211299619.8A
Other languages
Chinese (zh)
Inventor
陈泽涛
陈申宇
苏崇文
刘秦铭
王增煜
陈志健
芮庆涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202211299619.8A priority Critical patent/CN115565232A/en
Publication of CN115565232A publication Critical patent/CN115565232A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for identifying a cabinet face part of an electric room switch cabinet based on an improved YOLOv5 algorithm, which comprises the following steps: acquiring a switch cabinet image through a camera in a power distribution room to obtain a switch cabinet image data set; carrying out data enhancement on the image data set of the switch cabinet to obtain an enhanced data set, and dividing the enhanced data set into a training set and a test set; constructing an improved YOLOv5 network model, comprising: firstly, improving a network structure, and adding a small target detection layer; using an AF-FPN structure to replace an FPN structure in a neutral network, and finally using an SIOU loss function as a loss function of a bounding box; iteratively training the improved YOLOv5 network model on the training set until convergence, and storing the optimal network weight; and loading the optimal network weight into an improved YOLOv5 network model, testing by using a test set, and outputting a counter part identification result. The improved YOLOv5 algorithm is used, the cabinet surface parts are accurately identified and performance evaluation is carried out under the conditions of complex background and small target, and the cabinet surface part detection method has a good detection and identification effect.

Description

Power distribution room switch cabinet surface part identification method based on improved YOLOv5 algorithm
Technical Field
The invention belongs to the technical field of digital image processing, and particularly relates to a distribution room switch cabinet face part identification method based on an improved YOLOv5 algorithm.
Background
With the increasing demand of production and life for electric power energy, the requirements on the quality of power supply (stability, non-intermittence and accompanying services) provided by a power supply department are higher and higher; the distribution room is the most important power supply node in the power grid, and the number of the power supply nodes is numerous, and the power supply nodes are distributed and distributed widely in regions. In order to ensure the normal operation of a power grid, inspection personnel need to perform daily inspection and maintenance operation on equipment, but due to the fact that field operation environments and personnel are complex, multiple kinds of cross operation are performed, multiple cooperation parties are provided, the problems that operation points are scattered, labor cost of field supervision is high, operation efficiency is low, operation field management is difficult, operation precautionary measures are not in place, abnormal conditions cannot be timely alarmed and processed and the like can occur.
Safety production is the first element of the survival and development of enterprises, and although the electric power safety work regulations strictly stipulate the safety requirements to be followed when relevant personnel enter a work site, the site work environment is complex, a plurality of unpredictable dangers exist, and a large potential safety hazard is brought. Although the video monitoring can be carried out on the operation site at present, a large amount of video and picture data brought by the video monitoring also have new problems; because the shot video and picture information still need to be checked manually, the backgrounds of different electric room images are often complex, the sizes are different, visual fatigue is easy to occur to workers during checking, the conditions of missing detection and false detection occur, and the efficiency is reduced. Therefore, it is a research trend to automatically detect the counter picture by using the image processing technology.
With the development of an image processing technology, deep learning makes good progress in the field of target detection, and the current classical target detection algorithm is mainly divided into a single stage and a double stage, wherein the single stage comprises YOLO, SDD, retina-Net and the like, and the double stage comprises R-CNN, fast R-CNN, mask R-CNN and the like; the YOLO algorithm is fastest in operation speed and high in accuracy, and is suitable for detecting cabinet surface parts such as disconnecting link switches and indicator lamps on cabinet surfaces of switch cabinets in power distribution rooms, but due to the fact that different electric room images are complex in background, shielding exists, the cabinet surface parts are small, recognition effects are not good, and further research is still needed.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art and provide a distribution room switch cabinet surface part identification method based on an improved YOLOv5 algorithm, the invention collects switch cabinet images, uses the improved YOLOv5 algorithm to accurately identify and evaluate the surface part under the conditions of complex background and small target, has good detection and identification effects and provides technical support for intelligent construction of a distribution room.
In order to achieve the purpose, the invention adopts the following technical scheme:
a distribution room switch cabinet face part identification method based on an improved YOLOv5 algorithm comprises the following steps:
acquiring a switch cabinet image through a camera in a power distribution room to obtain a switch cabinet image data set;
carrying out data enhancement on the image data set of the switch cabinet to obtain an enhanced data set, and dividing the enhanced data set into a training set and a test set;
constructing an improved YOLOv5 network model, comprising: firstly, improving a network structure, and adding a small target detection layer; using an AF-FPN structure to replace an FPN structure in a neutral network, and finally using an SIOU loss function as a loss function of a bounding box;
iteratively training the improved YOLOv5 network model on the training set until convergence, and storing the optimal network weight;
and loading the optimal network weight into an improved YOLOv5 network model, testing by using a test set, and outputting a counter part identification result.
As a preferred technical scheme, a Mosaic-9 data enhancement method is adopted for data enhancement of a switch cabinet image data set to obtain an enhanced data set, and the method specifically comprises the following steps:
adjusting all switch cabinet images in the switch cabinet image data set to be uniform in size;
randomly taking out n pictures from the image data set of the switch cabinet, randomly cutting, zooming, randomly arranging and splicing into one picture, repeating the Batch-size times to obtain a Batch-size mosaic data enhanced picture, and storing the enhanced picture into the image data set of the switch cabinet to obtain an enhanced data set.
As a preferred technical solution, the improved YOLOv5 network model includes an input layer, a backhaul network, a Neck network, and a prediction layer;
the AF-FPN structure is additionally provided with an adaptive attention module AAM and a feature enhancement module FEM on the basis of the FPN structure;
the adaptive attention module AAM is used for reducing the loss of the context information in the feature channel and the high-level feature map;
the feature enhancement module FEM is used for enhancing the representation of the feature pyramid and improving the reasoning speed.
As a preferred technical solution, the iterative training of the improved YOLOv5 network model on the training set specifically includes:
inputting the training set into a backhaul network through an input layer, and extracting feature mapping of the training set;
acquiring a feature map of a training set in a Neck network according to the feature mapping of the training set;
predicting in a prediction layer based on a feature map of a training set, calculating loss, and updating model parameters;
retraining until the loss function converges or the maximum iteration times is reached, and storing the optimal network weight.
As a preferred technical solution, the feature map of the training set is represented as { C1, C2, C3, C4, C5};
the method comprises the following steps that the neutral network adopts top-down sampling operation and low-level feature mapping fusion to obtain a feature map of a training set, and specifically comprises the following steps:
generating a feature map M5 from the feature map C5 through an adaptive attention module AAM, fusing the feature map M5 with the feature map C5, and inputting the fused feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P5;
generating a feature map M4 from the feature map P5 through downsampling operation, fusing the feature map M4 with the feature map C4, and inputting the feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P4;
and generating a feature map M3 from the feature map P4 through downsampling operation, fusing the feature map M3 with the feature map C3, and inputting the fused feature map into a feature enhancement module FPM for feature enhancement to obtain the feature map P3.
As a preferred technical solution, the generating the feature mapping M5 by the feature mapping C5 through the adaptive attention module AAM specifically includes:
for the input feature mapping C5, firstly, obtaining 3 semantic features with different scales through a self-adaptive pooling layer;
carrying out convolution operation on the semantic features with 3 different scales by using 1 x 1 convolution to obtain the same channel dimension;
performing up-sampling operation on 3 semantic features with different scales by using a bilinear interpolation method, and merging channels through a Concat layer to obtain a feature map;
sequentially passing the feature graph through a 1 × 1 convolutional layer, a ReLU active layer, a 3 × 3 convolutional layer and a sigmoid active layer to generate a space weight of the feature graph;
and after the generated space weight and the feature map are subjected to Hadamard product operation, separating into 3 new context feature representations, and performing matrix sum operation with the input feature map C5 to obtain a feature map M5.
As a preferred technical solution, the feature enhancement module FPM adaptively learns different receptive fields in each feature map, including a multi-branch convolution layer and a branch pooling layer, by using dilation convolution;
the multi-branch convolution layer provides different sizes of receptive fields for the input characteristic diagram through expansion convolution, wherein the receptive fields comprise expansion convolution, a BN layer and a ReLU activation layer; the dilation convolutions in the multi-branch convolutional layer have the same kernel size but different dilation rates; the receptive field formula of the dilated convolution is:
r 1 =d×(k-1)+1
r n =d×(k-1)+r n-1
wherein k represents the convolution kernel size, r represents the expansion rate, n represents the number of expansion convolutions, and d represents stride of the convolution;
the branch pooling layer fuses receptive field traffic information of different multi-branch convolutional layers by utilizing average operation, so that multi-scale precision prediction is improved, and the expression is as follows:
Figure BDA0003903982370000031
wherein, y p Represents the output of the branch pooling layer; b represents the number of branches of the multi-branch convolutional layer, y i Represents the output of the ith branch convolution layer.
As a preferred technical scheme, the prediction layer performs prediction by adopting a bottom-up upsampling operation and high-level feature map fusion, namely performing upsampling operation on a feature map P3, performing upsampling operation after fusion with a feature map P4, and performing fusion with a feature map P5 to obtain an algorithm detection frame of the counter component;
comparing the algorithm detection frame of the counter part with the actual marking frame to obtain an intersection ratio, and constructing an SIOU loss function;
the intersection ratio formula is expressed as:
Figure BDA0003903982370000041
wherein B represents an algorithm detection frame of the counter part, B GT Actual labeling boxes representing the cabinet components;
the SIOU loss function consists of four parts, namely an Angle cost Angle function, a Distance cost function, a Shape cost function and an IoU cost function;
the Angle cost Angle function is defined as:
Figure BDA0003903982370000042
wherein the content of the first and second substances,
Figure BDA0003903982370000043
Figure BDA0003903982370000044
wherein the content of the first and second substances,
Figure BDA0003903982370000045
representing the distance of the actual label box on the x-axis,
Figure BDA0003903982370000046
to represent the distance of the algorithm detection box on the x-axis,
Figure BDA0003903982370000047
to actually label the distance of the box on the y-axis,
Figure BDA0003903982370000048
detecting the distance of the box on the y axis for the algorithm;
the Distance cost function is defined as:
Figure BDA0003903982370000049
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00039039823700000410
γ=2-Λ,c w the difference value of the actual marking box and the algorithm detection box in the x-axis direction is obtained;
the Shape cost function is defined as:
Figure BDA00039039823700000411
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00039039823700000412
w represents the width of the algorithm detection box, H represents the height of the algorithm detection box, W Gt Indicates the actual width of the label box, H Gt Represents the actual height of the labeled box, theta is equal to 2,6]Representing a degree of attention;
the IoU cos function is defined as:
Figure BDA00039039823700000413
the SIOU loss function is expressed as:
L=W box L box +W cls L cls
wherein, W box Representing the algorithm detection box weight, W cls Represents the classification loss weight, L cls Indicates focal length; the algorithmic detection box weights and classification loss weights are calculated using genetic algorithms on different datasets.
As a preferred technical solution, the method further comprises: after the counter part identification result is obtained, evaluating the performance of the network model by using the indexes of accuracy, recall rate and average accuracy rate mean value;
the accuracy calculation formula is as follows: precision = TP/(TP + FP) = TP/all detections;
the recall ratio calculation formula is as follows: recall = TP/(TP + FN) = TP/all groups clusters
The average accuracy mean value calculation formula is as follows:
Figure BDA0003903982370000051
the method comprises the following steps that TP is an area with correct algorithm detection frame prediction, FP is an area with wrong algorithm detection frame prediction, and FN is an area with correct actual marking frame but not predicted by the algorithm detection frame; all detections are prediction areas of the algorithm detection box, all clusters are actual areas of the actual labeling box, r represents a recall rate, rho (r) is a precision value of the recall rate r, and rho (r) is a precision value of the recall rate r interp (r n+1 ) When the recall rate is greater than or equal to r, the highest precision value in the precision values rho (r) is corresponded.
As a preferred technical scheme, the cabinet surface component comprises an operation state warning board, a switch disconnecting link, a grounding disconnecting link and an indicator light;
the pooling coefficient of the adaptive pooling layer in the adaptive attention module AAM is [0.1,0.5], the channel dimension is 256;
the kernel size of the dilation convolution of the multi-branch convolution layer in the feature enhancement module FPM is 3 x 3, and dilation rates d of different branches are 1, 3 and 5 respectively.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the method, the Mobile-9 data is adopted for enhancing, a data set is enriched, the network training speed is improved, meanwhile, random noise data are reasonably added, the small target sample distinguishing of a network model is enhanced, and the generalization effect of the model is improved.
2. The YOLOv5 network structure is improved, a small target detection layer is added, a shallow feature map and a deep feature map are spliced and then detected, the small target detection effect is improved, AF-FPN is used for replacing the original FPN in the Neck part, the loss of context information in the feature maps is reduced, the representation of a feature pyramid is enhanced, and the reasoning speed is improved.
3. And improving a prediction box formula, replacing the original CIOU Loss with the SIOU Loss to serve as a Loss function of the bounding box, and improving the training speed and the reasoning accuracy compared with the original CIOU Loss.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an electrical room switch cabinet face part identification method based on an improved YOLOv5 algorithm in an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for enhancing Mosaic-9 data according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an improved YOLOv5 network model in an embodiment of the present invention;
FIG. 4 is a schematic diagram of the AF-FPN structure in the embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an adaptive attention Module AAM according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a feature enhancement module FEM according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of the cross-over ratio IoU according to the embodiment of the invention;
FIG. 8 is a schematic diagram of an Angle cost Angle function according to an embodiment of the present invention;
FIG. 9 is a graph illustrating Angle cost as a function of Angle according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a Shape cost function in an embodiment of the invention;
fig. 11 is a schematic diagram of an identification result in the embodiment of the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by a person skilled in the art that the embodiments described herein can be combined with other embodiments.
As shown in fig. 1, the method for identifying a surface part of a switch cabinet of a power distribution room based on an improved YOLOv5 algorithm in the embodiment includes the following steps:
s1, acquiring a switch cabinet image through a camera in a power distribution room to obtain a switch cabinet image data set;
the invention discloses a switch cabinet image data collection system, which is characterized in that a camera is arranged in a power distribution room at present and is used for monitoring the state of the power distribution room and the information of personnel entering and exiting the power distribution room.
In the embodiment, under different backgrounds and different counter targets, 1258 or more switch cabinet images are obtained from the camera, and the richness is expanded as much as possible.
S2, performing data enhancement on the image data set of the switch cabinet to obtain an enhanced data set, and dividing the enhanced data set into a training set and a test set;
the identification and detection of the cabinet surface components are facilitated, data enhancement needs to be performed on the image data set of the switch cabinet to enrich the data set, and in this embodiment, data enhancement is performed by adopting a Mosaic-9 data enhancement method as shown in fig. 2, specifically:
adjusting the sizes of all switch cabinet images in the switch cabinet image data set to be uniform;
randomly taking out n pictures from the image data set of the switch cabinet, randomly cutting, zooming, randomly arranging and splicing into one picture, repeating the Batch-size times to obtain a Batch-size mosaic data enhanced picture, and storing the enhanced picture into the image data set of the switch cabinet to obtain an enhanced data set.
In the embodiment, 9 pictures are randomly selected from the image data set of the switch cabinet each time for data enhancement, so that the small sample target is increased while the data set is enriched, and the training speed of the network is increased; and random noise data are reasonably added, so that the small target samples can be distinguished by the network model, and the generalization effect of the model is further improved. After the enhanced data set is obtained, the enhanced data set is divided into a training set and a testing set according to the proportion of 8: 2.
S3, constructing an improved YOLOv5 network model, which comprises the following steps: firstly, improving a network structure, and adding a small target detection layer; using an AF-FPN structure to replace an FPN structure in a neutral network, and finally using an SIOU loss function as a loss function of a bounding box;
as shown in fig. 3, a YOLOv5 network model is constructed based on a YOLOv5 algorithm, and then is improved, firstly, the network structure is improved, a small target detection layer is added, a shallow feature map and a deep feature map are spliced and then detected, although the increase of the detection layer causes the reduction of the inference detection speed, the detection effect on the small target is improved; then, an AF-FPN structure is used in the Neck network to replace an original FPN (characteristic pyramid) structure, and as shown in fig. 4, the AF-FPN structure is formed by adding an Adaptive Attention Module (AAM) and a characteristic enhancement module (FEM) on the basis of a traditional characteristic pyramid network (FPN); reducing loss of context information in feature channels and high-level feature maps using AAM; the representation of the FEM enhanced feature pyramid is used, the reasoning speed is improved, and the recognition performance is improved.
S4, iteratively training the improved YOLOv5 network model on the training set until convergence, and storing the optimal network weight, wherein the method specifically comprises the following steps:
s41, inputting the training set into a backhaul network through an input layer, and extracting feature mapping of the training set;
as shown in fig. 4, after the training set is subjected to multiple convolutions in the backhaul network, the generated feature map is { C1, C2, C3, C4, C5};
s42, acquiring a feature map of the training set in the Neck network according to the feature mapping of the training set;
the hack network adopts top-down sampling operation and low-level feature mapping fusion, and uses the FEM module to expand the receptive field, and obtains the feature map of the training set, as shown in fig. 4 specifically:
generating a feature map M5 from the feature map C5 through an adaptive attention module AAM, fusing the feature map M5 with the feature map C5, and inputting the fused feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P5;
generating a feature map M4 from the feature map P5 through downsampling operation, fusing the feature map M4 with the feature map C4, and inputting the feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P4;
and generating a feature map M3 from the feature map P4 through downsampling operation, fusing the feature map M3 with the feature map C3, and inputting the feature map into a feature enhancement module FPM for feature enhancement to obtain the feature map P3.
The specific principle of the adaptive attention module AAM generating the feature map M5 is shown in fig. 5:
for the input feature mapping C5, the size is S = h × w, and firstly, semantic features of 3 different scales are obtained through a self-adaptive pooling layer; the pooling factor in this embodiment is [0.1,0.5], which varies adaptively according to the target size of the data set.
Carrying out convolution operation on the semantic features with 3 different scales by using 1 x 1 convolution to obtain the same channel dimension; 256 in this example;
performing up-sampling operation on 3 semantic features with different scales by using a bilinear interpolation method, up-sampling to S scale, and merging channels through a Concat layer to obtain a feature map;
sequentially passing the feature graph through a 1 × 1 convolutional layer, a ReLU active layer, a 3 × 3 convolutional layer and a sigmoid active layer to generate a space weight of the feature graph;
after the generated space weight and the feature map are subjected to Hadamard product operation, the space weight and the feature map are separated into 3 new context feature representations and are subjected to matrix sum operation with the input feature map C5 to obtain a feature map M5, and the finally obtained feature map M5 has rich multi-scale context information, so that information loss caused by reduction of the number of channels is relieved to a certain extent.
Meanwhile, as shown in fig. 6, the feature enhancement module FPM adaptively learns different receptive fields in each feature map by using the dilation convolution, thereby improving the accuracy of multi-scale target detection and identification, which includes a multi-branch convolution layer and a branch pooling layer;
the multi-branch convolution layer provides different-size receptive fields for the input feature map through expansion convolution, wherein the receptive fields comprise expansion convolution, a BN layer and a ReLU activation layer; wherein the dilation convolutions have the same kernel size but different dilation rates; specifically, the kernel size of each dilation convolution is 3 × 3, and the dilation rates d of different branches are 1, 3, and 5, respectively; expanding the convolution supports an exponentially expanded receptive field without loss of resolution; in the convolution operation of the dilation convolution, since the elements of the convolution kernel are spaced, the size of the space depends on the dilation rate; the expansion rate of the normal convolution is 1, which means that no gap exists before each element of the convolution kernel, and the normal convolution is a special cavity convolution in a broad sense; when r =2, it indicates that each element of the convolution kernel is preceded by a gap, i.e. the position difference between two adjacent elements is 1; therefore, the elements of the convolution kernels in the expanding convolution and the standard convolution operation are all adjacent and different, and when the expanding convolution kernel is changed from 3 × 3 to 7 × 7, the receptive field size of the layer is 7 × 7;
the receptive field formula of the dilated convolution is:
r 1 =d×(k-1)+1
r n =d×(k-1)+r n-1
wherein k represents the convolution kernel size, r represents the expansion rate, n represents the number of expansion convolutions, and d represents stride of the convolution;
the branch pooling layer fuses the receptive field traffic information of different multi-branch convolution layers by utilizing average operation, so that additional parameters are prevented from being introduced, multi-scale precision prediction is improved, and the expression is as follows:
Figure BDA0003903982370000091
wherein, y p Represents the output of the branch pooling layer; b represents the number of branches of the multi-branch convolutional layer, y i Represents the output of the ith branch convolution layer.
S43, predicting in a prediction layer based on the feature graph of the training set, calculating loss, and updating model parameters;
as shown in fig. 4, the prediction layer performs prediction by using bottom-up upsampling operation and high-level feature map fusion, that is, performing upsampling operation on a feature map P3, performing upsampling operation after fusion with the feature map P4, and performing fusion with a feature map P5 to obtain an algorithm detection frame of the counter component;
as shown in fig. 7, comparing the algorithm detection box of the counter component with the actual labeling box to obtain an intersection ratio, and constructing an SIOU loss function; the intersection-parallel ratio calculation formula is as follows:
Figure BDA0003903982370000092
wherein B in the figure represents an algorithm detection frame of the counter part, B GT Actual labeling boxes representing the cabinet components;
in the invention, an SIOU Loss function is used for replacing the original CIOU Loss to be used as a Loss function of a bounding box, and a network model is optimized; the SIOU loss function consists of four parts, namely an Angle cost Angle function, a Distance cost function, a Shape cost function and an IoU cost function;
as shown in FIG. 8, B represents the algorithm detection box of the counter component, B GT Actual reference boxes indicating counter parts, when B to B GT When the included angle is smaller than alpha, the convergence is towards the minimum alpha, otherwise, the convergence is towards beta; therefore, the number of variables in relation to distance can be reduced to the maximum extent by using the Angle cost Angle function; the improved YOLOv5 network model in this embodiment first brings the algorithmic detection box to the X or Y axis (whichever is closest), and then continues the approach along the relevant axis if one is
Figure BDA0003903982370000093
The convergence process will first minimize α, otherwise minimize β:
Figure BDA0003903982370000094
the plot of the Angle cost Angle function is shown in FIG. 9, and thus the Angle cost Angle functionThe number is defined as:
Figure BDA0003903982370000095
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003903982370000096
Figure BDA0003903982370000097
wherein the content of the first and second substances,
Figure BDA0003903982370000098
representing the distance of the actual label box on the x-axis,
Figure BDA0003903982370000099
to represent the distance of the algorithm detection box on the x-axis,
Figure BDA00039039823700000910
to actually label the distance of the box on the y-axis,
Figure BDA0003903982370000101
detecting the distance of the box on the y axis for the algorithm;
the Distance cost function is redefined in view of the Angle cost function defined above as:
Figure BDA0003903982370000102
wherein the content of the first and second substances,
Figure BDA0003903982370000103
γ=2-Λ,c w the difference value of the actual marking box and the algorithm detection box in the x-axis direction is obtained; when α → 0, the contribution rate of Distance cost is greatly reduced; conversely, the Distance cost contribution rate is increased as α approaches π/4The larger; since the distance value to which γ is given time priority has a large influence as the angle increases, the distance value increases as the angle increases.
As shown in FIG. 10, the Shape cost function is defined as:
Figure BDA0003903982370000104
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003903982370000105
w represents the algorithm detection box width, H represents the algorithm detection box height, W Gt Indicates the actual width of the label box, H Gt Represents the actual height of the labeled box, theta ∈ [2,6 ]]Representing a degree of attention; the value of θ defines the Shape cost of each data set and its value is unique; the value of θ is a very important term in this equation, which controls the degree of interest in Shape cost; if the value of θ is set to 1, then one Shape will be optimized immediately, compromising Shape's freedom of movement; therefore calculating the value of θ a genetic algorithm is used for each data set, θ =4 in this example.
The IoU cos function is defined as:
Figure BDA0003903982370000106
the resulting SIOU loss function is expressed as:
L=W box L box +W cls L cls
wherein, W box Represents the algorithm detection box weight, W cls Represents the classification loss weight, L cls The representation is focal loss.
Calculating and obtaining the weight of an algorithm detection frame and the classification loss weight on different data sets by adopting a genetic algorithm; the weight W of the algorithm detection box in this embodiment box =0.35, classification loss weight W cls =0.65。
And S44, retraining until the loss function converges or the maximum iteration number is reached, and storing the optimal network weight.
And S5, loading the optimal network weight into an improved YOLOv5 network model, testing by using a test set, and outputting a counter part identification result.
In order to evaluate the recognition result, the embodiment also uses the indexes of accuracy, recall rate and average accuracy rate mean value to evaluate the performance of the network model;
as shown in fig. 7, the intersection ratio IoU = TP/(TP + FN + FP), and the IoU is the overlapping area of the actual labeling box and the algorithm detection box divided by their merging area, which is the accuracy of measuring whether the algorithm detection box accurately finds the target object; the method comprises the following steps that TP is an area with correct prediction of an algorithm detection frame, FP is an area with wrong prediction of the algorithm detection frame, FN is an area with correct actual marking frame but not predicted by the algorithm detection frame, and TN is an area without the actual marking frame and the algorithm detection frame;
wherein, the accuracy calculation formula is: precision = TP/(TP + FP) = TP/all detections;
the recall ratio calculation formula is: recall = TP/(TP + FN) = TP/all groups
The average accuracy mean value calculation formula is as follows:
Figure BDA0003903982370000111
wherein all detections are prediction areas of the algorithm detection box, all groups are actual areas of the actual labeling box, r represents a recall rate, rho (r) is a precision value of the recall rate r, and rho (r) is a precision value of the recall rate r interp (r n+1 ) When the recall rate is greater than or equal to r, the highest precision value in the precision values rho (r) is corresponded.
As shown in fig. 11, in the embodiment, as a result of identifying the cabinet surface component in an image of a certain switch cabinet, it can be seen that the invention can accurately identify and label the cabinet surface components such as an operation state warning board, a switch disconnecting link, a grounding disconnecting link, an indicator light, and the like, and has important significance for maintenance and intelligent construction of a power distribution room.
It should be noted that for simplicity and clarity of description, the above-described method embodiments have been presented as a series of interrelated steps, but it should be appreciated by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the invention.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims (10)

1. A distribution room switch cabinet face part identification method based on an improved YOLOv5 algorithm is characterized by comprising the following steps:
acquiring a switch cabinet image through a camera in a power distribution room to obtain a switch cabinet image data set;
carrying out data enhancement on the image data set of the switch cabinet to obtain an enhanced data set, and dividing the enhanced data set into a training set and a test set;
constructing an improved YOLOv5 network model, comprising: firstly, improving a network structure, and adding a small target detection layer; using an AF-FPN structure to replace an FPN structure in a neutral network, and finally using an SIOU loss function as a loss function of a bounding box;
iteratively training the improved YOLOv5 network model on the training set until convergence, and storing the optimal network weight;
and loading the optimal network weight into an improved YOLOv5 network model, testing by using a test set, and outputting a counter part identification result.
2. The improved YOLOv5 algorithm-based part identification method for the switch cabinet surface of the power distribution room, as claimed in claim 1, is characterized in that a Mosaic-9 data enhancement method is adopted to enhance the data of the switch cabinet image data set, so as to obtain an enhanced data set, specifically:
adjusting all switch cabinet images in the switch cabinet image data set to be uniform in size;
and randomly taking n pictures from the image data set of the switch cabinet, randomly cutting, zooming, randomly arranging and splicing the n pictures into one picture, repeating the Batch-size times to obtain a Batch-size mosaic data enhanced picture, and storing the enhanced picture into the image data set of the switch cabinet to obtain an enhanced data set.
3. The method for identifying components of a switch cabinet of a power distribution room based on the improved YOLOv5 algorithm in claim 1, wherein the improved YOLOv5 network model comprises an input layer, a backhaul network, a hack network and a prediction layer;
the AF-FPN structure is additionally provided with an adaptive attention module AAM and a feature enhancement module FEM on the basis of the FPN structure;
the adaptive attention module AAM is used for reducing the loss of the context information in the feature channel and the high-level feature map;
the feature enhancement module FEM is used for enhancing the representation of the feature pyramid and improving the reasoning speed.
4. The method for identifying components of a switch cabinet face of a power distribution room based on an improved YOLOv5 algorithm as claimed in claim 3, wherein the improved YOLOv5 network model is iteratively trained on a training set, specifically:
inputting the training set into a backhaul network through an input layer, and extracting feature mapping of the training set;
acquiring a feature map of a training set in a Neck network according to the feature mapping of the training set;
predicting in a prediction layer based on a feature map of a training set, calculating loss, and updating model parameters;
retraining until the loss function converges or the maximum iteration times is reached, and storing the optimal network weight.
5. The method for identifying components of a switchgear cabinet of a power distribution room based on the improved YOLOv5 algorithm as claimed in claim 4, wherein the feature map of the training set is represented as { C1, C2, C3, C4, C5};
the method comprises the following steps that the neutral network adopts top-down sampling operation and low-level feature mapping fusion to obtain a feature map of a training set, and specifically comprises the following steps:
generating a feature map M5 by the feature map C5 through an adaptive attention module AAM, fusing the feature map M5 with the feature map C5, and inputting the fused feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P5;
generating a feature map M4 from the feature map P5 through downsampling operation, fusing the feature map M4 with the feature map C4, and inputting the feature map into a feature enhancement module FPM for feature enhancement to obtain a feature map P4;
and generating a feature map M3 from the feature map P4 through downsampling operation, fusing the feature map M3 with the feature map C3, and inputting the feature map into a feature enhancement module FPM for feature enhancement to obtain the feature map P3.
6. The method for identifying the surface component of the switch cabinet of the power distribution room based on the improved YOLOv5 algorithm as claimed in claim 5, wherein the feature map C5 is generated into the feature map M5 by the adaptive attention module AAM, specifically:
for the input feature mapping C5, firstly, obtaining 3 semantic features with different scales through a self-adaptive pooling layer;
carrying out convolution operation on the semantic features with 3 different scales by using 1 x 1 convolution to obtain the same channel dimension;
performing up-sampling operation on 3 semantic features with different scales by using a bilinear interpolation method, and merging channels through a Concat layer to obtain a feature map;
sequentially passing the feature graph through a 1 × 1 convolutional layer, a ReLU active layer, a 3 × 3 convolutional layer and a sigmoid active layer to generate a space weight of the feature graph;
after the generated space weight and the feature map are subjected to Hadamard product operation, the space weight and the feature map are separated into 3 new context feature representations, and matrix sum operation is performed on the context feature representations and the input feature map C5 to obtain a feature map M5.
7. The improved YOLOv5 algorithm-based part identification method for the switch cabinet face of the power distribution room, characterized in that the feature enhancement module FPM adaptively learns different receptive fields in each feature map, including a multi-branch convolution layer and a branch pooling layer, by using a dilation convolution;
the multi-branch convolution layer provides different sizes of receptive fields for the input feature map through expansion convolution, wherein the receptive fields comprise expansion convolution, a BN layer and a ReLU activation layer; the expansion convolutions in the multi-branch convolution layer have the same kernel size but different expansion rates; the receptive field formula of the dilated convolution is:
r 1 =d×(k-1)+1
r n =d×(k-1)+r n-1
wherein k represents the convolution kernel size, r represents the expansion rate, n represents the number of expansion convolutions, and d represents stride of convolution;
the branch pooling layer fuses receptive field traffic information of different multi-branch convolutional layers by utilizing average operation, so that multi-scale precision prediction is improved, and the expression is as follows:
Figure FDA0003903982360000021
wherein, y p Represents the output of the branch pooling layer; b represents the number of branches of the multi-branch convolutional layer, y i Represents the output of the ith branch convolution layer.
8. The method for identifying the surface parts of the switch cabinets of the power distribution rooms based on the improved YOLOv5 algorithm, as claimed in claim 5, is characterized in that the prediction layer performs prediction by using bottom-up upsampling operation and high-level feature map fusion, namely performing upsampling operation on a feature map P3, performing upsampling operation after the feature map P4 is fused, and performing fusion with the feature map P5 to obtain the algorithm detection frame of the surface parts;
comparing the algorithm detection frame of the counter part with the actual marking frame to obtain an intersection ratio, and constructing an SIOU loss function;
the intersection ratio formula is expressed as:
Figure FDA0003903982360000031
wherein B represents the algorithm detection box of the cabinet component, B GT Actual labeling boxes representing the cabinet components;
the SIOU loss function consists of four parts, namely an Angle cost Angle function, a Distance cost function, a Shape cost function and an IoU cost function;
the Angle cost Angle function is defined as:
Figure FDA0003903982360000032
wherein the content of the first and second substances,
Figure FDA0003903982360000033
Figure FDA0003903982360000034
Figure FDA0003903982360000035
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003903982360000036
representing the distance of the actual label box on the x-axis,
Figure FDA0003903982360000037
detection of boxes on the x-axis for representation of algorithmsThe distance of (a) to (b) is,
Figure FDA0003903982360000038
to actually label the distance of the box on the y-axis,
Figure FDA0003903982360000039
detecting the distance of the box on the y axis for the algorithm;
the Distance cost function is defined as:
Figure FDA00039039823600000310
wherein the content of the first and second substances,
Figure FDA00039039823600000311
γ=2-Λ,c w the difference value of the actual marking box and the algorithm detection box in the x-axis direction is obtained;
the Shape cost function is defined as:
Figure FDA00039039823600000312
wherein the content of the first and second substances,
Figure FDA00039039823600000313
w represents the width of the algorithm detection box, H represents the height of the algorithm detection box, W Gt Indicates the actual width of the label box, H Gt Represents the actual height of the labeled box, theta is equal to 2,6]Representing a degree of attention;
the IoU cos function is defined as:
Figure FDA0003903982360000041
the SIOU loss function is expressed as:
L=W box L box +W cls L cls
wherein, W box Represents the algorithm detection box weight, W cls Represents the classification loss weight, L cls Indicates focal length; the algorithmic detection box weights and classification loss weights are calculated using genetic algorithms on different datasets.
9. The method for identifying components of a switchgear cabinet of a power distribution room based on the improved YOLOv5 algorithm of claim 8, wherein the method further comprises: after the counter part identification result is obtained, evaluating the network model performance by using the indexes of the precision, the recall rate and the average precision rate mean value;
the accuracy calculation formula is as follows: precision = TP/(TP + FP) = TP/all detections;
the recall ratio calculation formula is as follows: recall = TP/(TP + FN) = TP/all groups
The average accuracy mean value calculation formula is as follows:
Figure FDA0003903982360000042
the method comprises the following steps that TP is an area with correct algorithm detection frame prediction, FP is an area with wrong algorithm detection frame prediction, and FN is an area with correct actual marking frame but not predicted by the algorithm detection frame; all detections are prediction areas of the algorithm detection box, all clusters are actual areas of the actual labeling box, r represents a recall rate, rho (r) is a precision value of the recall rate r, and rho (r) is a precision value of the recall rate r interp (r n+1 ) When the recall rate is greater than or equal to r, the highest precision value in the precision values rho (r) is corresponded.
10. The improved YOLOv5 algorithm-based distribution room switch cabinet face component identification method according to claim 8, wherein the cabinet face components comprise an operating state warning board, a switch knife switch, a grounding knife switch and an indicator light;
the pooling coefficient of the adaptive pooling layer in the adaptive attention module AAM is [0.1,0.5], the channel dimension is 256;
the kernel size of the dilation convolution of the multi-branch convolution layer in the feature enhancement module FPM is 3 x 3, and dilation rates d of different branches are 1, 3 and 5 respectively.
CN202211299619.8A 2022-10-24 2022-10-24 Power distribution room switch cabinet face part identification method based on improved YOLOv5 algorithm Pending CN115565232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211299619.8A CN115565232A (en) 2022-10-24 2022-10-24 Power distribution room switch cabinet face part identification method based on improved YOLOv5 algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211299619.8A CN115565232A (en) 2022-10-24 2022-10-24 Power distribution room switch cabinet face part identification method based on improved YOLOv5 algorithm

Publications (1)

Publication Number Publication Date
CN115565232A true CN115565232A (en) 2023-01-03

Family

ID=84746310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211299619.8A Pending CN115565232A (en) 2022-10-24 2022-10-24 Power distribution room switch cabinet face part identification method based on improved YOLOv5 algorithm

Country Status (1)

Country Link
CN (1) CN115565232A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205895A (en) * 2023-03-16 2023-06-02 四川轻化工大学 Transformer oil leakage detection method based on improved YOLOv5
CN116229570A (en) * 2023-02-21 2023-06-06 四川轻化工大学 Aloft work personnel behavior situation identification method based on machine vision
CN116342596A (en) * 2023-05-29 2023-06-27 云南电网有限责任公司 YOLOv5 improved substation equipment nut defect identification detection method
CN116524328A (en) * 2023-06-28 2023-08-01 中国科学院长春光学精密机械与物理研究所 Target identification method and system based on improved lightweight network

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229570A (en) * 2023-02-21 2023-06-06 四川轻化工大学 Aloft work personnel behavior situation identification method based on machine vision
CN116229570B (en) * 2023-02-21 2024-01-23 四川轻化工大学 Aloft work personnel behavior situation identification method based on machine vision
CN116205895A (en) * 2023-03-16 2023-06-02 四川轻化工大学 Transformer oil leakage detection method based on improved YOLOv5
CN116205895B (en) * 2023-03-16 2024-04-02 四川轻化工大学 Transformer oil leakage detection method based on improved YOLOv5
CN116342596A (en) * 2023-05-29 2023-06-27 云南电网有限责任公司 YOLOv5 improved substation equipment nut defect identification detection method
CN116342596B (en) * 2023-05-29 2023-11-28 云南电网有限责任公司 YOLOv5 improved substation equipment nut defect identification detection method
CN116524328A (en) * 2023-06-28 2023-08-01 中国科学院长春光学精密机械与物理研究所 Target identification method and system based on improved lightweight network
CN116524328B (en) * 2023-06-28 2023-09-15 中国科学院长春光学精密机械与物理研究所 Target identification method and system based on improved lightweight network

Similar Documents

Publication Publication Date Title
CN115565232A (en) Power distribution room switch cabinet face part identification method based on improved YOLOv5 algorithm
Hayes et al. Contextual anomaly detection in big sensor data
Ghamarian et al. Hierarchical density-based cluster analysis framework for atom probe tomography data
CN109919032A (en) A kind of video anomaly detection method based on action prediction
CN111064620A (en) Power grid multimedia conference room equipment maintenance method and system based on operation and maintenance knowledge base
CN102339347A (en) A method for computer-assisted analyzing of a technical system
CN113076738A (en) GNN encoder and abnormal point detection method based on graph context learning
CN113591948A (en) Defect pattern recognition method and device, electronic equipment and storage medium
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN112464439A (en) Three-layer data mining-based power system fault type analysis method and device
Rathore et al. Multi scale graph wavenet for wind speed forecasting
CN115905959A (en) Method and device for analyzing relevance fault of power circuit breaker based on defect factor
CN111966758B (en) Electric power hidden trouble investigation method based on image data analysis technology
CN115905715A (en) Internet data analysis method and platform based on big data and artificial intelligence
Baranwal et al. Five deep learning recipes for the mask-making industry
Shen et al. Long-term multivariate time series forecasting in data centers based on multi-factor separation evolutionary spatial–temporal graph neural networks
CN113449626A (en) Hidden Markov model vibration signal analysis method and device, storage medium and terminal
CN117911662B (en) Digital twin scene semantic segmentation method and system based on depth hough voting
Chen et al. Surface Defect Detection System of Condenser Tube Based on Deep Learning
Amin et al. Link prediction in scientists collaboration with author name and affiliation
Pang et al. Electrical insulator defect detection with incomplete annotations and imbalanced samples
CN117115178B (en) Semi-parameter sharing-based power infrared inspection image segmentation and detection method
CN111008238B (en) Key mode automatic positioning and early warning method based on associated evolution big data
Guo Research on surface defect detection and fault diagnosis of mechanical gear based on R‐CNN
Liu et al. Fault identification method for substation power equipment based on infrared, visible light images, and YOLOv5 algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination