CN113486781B

CN113486781B - Electric power inspection method and device based on deep learning model

Info

Publication number: CN113486781B
Application number: CN202110752672.8A
Authority: CN
Inventors: 巢玉坚; 李洋; 刘鸿斌; 张影; 王彦波; 袁逸凡; 周子纯; 戴铁潮; 毕善钰; 于佳; 宋乐乐; 汤辉; 黄永明; 张铖
Original assignee: State Grid Corp of China SGCC; Southeast University; State Grid Zhejiang Electric Power Co Ltd; Nari Information and Communication Technology Co; Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd; State Grid Electric Power Research Institute
Current assignee: State Grid Corp of China SGCC; Southeast University; State Grid Zhejiang Electric Power Co Ltd; Nari Information and Communication Technology Co; Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd; State Grid Electric Power Research Institute
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2023-10-24
Anticipated expiration: 2041-07-02
Also published as: CN113486781A

Abstract

The application discloses a power inspection method based on a deep learning model, which comprises the following steps: step S1: performing feature extraction on the electric power inspection scene image to be detected by using a feature extraction module backbone of the detection model to obtain a feature map; step S2: fusing the feature graphs by using a cross-stage fusion module neg of the detection model; step S3: and detecting the fused feature images by using a multi-scale detection head module head of the detection model to obtain a frame and a category result of the target object in the electric power inspection image. The feature extraction module back uses an internal and external cascading scheme; and/or, the cross-stage fusion module neg uses a grouping fusion scheme, and the grouping number of the feature map is the same as the input channel number of the feature map. The application provides a power inspection method based on a deep learning model, which improves the detection performance and further reduces the network size.

Description

Electric power inspection method and device based on deep learning model

Technical Field

The application belongs to the technical field of power inspection equipment identification and anomaly detection, and particularly relates to a power inspection method based on a deep learning model, and a power inspection device based on the deep learning model.

Background

The power inspection aims at detecting the state of power transmission equipment and finding potential safety hazards in time. Compared with the traditional manual detection, the automatic inspection is higher in detection efficiency and lower in labor cost, so that the automatic inspection gradually becomes a main means of electric power inspection, and the unmanned aerial vehicle inspection method is widely applied. The unmanned aerial vehicle inspection needs to execute equipment identification and abnormality detection tasks, wherein equipment identification objects comprise insulators, power transmission lines, vibration-proof hammers, spacing bars and the like, and common abnormalities comprise bird nest abnormality, insulator breakage, icing of power transmission lines and the like.

In power scenarios, automatic inspection is often based on a target detection model of a deep learning model. Some current target detection methods based on deep learning can effectively improve detection performance, but most of the current target detection methods can only detect one device or abnormality. Furthermore, these methods are computationally expensive and difficult to deploy on lightweight platforms. In order to achieve the purpose of light weight, some small-sized target detection models are also in the field of power inspection, but the detection performance of the small-sized target detection models still has a large rising space.

For this reason, there is an urgent need for a small-sized object detection network that can simultaneously ensure lightweight and detection performance.

Disclosure of Invention

The application aims to overcome the defects in the prior art, provides a power inspection method based on a deep learning model, and solves the problem of unbalanced detection performance and model size.

In order to solve the technical problems, the application provides the following technical scheme.

In a first aspect, the application provides a power inspection method based on a deep learning model, which comprises the following steps:

performing feature extraction on the electric power inspection scene image to be detected by using a feature extraction module backbone of the target detection model to obtain a feature map;

fusing the feature images by using a cross-stage fusion module neg of the target detection model;

detecting the fused feature images by utilizing a multi-scale detection head module head of the target detection model to obtain a patrol result of the electric power patrol scene image to be detected;

the feature extraction module back uses an internal and external cascading scheme; and/or, the cross-stage fusion module neg uses a grouping fusion scheme, and the grouping number of the feature map is the same as the input channel number of the feature map.

Optionally, the feature extraction Module backup comprises a slicing Module Focus and 4 cascaded feature processing modules F-modules;

the slicing module Focus performs downsampling processing on the electric power inspection scene image to be detected; the feature processing Module F-Module performs feature extraction on the downsampled image to obtain a feature map;

each feature processing Module F-Module comprises a convolution kernel and at least one cascade-connected shuffleCSP Module, and an SPP Module is embedded between the convolution kernel of the last feature processing Module F-Module and the shuffleCSP Module;

the ShuffleCSP module comprises a residual branch and a main branch, wherein the residual branch combines original input information with deep semantic information extracted by the main branch;

and a plurality of ShuffleBlock modules are internally cascaded in the main branches of the ShuffleCSP module.

Optionally, the number of the 4 feature processing modules F-Module externally cascaded shuffleCSP modules is 1,2, 1, and the number of internally cascaded shuffleBlock modules is 2.

Optionally, the convolution kernel in each feature processing Module F-Module is a 3×3 convolution kernel with a step size of 2.

Optionally, the last two feature processing modules F-Module use a depth separable convolution.

Optionally, the cross-phase fusion module includes a top-down branch and a bottom-up branch:

in the bottom-up branch, taking a feature map output by a feature extraction Module backstene as input, obtaining a feature map which keeps the same scale and channel number as the output of a 3 rd feature processing Module F-Module through convolution and up-sampling operation, splicing the feature map and the feature map output by the 3 rd feature processing Module F-Module, and inputting the spliced feature map into a shuffleCSP5 Module for feature extraction to obtain a feature map; the extracted feature images are subjected to convolution and up-sampling operation to obtain feature images with the same scale and channel number as those of the output of the 2 nd feature processing Module F-Module, the feature images are spliced with the feature images output by the 3 rd feature processing Module F-Module, and the obtained output is used as input of a top-down branch;

in the top-down branch, inputting a characteristic image with the channel number of 1/2 through a shuffleCSP6 Module, downsampling through depth separable convolution to obtain a characteristic image with the size of one time reduced, fusing the output characteristic image with the characteristic images output by a 3 rd characteristic processing Module F-Module and the shuffleCSP5 Module through GroupF, inputting the fused characteristic image into a shuffleCSP7 Module to perform characteristic extraction to obtain a characteristic image with the channel number of 1/2 after fusion, downsampling through depth separable convolution to obtain an output characteristic image with the size of one time reduced, fusing the output characteristic image with the characteristic image obtained by the shuffleCSP4 Module and the first upsampling in the bottom-up branch, and inputting the fused characteristic image into the shuffleCSP8 Module to perform characteristic extraction to obtain a final output characteristic image;

the structure of the GroupFUSE comprises three parts of input, packet fusion and output; wherein:

the input is used for extracting the feature graphs with the same scale from the feature extraction module back and the cross-stage fusion module neg;

grouping fusion, which is used for grouping the channel number of the feature map for each feature map, multiplying the channel features in each group by corresponding weights, and splicing the weighted grouping features to finish fusion;

and an output for converting the stitched multi-channel feature dimension to a specified output dimension using a 1 x 1 convolution.

The number of feature map packets in the GroupFuse is the same as the number of feature map input channels.

Optionally, the multi-scale head comprises 3 heads head1, head2 and head3 of different scales,

extracting 3 feature images with different scales from 3 modules of a shuffleCSP6, a shuffleCSP7 and a shuffleCSP8 of a cross-stage fusion module neg to be respectively used as the input of 3 detection heads;

and respectively carrying out channel number change in the detection head through convolution, carrying out frame positioning and category detection on the three different-scale feature images after the channel number change by utilizing 3 anchor frame anchors with different sizes, outputting frame coordinates and category results, and finally screening frames and categories of all scales by utilizing a non-maximum suppression technology to obtain a final inspection result.

Optionally, before training the target detection model, the method further includes:

the training set is subjected to data enhancement processing to expand the training set.

Optionally, the data enhancement processing includes: any one or more of random flipping, random rotation, random scaling, random occlusion, and generation of an antagonism network.

In a second aspect, the present application further provides a power inspection device based on a deep learning model, including:

the feature map extraction module is used for carrying out feature extraction on the electric power inspection scene image to be detected by utilizing a feature extraction module backstene of the target detection model to obtain a feature map;

the feature map fusion module is used for fusing the feature maps by utilizing a cross-stage fusion module neg of the target detection model;

the characteristic diagram detection module is used for detecting the fused characteristic diagram by utilizing a multi-scale detection head module head of the target detection model to obtain a patrol result of the electric power patrol scene image to be detected;

Compared with the prior art, the application has the following beneficial effects:

1) The application realizes a small target detection model, and adopts an internal and external cascading scheme and/or a grouping mode of a cross-stage feature fusion module in the model, thereby realizing good compromise of model weight and detection performance. The final parameter of the network is only 2.97M, and the maximum mAP of the network can reach 41.1% on the power inspection verification data set, so that the network is more suitable for mobile terminals and embedded equipment compared with the prior small target detection model, and has higher engineering application value.

2) The application researches the determined internal and external cascading schemes of the feature extraction modules and the grouping mode of the cross-stage feature fusion modules, namely the number of the externally cascaded shuffleCSP modules is 1,2 and 1, the number of the internally cascaded shuffleBlock modules is 2, and the grouping number of the grouping fusion is set to be the same as the number of the input channels of the feature map, so that higher detection precision can be realized on the basis of smaller parameter number, and the thought is provided for the design of small and medium-sized target detection models in the power inspection engineering.

Drawings

FIG. 1 is a flow chart of a power inspection method based on a deep learning model provided by an embodiment of the application;

FIG. 2 is a schematic diagram of an object detection model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a packet fusion process provided by an embodiment of the present application;

FIG. 4 is a packet number experimental line graph of packet fusion provided by an embodiment of the present application;

FIG. 5 is a flow chart of a training process for a target detection model provided by an embodiment of the present application.

Detailed Description

The application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and are not intended to limit the scope of the present application.

Example 1

Aiming at the problem of application of a small target detection model in the field of electric power inspection, the application provides an electric power inspection method based on a deep learning model, which improves the detection performance and further reduces the network size.

The application discloses a power inspection method based on a deep learning model, which is shown in a flow chart in fig. 1 and comprises the following steps:

acquiring an electric power inspection scene image to be detected;

For the final inspection result, if the inspection purpose is to carry out equipment identification, the inspection result comprises a target object frame and equipment categories; if the detection result is abnormal detection, the inspection result comprises an abnormal region frame and an abnormal category.

The application realizes a small target detection model, and adopts an internal and external cascading scheme and/or a grouping mode of a cross-stage feature fusion module in the model, thereby realizing good compromise of model weight and detection performance. Compared with the prior small target detection model, the method is more suitable for mobile terminals and embedded equipment, and has higher engineering application value.

Taking device recognition as an example, a specific training process of the target detection model for device recognition is shown in fig. 5, and includes the following processes:

step S1, collecting a data set and dividing the data set into a training set and a verification set.

And collecting the power inspection scene image as a data set, and labeling the target object frame and the power transmission equipment category of the data set.

The target object includes a power transmission device, and the power transmission device class includes a connector, an insulator, a tower, a damper, a lightning arrester, a clip, a spacer, and the like. The dataset was divided into training and validation sets at a 9:1 ratio.

And S2, carrying out data enhancement on the training set.

Before training, the images of the training set are subjected to data enhancement processing to increase the number of images processed in each batch, and the input resolution is 640×640. The purpose of data enhancement is to increase the scene diversity of the power patrol image, thereby expanding the entire data set. The data enhancement method used in the present application includes a combination of one or more of the following methods: random flipping, random rotation, random scaling, cupout (random occlusion), styleGAN (generating an countermeasure network), etc., wherein random flipping, random rotation, random scaling are basic enhancement methods; the random shielding Cutout is to randomly wipe out some information in the image by using a full black rectangular block, so that the whole target detection model focuses more on global information of the image rather than local information, and the identification capability of the model on the target is improved; the generation countermeasure network StyleGAN is a generation countermeasure network method for generating images, and high-quality images of real scenes of simulated power inspection are realized through precise adjustment and control of high-order attributes of objects and a training process from thick to thin.

Step S3: and training the target detection model by using a training set.

The target detection model integrally comprises a feature extraction module back bone, a cross-stage fusion module neg and a multi-scale detection head; the training set is used to train the target detection model.

Firstly, extracting features of an image in a training set through a feature extraction module backstbone to obtain a feature image, further fusing the extracted feature image through a cross-stage fusion module neg, and finally, sending the fused feature image into a multi-scale detection head for detection to obtain a frame and a category result of a target object in the image.

As shown in fig. 2. The respective modules are described in detail below.

1) Feature extraction module backup

The feature extraction module back is used for extracting features of the input training set to obtain a feature map, and an internal and external cascading scheme is used in the feature extraction module back to achieve light weight of the model and improvement of detection performance.

As shown in fig. 2, the feature extraction Module backbox is composed of a slicing Module Focus and 4 feature processing modules F-modules. The input image firstly enters a Focus Module for downsampling, the output scale is reduced by one time, meanwhile, the channel number is changed into a characteristic image which is 4 times of that of the original image, then the characteristic image sequentially passes through cascaded F-modules 1-F-Module 4 for characteristic extraction, and the output scale and the channel number are respectively 1/16 and 16 times of that of the input characteristic image.

The slicing module Focus performs downsampling processing on an input training set image, namely, the size of the image is reduced by one time, and meanwhile, the channel number is changed into 4 times of that of an original image.

The first two feature processing modules F-Module 1-F-Module 2 are composed of 3X 3 standard convolution and cascade-connected shuffleCSP modules with the step length of 2, the shuffleCSP is a designed lightweight feature extraction Module, each shuffleCSP Module comprises two branches, namely a residual branch and a main branch, wherein the residual branch combines original input information with deep semantic information extracted by the main branch, the probability and the calculation cost of gradient disappearance in back propagation are reduced, the ShuffleBlock Module is stacked in the main branch, and the dimension and the channel number of an output feature graph of the shuffleCSP Module are identical to those of the input feature graph. In order to reduce the number of parameters, the 3 rd feature processing Module F-Module3 is composed of a 3×3 depth separable convolution with a step size of 2 and a cascade of ShuffleCSP3 modules, and the last feature processing Module F-Module4 is composed of a 3×3 depth separable convolution with a step size of 2, an SPP Module and a cascade of ShuffleCSP4 modules.

The F-modules can extract features lightweight and effectively, the number of blocks of the SheffleCSP cascade in the 4F-modules is N1, N2, N3 and N4 respectively, and the number of blocks can be controlled.

The multi-scale fusion Module SPP is embedded in the F-Module4, and multi-scale fusion of the feature map is realized through pooling functions of different scales and splicing operation, so that the extracted feature information is richer.

The application is characterized in that: in the depth separable convolution operation of the 3 rd and 4 th feature processing modules F-Module 3-F-Module 4, one convolution kernel is responsible for one channel, and one channel is only convolved by one convolution kernel, so that the number of generated feature map channels is the same as the number of channels of an input image. All channels in the traditional convolution correspond to one convolution kernel, and the depth separable convolution learns each channel, so that the quality of the extracted features is improved.

The feature extraction Module backup comprises 4 feature extraction modules F-Module 1-F-Module 4, a shuffleCSP Module is built in each feature extraction Module F-Module, the cascade block numbers of the shuffleCSP modules in different F-Module are respectively N1, N2, N3 and N4, the initial block number is set as (1,3,3,1), and the block numbers of N1-N4 are finally determined as (1, 2, 1) for ensuring the optimal performance by carrying out multiple experiments on the block number setting. Each ShuffleCSP module includes two branches, namely a CSP branch and a Main branch, wherein the CSP branch combines the original input information with the deep semantic information extracted by the Main branch, thereby reducing the probability of gradient disappearance in back propagation and the calculation cost. N ShuffleBlock modules are stacked in the Main branch, namely basic units of a confusion network ShuffleNet in the prior art, wherein the number n of cascade blocks is adjustable, and experiments prove that when n=2, the AI network can obtain the highest performance on the basis of keeping the weight reduction.

2) Cross-stage fusion module neg

The cross-stage fusion module neg fuses the feature graphs, and the cross-stage fusion module neg uses a grouping fusion scheme, so that the grouping number of the feature graphs is the same as the input channel number of the feature graphs, and the reduction of the model size and the improvement of the detection performance are realized.

As shown in fig. 2, the cross-stage fusion Module neg part comprises two branches from bottom to top and from top to bottom, the first branch takes the output of a feature extraction Module backstene as input, the channel number and the scale are changed through 1×1 convolution and up-sampling operation, a feature diagram which keeps the same scale and channel number as the output of the F-Module3 is obtained, and the two feature diagrams are spliced and then input into a ShuffleCSP Module (ShuffleCSP 5 in the figure) for feature extraction; and changing the channel number and the scale by using 1X 1 convolution and up-sampling operation to obtain a feature map which keeps the same scale and channel number as the output of F-Module2, and splicing the two feature maps to obtain an output serving as the input of a top-down branch. In the top-down branch, firstly, carrying out feature extraction by a shuffleCSP Module (shuffleCSP 6) to obtain a feature map with the channel number of 1/2, then carrying out downsampling by 3 x 3 depth separable convolution to obtain an output feature map with the size reduced by one time, then fusing the output with the output feature maps of F-Module3 and shuffleCSP5 by a GroupFuse, then inputting the fused feature map into a shuffleCSP Module (shuffleCSP 7) to carry out feature extraction to obtain a feature map with the channel number of fused feature map 1/2, then carrying out downsampling by 3 x 3 depth separable convolution to obtain an output feature map with the size reduced by one time, then sending the output and the feature map sampled for the first time in the shuffleCSP4 and the lower-up branch into the GroupFuse for fusion, and inputting the fused feature map into the shuffleCSP8 for final feature extraction.

As shown in fig. 3, the structure of the GroupFuse is divided into three parts of input, packet fusion and output. The input refers to the same-scale feature graphs extracted from different stages, and three same-scale feature graphs from a feature extraction module backstbone and a cross-stage fusion module neg are selected for fusion in the application. The grouping fusion is to group the channel number of the feature map according to the set group number for each feature map, learn a weight for each group of feature map through model training, multiply the channel features in the group with the corresponding weights, and finally splice the group features weighted in each stage, thereby completing the fusion. The output means that the spliced multi-channel characteristic dimension is converted into a designated output dimension by using 1×1 convolution, and then the characteristic diagram subjected to the 1×1 convolution is output and participates in subsequent network layer operation.

3) Multi-scale detection head

And detecting the fused feature images by the multi-scale detection head module head to obtain a frame and a category result of the target object in the power inspection image.

As shown in fig. 2, the multi-scale detection head includes head1, head2 and head3 respectively, firstly, 3 feature maps with different scales are extracted from 3 modules of a ShuffleCSP6, a ShuffleCSP7 and a ShuffleCSP8 of a cross-stage fusion module neg as input of 3 detection heads, channel number is changed by 1×1 convolution respectively, then, target object frame positioning and class detection are performed on the three feature maps with different scales after the channel number is changed by using 3 anchor frames (anchors) with different sizes, frame coordinates and class results are output, and finally, frames with different scales are screened by using a non-maximum suppression technology (NMS) to obtain final inspection results (including target area frame coordinates and target object class results).

It should be noted that if the detection is abnormal, the multi-scale detection head uses 3 anchor frames with different sizes to perform abnormal region frame positioning and abnormal class detection on the three different-scale feature images with changed channel numbers, outputs frame coordinates and class results, and finally screens frames with different scales by using the NMS to obtain final inspection results (including abnormal region frame coordinates and abnormal class results).

As shown in fig. 1, a training set is adopted to input a target detection model in batches for training, the output result of the target detection model, a target detection frame marked in advance and the type of the power equipment are subjected to error calculation, if the output result is not converged, parameters in the target detection model are modified, training is repeated by the training set until the training result is converged, and a final target detection model is obtained.

Step S4: and verifying the target detection model by using the verification set.

The verification of the target detection model refers to performing a light-weight improvement experiment on a feature extraction and grouping fusion part of the target detection model so as to determine an optimal model scheme and improve practicability. The method specifically comprises the following steps: changing the grouping number of GroupFUSE in a cross-stage fusion module new, the number of the SheffleCSP cascade blocks and the number of the SheffleBlock cascade blocks in a feature extraction module back and a convolution method in a network, and carrying out a comparison experiment on a verification set.

The evaluation indexes of the comparison experiment are average precision average value (mAP) and average Accuracy (AP) ₅₀ ) Parameter (parameters), floating point number of operations (FLPs). The average precision mean value and the average accuracy rate are used for measuring the detection precision of the model, the AP is the accuracy rate mean value on the PR curve, and the average precision mean value is reflected on various AP mean values under the condition that the cross ratio threshold value is 0.5-0.95 and the step length is 0.05; the average accuracy is reflected in the AP at a cross ratio threshold of 0.5. The parameter number and the floating point number operation times are used for measuring the lightweight performance of the model.

Experiment one: the feature extraction module backup uses the initially set (1,3,3,1) ShuffleCSP external block combination, the number of internal ShuffleBlock cascade blocks is set to 1, the cross-stage fusion module neg uses GroupFuse to perform multi-stage feature fusion, and different group numbers are set for experiments, and the experimental results are shown in table 1.

TABLE 1 GroupFUSE packet count experiment

Group	mAP(％)	AP ₅₀ (％)	Params(M)	FLOPs(G)
					1	40.1	75.2	6.2	13.6
2	39.5	77.2	6.2	13.6
					4	39.9	76.7	6.2	13.6
8	40.0	76.9	6.2	13.6
					16	40.2	75.1	6.2	13.6
32	40.4	75.8	6.2	13.6
					64	40.5	75.6	6.2	13.6
128(inp)	40.6	76.9	6.3	13.6

As shown in fig. 4, it was found that when the feature mAP of the same stage is divided into a plurality of groups and weighted fusion is performed, the mAP increases with the increase of the number of groups, but the performance gain due to the increase of the number of groups becomes smaller and smaller, and finally tends to be saturated. Further, the increase in the number of sets and the increase in the number of parameters and the number of calculations are small and negligible. Experiments find that when the number of packets is the same as the number of input feature channels, the mAP reaches the highest 40.6, and compared with other packet cases, the parameter number is increased by only 0.1M, so the number of packets is set as the number of input feature channels (inp).

Experiment II: in order to obtain the optimal convolution method setting, a comparison experiment is carried out by using a traditional convolution and a depth separable convolution in sequence in a feature extraction module back and a cross-stage fusion module neg, and the method (b) is used in the experiment ₁ ,b ₂ N) represents the case where convolution is used in the feature extraction module back and the cross-phase fusion module neg. Wherein b ₁ 、b ₂ N respectively represent the settings of F-modules 1-2, F-modules 3-4 and cross-stage fusion Module neg by using convolution method, b ₁ 、b ₂ The n is 0/1,0 represents the conventional convolution, 1 represents the depth separable convolution, and the experimental results are shown in table 2.

TABLE 2 convolution method experiment 1

(b ₁ ,b ₂ ,n)	mAP(％)	AP ₅₀ (％)	Params(M)	FLOPs(G)
					(1,1,0)	38.5	73.7	3.9	9.1
(0,0,1)	39.2	76.4	4.6	11.7
					(1,1,1)	36.1	71.9	3.0	8.0

Experiments find that the standard convolution in the cross-stage fusion module neck is replaced by the depth separable convolution, the result is best, the mAP value is 39.2, the mAP value exceeds that of the depth separable convolution used by the feature extraction module back, the effect of the depth separable convolution used by the feature extraction module back and the cross-stage fusion module neck is worst, the analysis reasons are probably that the feature extraction module back is mainly responsible for initial feature extraction, and the initial feature extraction quality is poor due to the depth separable convolution using a small parameter amount, so that the detection performance is reduced. Based on the above experiments, the following holds for b in the feature extraction module backbone ₁ For 0, i.e. guaranteeing the quality of feature extraction using conventional convolution, change b ₂ Experiments were performed with n, the experimental results are shown in Table 3。

TABLE 3 convolution method experiment 2

(b ₁ ,b ₂ ,n)	mAP(％)	AP ₅₀ (％)	Params(M)	FLOPs(G)
					(0,1,0)	40.5	77.5	4.0	11.0
(0,1,1)	39.9	77.0	3.1	9.8

The analysis experiment results show that in order to realize the good compromise of parameter quantity and detection performance, the depth separable convolution is used in the two F-modules behind the feature extraction Module back and the depth separable convolution is used in the cross-stage fusion Module neg.

Experiment III: N1-N4 SheffeCSP modules are respectively cascaded in the feature extraction Module F-Module of the feature extraction Module back part, N SheffeBlock modules are cascaded in the SheffeCSP, and in order to obtain an optimal cascading scheme, a plurality of experiments are performed on different cascading block numbers on the basis of the optimal convolution scheme of the step S42, and the experimental results are shown in Table 4.

Table 4 cascade protocol experiment

It was found experimentally that the detection performance was reduced with an increase in N at the time of fixation (N1, N2, N3, N4), where the detection performance was best when N1 to N4 were set to (1, 2, 1) and the number of internal ShuffleBlock cascade blocks N was set to 2.

And obtaining the optimal parameter values of each module in the target detection model based on the experimental result to obtain an optimized target detection model. Accordingly, as shown in fig. 5, in step S5, an optimized target detection model is output.

And then, detecting the electric power inspection scene image to be detected by using the target detection model to obtain an inspection result of the image. For the final inspection result, if the inspection purpose is to carry out equipment identification, the inspection result comprises a target object frame and equipment categories; if the detection result is abnormal detection, the inspection result comprises an abnormal region frame and an abnormal category.

Example 2

Based on the same inventive concept as the method of embodiment 1, the electric power inspection device based on the deep learning model of the present application includes:

For a specific implementation of the modules in the device according to the application, reference is made to the processing of the steps in the method according to example 1.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely a preferred embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present application, and these modifications and variations should also be regarded as the scope of the application.

Claims

1. The electric power inspection method based on the deep learning model is characterized by comprising the following steps of:

the feature extraction module back uses an internal and external cascading scheme; and/or, the cross-stage fusion module neg uses a grouping fusion scheme, and the grouping number of the feature map is the same as the input channel number of the feature map;

the feature extraction Module backbox comprises a slicing Module Focus and 4 cascaded feature processing modules F-modules;

2. The power inspection method based on the deep learning model as claimed in claim 1, wherein the number of the 4 feature processing modules F-Module external cascaded ShuffleCSP modules is 1,2, 1 and the number of the internal cascaded ShuffleBlock modules is 2.

3. The power inspection method based on the deep learning model according to claim 1, wherein the convolution kernel in each feature processing Module F-Module is a 3×3 convolution kernel with a step size of 2.

4. The power inspection method based on a deep learning model according to claim 1, wherein the last two feature processing modules F-modules use a deep separable convolution.

5. The power inspection method based on the deep learning model as claimed in claim 1, wherein the cross-stage fusion module comprises a top-down branch and a bottom-up branch:

an output for converting the stitched multi-channel feature dimension into a specified output dimension using a 1 x 1 convolution;

6. The method for power inspection based on deep learning model as claimed in claim 5, wherein the multi-scale detection head comprises 3 detection heads head1, head2 and head3 of different scales,

7. The deep learning model-based power inspection method of claim 1, further comprising, prior to training the target detection model:

8. The deep learning model-based power inspection method of claim 7, wherein the data enhancement process comprises: any one or more of random flipping, random rotation, random scaling, random occlusion, and generation of an antagonism network.

9. Electric power inspection device based on deep learning model, characterized by includes: