CN114359153A

CN114359153A - Insulator defect detection method based on improved CenterNet

Info

Publication number: CN114359153A
Application number: CN202111483734.6A
Authority: CN
Inventors: 李利荣; 陈鹏; 张云良; 梅冰; 丁江; 张开; 孙鹏; 赵迪; 刘粤
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-04-15
Anticipated expiration: 2041-12-07
Also published as: CN114359153B

Abstract

The invention belongs to the computer vision technology, in particular to an insulator defect detection method based on improved CenterNet, which is characterized in that in order to enable a detection model to adapt to the difference of the characteristic scales of insulators and defects thereof, a structure of characteristic grouping and attention separation is adopted in an encoding stage to extract characteristics, and a cavity space pyramid pooling structure is added to capture the multi-scale characteristics of the insulators and the defects thereof; in order to reduce the loss of feature information in a decoding network, feature grouping and different layer features of a separated attention structure are cascaded with an ECA attention module and are respectively added with each deconvolution feature passing through an SA space pyramid attention module to form double attention feature fusion; and returning the width, height and size information of the object through the thermodynamic diagram center point of the object. The method can directly obtain the characteristics from data learning, does not need the characteristics of manual design, and can realize accurate insulator defect detection in the inspection image with complex background.

Description

Insulator defect detection method based on improved CenterNet

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an insulator defect detection method based on improved CenterNet.

Background

In high-voltage transmission lines, insulators are key fittings used for supporting and insulating electrical conductors from the ground. Generally, the insulator is exposed to a strong electric field and various severe environments, such as rainy and snowy weather, typhoon, acid rain, and the like. The insulator can be damaged or even fall off under the influence of long-term severe environment, and further the safe operation of the power grid system in China is threatened. Nowadays, the high-voltage transmission line has longer and longer running mileage, and the manual inspection along the line is extremely difficult. With the development of science and technology, the unmanned aerial vehicle is utilized to inspect the power transmission line to become a research hotspot, and on one hand, the unmanned aerial vehicle is utilized to replace manual inspection to improve the working efficiency; on the other hand, the cost and the risk of routing inspection are greatly reduced.

The existing detection method for the insulator defect of the power transmission line can be roughly divided into a physical method, a traditional vision-based method and a deep learning-based method. Physical methods such as ultrasonic methods based on manual operation are easily affected by factors such as sunlight, climate, distance and the like in practical detection application; the traditional vision-based detection algorithm mainly utilizes a sliding window to select an interested region, extracts the characteristics of each window, and then classifies a characteristic sample, such as an algorithm like HOG + SVM, and the like.

Deep learning (deep learning) is a method of machine learning, the concept of which derives from mimicking the working mechanism of artificial neural networks. In the field of image recognition, deep learning first automatically learns feature extraction through a convolutional neural network, and then respectively obtains the category and position information of a target through a classifier and a bounding box regressor. Deep learning does not require a human to manually select features as compared to conventional image processing techniques. Common target detection algorithms based on deep learning are divided into two types according to detection stages, namely a two-stage target detection algorithm and a single-stage target detection algorithm; the fixed anchor frames are also divided into two types according to whether the detection process is set, namely an anchor frame detection algorithm and an anchor frame-free detection algorithm. The two-stage detection algorithm can be basically divided into two steps in principle, firstly, an area suggestion frame is generated through a selective search algorithm or RPN, and then targets in a candidate frame are classified, wherein the representative algorithms comprise R-CNN, SPP-Net, Fast-RCNN, R-FCN and the like; the single-stage target detection algorithm regresses and predicts the target by setting a fixed anchor frame, and the algorithm is, for example, SSD, RetinaNet, and the like. Generally, the detection speed of the single-stage algorithm is faster than that of the two stages, and the detection precision is lower than that of the two stages; recently, the anchor-free frame detection algorithm is very hot, and the idea is that the speed and the precision of the anchor-free frame detection algorithm have certain advantages through predicting key points of a boundary frame and through attribute information of a key point regression target, and representative algorithms of the anchor-free frame detection algorithm include CornerNet, CenterNet, FCOS and the like. With the continuous improvement of computer performance, the detection method based on the deep learning framework is widely applied, however, the representation forms of defects in the images are various and complex due to the complex background information of the insulators in the aerial images of the unmanned aerial vehicle, the slender shape characteristics of the insulators and different defect changes, and the accuracy of the insulators identified by using a common deep learning algorithm is low. Therefore, the accuracy of the existing insulator detection method still needs to be improved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an insulator defect detection method based on improved CenterNet, which can quickly and accurately identify the insulator and the defect position thereof from the power transmission line inspection image.

In order to solve the technical problems, the invention adopts the following technical scheme: an insulator defect detection method based on improved CenterNet, which is used for processing aerial insulating sub-images or videos acquired by an unmanned aerial vehicle, comprises the following steps:

step 1, in the encoding stage, firstly, a ResNet residual block is replaced by a ResNeSt rolling block in an encoding structure; then adding a cavity space pyramid pooling structure behind the feature extraction network to form a coding structure with multi-scale feature extraction;

step 2, in the decoding stage, three layers of characteristics of the characteristic extraction network in the ResNeSt rolling block are respectively cascaded with an ECA module to form a three-branch jumping branch; then, the jumping branches of the three branches are respectively fused with the characteristics of each upper sampling layer through the characteristics of an SA module to form a decoding structure with the fusion of the double attention characteristics;

step 3, in a prediction stage, the thermodynamic diagram prediction, the central point offset and the width and height information are respectively obtained through three parallel depth separable convolutions of the fusion characteristics obtained in the step 2, and finally the information is merged to output a detection result;

and 4, in a training stage, on the basis of the IOU boundary box loss function, the boundary box regression loss function CIOU is obtained by considering the non-overlapping condition of the prediction box and the real box, the distance between the center points of the prediction box and the real box and the aspect ratio factors.

In the insulator defect detection method based on the improved centrnet, the step 1 comprises the following specific steps:

1.1, dividing input features into R groups, dividing each separated R group of features into K groups of feature map groups by using hyper-parameters, wherein each feature map group is serially connected with a 1 × 1 convolution and a 3 × 3 convolution, and splicing the features after the convolution;

step 1.2, dividing R splicing characteristics into two branches, adding the R splicing characteristics by the first branch, and outputting the characteristics after global pooling, two full-connection layers and an activation function; the second branch multiplies the R splicing characteristics with the first branch which obtains the attention weight value according to channels to realize the attention separation, adds the R groups of attention separation characteristics, and converts the number of the channels through 1 multiplied by 1 convolution;

step 1.3, continuing to use the short circuit connection of ResNet, adding the input characteristics and the characteristics of the channels which are subjected to 1 multiplied by 1 convolution conversion;

and step 1.4, adding a void space gold tower structure after feature extraction to capture the insulator and the multi-scale features of the defects of the insulator.

In the insulator defect detection method based on the improved centrnet, the step 2 comprises the following specific steps:

step 2.1, a Layer2, Layer3 and Layer4 Layer feature cascade efficient channel attention mechanism of a feature extraction network in the coding structure is called as a first branch, and embedding space attention of each deconvolution Layer feature is called as a second branch;

step 2.2, performing Average firing on the features input into the first branch to obtain a one-dimensional feature vector, capturing local feature information of the input features by using a mechanism of self-adaptive convolution kernel size, activating the obtained local feature information to obtain a channel attention weight value of the layer of features, multiplying the channel attention weight value by the initial features input into the layer one by one, and outputting the features of the first branch;

2.3, enabling the characteristics of the deconvolution layer to pass through parallel branches of global maximum pooling and global average pooling, adding the parallel processed characteristics to obtain fusion characteristics, then performing 7 × 7 convolution on the fusion characteristics and an activation function to obtain a spatial attention weight value of the layer, and finally multiplying the value and the initial input characteristics channel by channel to output the characteristics of a second branch;

and 2.3, adding the characteristics of the first branch and the second branch to realize a characteristic fusion process, so that the first-layer characteristics of the prediction network have both the channel characteristics of the prediction category and the accurately positioned spatial characteristics.

In the above insulator defect detection method based on improved centrnet, the implementation of step 3 includes: three 3 x 3 ordinary convolutions exist for the detection head, the three 3 x 3 convolutions are replaced by the depth separable convolution, and when the thermodynamic diagram predicts that the number of channels in a branch is 2 and the number of classes is 2, the parameter quantity of the detection head is reduced by about 40% through calculation.

In the method for detecting the defect of the insulator based on the improved centrnet, the bounding box regression loss function CIOU in the step 4 is specifically as follows:

in the formula, ρ²And (A, B) is the Euclidean distance between the central point of the prediction frame and the central point of the real frame, c is the diagonal distance of the minimum frame surrounding the prediction frame and the real frame, alpha is a weight function, and v is a factor for measuring the consistency of the aspect ratio.

Compared with the prior art, the method changes the feature extraction network of the original CenterNet network into ResNeSt50, and serially connects the cavity space pyramid pooling module after ResNeSt50, so that the detection network has strong feature extraction capability and improves the sensitivity of the network to the insulator and the defect feature scale thereof. And by using feature fusion of double attention, the predicted layer features have channel features of predicted categories and precisely positioned spatial features. The method can replace the 3 multiplied by 3 common convolution in the detection head part with the depth separable convolution, effectively reduces the parameter quantity of the detection head, and can enable the detection model to be lighter if the method is applied to hardware defect detection tasks of multiple classes of the power transmission line. A loss function IOU for guiding the regression of the network boundary box is replaced by the CIOU, so that the condition that the optimization cannot be performed possibly in the network training process can be effectively eliminated.

Drawings

Figure 1 is a diagram of a centret network architecture according to one embodiment of the present invention;

FIG. 2 is a diagram of the ResNeSt residual structure according to one embodiment of the present invention;

FIG. 3 is a void space pyramid pooling structure according to an embodiment of the present invention;

FIG. 4 is an encoding network structure according to one embodiment of the invention;

FIG. 5 is a schematic illustration of dual attention feature fusion according to an embodiment of the present invention;

FIG. 6 is a lightweight test head according to one embodiment of the present invention;

FIG. 7 is a schematic diagram of a CIOU according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the following embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The present invention is further illustrated by the following examples, which are not to be construed as limiting the invention.

In the embodiment, an anchor-frame-free detection algorithm CenterNet is selected in the process of realizing the detection of the insulator defects of the power transmission line.

The centret is an anchor-free frame detection algorithm with a coding and decoding structure, and the algorithm structure is shown in figure 1 and consists of a coding network, a decoding network and a detection network. A feature extraction network in an encoding structure carries out feature extraction on an input image, a decoding network consists of three serial deconvolution branches, a 128 x 3 feature map is obtained after three times of upsampling, a detection network consists of a thermodynamic diagram prediction branch, a central point offset prediction branch and a width and height prediction branch, each branch consists of a 3 x 3 convolution and a 1 x 1 convolution in series, the difference is the number of channels of each branch, the number of channels of the thermodynamic diagram prediction branch is the number of detection categories, the number of channels of the central point offset prediction branch and the number of channels of the width and height prediction branch are both 2, and finally detection results are output by combining the three branches.

The embodiment directly processes aerial insulating sub-images or videos acquired by the unmanned aerial vehicle, and can identify the position and size information of the insulator and the defects thereof from the power transmission line inspection image. The method is divided into four parts: the method comprises the steps of firstly, improving a coding network, secondly, fusing double attention characteristics, thirdly, lightening a detection head, fourthly, optimizing regression loss of a boundary box.

The method specifically comprises the following steps: (1) the improved coding network is adopted to improve the characteristic extraction capability of the original CenterNet network and improve the sensitivity of the network to insulators with different scales and defect characteristics of the insulators; (2) feature fusion with dual attention. In order to reduce the loss of feature information in a decoding network, hierarchical features of a feature extraction network are added after being cascaded with an efficient channel attention mechanism and after being embedded with features of a space attention mechanism after being cascaded with features of each deconvolution layer, so that the features for detecting network prediction not only have channel features of prediction categories, but also have precisely positioned space features. (3) The detection head uses the depth separable convolution to replace the common 3 multiplied by 3 convolution, so that the parameter quantity can be effectively reduced, and the model is lighter. (4) The CIOU is used for replacing the IOU boundary box loss function, the situation that the loss is 0 in the network training process is effectively solved, and the network can accurately learn the parameters of the real boundary box.

The embodiment is realized by the following technical scheme, and the insulator defect detection method based on the improved CenterNet comprises the following steps:

s1, improving a coding network, and aiming at the problem that an original CenterNet target detection algorithm is low in sensitivity to changes of power transmission line insulators and defect characteristic scales thereof (the original CenterNet network structure is shown in figure 1), changing the coding network of the CenterNet into a coding structure formed by connecting ResNeSt (shown in figure 2) and ASPP (shown in figure 3) in series.

Moreover, the specific steps for improving the coding network are as follows: the feature extraction network ResNet in the original CenterNet is replaced by ResNeSt50, and the feature extraction capability is improved while the calculation amount is not increased; and embedding a cavity space pyramid pooling ASPP module after the characteristic extraction network, so that the sensitivity of the network to the insulator and the defect scale thereof is improved.

S2, fusing dual attention characteristics, as shown in figure 4. For the original CenterNet target detection algorithm, feature information is missing in the sampling process on a decoding network, features of Layer2, Layer3 and Layer4 of ResNeSt in the coding network are respectively cascaded with an ECA (efficient Channel attention) module to transversely output features with attention Channel information, and then the features and features of each deconvolution Layer feature of the decoding network output the features with attention spatial positions after passing through an SA (spatial attention) module are added to form double attention feature fusion.

Moreover, the specific steps of the fusion of the dual attention characteristics are as follows: as shown in fig. 4, the fusion operation is performed by using the features of the 2 nd, 3 rd and 4 th layers of the coding network as first branches through the feature of the efficient channel attention mechanism; then, the characteristic of each deconvolution characteristic of the decoding network which passes through a space attention mechanism is taken as a second branch, and the double attention characteristic fusion body adds the two branches. The network structure formed by encoding and decoding outputs a feature map with the size of 128 multiplied by 128 and the number of channels of 64 for detecting the network prediction result.

S3, the detection head is light in weight, the original CenterNet network is composed of thermodynamic diagram prediction branches, central point offset prediction branches and object width and height prediction branches, each branch is composed of a 3 x 3 convolution and a 1 x 1 convolution in series, the difference is that the number of characteristic channels of the thermodynamic diagram prediction branches is the number of categories, and the number of characteristic channels of the central point offset prediction branches and the number of characteristic channels of the object width and height prediction branches are 2. Experiments prove that the three 3 x 3 convolutions of the detection header have certain influence on the model inference speed, and the 3 x 3 convolutions in the detection header are replaced by the depth separable convolutions (as shown in fig. 5) to improve the detection speed.

And the specific steps of the lightweight detection head are as follows: three 3 x 3 ordinary convolutions exist for the detection head, the parameter quantity is not too large in the prediction process, in order to prevent reduction of the detection speed, the three 3 x 3 convolutions are replaced by the depth separable convolutions, and when the thermodynamic diagram predicts that the number of channels in the branch is 2 (namely, the number of categories is 2), the parameter quantity of the detection head can be reduced by about 40% through simple calculation.

And S4, optimizing the regression loss of the boundary box, wherein the original CenterNet network boundary box is regressed by using an IOU loss function, and when the IOU (shown in figure 6) is not overlapped with a real box, the gradient is 0, so that the network cannot perform back propagation to optimize network parameters. Therefore, CIOU (as shown in FIG. 7) loss functions are used to guide the bounding box regression.

Moreover, the specific steps of optimizing the regression loss of the bounding box are as follows: and when the prediction frame and the real frame are not overlapped, the regression loss of the boundary frame is zero, so that the CIOU is used for guiding the regression of the boundary frame, and the CIOU considers the condition that the prediction frame is not overlapped with the real frame, the distance between the center point of the prediction frame and the center point of the real frame and the aspect ratio factor on the basis of the IOU.

In the specific implementation, in order to enable a detection model to adapt to the difference of characteristic scales of the insulator and the defect thereof, a structure for characteristic grouping and attention separation is adopted in the encoding stage to extract the characteristics, and a cavity space pyramid pooling structure is added to capture the multi-scale characteristics of the insulator and the defect thereof; secondly, in order to reduce the loss of feature information in a decoding network, different layers of features of feature grouping and separation attention structures are cascaded with an ECA attention module and are added with each deconvolution feature passing through an SA space pyramid attention module to form double attention feature fusion; and finally, returning the width, height and size information of the object through the thermodynamic diagram center point of the object. Compared with other traditional methods, the method can directly obtain the characteristics from data learning, does not need the characteristics of manual design, and can realize accurate insulator defect detection in the inspection image with complex background. The network architecture diagram of the centrnet shown in fig. 1 includes an encoding network, a decoding network and a prediction network.

1) In the encoding stage, firstly, a ResNet residual block is replaced by a ResNeSt rolling block in an encoding structure; and then adding a cavity space pyramid pooling structure behind the feature extraction network to form a coding structure with multi-scale feature extraction.

2) In the decoding stage, three layers of characteristics of a characteristic extraction network in a ResNeSt rolling block are respectively cascaded with an ECA module to form a three-branch jumping branch; and then, respectively fusing the jumping branches of the three branches with the characteristics of the upper sampling layers through the SA module to form a decoding structure with dual attention characteristic fusion.

3) And in the prediction stage, the thermodynamic diagram prediction, the central point offset and the width and height information are respectively obtained by the fusion characteristics obtained in the step 2) through three parallel depth separable convolutions, and finally the information is merged to output a detection result.

4) And in the training stage, the boundary box regression loss function CIOU is obtained by considering the non-overlapping condition of the prediction box and the real box, the distance between the center points of the prediction box and the real box and the aspect ratio factor on the basis of the IOU boundary box loss function.

As shown in the ResNeSt residual block diagram of fig. 2, the present embodiment replaces the residual block of ResNet with a ResNeSt rolling block having a set of signatures and a separate attention. Specifically, the ResNeSt convolution block firstly divides input features into R groups, divides each separated R group of features into K groups of feature map groups by using hyper-parameters, wherein each feature map group is serially connected with a 1 × 1 convolution and a 3 × 3 convolution, and splices the features after the convolution; then dividing the R splicing characteristics into two branches, adding the R splicing characteristics by the first branch, and outputting the characteristics after global pooling, two full-connection layers and an activation function; the second branch multiplies the R splicing characteristics with the first branch which obtains the attention weight value according to channels to realize the attention separation, adds the R groups of attention separation characteristics, and converts the number of the channels through 1 multiplied by 1 convolution; finally, to make the network deeper, the short-circuit connection of ResNet is followed, adding the input signature to the signature of the 1 × 1 convolution-converted channel.

Fig. 3 is a cavity space pyramid pooling structure, and after feature extraction, cavity space pyramid pooling is added, and is a module for capturing a multi-scale receptive field in a deplabv 3 semantic segmentation network, because the embodiment is directed to an insulator and its defect features, in an actual detection task, the insulator and its defect features show a problem of large scale change, the cavity space pyramid outputs the features of the multi-scale receptive field by adopting five parallel convolutions, and the five parallel convolutions are global average pooling, 1 × 1 convolution, 3 × 3 convolution with a cavity rate of 6, 3 × 3 convolution with a cavity rate of 12, and 3 × 3 convolution with a cavity rate of 18, and the five parallel output features are spliced and then converted into the features of a channel number by 1 × 1 convolution to output the coding network.

Fig. 4 is an overall structure of the coding network, namely a combination of a resenest structure and a void space gold tower structure.

Fig. 5 is a schematic diagram of dual attention feature fusion, in this embodiment, an efficient channel attention mechanism and a spatial attention mechanism are embedded in different branches, and the two different attention mechanisms are fused by the following steps: the efficient channel attention mechanism of feature cascade of Layer2, Layer3 and Layer4 layers of the feature extraction network in the coding structure is called a first branch, embedding spatial attention of each deconvolution Layer feature is called a second branch, and the specific process of feature fusion of the two branches is explained in more detail below. A first branch: firstly, performing Average firing on the features of an input first branch to obtain a one-dimensional feature vector, capturing local feature information of the input features by using a mechanism of self-adaptive convolution kernel size, activating the obtained local feature information to obtain a channel attention weight value of the layer of features, multiplying the channel attention weight value by the initial features of the layer of features channel by channel, and outputting the channel attention weight value as the features of the first branch; a second branch circuit: firstly, enabling characteristics of a deconvolution layer to pass through parallel branches of global maximum pooling and global average pooling, adding the characteristics subjected to parallel processing to obtain fusion characteristics, then performing 7 × 7 convolution on the fusion characteristics and an activation function to obtain a spatial attention weight value of the layer, and finally multiplying the value and initial input characteristics channel by channel to output characteristics of a second branch; and in the feature fusion process, the first branch and the second branch are added, so that the first layer feature of the prediction network has the channel feature of the prediction category and the accurately positioned spatial feature.

Fig. 6 shows the weight reduction of the detection head. Converting the normal 3 x 3 convolution to a depth separable convolution can effectively reduce the number of parameters. The process of calculating the parameter number by the depth separable convolution is different from the process of calculating the parameter number by the normal convolution in that the process of performing convolution by steps is divided into two steps, the first step is to perform convolution by 3 × 3 with the number of channels as the number of channels of the feature map, the second step is to perform convolution by 1 × 1 with the number of channels as the convolution kernel of the channel of the conversion feature map, for example, the parameter number is D _ k × C _ in × C _ out by setting the input feature map size to W _ in × H _ in × C _ in, the feature map size to be output to W _ out × H _ out × C _ out, and the convolution kernel size to be D _ k × D _ in × C _ out by the normal convolution, whereas the parameter number is D _ k × C _ in × C _ out + C _ in × C _ out × 1 × 1 by the depth separable convolution.

And optimizing a bounding box loss function, wherein when the bounding box regression loss function IOU in the original CenterNet network is not overlapped with the real box, the network cannot optimize and update the weight parameters. The embodiment considers the non-overlapping condition of the prediction frame and the real frame, the distance between the center points of the prediction frame and the real frame and the pre-display on the basis of the IOU loss. The aspect ratio of the test frame to the real frame is shown in FIG. 7. The bounding box regression loss function CIOU is given by the following equation (1).

The network training steps are mainly as follows: data preprocessing, hardware environment construction, software environment construction, hyper-parameter and training skill setting.

The first step is as follows: data preprocessing, namely firstly expanding the data of the power transmission line picture obtained by aerial photography of an unmanned aerial vehicle by operations of rotation, cutting, zooming, translation and the like, and adjusting the pixel value of the picture to 512 multiplied by 512; then, labeling all pictures by using a LabelImg image labeling tool, wherein the labeling format is PASCAL VOC; and finally, arranging the obtained label file and the image into a PASCAL VOC data set format.

The second step is that: the method comprises the steps of building a hardware environment, wherein the hardware environment is trained on a workstation which is provided with an Intel i 7-87003.2 GHzCPU and a NVIDIARTX 2080Ti GPU with a 16GB running memory, and the fact proves that the hardware environment can run on a personal computer with a proper size of video memory is only taken as an example.

The third step: and (3) building a software environment, wherein the software environment is carried out under Python3.6, Pycharm community version and Pytrich1.2 deep learning framework. Specifically, a video card driver cuda and a matched cudnn are firstly required to be installed in the Ubuntu system, an anaconda convenient management environment is preferably downloaded before software is installed, and a Pycharm community version is selected as a compiler; and then the terminal of the Ubuntu system creates an environment by using a command of controlling create-n name python to be 3.6, activates the environment by controlling activate name after the environment is created, downloads the operating environment of the pytorch in the activated environment, inputs an import torch and a torch-cut-is-available () command in a python command column of the Pycharm after the environment is downloaded to verify whether the torch environment is successfully installed and verify whether accelerated training can be carried out by using cuda in the environment.

The fourth step: the hyper-parameters and the training skills are set, and in order to accelerate network convergence, the network is trained by utilizing a freezing and unfreezing mode; specifically, firstly, freezing a backbone network, and training 50 epochs for a decoding network and a detection network; then, the trunk network is unfrozen, and 200 epochs are end-to-end trained by the trunk network, the decoding network and the prediction network to obtain a final detection model. Because the resolution of aerial images is too large, and a large amount of memory of a display card is consumed in the training process, the Batch Size can be set to be 8, an Adam network optimizer is selected, and the initial learning rate of the network is set to be 1 multiplied by 10^-3。

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. An insulator defect detection method based on improved CenterNet is used for processing aerial insulator images or videos acquired by an unmanned aerial vehicle, and is characterized in that: the method comprises the following steps:

2. The improved centret-based insulator defect detection method according to claim 1, characterized in that: the implementation of the step 1 comprises the following specific steps:

3. The improved centret-based insulator defect detection method according to claim 1, characterized in that: the step 2 is realized by the following specific steps:

4. The improved centret-based insulator defect detection method according to claim 1, characterized in that: the implementation of step 3 comprises: three 3 x 3 ordinary convolutions exist for the detection head, the three 3 x 3 convolutions are replaced by the depth separable convolution, and when the thermodynamic diagram predicts that the number of channels in a branch is 2 and the number of classes is 2, the parameter quantity of the detection head is reduced by about 40% through calculation.

5. The improved centret-based insulator defect detection method according to claim 1, characterized in that: step 4, the bounding box regression loss function CIOU specifically comprises: