CN112614121A

CN112614121A - Multi-scale small-target equipment defect identification and monitoring method

Info

Publication number: CN112614121A
Application number: CN202011592556.6A
Authority: CN
Inventors: 封琰; 谭毓卿; 袁源; 张海林; 吴童生; 王兴顺; 李沛然; 樊海峰; 梁珑; 田洪滨; 展毅晟; 芦国云; 郭妍; 谢占兰; 卢涛; 冯小霞; 张青梅; 沈娟; 马雅静; 刘有文
Original assignee: QINGHAI SANXIN RURAL POWER CO Ltd; Hainan Power Supply Co Of State Grid Qinghai Electric Power Co
Current assignee: QINGHAI SANXIN RURAL POWER CO Ltd; Hainan Power Supply Co Of State Grid Qinghai Electric Power Co
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-06

Abstract

The invention relates to the technical field of machine vision, in particular to an image identification monitoring method for defects of small target equipment. A multi-scale small target equipment defect identification monitoring method is characterized in that: (1) a single target detector is constructed for multiple categories. (2) A small convolution filter is used to predict the class scores and position offsets of a fixed set of default bounding boxes on the feature map. (3) Predictions of different scales are generated from feature maps of different scales, and are explicitly separated by aspect ratios. The method provided by the invention outputs discretized multi-scale and multi-proportion default boxes coordinates by predicting object areas on feature maps of different convolution layers, and simultaneously predicts frame coordinate compensation of a series of candidate frames and confidence coefficient of each category by using a small convolution kernel.

Description

Multi-scale small-target equipment defect identification and monitoring method

Technical Field

The invention relates to the technical field of machine vision, in particular to an image identification monitoring method for defects of small target equipment.

Background

In the task of detecting and identifying the defective target of the equipment, the target to be detected is possibly present at any position in the image, and the size, the length and the width of the target are not determined, so that the difficulty is brought to the target detection and identification. Since the image is uncertain in size, it is necessary to use a lot of computing resources to classify all possible positions and sizes of regions on the image, and therefore, it is necessary to first generate some candidate regions (Region probes) to find out regions that are likely to contain objects.

The convolutional neural network belongs to one of neural networks, is one of the most common networks for deep learning, and is widely applied to the fields of machine vision, word processing, numerical analysis and the like. Deep learning is the most important branch of machine learning, and the height which cannot be realized by the original machine learning is achieved in many fields. Therefore, the convolutional neural network can be regarded as a representative of the current mainstream artificial intelligence detection implementation.

Disclosure of Invention

In order to further improve the detection accuracy of the multi-scale small target of the aerial image, the invention provides a ResNet50 variant network structure design mode capable of enhancing the convolution feature extraction of the multi-scale small target. By means of increasing the network width, not only can each layer in the network learn sparse or non-sparse characteristics, but also the adaptability of the network to multi-scale small targets is increased. Meanwhile, the convolution operation of 2X 3 is continuously adopted, so that the same receptive field can be obtained as that of the convolution operation of 5X5, and a certain number of convolution layer weight parameters can be reduced.

(1) A single target detector is constructed for multiple categories.

(2) A small convolution filter is used to predict the class scores and position offsets of a fixed set of default bounding boxes on the feature map.

(3) To achieve high detection accuracy, predictions of different scales are generated from feature maps of different scales, and the predictions are explicitly separated by an aspect ratio.

The model adds several layers of features at the end of the underlying network that predict the offsets of different scales and aspect ratios to the default box and their associated confidence levels.

(1) Multi-scale feature map detection: a convolutional signature layer is added to the end of the truncated base network. The sizes of the layers are gradually reduced to obtain predicted values of multiple scale detections, and the detected convolution models are different for each characteristic layer.

(2) Convolution predictor detected: each added feature layer (or alternatively an existing feature layer of the underlying network) may use a set of convolution filters to produce a fixed set of predictions. The SSD network architecture is pointed out at the top of these figures. For a feature layer of size m × n with p channels, a 3 × 3 × p convolution kernel convolution operation is used, yielding a score for a class or coordinate offset from a default box. At each m × n size location where a convolution kernel operation is applied, an output value is generated. The bounding box offset output is measured relative to a default box, which is positioned relative to the feature map.

(3) Default box to aspect ratio: a set of default bounding boxes is associated with each feature map unit of the top-level network. The default box convolves the feature map such that the position of each box instance with respect to its corresponding cell is fixed. In each feature mapping unit, we predict the offset from the default box shape in the cell, and the per class score of the instance in each box. Specifically, for each of the k boxes at a given position, we compute a class c score and 4 offsets from the original default box. This results in a total of (c +4) k filters required at each position in the profile, producing (c +4) k m n outputs for the m n profile. The default box is similar to the anchor boxes used in Faster R-CNN, but applies to different resolution profiles. Using different default box shapes in multiple feature maps can effectively discretize the space of possible output box shapes.

(4) Matching strategies: at the beginning, each group channel box and default box are matched by using the best jaccard overlay in the MultiBox, so that each group channel box is ensured to correspond to a unique default box. But unlike the MultiBox, the default box is paired with any grountrituth box later, as long as the jaccard overlap between the two is greater than a threshold, which is shown in the following graph:

the formula can find that the jaccard overlap is IOU, namely the intersection of the two sets is divided by the union of the two sets.

(5) Data augmentation: for each training image, the following selections were randomly made:

using the original image to sample a patch, the smallest jaccard overlap (IOU) between the patch and the object is: 0.1, 0.3, 0.5, 0.7 and 0.90.1, 0.3, 0.5, 0.7 and 0.9.

Randomly sample one patch: sampled patch is the original image size scale is [0.1, 1] [0.1, 1], aspect ratio is between 1212 and 22. When the center (center) of the groudtuth box is in the sample's patch, the overlap is preserved. After these sampling steps, each sampled patch is rescize to a fixed size and flipped at a random level with a probability of 0.5.

(7) Adding atrous: the size of the receptive field (receptive field) is changed after modifying the network structure. Therefore, the technology of atrous algorithms is adopted, and the model identification precision is improved.

In the invention, a regression method of YOLO is combined with an anchor box mechanism of Faster R-CNN, and a multi-scale-based target detection and identification method is innovatively provided. And (3) outputting discretized multi-scale and multi-proportion default boxes coordinates by predicting object regions on feature maps of different convolution layers, and predicting frame coordinate compensation and confidence coefficient of each category of a series of candidate frames by using a small convolution kernel.

Drawings

FIG. 1 is a flow chart of data processing according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further explained below with reference to the accompanying drawings.

In the target detection and identification algorithm based on the regional convolutional neural network, a method for detecting a target is carried out by classifying a candidate Region (Region pro posal) + Convolutional Neural Network (CNN), namely, a position where the target possibly appears in a graph, namely the candidate Region, is found in advance, and then the convolutional neural network is used for extracting features.

The invention provides a ResNet50 variant network structure design mode capable of enhancing multi-scale small-target convolution feature extraction. By means of increasing the network width, not only can each layer in the network learn sparse or non-sparse characteristics, but also the adaptability of the network to multi-scale small targets is increased. Meanwhile, the convolution operation of 2X 3 is continuously adopted, so that the same receptive field can be obtained as that of the convolution operation of 5X5, and a certain number of convolution layer weight parameters can be reduced.

(1) A single target detector is constructed for multiple categories. Faster and more accurate than the prior art methods, while ensuring a higher inspection rate

the jaccard overlay is the IOU, i.e., the intersection of the two sets divided by the union of the two sets.

Claims

1. A multi-scale small target equipment defect identification monitoring method is characterized in that:

(1) constructing a single-pass object detector for a plurality of classes;

(2) predicting category scores and position offsets for a fixed set of default bounding boxes on the feature map using a small convolution filter;

(3) predictions of different scales are generated from feature maps of different scales, and are explicitly separated by aspect ratios.

2. The method for identifying and monitoring the defects of the multi-scale small target equipment as claimed in claim 1, wherein: the method comprises the following steps of multi-scale feature map detection: and adding convolution characteristic layers to the tail of the truncated basic network, wherein the sizes of the layers are gradually reduced to obtain predicted values of multiple scale detections, and the detected convolution models are different for each characteristic layer.

3. The method for identifying and monitoring the defects of the multi-scale small target equipment as claimed in claim 2, wherein: convolution predictor including detection: each added feature layer (or alternatively an existing feature layer of the underlying network) may use a set of convolution filters to produce a fixed prediction set, for a feature layer of size m × n with p channels, using a 3 × 3 × p convolution kernel convolution operation to produce a score of the class or coordinate offset relative to a default box, and at each m × n size location where a convolution kernel is applied, to produce an output value, the bounding box offset output value is measured relative to the default box, and the default box position is relative to the feature map.

4. The method for identifying and monitoring the defects of the multi-scale small target equipment as claimed in claim 3, wherein: default box to aspect ratio: associating a set of default bounding boxes with each feature map cell of the top-level network, the default boxes convolving the feature map such that the position of each box instance with respect to its corresponding cell is fixed, predicting, in each feature mapping cell, an offset with respect to a default box shape in the cell, and a score for each class of instances in each box.

5. The method for identifying and monitoring the defects of the multi-scale small target equipment as claimed in claim 4, wherein: the matching strategy is as follows: at the beginning, each group route box and the default box are matched by using the best jaccard overlay in the MultiBox, so that each group route box is ensured to correspond to a unique default box and is different from the MultiBox, and the jaccard overlay between the two boxes is larger than a threshold value.