CN110766041A

CN110766041A - Deep learning-based pest detection method

Info

Publication number: CN110766041A
Application number: CN201910830378.7A
Authority: CN
Inventors: 宋雪桦; 邓壮来; 汪盼; 解晖; 金华; 王昌达
Original assignee: Jiangsu University
Current assignee: Shenzhen Wanzhida Technology Transfer Center Co ltd
Priority date: 2019-09-04
Filing date: 2019-09-04
Publication date: 2020-02-07
Anticipated expiration: 2039-09-04
Also published as: CN110766041B

Abstract

The invention relates to a pest detection method based on deep learning, which is mainly applied to the detection of pests in a granary. Through the VGG16 model in the lightweight SSD, and reduce the dimension that corresponds convolution kernel, pooling kernel and characteristic map, modify the characteristic layer, add the weighting target who is used for classification and regression task in addition in the loss function, use the granary pest picture of mark to train, detect the pest with the model that the training is accomplished. The invention can learn multilevel characteristics from low to high, optimizes the convergence rate of model training, balances the number of positive and negative samples, improves the training efficiency and quickly realizes high-precision granary pest detection.

Description

Deep learning-based pest detection method

Technical Field

The invention relates to the technical field of computer vision, in particular to a pest detection method based on deep learning.

Background

In the process of grain storage, insect pests are one of the important causes of grain loss, and the premise of prevention and control is to detect the occurrence of the insect pests in time. At present, the pest detection of the granary is mainly carried out or is manually detected, and the detection method is poor in timeliness and low in accuracy. And the method can be completed only by the experts of the pests in the granary or the managers of the granary with abundant experience.

With the development of optical technology, granary pest detection technology based on the support of optical principle appears. Such as soft X-ray detection and near infrared spectroscopy, etc. However, these methods have the problems of high equipment cost, complicated instrument calibration, limited detection range, etc.

SSD is a One-Stage (One-time) method of detection. The method combines the regression idea of YOLO and the Anchor boxes mechanism in fast-RCNN, adds the detection mode of the characteristic pyramid of the RPN network, detects multi-scale targets by executing sliding window scanning on the characteristic images of different convolutional layers, can detect small targets in the characteristic images output by the convolutional layers, obtains characteristic images with certain translation and scale invariance, applies the characteristics to the detection of granary pests, and can effectively improve the detection precision.

Disclosure of Invention

The invention aims to solve the problems of long detection time, low detection accuracy and the like in the traditional granary pest detection method, further improve the training speed and the detection accuracy of an SSD (solid State disk) model, provide a deep learning-based pest detection method and provide necessary technical support for an intelligent granary pest early warning system.

The invention is realized by the following technical scheme, which comprises the following steps:

step 1: the method comprises the steps that a photographing device is used for collecting pest pictures, pests in each picture are marked, and a database for detecting pests in a granary is constructed;

step 2: a VGG16 network structure of a lightweight SSD model;

and step 3: adding a weighted target for classification and regression tasks into the SSD loss function;

and 4, step 4: and training the pests with various postures and different sizes in the collected pest picture, and detecting the pests by using the trained SSD model.

Further, the step 2 includes the steps of:

step 2.1: modifying the VGG16 network structure to change the multiple convolution and maximum pooling operations of the SSD backbone network VGG16 model to convolution 1_1, convolution 1_2, maximum pooling 1, convolution 2_2, maximum pooling 2, convolution 3_1, convolution 3_3, maximum pooling 3, convolution 4_1, convolution 4_3, maximum pooling 4, convolution 5_1, convolution 5_3, maximum pooling 5;

step 2.2: changing the dimensionality of the convolution kernel and the pooling kernel of the VGG16 to 3 × 3, 2 × 2, 3 × 3, 2 × 2, 3 × 3;

step 2.3: changing the dimension of the VGG16 feature map to 300 × 300, 150 × 150, 75 × 75, 38 × 38, 19 × 19;

step 2.4: and selecting all feature maps corresponding to all convolution layers from the extracted feature maps to be convolved with convolution kernels of 3 multiplied by 3, and then presetting a default frame in each feature map grid on each layer of output feature map to obtain the offset and confidence of each frame relative to the labeling frame.

Further, the step 3 includes the steps of:

step 3.1: in the training stage of the SSD convolutional network, matching a default frame with pests, wherein if the default frame is matched with the pests, the default frame is a positive sample, and if the default frame is not matched with the pests, the default frame is a negative sample; then, sequencing the negative samples according to the loss value of the confidence coefficient, wherein the loss value is obtained by a target loss function;

step 3.2: adding a weighted target for classification and regression tasks into the loss function, wherein the target calculation process comprises the following steps: assuming that k default boxes are provided, the number of positive samples is n, the number of negative samples is m, k equals m + n, a Label for classification is set, and when n > 0, the weighted Pos _ target for positive sample classification is: pos _ target is Labeel/n; when m > 0 and the positive and negative sample ratio is set to 1: p, then the weighted Neg _ target for the negative sample classification is: neg _ target ═ {1-Label }/m × p; the weighted Tol _ target of the whole classification task is: tol _ target ═ Pos _ target + Neg _ target; the weight coefficient added into the regression task is theta, and the weighted Reg _ target of the regression task is as follows: reg _ target is Pos _ target × θ.

Further, the loss function in step 3.1 above consists of two parts, classification and regression:

wherein N is the number of matching to pest default boxes; l is_locRepresenting confidence loss, realized by regression positioning error; l is_confThe confidence coefficient loss is expressed and realized by a Softmax multi-classification function, z is position information of an actual pest, c is confidence coefficient of a prediction target, l is position information of a prediction object frame, g is position information of the pest, α is a weighing parameter of the confidence coefficient loss and the position loss, and the default is set to be 1.

Further, the step 4 includes the following steps:

step 4.1: the output of VGG16 in the improved SSD method is used for preliminary feature extraction of pests, and a basic feature map of a pest image is obtained;

step 4.2: carrying out multi-scale feature extraction on the feature map obtained in the step 4.1 through a plurality of convolution operations and maximum pooling operations of a multi-scale feature detection network in sequence, extracting multi-level feature maps and selecting candidate pest areas with different sizes and different aspect ratios from the feature maps;

step 4.3: selecting all feature maps corresponding to all convolution layers from the feature maps extracted in the step 4.2 to be convolved with convolution kernels of 3 x 3, and then presetting default frames in each feature map grid on each layer of output feature map to obtain the offset and confidence of each frame relative to the labeling frame; wherein the default boxes are a series of fixed-size boxes on each grid of the feature map;

step 4.4: and 4.3, combining the convolution calculation results of the layers according to the step 4.3, transmitting the combined results to the detection layers, and fusing the detection results of all the layers by using a non-maximum inhibition method to obtain the granary pest image detection result.

Further, the multi-scale feature detection network in the above step 4.2 includes convolution 4_1, convolution 4_3, max pooling 4, convolution 5_1, convolution 5_3, max pooling 5, convolution 6_1, convolution 6_2, convolution 7_1, convolution 7_2, convolution 8_1, convolution 8_2, convolution 9_1 and convolution 9_2, wherein the dimensions of the convolution kernel and the pooling kernel are 3 × 3, 2 × 2, 3 × 3, 1 × 1, 3 × 3, respectively; the dimensions of the obtained feature maps were 38 × 38, 19 × 19, 19 × 10, 10 × 10, 5 × 5, 3 × 3, and 1 × 1, respectively.

Further, the non-maximum suppression method in step 4.4 is an iterative-traversal-elimination process, which includes the following steps:

4.4.1: sorting the scores of all the frames, and selecting the highest score and the corresponding frame;

4.4.2: traversing the rest frames, and deleting the frame if the overlapping area of the frame and the current highest frame is larger than a threshold value;

4.4.3: the above process is repeated with the highest score selected from the unprocessed boxes.

The invention provides a pest detection method based on deep learning, which comprises the steps of firstly carrying out primary feature extraction on a pest image by using a lightweight VGG16 convolutional network, then carrying out multi-scale feature extraction, evaluating pest detection frames with different length-width ratios at each position in a plurality of feature maps output by different convolutional layers, and detecting pests with different postures and sizes in the image. The SSD target detection model is composed of a VGG16 convolutional neural network subjected to lightweight processing and a multi-scale feature detection network, and detection is carried out through feature maps of 6 different convolutional layers. The invention effectively discretizes the shape of the output frame by applying a default frame mechanism on the feature maps of different layers, thereby greatly reducing the number of pest detection frames, accelerating the detection speed, and detecting pests from the feature maps of the different layers, thereby being capable of carrying out multi-scale detection on the input pest image.

The invention carries out lightweight processing on the original SSD model and accelerates the convergence speed of the model. In addition, the loss function is improved, so that the number of positive and negative samples is balanced, and the training efficiency is improved. The SSD convolutional network encapsulates image processing operation in a single network, so that the training and detection time is greatly shortened, the granary pest detection can be better realized, certain robustness is increased, and the SSD convolutional network can be integrated into an early warning system of granary pests.

Drawings

Fig. 1 is a schematic diagram of an original SSD convolutional network.

Fig. 2 is a schematic diagram of an SSD convolutional network employed in the present invention.

Fig. 3 is a flow chart of the comprehensive detection of granary pests in the SSD convolutional network of the present invention.

Fig. 4 is an image of the effect of barn pests detected by the SSD model of the present invention.

Detailed Description

In order to make the object and technical solution of the present invention more clear, the following describes the implementation steps of the present invention in detail with reference to fig. 1-4.

As shown in fig. 1, which is a schematic diagram of an original SSD convolution network, and fig. 2 which is a schematic diagram of an SSD convolution network used in the present invention, it can be seen from comparison of the two diagrams that the feature diagrams of 6 different convolution layers are sequentially changed, the original convolution 7(FC7) is removed, and the convolution 5_3 is used instead.

The invention provides a deep learning-based pest detection method, which comprises the following steps:

step 2: a VGG16 network structure of a lightweight SSD model;

and step 3: adding weighted tar get used for classification and regression tasks into the SSD loss function;

The step 1 specifically comprises the following steps: the method comprises the steps of shooting pictures of high-outbreak granary pests, preprocessing the pictures, removing redundant backgrounds, marking the positions of the pests in each picture by adopting LabelImg, setting English acronyms labeled as the pests, storing the English acronyms in an xml format, and constructing a database of the pictures and marking files.

The step 2 comprises the following steps:

step 2.1: modifying the VGG16 network structure, the multiple volume and maximum pooling operations of the original SSD backbone network VGG16 model comprise: convolution 1_1, convolution 1_2, max pooling 1, convolution 2_2, max pooling 2, convolution 3_1, convolution 3_2, convolution 3_3, max pooling 3, convolution 4_1, convolution 4_2, convolution 4_3, max pooling 4, convolution 5_1, convolution 5_2, convolution 5_3, max pooling 5, convolution 6, convolution 7. Changing it to convolution 1_1, convolution 1_2, max pooling 1, convolution 2_2, max pooling 2, convolution 3_1, convolution 3_3, max pooling 3, convolution 4_1, convolution 4_3, max pooling 4, convolution 5_1, convolution 5_3, max pooling 5;

step 2.2: modifying the dimensions of the convolution kernel and pooling kernel of the original VGG16 from 3 × 3, 2 × 2, 3 × 3, 1 × 1 to 3 × 3, 2 × 2, 3 × 3, 3 × 2, 3 × 3, 2 × 2, 3 × 3, 3 × 2, 3 × 3 and 3 × 3;

step 2.3: the dimension of the original VGG16 feature map is modified from the original 300 × 300, 150 × 150, 75 × 75, 38 × 38, 19 × 19, and 19 × 19 to 300 × 300, 150 × 150, 75 × 75, 38 × 38, 19 × 19, and 19 × 19.

Step 2.4: selecting all feature maps corresponding to all convolutional layers from the extracted feature maps to perform convolution with convolution kernels of 3 x 3, then presetting a default frame in each feature map grid on each layer of output feature map to obtain the offset and confidence of each frame relative to a labeling frame, wherein the convolutional layers selected by the original SSD model are shown in FIG. 1-1: convolution 4_3, convolution 7, convolution 6_2, convolution 7_2, convolution 8_2 and convolution 9_2 are changed to convolution 4_3, convolution 5_3, convolution 6_2, convolution 7_2, convolution 8_2 and convolution 9_2, as shown in FIGS. 1-2.

The 6 convolutional layers in step 2.4 are shown in FIG. 1 and include 4_3, convolution 5_3, convolution 6_2, convolution 7_2, convolution 8_2 and convolution 9_ 2. The deviation of the target frame and the prediction of different classification scores are carried out on the characteristic layers of the pyramid structure, and as can be seen from fig. 3, the characteristic graphs output by different convolution layers can detect pests with different scales in the image, wherein the characteristic graph output by convolution 4_3 can detect the pests with small scale in the image, but has poor detection effect on the pests with large size; the feature map output by convolution 9_2 can detect pests with larger sizes in the image, but has poor detection effect on pests with small sizes. By utilizing the characteristic diagrams output by the plurality of convolutional layers for detection, the problem of multi-scale pest detection in the image can be solved, the pest detection resolution of the algorithm is increased, and the detection effect on pests with relatively small sizes in the image is improved.

The step 3 comprises the following steps:

the setting process of the default frame in the step 3.1 is as follows: 4 default frames are preset in each feature map grid on the 6 feature maps in step 2.4, the feature maps of different output layers are provided with default frames of multiple scales, and the same feature map is provided with default frames of different aspect ratios, so that pest detection of various shapes and sizes in the image is realized. The width-to-height ratio of the default frame of the pest is set to 1:1, 2:1, 3:1, 1: 2.

The loss function in step 3.1 consists of two parts, classification and regression:

wherein N is the number of matching to pest default boxes; l is_locRepresenting confidence loss, realized by regression positioning error; l is_confRepresenting confidence loss, implemented by a Softmax multi-classification function; z is the location information of the actual pest; c is the confidence of the predicted target; l is the position information of the predicted object frame; g represents positional information of pests;α is a trade-off parameter for confidence loss and location loss, set to 1 by default.

Step 3.2: adding a weighted target for classification and regression tasks into the loss function, wherein the target calculation process comprises the following steps: assuming that k default boxes are provided, the number of positive samples is n, the number of negative samples is m, k equals m + n, a Label for classification is set, and when n > 0, the weighted Pos _ target for positive sample classification is: pos _ target is Label/n; when m is greater than 0, setting the proportion of positive samples to negative samples to be 1: p, then the weighted Neg _ target for the negative sample classification is: neg _ target ═ {1-Label }/m × p; the weighted Tol _ target of the whole classification task is: tol _ target ═ Pos _ target + Neg _ target; the weight coefficient added into the regression task is theta, and the weighted Reg _ target of the regression task is as follows: reg _ target is Pos _ target × θ.

Pest detection process referring to fig. 3, step 4 includes the steps of:

step 4.2: carrying out multi-scale feature extraction on the feature map obtained in the step 4.1 through a plurality of convolution operations and maximum pooling operations of a multi-scale feature detection network in sequence, extracting multi-level feature maps and selecting candidate pest areas with different sizes and different aspect ratios at each position in the feature maps;

the multi-scale feature detection network in the step 4.2 comprises convolution 4_1, convolution 4_3, maximal pooling 4, convolution 5_1, convolution 5_3, maximal pooling 5, convolution 6_1, convolution 6_2, convolution 7_1, convolution 7_2, convolution 8_1, convolution 8_2, convolution 9_1 and convolution 9_2, wherein the dimensions of a convolution kernel and a pooling kernel are 3 × 3, 2 × 2, 3 × 3, 1 × 1, 3 × 3, 3 × 1 and 3 × 3, respectively; the dimensions of the obtained feature maps were 38 × 38, 19 × 19, 19 × 10, 10 × 10, 5 × 5, 3 × 3, and 1 × 1, respectively.

The convolution operation process in the step 4.2 is as follows: the output of the convolution operation in the ith hidden layer is denoted xⁱ＝f(wⁱx^i-1+bⁱ)，Wherein x^i-1Is the output of the i-1 th hidden layer, xⁱFor the output of the convolutional layer in the ith hidden layer, x⁰For an input image of an input layer, wⁱWeight feature matrix representing the ith hidden layer, bⁱFor the bias of the ith hidden layer, f () is the Relu activation function, whose expression is f (x) max (0, x).

The maximum pooling operation in the step 4.2 is as follows: and sequentially taking the maximum values in the 2 multiplied by 2 area on the feature map after activation by the activation function f () with the step length of 2 to obtain the feature map with the length and the width reduced by half.

step 4.4: and 4.3, merging the convolution calculation results of the layers according to the step 4.3, transmitting the merged results to the detection layers, and fusing the detection results of all the layers by using non-maximum value inhibition to obtain a pest image detection result.

The detection process of the detection layer in the step 4.4 is as follows: assuming there are k default boxes per grid, each default box predicts a score of 2 target classes and 4 offsets; if the size of the feature map is m × n, that is, if there are m × n feature map grids, the feature map has 6m × n × k outputs in total.

The non-maximum suppression method in step 4.4 is an iterative-traversal-elimination process:

(1) sorting the scores of all the frames, and selecting the highest score and the corresponding frame;

(2) traversing the rest of frames, and deleting the frame if the overlapping area of the frame and the current highest frame is larger than a threshold (initially set to be 0.005);

(3) the above process is repeated with the highest score selected from the unprocessed boxes.

The lightweight SSD network needs to improve the detection effect through repeated iterative training: in the training process, the loss function value is reduced through multiple times of result optimization, and the pest detection performance of the SSD convolutional network is continuously improved.

The training of the lightweight SSD network comprises the following steps:

(1) inputting the image to a VGG16 convolutional neural network with a modified structure to obtain the preliminary characteristics of the image;

(2) extracting multilayer characteristic graphs and selecting candidate regions with different sizes and different aspect ratios;

(3) calculating the coordinate position offset and the category score of each candidate area;

(4) determining a final area according to the candidate area and the coordinate position offset, estimating a loss function of the candidate area according to the category score, and accumulating to obtain total loss;

(5) and correcting the weight of each layer through a back propagation process by the total loss.

The invention carries out lightweight processing on an original SSD model to accelerate the convergence speed of the model; the loss function is improved, so that the number of positive and negative samples is balanced, and the training efficiency is improved. Fig. 4 shows the pest detection effect of the barn by the present invention. The default frame mechanism is applied to the feature maps of different layers, the shape of the output frame is effectively discretized, so that the number of pest detection frames is reduced, the detection speed is increased, the granary pests are detected through the feature maps of a plurality of different layers, and the input pest image can be subjected to multi-scale detection.

Claims

1. A pest detection method based on deep learning is characterized by comprising the following steps:

step 2: a VGG16 network structure of a lightweight SSD model;

2. The deep learning-based pest detection method according to claim 1, wherein said step 2 comprises the steps of:

3. The deep learning-based pest detection method according to claim 1, wherein the step 3 comprises the steps of:

step 3.2: adding a weighted target for classification and regression tasks into the loss function, wherein the target calculation process comprises the following steps: assuming that k default boxes are provided, the number of positive samples is n, the number of negative samples is m, k equals m + n, a Label for classification is set, and when n > 0, the weighted Pos _ target for positive sample classification is: pos _ target is Label/n; when m > 0 and the positive and negative sample ratio is set to 1: p, then the weighted Neg _ target for the negative sample classification is: neg _ target ═ {1-Label }/m × p; the weighted Tol _ target of the whole classification task is: tol _ target ═ Pos _ target + Neg _ target; the weight coefficient added into the regression task is theta, and the weighted Reg _ target of the regression task is as follows: reg _ target is Pos _ target × θ.

4. A deep learning based pest detection method as claimed in claim 3, wherein the loss function in step 3.1 consists of two parts of classification and regression:

5. The deep learning-based pest detection method according to claim 1, wherein the step 4 comprises the steps of:

6. A deep learning based pest detection method as claimed in claim 5, wherein the multi-scale feature detection network in step 4.2 comprises convolution 4_1, convolution 4_3, max pooling 4, convolution 5_1, convolution 5_3, max pooling 5, convolution 6_1, convolution 6_2, convolution 7_1, convolution 7_2, convolution 8_1, convolution 8_2, convolution 9_1 and convolution 9_2, wherein dimensions of convolution kernel and pooling kernel are 3 x 3, 2 x 2, 3 x 3, 1 x 1, 3 x 3, 1 x 1, 3 x 3; the dimensions of the obtained feature maps were 38 × 38, 19 × 19, 19 × 10, 10 × 10, 5 × 5, 3 × 3, and 1 × 1, respectively.

7. A deep learning based pest detection method as claimed in claim 5 wherein the non-maxima suppression method in step 4.4 is an iterative-traversal-elimination process comprising the steps of: