CN113554627A

CN113554627A - Wheat head detection method based on computer vision semi-supervised pseudo label learning

Info

Publication number: CN113554627A
Application number: CN202110849609.6A
Authority: CN
Inventors: 钟必能; 张子凯; 郑耀宗; 梁启花; 李先贤
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-26
Anticipated expiration: 2041-07-27
Also published as: CN113554627B

Abstract

The invention discloses a wheat head detection method based on computer vision semi-supervised pseudo label learning, which is characterized by comprising the following steps of: training different wheat head detection models and semi-supervised pseudo label learning strategies. The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.

Description

Wheat head detection method based on computer vision semi-supervised pseudo label learning

Technical Field

The invention relates to the technical field of artificial intelligence and intelligent agriculture, in particular to a wheat head detection method based on computer vision semi-supervised pseudo label learning.

Background

Wheat is a global grain crop, and with the advancement of agriculture modernization in China, the wheat yield is improved by using a modern artificial intelligence technology, and the auxiliary agriculture production management becomes an important exploratory problem. Traditional agricultural's planting relies on the work of manpower completely, and this is the process of an extremely consuming time and power, and whole process can't last to keep high-efficient work and do not make mistakes moreover, and in addition, traditional agricultural also has influenced the output of crops to a certain extent because the growth condition of unable real time supervision crops. In the process of promoting the modernization of agriculture, the growth condition of wheat is automatically monitored by using an intelligent algorithm, so that the traditional agriculture can be promoted to advance to a new agricultural era of modernized planting and intelligent management. In the field of computer vision, although a target detection algorithm is greatly improved after years of research by a plurality of scholars, the method still has great challenges in actual scenes, such as different wheat varieties in different regions, various properties, inconsistent growth cycles and the like, so that it is important to construct a detection model which can keep high robustness and high accuracy in different scenes.

Disclosure of Invention

The invention aims to provide a wheat head detection method based on computer vision semi-supervised pseudo label learning, aiming at the defects of the prior art. The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.

The technical scheme for realizing the purpose of the invention is as follows:

a wheat head detection method based on computer vision semi-supervised pseudo label learning, the method comprising: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,

1) the wheat head detection models for training different types are as follows: the method is characterized in that a first wheat head detection model and a second wheat head detection model which are different and based on a supervised training mode are trained, a data set used for training the wheat head detection model is a global wheat head detection data set which is a data set mainly completed by nine research institutes of seven countries, including Tokyo university, French national agricultural science institute, French nutrition and environment institute, France technical research institution Arvalis, Zuis Suzurich Federal Industrial school, Kasan Kachester Technology university, Australian Kunland Lange university, Nanjing agriculture university in China and Hibiscus British institute, the global wheat head detection data set is used for a universal solution of wheat head detection and is used for estimating the number and the size of wheat heads, accurate wheat head detection in a field image is very challenging, overlapping of wheat and external factors such as wind blowing can cause blurred pictures, and many unpredictable problems can make it difficult to identify a single wheat head, besides, the appearance of wheat has great difference due to maturity, color, genotype, head direction and planting density, and meanwhile, different wheat varieties are planted in all regions of the world, in order to make the wheat detection model have better generalization performance under different detection environments, 3000 images of france, british and switzerland canada in european regions in the global wheat head detection data set are selected as training data sets, 1000 images of different varieties of wheat heads in different regions from australia, japan and china in the global wheat head detection data set are selected as test data sets, the first wheat head detection model takes yolov5s as a reference model, and the training process is as follows:

1-1) carrying out batch selection on training samples from a training set part of a global wheat head detection data set, randomly reading n pictures in each batch, and randomly selecting the value of n;

1-2) scaling the picture to 640 x 640 pixel size;

1-3) performing data augmentation on all the batch pictures, including color space transformation, picture rotation, random translation, turnover, Mosaic and affine transformation;

1-4) inputting the pictures subjected to data amplification into a yolv 5s network model in batches, predicting to obtain target types and position information in the pictures, wherein the target types and position information comprise target object types and coordinates of center points of predicted boundary frames, width and height, then performing loss calculation on a predicted value of a first wheat head detection model and a target label of the input picture through a loss function, returning gradients to the network through a back propagation algorithm by the calculated loss value, performing iterative update of network parameters, and enabling the model to learn and identify the target object iteratively through a learning mode to continuously fit the distribution of real data, aiming at training to obtain an identification model with the best performance, wherein the classification loss function of the first wheat head detection model adopts a Focal loss function, and is shown in a formula (1):

wherein L is_clsRepresenting a classification loss function, alpha and gamma are hyper-parameters of the loss function, gamma is used for adjusting the loss of easy samples and difficult samples, so that the loss function can focus more on the difficult samples, alpha is used for balancing the nonuniformity of positive and negative samples,

the classification prediction value of the first wheat head detection model is between 0 and 1,

the classification loss function is used for continuously optimizing the prediction of the target object category, the regression loss function is used for continuously optimizing the prediction of the target object position coordinate, the coordinate of the central point of the predicted boundary box is continuously fitted with the width and the height of the predicted boundary box, however, the regression task is only used for optimizing the positive sample and not used for performing iterative optimization on the negative sample, the target box does not exist in the negative sample, and the first wheat head detection model adopts the GIoU loss function as the regression loss function, as shown in a formula (2):

wherein L is_locRepresents the regression loss function, A represents the area of the predicted bounding box, B represents the area of the true labeled box of the target, C represents the area of the smallest rectangle that can enclose A and B, IoU_ABRepresenting the area intersection ratio of the prediction boundary frame A and the target real labeling frame, C/(Au B)The difference value of the area of the minimum bounding rectangle C minus the union of the areas of the prediction box A and the real labeling box B is represented;

the second wheat head detection model takes EfficientDet as a reference model, and as for the conventional multi-scale feature fusion method, such as a feature pyramid network, a path aggregation network and the like, features are simply added, however, the resolutions of different features are different, and the contribution degrees of the different features to the fused features are also different, so that the EfficientDet introduces a weighted bidirectional feature pyramid network, senses the importance of the different features by combining learnable weights, and simultaneously conducts multi-scale feature fusion from top to bottom and from bottom to top for many times, the second wheat head detection model comprises a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and a mobile flip bottleneck convolution sub-module and a bidirectional feature pyramid sub-module which form the network are defined as follows:

the input of the mobile turnover bottleneck convolution submodule is a characteristic vector with a channel dimension of C, firstly, the characteristic vector is subjected to dimension increasing through a 1 x 1 convolution layer, and sequentially passes through a batch normalization layer, a swish activation function, a 5 x 5 depth separable convolution layer, a batch normalization layer and a swish activation function, then the characteristic vector is divided into two branches, the first branch is provided with a global average pooling layer, a 1 x 1 convolution layer, a swish activation function, a 1 x 1 convolution layer and a sigmoid activation function, the second branch is provided with a 1 x 1 convolution layer for dimension reduction, a batch normalization layer and a dropout function, and finally, the input vector of the module and the output vector of the second branch are subjected to residual error connection to be used as the final output of the module;

the input of the bidirectional feature pyramid submodule is an output vector of a C3, C4 and C5 layer feature layer of a feature extraction network and a C6 and C7 feature vector of a C5 feature vector after twice pooling, firstly, the C7 feature vector is up-sampled, the C6 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 1, the intermediate feature vector 1 is up-sampled, the C5 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 2, the intermediate feature vector 2 is up-sampled, the C4 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 3, the intermediate feature vector 3 is up-sampled, the C3 feature vector is added and passes through a 3 x 3 convolutional layer to obtain a P3 feature vector, the P3 feature vector is down-sampled, the intermediate feature vector of a C4 layer is added and passes through a 3 x 3 convolutional layer to obtain a P4 feature vector, the P4 feature vector is downsampled, added with the middle feature vector of the C5 layer and processed by a 3 x 3 convolutional layer to obtain a P5 feature vector, the P5 feature vector is downsampled, added with the middle feature vector of the C6 layer and processed by a 3 x 3 convolutional layer to obtain a P6 feature vector, the P6 feature vector is downsampled, added with the C7 feature vector and processed by a 3 x 3 convolutional layer to obtain a P7 feature vector, wherein the P3, P4, P5, P6 and P7 feature vectors are output vectors of the bidirectional feature pyramid sub-module;

the feature extraction sub-network comprises a 3 × 3 convolutional layer and 16 mobile turning bottleneck convolutional modules which are sequentially connected, the weighted bidirectional feature pyramid sub-network comprises 3 bidirectional feature pyramid sub-modules, the classification regression sub-network comprises 2 branches, and each branch comprises 2 1 × 1 shared convolutional layers;

the training process of the second wheat head detection model is similar to the training process of the first wheat head detection model, pictures are continuously read in batches, iterative training is carried out, the model is designed to learn and identify a target object iteratively, and distribution of real data is continuously fitted, except that the regression loss function of the second wheat head detection model adopts a Smooth L1 loss function, as shown in formula (3):

wherein, Smooth L₁Represents the function of the regression loss and is,

the predicted value of the boundary box, namely the coordinate of the central point and the width and the height, y represents the labeled coordinate of the real target box,

between the predicted value and the true value of the representative modelA difference of (d);

2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model in the steps until the test effect of the wheat head detection model is not improved any more, and setting the stopping condition of model pseudo label training as the stopping condition when the average precision of model prediction is improved by less than 0.2%, wherein the weighting frame fusion method comprises the following steps:

1-2) firstly creating a list B, wherein the list B is used for storing 2 model prediction boundary frames based on a supervised training mode, and then performing descending arrangement on the prediction frames according to the confidence coefficient of the model prediction;

2-2) creating a list L and a list F, wherein the list L is a multi-dimensional list, 1 or more prediction frames are stored in each position, the lists are called clusters, all the prediction frames in each cluster are subjected to weighted fusion, and a boundary frame obtained by fusion is stored in the list F, namely the list F is used for storing 2 fusion frames with supervision models;

3-2) traversing the prediction boundary box in the list B, and performing ' clustering ' in the list F for storing the fusion box, wherein the clustering ' rule is that IoU values are calculated for the current prediction box and the fusion box, and if IoU values of the two boxes are greater than a specified threshold value, successful ' clustering ' is considered, and the technical scheme sets a IoU threshold value to be 0.55;

4-2) in the step 3-2), if the clustering is not successful, that is, the fusion frame larger than the specified IoU threshold is not found, a new cluster is newly built for the current prediction frame and is added to the tail parts of the list L and the list F, if the clustering is successful, the fusion frame larger than the specified IoU threshold is found, the current prediction frame is added to the list L, the added position is the subscript position of the fusion frame matched by the current prediction frame in the list F, and after the list L is added, the fusion frame at the corresponding position in the list F needs to be updated according to all the prediction frames in the cluster;

5-2) in step 4-2), after the current prediction box is added to the corresponding cluster, updating the confidence coefficient and the coordinates of the fusion box in the list F by using all the bounding boxes in each cluster of L, assuming that there are T bounding boxes, and calculating the confidence coefficient and the coordinates of the fusion box in the list F in the manner shown in formula (4), formula (5) and formula (6):

c represents the confidence coefficient of the model prediction frame, X, Y represents the coordinate of the prediction boundary frame, the coordinate of the fusion frame is obtained by multiplying the confidence coefficient of the prediction frame and the coordinate value, then accumulating and dividing by the sum of the confidence coefficients, and thus the action weight of the boundary frame with larger assigned confidence coefficient is larger;

6-2) after traversing the list B, readjusting the confidence of each fused box in the list F, because if the number of boxes in a cluster is too small, it means that only a few models predict the box, and therefore it is necessary to reduce the confidence of the corresponding box in this case, and the adjustment formula is as shown in formula (7):

by the semi-supervised pseudo label learning mode, the model obtained by training has the obvious advantages of being capable of self-adapting to wheat head detection of different regional varieties and having good generalization.

Under a real scene, the wheat head identification is easily influenced by various factors such as illumination change, shielding of crops and other plants, complex background, different varieties and the like, so that the wheat head cannot be identified with high precision and high robustness by the existing wheat head detection method.

The technical scheme utilizes a computer vision semi-supervised pseudo label technology, can be self-adaptive to the wheat head detection of different regions and varieties, is more adaptive to actual detection scenes in real life, relieves the challenges of small target detection, shielding, complex background, light change and detection of different wheat varieties, obviously improves the robustness and the identification precision of the wheat head detector, and is beneficial to promoting the progress of agricultural modernization.

The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.

Drawings

FIG. 1 is a schematic diagram of an EfficientNet network in an embodiment;

FIG. 2 is a diagram illustrating a structure of a convolution sub-module of a moving flip bottleneck in an embodiment;

FIG. 3 is a schematic diagram of a bi-directional feature pyramid sub-module in an embodiment;

FIG. 4 is a schematic flow chart of an exemplary method.

Detailed Description

The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.

Example (b):

referring to fig. 4, a wheat head detection method based on computer vision semi-supervised pseudo label learning, the method comprising: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,

1-1) carrying out batch selection on training samples from a training set part of a global wheat head detection data set, randomly reading n pictures in each batch, randomly selecting the value of n, and selecting n-32 pictures to input a model for training in the example;

1-2) scaling the picture to 640 x 640 pixel size;

wherein L is_locRepresents the regression loss function, A represents the area of the predicted bounding box, B represents the area of the true labeled box of the target, C represents the area of the smallest rectangle that can enclose A and B, IoU_ABRepresenting the intersection and comparison of the areas of the predicted boundary frame A and the target real labeling frame, wherein C/(Au B) represents the difference value obtained by subtracting the union of the areas of the predicted boundary frame A and the target real labeling frame from the area of the minimum surrounding rectangle C;

the second wheat head detection model uses EfficientDet as a reference model, as shown in fig. 1, for the conventional multi-scale feature fusion method, such as a feature pyramid network, a path aggregation network and the like, which simply adds features, however, the resolutions of different features are different, and the contribution degrees of the different features to the fused features are also different, so that the EfficientDet introduces a weighted bidirectional feature pyramid network, senses the importance of different features by combining learnable weights, and performs multi-scale feature fusion from top to bottom and from bottom to top for multiple times, the second wheat head detection model includes a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and a mobile flip bottleneck convolution sub-module and a bidirectional feature pyramid sub-module which form a network are respectively shown in fig. 2 and fig. 3 and are defined as follows:

wherein, Smooth L₁Represents the function of the regression loss and is,

representing the difference between the predicted value and the true value of the model;

2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model in the steps until the test effect of the wheat head detection model is not improved any more, in the example, the stopping condition of model pseudo label training is set as that the model prediction average precision is improved by less than 0.2%, and then the training is stopped, wherein the weighting frame fusion method comprises the following steps:

3-2) traversing the prediction boundary box in the list B, and performing 'clustering' in the list F for storing the fusion box, wherein the 'clustering' rule is that IoU values are calculated for the current prediction box and the fusion box, and if IoU values of the two boxes are greater than a specified threshold, successful 'clustering' is considered, and the IoU threshold is set to be 0.55 in the example;

the model obtained by training in the embodiment by adopting the semi-supervised pseudo label learning mode has the obvious advantages of being capable of self-adapting to wheat head detection of different regional varieties and having good generalization.

Claims

1. A wheat head detection method based on computer vision semi-supervised pseudo label learning is characterized by comprising the following steps: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,

1) the wheat head detection models for training different types are as follows: training different first wheat head detection models and second wheat head detection models based on a supervision training mode, wherein a data set used for training the wheat head detection models is a global wheat head detection data set which is used for estimating the number and the size of wheat heads, 3000 images of France, British and Switzerland in European regions and Canada in North America in the global wheat head detection data set are selected as training data sets, 1000 wheat head images of different varieties in Japan and China in Australia in the global wheat head detection data set are selected as test data sets, the first wheat head detection model takes yolov5s as a reference model, and the training process is as follows:

1-2) scaling the picture to 640 x 640 pixel size;

1-4) inputting the pictures subjected to data amplification into a yolov5s network model in batches, predicting to obtain target types and position information in the pictures, wherein the target types and the position information comprise target object types, coordinates of central points of predicted boundary frames, width and height, then performing loss calculation on a predicted value of a first wheat head detection model and a target label of the input pictures through a loss function, returning the gradient of the calculated loss value to the network through a back propagation algorithm, and performing iterative updating of network parameters, wherein a classification loss function of the first wheat head detection model adopts a Focal loss function, and is shown in a formula (1):

wherein L is_clsRepresenting the classification loss function, alpha and gamma are hyper-parameters of the loss function,

the first wheat head detection model uses the GIoU loss function as the regression loss function, as shown in equation (2):

the second wheat head detection model takes EfficientDet as a reference model, comprises a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and is defined as follows:

the training process of the second wheat head detection model is similar to the training process of the first wheat head detection model, except that the regression loss function of the second wheat head detection model adopts a Smooth L1 loss function, as shown in formula (3):

wherein, Smooth L₁Represents the function of the regression loss and is,

2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model until the test effect of the wheat head detection model is not improved any more, and setting the stopping condition of model pseudo label training as that the model prediction average precision is improved by less than 0.2 percent to stop training, wherein the weighting frame fusion method comprises the following steps:

3-2) traversing the prediction boundary box in the list B, and carrying out 'clustering' in the list F for storing the fusion box, wherein the 'clustering' rule is that IoU value calculation is carried out on the current prediction box and the fusion box, and the IoU values of the two boxes are greater than a specified threshold value, so that 'clustering' is considered to be successful;

c represents the confidence coefficient of the model prediction frame, X, Y represents the coordinate of the prediction boundary frame, and the coordinate of the fusion frame is obtained by multiplying the confidence coefficient of the prediction frame and the coordinate value, then accumulating and dividing by the sum of the confidence coefficients;

6-2) after traversing the list B, readjusting the confidence of each fusion box in the list F, wherein the adjustment formula is shown as a formula (7):