CN113554627A - Wheat head detection method based on computer vision semi-supervised pseudo label learning - Google Patents
Wheat head detection method based on computer vision semi-supervised pseudo label learning Download PDFInfo
- Publication number
- CN113554627A CN113554627A CN202110849609.6A CN202110849609A CN113554627A CN 113554627 A CN113554627 A CN 113554627A CN 202110849609 A CN202110849609 A CN 202110849609A CN 113554627 A CN113554627 A CN 113554627A
- Authority
- CN
- China
- Prior art keywords
- feature vector
- head detection
- wheat head
- list
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a wheat head detection method based on computer vision semi-supervised pseudo label learning, which is characterized by comprising the following steps of: training different wheat head detection models and semi-supervised pseudo label learning strategies. The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.
Description
Technical Field
The invention relates to the technical field of artificial intelligence and intelligent agriculture, in particular to a wheat head detection method based on computer vision semi-supervised pseudo label learning.
Background
Wheat is a global grain crop, and with the advancement of agriculture modernization in China, the wheat yield is improved by using a modern artificial intelligence technology, and the auxiliary agriculture production management becomes an important exploratory problem. Traditional agricultural's planting relies on the work of manpower completely, and this is the process of an extremely consuming time and power, and whole process can't last to keep high-efficient work and do not make mistakes moreover, and in addition, traditional agricultural also has influenced the output of crops to a certain extent because the growth condition of unable real time supervision crops. In the process of promoting the modernization of agriculture, the growth condition of wheat is automatically monitored by using an intelligent algorithm, so that the traditional agriculture can be promoted to advance to a new agricultural era of modernized planting and intelligent management. In the field of computer vision, although a target detection algorithm is greatly improved after years of research by a plurality of scholars, the method still has great challenges in actual scenes, such as different wheat varieties in different regions, various properties, inconsistent growth cycles and the like, so that it is important to construct a detection model which can keep high robustness and high accuracy in different scenes.
Disclosure of Invention
The invention aims to provide a wheat head detection method based on computer vision semi-supervised pseudo label learning, aiming at the defects of the prior art. The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.
The technical scheme for realizing the purpose of the invention is as follows:
a wheat head detection method based on computer vision semi-supervised pseudo label learning, the method comprising: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,
1) the wheat head detection models for training different types are as follows: the method is characterized in that a first wheat head detection model and a second wheat head detection model which are different and based on a supervised training mode are trained, a data set used for training the wheat head detection model is a global wheat head detection data set which is a data set mainly completed by nine research institutes of seven countries, including Tokyo university, French national agricultural science institute, French nutrition and environment institute, France technical research institution Arvalis, Zuis Suzurich Federal Industrial school, Kasan Kachester Technology university, Australian Kunland Lange university, Nanjing agriculture university in China and Hibiscus British institute, the global wheat head detection data set is used for a universal solution of wheat head detection and is used for estimating the number and the size of wheat heads, accurate wheat head detection in a field image is very challenging, overlapping of wheat and external factors such as wind blowing can cause blurred pictures, and many unpredictable problems can make it difficult to identify a single wheat head, besides, the appearance of wheat has great difference due to maturity, color, genotype, head direction and planting density, and meanwhile, different wheat varieties are planted in all regions of the world, in order to make the wheat detection model have better generalization performance under different detection environments, 3000 images of france, british and switzerland canada in european regions in the global wheat head detection data set are selected as training data sets, 1000 images of different varieties of wheat heads in different regions from australia, japan and china in the global wheat head detection data set are selected as test data sets, the first wheat head detection model takes yolov5s as a reference model, and the training process is as follows:
1-1) carrying out batch selection on training samples from a training set part of a global wheat head detection data set, randomly reading n pictures in each batch, and randomly selecting the value of n;
1-2) scaling the picture to 640 x 640 pixel size;
1-3) performing data augmentation on all the batch pictures, including color space transformation, picture rotation, random translation, turnover, Mosaic and affine transformation;
1-4) inputting the pictures subjected to data amplification into a yolv 5s network model in batches, predicting to obtain target types and position information in the pictures, wherein the target types and position information comprise target object types and coordinates of center points of predicted boundary frames, width and height, then performing loss calculation on a predicted value of a first wheat head detection model and a target label of the input picture through a loss function, returning gradients to the network through a back propagation algorithm by the calculated loss value, performing iterative update of network parameters, and enabling the model to learn and identify the target object iteratively through a learning mode to continuously fit the distribution of real data, aiming at training to obtain an identification model with the best performance, wherein the classification loss function of the first wheat head detection model adopts a Focal loss function, and is shown in a formula (1):
wherein L isclsRepresenting a classification loss function, alpha and gamma are hyper-parameters of the loss function, gamma is used for adjusting the loss of easy samples and difficult samples, so that the loss function can focus more on the difficult samples, alpha is used for balancing the nonuniformity of positive and negative samples,the classification prediction value of the first wheat head detection model is between 0 and 1,
the classification loss function is used for continuously optimizing the prediction of the target object category, the regression loss function is used for continuously optimizing the prediction of the target object position coordinate, the coordinate of the central point of the predicted boundary box is continuously fitted with the width and the height of the predicted boundary box, however, the regression task is only used for optimizing the positive sample and not used for performing iterative optimization on the negative sample, the target box does not exist in the negative sample, and the first wheat head detection model adopts the GIoU loss function as the regression loss function, as shown in a formula (2):
wherein L islocRepresents the regression loss function, A represents the area of the predicted bounding box, B represents the area of the true labeled box of the target, C represents the area of the smallest rectangle that can enclose A and B, IoUABRepresenting the area intersection ratio of the prediction boundary frame A and the target real labeling frame, C/(Au B)The difference value of the area of the minimum bounding rectangle C minus the union of the areas of the prediction box A and the real labeling box B is represented;
the second wheat head detection model takes EfficientDet as a reference model, and as for the conventional multi-scale feature fusion method, such as a feature pyramid network, a path aggregation network and the like, features are simply added, however, the resolutions of different features are different, and the contribution degrees of the different features to the fused features are also different, so that the EfficientDet introduces a weighted bidirectional feature pyramid network, senses the importance of the different features by combining learnable weights, and simultaneously conducts multi-scale feature fusion from top to bottom and from bottom to top for many times, the second wheat head detection model comprises a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and a mobile flip bottleneck convolution sub-module and a bidirectional feature pyramid sub-module which form the network are defined as follows:
the input of the mobile turnover bottleneck convolution submodule is a characteristic vector with a channel dimension of C, firstly, the characteristic vector is subjected to dimension increasing through a 1 x 1 convolution layer, and sequentially passes through a batch normalization layer, a swish activation function, a 5 x 5 depth separable convolution layer, a batch normalization layer and a swish activation function, then the characteristic vector is divided into two branches, the first branch is provided with a global average pooling layer, a 1 x 1 convolution layer, a swish activation function, a 1 x 1 convolution layer and a sigmoid activation function, the second branch is provided with a 1 x 1 convolution layer for dimension reduction, a batch normalization layer and a dropout function, and finally, the input vector of the module and the output vector of the second branch are subjected to residual error connection to be used as the final output of the module;
the input of the bidirectional feature pyramid submodule is an output vector of a C3, C4 and C5 layer feature layer of a feature extraction network and a C6 and C7 feature vector of a C5 feature vector after twice pooling, firstly, the C7 feature vector is up-sampled, the C6 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 1, the intermediate feature vector 1 is up-sampled, the C5 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 2, the intermediate feature vector 2 is up-sampled, the C4 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 3, the intermediate feature vector 3 is up-sampled, the C3 feature vector is added and passes through a 3 x 3 convolutional layer to obtain a P3 feature vector, the P3 feature vector is down-sampled, the intermediate feature vector of a C4 layer is added and passes through a 3 x 3 convolutional layer to obtain a P4 feature vector, the P4 feature vector is downsampled, added with the middle feature vector of the C5 layer and processed by a 3 x 3 convolutional layer to obtain a P5 feature vector, the P5 feature vector is downsampled, added with the middle feature vector of the C6 layer and processed by a 3 x 3 convolutional layer to obtain a P6 feature vector, the P6 feature vector is downsampled, added with the C7 feature vector and processed by a 3 x 3 convolutional layer to obtain a P7 feature vector, wherein the P3, P4, P5, P6 and P7 feature vectors are output vectors of the bidirectional feature pyramid sub-module;
the feature extraction sub-network comprises a 3 × 3 convolutional layer and 16 mobile turning bottleneck convolutional modules which are sequentially connected, the weighted bidirectional feature pyramid sub-network comprises 3 bidirectional feature pyramid sub-modules, the classification regression sub-network comprises 2 branches, and each branch comprises 2 1 × 1 shared convolutional layers;
the training process of the second wheat head detection model is similar to the training process of the first wheat head detection model, pictures are continuously read in batches, iterative training is carried out, the model is designed to learn and identify a target object iteratively, and distribution of real data is continuously fitted, except that the regression loss function of the second wheat head detection model adopts a Smooth L1 loss function, as shown in formula (3):
wherein, Smooth L1Represents the function of the regression loss and is,the predicted value of the boundary box, namely the coordinate of the central point and the width and the height, y represents the labeled coordinate of the real target box,between the predicted value and the true value of the representative modelA difference of (d);
2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model in the steps until the test effect of the wheat head detection model is not improved any more, and setting the stopping condition of model pseudo label training as the stopping condition when the average precision of model prediction is improved by less than 0.2%, wherein the weighting frame fusion method comprises the following steps:
1-2) firstly creating a list B, wherein the list B is used for storing 2 model prediction boundary frames based on a supervised training mode, and then performing descending arrangement on the prediction frames according to the confidence coefficient of the model prediction;
2-2) creating a list L and a list F, wherein the list L is a multi-dimensional list, 1 or more prediction frames are stored in each position, the lists are called clusters, all the prediction frames in each cluster are subjected to weighted fusion, and a boundary frame obtained by fusion is stored in the list F, namely the list F is used for storing 2 fusion frames with supervision models;
3-2) traversing the prediction boundary box in the list B, and performing ' clustering ' in the list F for storing the fusion box, wherein the clustering ' rule is that IoU values are calculated for the current prediction box and the fusion box, and if IoU values of the two boxes are greater than a specified threshold value, successful ' clustering ' is considered, and the technical scheme sets a IoU threshold value to be 0.55;
4-2) in the step 3-2), if the clustering is not successful, that is, the fusion frame larger than the specified IoU threshold is not found, a new cluster is newly built for the current prediction frame and is added to the tail parts of the list L and the list F, if the clustering is successful, the fusion frame larger than the specified IoU threshold is found, the current prediction frame is added to the list L, the added position is the subscript position of the fusion frame matched by the current prediction frame in the list F, and after the list L is added, the fusion frame at the corresponding position in the list F needs to be updated according to all the prediction frames in the cluster;
5-2) in step 4-2), after the current prediction box is added to the corresponding cluster, updating the confidence coefficient and the coordinates of the fusion box in the list F by using all the bounding boxes in each cluster of L, assuming that there are T bounding boxes, and calculating the confidence coefficient and the coordinates of the fusion box in the list F in the manner shown in formula (4), formula (5) and formula (6):
c represents the confidence coefficient of the model prediction frame, X, Y represents the coordinate of the prediction boundary frame, the coordinate of the fusion frame is obtained by multiplying the confidence coefficient of the prediction frame and the coordinate value, then accumulating and dividing by the sum of the confidence coefficients, and thus the action weight of the boundary frame with larger assigned confidence coefficient is larger;
6-2) after traversing the list B, readjusting the confidence of each fused box in the list F, because if the number of boxes in a cluster is too small, it means that only a few models predict the box, and therefore it is necessary to reduce the confidence of the corresponding box in this case, and the adjustment formula is as shown in formula (7):
by the semi-supervised pseudo label learning mode, the model obtained by training has the obvious advantages of being capable of self-adapting to wheat head detection of different regional varieties and having good generalization.
Under a real scene, the wheat head identification is easily influenced by various factors such as illumination change, shielding of crops and other plants, complex background, different varieties and the like, so that the wheat head cannot be identified with high precision and high robustness by the existing wheat head detection method.
The technical scheme utilizes a computer vision semi-supervised pseudo label technology, can be self-adaptive to the wheat head detection of different regions and varieties, is more adaptive to actual detection scenes in real life, relieves the challenges of small target detection, shielding, complex background, light change and detection of different wheat varieties, obviously improves the robustness and the identification precision of the wheat head detector, and is beneficial to promoting the progress of agricultural modernization.
The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.
Drawings
FIG. 1 is a schematic diagram of an EfficientNet network in an embodiment;
FIG. 2 is a diagram illustrating a structure of a convolution sub-module of a moving flip bottleneck in an embodiment;
FIG. 3 is a schematic diagram of a bi-directional feature pyramid sub-module in an embodiment;
FIG. 4 is a schematic flow chart of an exemplary method.
Detailed Description
The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.
Example (b):
referring to fig. 4, a wheat head detection method based on computer vision semi-supervised pseudo label learning, the method comprising: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,
1) the wheat head detection models for training different types are as follows: the method is characterized in that a first wheat head detection model and a second wheat head detection model which are different and based on a supervised training mode are trained, a data set used for training the wheat head detection model is a global wheat head detection data set which is a data set mainly completed by nine research institutes of seven countries, including Tokyo university, French national agricultural science institute, French nutrition and environment institute, France technical research institution Arvalis, Zuis Suzurich Federal Industrial school, Kasan Kachester Technology university, Australian Kunland Lange university, Nanjing agriculture university in China and Hibiscus British institute, the global wheat head detection data set is used for a universal solution of wheat head detection and is used for estimating the number and the size of wheat heads, accurate wheat head detection in a field image is very challenging, overlapping of wheat and external factors such as wind blowing can cause blurred pictures, and many unpredictable problems can make it difficult to identify a single wheat head, besides, the appearance of wheat has great difference due to maturity, color, genotype, head direction and planting density, and meanwhile, different wheat varieties are planted in all regions of the world, in order to make the wheat detection model have better generalization performance under different detection environments, 3000 images of france, british and switzerland canada in european regions in the global wheat head detection data set are selected as training data sets, 1000 images of different varieties of wheat heads in different regions from australia, japan and china in the global wheat head detection data set are selected as test data sets, the first wheat head detection model takes yolov5s as a reference model, and the training process is as follows:
1-1) carrying out batch selection on training samples from a training set part of a global wheat head detection data set, randomly reading n pictures in each batch, randomly selecting the value of n, and selecting n-32 pictures to input a model for training in the example;
1-2) scaling the picture to 640 x 640 pixel size;
1-3) performing data augmentation on all the batch pictures, including color space transformation, picture rotation, random translation, turnover, Mosaic and affine transformation;
1-4) inputting the pictures subjected to data amplification into a yolv 5s network model in batches, predicting to obtain target types and position information in the pictures, wherein the target types and position information comprise target object types and coordinates of center points of predicted boundary frames, width and height, then performing loss calculation on a predicted value of a first wheat head detection model and a target label of the input picture through a loss function, returning gradients to the network through a back propagation algorithm by the calculated loss value, performing iterative update of network parameters, and enabling the model to learn and identify the target object iteratively through a learning mode to continuously fit the distribution of real data, aiming at training to obtain an identification model with the best performance, wherein the classification loss function of the first wheat head detection model adopts a Focal loss function, and is shown in a formula (1):
wherein L isclsRepresenting a classification loss function, alpha and gamma are hyper-parameters of the loss function, gamma is used for adjusting the loss of easy samples and difficult samples, so that the loss function can focus more on the difficult samples, alpha is used for balancing the nonuniformity of positive and negative samples,the classification prediction value of the first wheat head detection model is between 0 and 1,
the classification loss function is used for continuously optimizing the prediction of the target object category, the regression loss function is used for continuously optimizing the prediction of the target object position coordinate, the coordinate of the central point of the predicted boundary box is continuously fitted with the width and the height of the predicted boundary box, however, the regression task is only used for optimizing the positive sample and not used for performing iterative optimization on the negative sample, the target box does not exist in the negative sample, and the first wheat head detection model adopts the GIoU loss function as the regression loss function, as shown in a formula (2):
wherein L islocRepresents the regression loss function, A represents the area of the predicted bounding box, B represents the area of the true labeled box of the target, C represents the area of the smallest rectangle that can enclose A and B, IoUABRepresenting the intersection and comparison of the areas of the predicted boundary frame A and the target real labeling frame, wherein C/(Au B) represents the difference value obtained by subtracting the union of the areas of the predicted boundary frame A and the target real labeling frame from the area of the minimum surrounding rectangle C;
the second wheat head detection model uses EfficientDet as a reference model, as shown in fig. 1, for the conventional multi-scale feature fusion method, such as a feature pyramid network, a path aggregation network and the like, which simply adds features, however, the resolutions of different features are different, and the contribution degrees of the different features to the fused features are also different, so that the EfficientDet introduces a weighted bidirectional feature pyramid network, senses the importance of different features by combining learnable weights, and performs multi-scale feature fusion from top to bottom and from bottom to top for multiple times, the second wheat head detection model includes a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and a mobile flip bottleneck convolution sub-module and a bidirectional feature pyramid sub-module which form a network are respectively shown in fig. 2 and fig. 3 and are defined as follows:
the input of the mobile turnover bottleneck convolution submodule is a characteristic vector with a channel dimension of C, firstly, the characteristic vector is subjected to dimension increasing through a 1 x 1 convolution layer, and sequentially passes through a batch normalization layer, a swish activation function, a 5 x 5 depth separable convolution layer, a batch normalization layer and a swish activation function, then the characteristic vector is divided into two branches, the first branch is provided with a global average pooling layer, a 1 x 1 convolution layer, a swish activation function, a 1 x 1 convolution layer and a sigmoid activation function, the second branch is provided with a 1 x 1 convolution layer for dimension reduction, a batch normalization layer and a dropout function, and finally, the input vector of the module and the output vector of the second branch are subjected to residual error connection to be used as the final output of the module;
the input of the bidirectional feature pyramid submodule is an output vector of a C3, C4 and C5 layer feature layer of a feature extraction network and a C6 and C7 feature vector of a C5 feature vector after twice pooling, firstly, the C7 feature vector is up-sampled, the C6 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 1, the intermediate feature vector 1 is up-sampled, the C5 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 2, the intermediate feature vector 2 is up-sampled, the C4 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 3, the intermediate feature vector 3 is up-sampled, the C3 feature vector is added and passes through a 3 x 3 convolutional layer to obtain a P3 feature vector, the P3 feature vector is down-sampled, the intermediate feature vector of a C4 layer is added and passes through a 3 x 3 convolutional layer to obtain a P4 feature vector, the P4 feature vector is downsampled, added with the middle feature vector of the C5 layer and processed by a 3 x 3 convolutional layer to obtain a P5 feature vector, the P5 feature vector is downsampled, added with the middle feature vector of the C6 layer and processed by a 3 x 3 convolutional layer to obtain a P6 feature vector, the P6 feature vector is downsampled, added with the C7 feature vector and processed by a 3 x 3 convolutional layer to obtain a P7 feature vector, wherein the P3, P4, P5, P6 and P7 feature vectors are output vectors of the bidirectional feature pyramid sub-module;
the feature extraction sub-network comprises a 3 × 3 convolutional layer and 16 mobile turning bottleneck convolutional modules which are sequentially connected, the weighted bidirectional feature pyramid sub-network comprises 3 bidirectional feature pyramid sub-modules, the classification regression sub-network comprises 2 branches, and each branch comprises 2 1 × 1 shared convolutional layers;
the training process of the second wheat head detection model is similar to the training process of the first wheat head detection model, pictures are continuously read in batches, iterative training is carried out, the model is designed to learn and identify a target object iteratively, and distribution of real data is continuously fitted, except that the regression loss function of the second wheat head detection model adopts a Smooth L1 loss function, as shown in formula (3):
wherein, Smooth L1Represents the function of the regression loss and is,the predicted value of the boundary box, namely the coordinate of the central point and the width and the height, y represents the labeled coordinate of the real target box,representing the difference between the predicted value and the true value of the model;
2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model in the steps until the test effect of the wheat head detection model is not improved any more, in the example, the stopping condition of model pseudo label training is set as that the model prediction average precision is improved by less than 0.2%, and then the training is stopped, wherein the weighting frame fusion method comprises the following steps:
1-2) firstly creating a list B, wherein the list B is used for storing 2 model prediction boundary frames based on a supervised training mode, and then performing descending arrangement on the prediction frames according to the confidence coefficient of the model prediction;
2-2) creating a list L and a list F, wherein the list L is a multi-dimensional list, 1 or more prediction frames are stored in each position, the lists are called clusters, all the prediction frames in each cluster are subjected to weighted fusion, and a boundary frame obtained by fusion is stored in the list F, namely the list F is used for storing 2 fusion frames with supervision models;
3-2) traversing the prediction boundary box in the list B, and performing 'clustering' in the list F for storing the fusion box, wherein the 'clustering' rule is that IoU values are calculated for the current prediction box and the fusion box, and if IoU values of the two boxes are greater than a specified threshold, successful 'clustering' is considered, and the IoU threshold is set to be 0.55 in the example;
4-2) in the step 3-2), if the clustering is not successful, that is, the fusion frame larger than the specified IoU threshold is not found, a new cluster is newly built for the current prediction frame and is added to the tail parts of the list L and the list F, if the clustering is successful, the fusion frame larger than the specified IoU threshold is found, the current prediction frame is added to the list L, the added position is the subscript position of the fusion frame matched by the current prediction frame in the list F, and after the list L is added, the fusion frame at the corresponding position in the list F needs to be updated according to all the prediction frames in the cluster;
5-2) in step 4-2), after the current prediction box is added to the corresponding cluster, updating the confidence coefficient and the coordinates of the fusion box in the list F by using all the bounding boxes in each cluster of L, assuming that there are T bounding boxes, and calculating the confidence coefficient and the coordinates of the fusion box in the list F in the manner shown in formula (4), formula (5) and formula (6):
c represents the confidence coefficient of the model prediction frame, X, Y represents the coordinate of the prediction boundary frame, the coordinate of the fusion frame is obtained by multiplying the confidence coefficient of the prediction frame and the coordinate value, then accumulating and dividing by the sum of the confidence coefficients, and thus the action weight of the boundary frame with larger assigned confidence coefficient is larger;
6-2) after traversing the list B, readjusting the confidence of each fused box in the list F, because if the number of boxes in a cluster is too small, it means that only a few models predict the box, and therefore it is necessary to reduce the confidence of the corresponding box in this case, and the adjustment formula is as shown in formula (7):
the model obtained by training in the embodiment by adopting the semi-supervised pseudo label learning mode has the obvious advantages of being capable of self-adapting to wheat head detection of different regional varieties and having good generalization.
Claims (1)
1. A wheat head detection method based on computer vision semi-supervised pseudo label learning is characterized by comprising the following steps: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,
1) the wheat head detection models for training different types are as follows: training different first wheat head detection models and second wheat head detection models based on a supervision training mode, wherein a data set used for training the wheat head detection models is a global wheat head detection data set which is used for estimating the number and the size of wheat heads, 3000 images of France, British and Switzerland in European regions and Canada in North America in the global wheat head detection data set are selected as training data sets, 1000 wheat head images of different varieties in Japan and China in Australia in the global wheat head detection data set are selected as test data sets, the first wheat head detection model takes yolov5s as a reference model, and the training process is as follows:
1-1) carrying out batch selection on training samples from a training set part of a global wheat head detection data set, randomly reading n pictures in each batch, and randomly selecting the value of n;
1-2) scaling the picture to 640 x 640 pixel size;
1-3) performing data augmentation on all the batch pictures, including color space transformation, picture rotation, random translation, turnover, Mosaic and affine transformation;
1-4) inputting the pictures subjected to data amplification into a yolov5s network model in batches, predicting to obtain target types and position information in the pictures, wherein the target types and the position information comprise target object types, coordinates of central points of predicted boundary frames, width and height, then performing loss calculation on a predicted value of a first wheat head detection model and a target label of the input pictures through a loss function, returning the gradient of the calculated loss value to the network through a back propagation algorithm, and performing iterative updating of network parameters, wherein a classification loss function of the first wheat head detection model adopts a Focal loss function, and is shown in a formula (1):
wherein L isclsRepresenting the classification loss function, alpha and gamma are hyper-parameters of the loss function,the classification prediction value of the first wheat head detection model is between 0 and 1,
the first wheat head detection model uses the GIoU loss function as the regression loss function, as shown in equation (2):
wherein L islocRepresents the regression loss function, A represents the area of the predicted bounding box, B represents the area of the true labeled box of the target, C represents the area of the smallest rectangle that can enclose A and B, IoUABRepresenting the intersection and comparison of the areas of the predicted boundary frame A and the target real labeling frame, wherein C/(Au B) represents the difference value obtained by subtracting the union of the areas of the predicted boundary frame A and the target real labeling frame from the area of the minimum surrounding rectangle C;
the second wheat head detection model takes EfficientDet as a reference model, comprises a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and is defined as follows:
the input of the mobile turnover bottleneck convolution submodule is a characteristic vector with a channel dimension of C, firstly, the characteristic vector is subjected to dimension increasing through a 1 x 1 convolution layer, and sequentially passes through a batch normalization layer, a swish activation function, a 5 x 5 depth separable convolution layer, a batch normalization layer and a swish activation function, then the characteristic vector is divided into two branches, the first branch is provided with a global average pooling layer, a 1 x 1 convolution layer, a swish activation function, a 1 x 1 convolution layer and a sigmoid activation function, the second branch is provided with a 1 x 1 convolution layer for dimension reduction, a batch normalization layer and a dropout function, and finally, the input vector of the module and the output vector of the second branch are subjected to residual error connection to be used as the final output of the module;
the input of the bidirectional feature pyramid submodule is an output vector of a C3, C4 and C5 layer feature layer of a feature extraction network and a C6 and C7 feature vector of a C5 feature vector after twice pooling, firstly, the C7 feature vector is up-sampled, the C6 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 1, the intermediate feature vector 1 is up-sampled, the C5 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 2, the intermediate feature vector 2 is up-sampled, the C4 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 3, the intermediate feature vector 3 is up-sampled, the C3 feature vector is added and passes through a 3 x 3 convolutional layer to obtain a P3 feature vector, the P3 feature vector is down-sampled, the intermediate feature vector of a C4 layer is added and passes through a 3 x 3 convolutional layer to obtain a P4 feature vector, the P4 feature vector is downsampled, added with the middle feature vector of the C5 layer and processed by a 3 x 3 convolutional layer to obtain a P5 feature vector, the P5 feature vector is downsampled, added with the middle feature vector of the C6 layer and processed by a 3 x 3 convolutional layer to obtain a P6 feature vector, the P6 feature vector is downsampled, added with the C7 feature vector and processed by a 3 x 3 convolutional layer to obtain a P7 feature vector, wherein the P3, P4, P5, P6 and P7 feature vectors are output vectors of the bidirectional feature pyramid sub-module;
the feature extraction sub-network comprises a 3 × 3 convolutional layer and 16 mobile turning bottleneck convolutional modules which are sequentially connected, the weighted bidirectional feature pyramid sub-network comprises 3 bidirectional feature pyramid sub-modules, the classification regression sub-network comprises 2 branches, and each branch comprises 2 1 × 1 shared convolutional layers;
the training process of the second wheat head detection model is similar to the training process of the first wheat head detection model, except that the regression loss function of the second wheat head detection model adopts a Smooth L1 loss function, as shown in formula (3):
wherein, Smooth L1Represents the function of the regression loss and is,the predicted value of the boundary box, namely the coordinate of the central point and the width and the height, y represents the labeled coordinate of the real target box,representing the difference between the predicted value and the true value of the model;
2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model until the test effect of the wheat head detection model is not improved any more, and setting the stopping condition of model pseudo label training as that the model prediction average precision is improved by less than 0.2 percent to stop training, wherein the weighting frame fusion method comprises the following steps:
1-2) firstly creating a list B, wherein the list B is used for storing 2 model prediction boundary frames based on a supervised training mode, and then performing descending arrangement on the prediction frames according to the confidence coefficient of the model prediction;
2-2) creating a list L and a list F, wherein the list L is a multi-dimensional list, 1 or more prediction frames are stored in each position, the lists are called clusters, all the prediction frames in each cluster are subjected to weighted fusion, and a boundary frame obtained by fusion is stored in the list F, namely the list F is used for storing 2 fusion frames with supervision models;
3-2) traversing the prediction boundary box in the list B, and carrying out 'clustering' in the list F for storing the fusion box, wherein the 'clustering' rule is that IoU value calculation is carried out on the current prediction box and the fusion box, and the IoU values of the two boxes are greater than a specified threshold value, so that 'clustering' is considered to be successful;
4-2) in the step 3-2), if the clustering is not successful, that is, the fusion frame larger than the specified IoU threshold is not found, a new cluster is newly built for the current prediction frame and is added to the tail parts of the list L and the list F, if the clustering is successful, the fusion frame larger than the specified IoU threshold is found, the current prediction frame is added to the list L, the added position is the subscript position of the fusion frame matched by the current prediction frame in the list F, and after the list L is added, the fusion frame at the corresponding position in the list F needs to be updated according to all the prediction frames in the cluster;
5-2) in step 4-2), after the current prediction box is added to the corresponding cluster, updating the confidence coefficient and the coordinates of the fusion box in the list F by using all the bounding boxes in each cluster of L, assuming that there are T bounding boxes, and calculating the confidence coefficient and the coordinates of the fusion box in the list F in the manner shown in formula (4), formula (5) and formula (6):
c represents the confidence coefficient of the model prediction frame, X, Y represents the coordinate of the prediction boundary frame, and the coordinate of the fusion frame is obtained by multiplying the confidence coefficient of the prediction frame and the coordinate value, then accumulating and dividing by the sum of the confidence coefficients;
6-2) after traversing the list B, readjusting the confidence of each fusion box in the list F, wherein the adjustment formula is shown as a formula (7):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110849609.6A CN113554627B (en) | 2021-07-27 | 2021-07-27 | Wheat head detection method based on computer vision semi-supervised pseudo label learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110849609.6A CN113554627B (en) | 2021-07-27 | 2021-07-27 | Wheat head detection method based on computer vision semi-supervised pseudo label learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113554627A true CN113554627A (en) | 2021-10-26 |
CN113554627B CN113554627B (en) | 2022-04-29 |
Family
ID=78132902
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110849609.6A Active CN113554627B (en) | 2021-07-27 | 2021-07-27 | Wheat head detection method based on computer vision semi-supervised pseudo label learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113554627B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115082757A (en) * | 2022-07-13 | 2022-09-20 | 北京百度网讯科技有限公司 | Pseudo label generation method, target detection model training method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451616A (en) * | 2017-08-01 | 2017-12-08 | 西安电子科技大学 | Multi-spectral remote sensing image terrain classification method based on the semi-supervised transfer learning of depth |
CN107644235A (en) * | 2017-10-24 | 2018-01-30 | 广西师范大学 | Image automatic annotation method based on semi-supervised learning |
CN112149733A (en) * | 2020-09-23 | 2020-12-29 | 北京金山云网络技术有限公司 | Model training method, model training device, quality determining method, quality determining device, electronic equipment and storage medium |
CN112232416A (en) * | 2020-10-16 | 2021-01-15 | 浙江大学 | Semi-supervised learning method based on pseudo label weighting |
CN112488006A (en) * | 2020-12-05 | 2021-03-12 | 东南大学 | Target detection algorithm based on wheat image |
CN113128476A (en) * | 2021-05-17 | 2021-07-16 | 广西师范大学 | Low-power consumption real-time helmet detection method based on computer vision target detection |
CN113158865A (en) * | 2021-04-14 | 2021-07-23 | 杭州电子科技大学 | Wheat ear detection method based on EfficientDet |
-
2021
- 2021-07-27 CN CN202110849609.6A patent/CN113554627B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451616A (en) * | 2017-08-01 | 2017-12-08 | 西安电子科技大学 | Multi-spectral remote sensing image terrain classification method based on the semi-supervised transfer learning of depth |
CN107644235A (en) * | 2017-10-24 | 2018-01-30 | 广西师范大学 | Image automatic annotation method based on semi-supervised learning |
CN112149733A (en) * | 2020-09-23 | 2020-12-29 | 北京金山云网络技术有限公司 | Model training method, model training device, quality determining method, quality determining device, electronic equipment and storage medium |
CN112232416A (en) * | 2020-10-16 | 2021-01-15 | 浙江大学 | Semi-supervised learning method based on pseudo label weighting |
CN112488006A (en) * | 2020-12-05 | 2021-03-12 | 东南大学 | Target detection algorithm based on wheat image |
CN113158865A (en) * | 2021-04-14 | 2021-07-23 | 杭州电子科技大学 | Wheat ear detection method based on EfficientDet |
CN113128476A (en) * | 2021-05-17 | 2021-07-16 | 广西师范大学 | Low-power consumption real-time helmet detection method based on computer vision target detection |
Non-Patent Citations (2)
Title |
---|
单纯等: "半监督单样本深度行人重识别方法", 《计算机***应用》 * |
马蕾等: "基于支持向量机协同训练的半监督回归", 《计算机工程与应用》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115082757A (en) * | 2022-07-13 | 2022-09-20 | 北京百度网讯科技有限公司 | Pseudo label generation method, target detection model training method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113554627B (en) | 2022-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A high-precision detection method of hydroponic lettuce seedlings status based on improved Faster RCNN | |
Fu et al. | Fast and accurate detection of kiwifruit in orchard using improved YOLOv3-tiny model | |
Xiong et al. | Visual detection of green mangoes by an unmanned aerial vehicle in orchards based on a deep learning method | |
Chen et al. | Detecting citrus in orchard environment by using improved YOLOv4 | |
CN113076871B (en) | Fish shoal automatic detection method based on target shielding compensation | |
Ruiz-Ruiz et al. | Testing different color spaces based on hue for the environmentally adaptive segmentation algorithm (EASA) | |
Jabir et al. | Accuracy and efficiency comparison of object detection open-source models. | |
CN111340141A (en) | Crop seedling and weed detection method and system based on deep learning | |
CN112364931B (en) | Few-sample target detection method and network system based on meta-feature and weight adjustment | |
CN110222215B (en) | Crop pest detection method based on F-SSD-IV3 | |
Rong et al. | A peduncle detection method of tomato for autonomous harvesting | |
CN109325484A (en) | Flowers image classification method based on background priori conspicuousness | |
CN115393687A (en) | RGB image semi-supervised target detection method based on double pseudo-label optimization learning | |
Lv et al. | A visual identification method for the apple growth forms in the orchard | |
CN111723764A (en) | Improved fast RCNN hydroponic vegetable seedling state detection method | |
CN113554627B (en) | Wheat head detection method based on computer vision semi-supervised pseudo label learning | |
Gao et al. | Recognition and Detection of Greenhouse Tomatoes in Complex Environment. | |
CN114549970B (en) | Night small target fruit detection method and system integrating global fine granularity information | |
Ubbens et al. | Autocount: Unsupervised segmentation and counting of organs in field images | |
Li et al. | MTA-YOLACT: Multitask-aware network on fruit bunch identification for cherry tomato robotic harvesting | |
Li et al. | A novel approach for the 3D localization of branch picking points based on deep learning applied to longan harvesting UAVs | |
Huang et al. | A survey of deep learning-based object detection methods in crop counting | |
Zhang et al. | Multi-class detection of cherry tomatoes using improved Yolov4-tiny model | |
Yu et al. | A-pruning: a lightweight pineapple flower counting network based on filter pruning | |
Hu et al. | Automatic detection of pecan fruits based on Faster RCNN with FPN in orchard |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |