CN113554627A - Wheat head detection method based on computer vision semi-supervised pseudo label learning - Google Patents

Wheat head detection method based on computer vision semi-supervised pseudo label learning Download PDF

Info

Publication number
CN113554627A
CN113554627A CN202110849609.6A CN202110849609A CN113554627A CN 113554627 A CN113554627 A CN 113554627A CN 202110849609 A CN202110849609 A CN 202110849609A CN 113554627 A CN113554627 A CN 113554627A
Authority
CN
China
Prior art keywords
feature vector
head detection
wheat head
list
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110849609.6A
Other languages
Chinese (zh)
Other versions
CN113554627B (en
Inventor
钟必能
张子凯
郑耀宗
梁启花
李先贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN202110849609.6A priority Critical patent/CN113554627B/en
Publication of CN113554627A publication Critical patent/CN113554627A/en
Application granted granted Critical
Publication of CN113554627B publication Critical patent/CN113554627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a wheat head detection method based on computer vision semi-supervised pseudo label learning, which is characterized by comprising the following steps of: training different wheat head detection models and semi-supervised pseudo label learning strategies. The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.

Description

Wheat head detection method based on computer vision semi-supervised pseudo label learning
Technical Field
The invention relates to the technical field of artificial intelligence and intelligent agriculture, in particular to a wheat head detection method based on computer vision semi-supervised pseudo label learning.
Background
Wheat is a global grain crop, and with the advancement of agriculture modernization in China, the wheat yield is improved by using a modern artificial intelligence technology, and the auxiliary agriculture production management becomes an important exploratory problem. Traditional agricultural's planting relies on the work of manpower completely, and this is the process of an extremely consuming time and power, and whole process can't last to keep high-efficient work and do not make mistakes moreover, and in addition, traditional agricultural also has influenced the output of crops to a certain extent because the growth condition of unable real time supervision crops. In the process of promoting the modernization of agriculture, the growth condition of wheat is automatically monitored by using an intelligent algorithm, so that the traditional agriculture can be promoted to advance to a new agricultural era of modernized planting and intelligent management. In the field of computer vision, although a target detection algorithm is greatly improved after years of research by a plurality of scholars, the method still has great challenges in actual scenes, such as different wheat varieties in different regions, various properties, inconsistent growth cycles and the like, so that it is important to construct a detection model which can keep high robustness and high accuracy in different scenes.
Disclosure of Invention
The invention aims to provide a wheat head detection method based on computer vision semi-supervised pseudo label learning, aiming at the defects of the prior art. The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.
The technical scheme for realizing the purpose of the invention is as follows:
a wheat head detection method based on computer vision semi-supervised pseudo label learning, the method comprising: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,
1) the wheat head detection models for training different types are as follows: the method is characterized in that a first wheat head detection model and a second wheat head detection model which are different and based on a supervised training mode are trained, a data set used for training the wheat head detection model is a global wheat head detection data set which is a data set mainly completed by nine research institutes of seven countries, including Tokyo university, French national agricultural science institute, French nutrition and environment institute, France technical research institution Arvalis, Zuis Suzurich Federal Industrial school, Kasan Kachester Technology university, Australian Kunland Lange university, Nanjing agriculture university in China and Hibiscus British institute, the global wheat head detection data set is used for a universal solution of wheat head detection and is used for estimating the number and the size of wheat heads, accurate wheat head detection in a field image is very challenging, overlapping of wheat and external factors such as wind blowing can cause blurred pictures, and many unpredictable problems can make it difficult to identify a single wheat head, besides, the appearance of wheat has great difference due to maturity, color, genotype, head direction and planting density, and meanwhile, different wheat varieties are planted in all regions of the world, in order to make the wheat detection model have better generalization performance under different detection environments, 3000 images of france, british and switzerland canada in european regions in the global wheat head detection data set are selected as training data sets, 1000 images of different varieties of wheat heads in different regions from australia, japan and china in the global wheat head detection data set are selected as test data sets, the first wheat head detection model takes yolov5s as a reference model, and the training process is as follows:
1-1) carrying out batch selection on training samples from a training set part of a global wheat head detection data set, randomly reading n pictures in each batch, and randomly selecting the value of n;
1-2) scaling the picture to 640 x 640 pixel size;
1-3) performing data augmentation on all the batch pictures, including color space transformation, picture rotation, random translation, turnover, Mosaic and affine transformation;
1-4) inputting the pictures subjected to data amplification into a yolv 5s network model in batches, predicting to obtain target types and position information in the pictures, wherein the target types and position information comprise target object types and coordinates of center points of predicted boundary frames, width and height, then performing loss calculation on a predicted value of a first wheat head detection model and a target label of the input picture through a loss function, returning gradients to the network through a back propagation algorithm by the calculated loss value, performing iterative update of network parameters, and enabling the model to learn and identify the target object iteratively through a learning mode to continuously fit the distribution of real data, aiming at training to obtain an identification model with the best performance, wherein the classification loss function of the first wheat head detection model adopts a Focal loss function, and is shown in a formula (1):
Figure BDA0003181921860000021
wherein L isclsRepresenting a classification loss function, alpha and gamma are hyper-parameters of the loss function, gamma is used for adjusting the loss of easy samples and difficult samples, so that the loss function can focus more on the difficult samples, alpha is used for balancing the nonuniformity of positive and negative samples,
Figure BDA0003181921860000022
the classification prediction value of the first wheat head detection model is between 0 and 1,
the classification loss function is used for continuously optimizing the prediction of the target object category, the regression loss function is used for continuously optimizing the prediction of the target object position coordinate, the coordinate of the central point of the predicted boundary box is continuously fitted with the width and the height of the predicted boundary box, however, the regression task is only used for optimizing the positive sample and not used for performing iterative optimization on the negative sample, the target box does not exist in the negative sample, and the first wheat head detection model adopts the GIoU loss function as the regression loss function, as shown in a formula (2):
Figure BDA0003181921860000031
wherein L islocRepresents the regression loss function, A represents the area of the predicted bounding box, B represents the area of the true labeled box of the target, C represents the area of the smallest rectangle that can enclose A and B, IoUABRepresenting the area intersection ratio of the prediction boundary frame A and the target real labeling frame, C/(Au B)The difference value of the area of the minimum bounding rectangle C minus the union of the areas of the prediction box A and the real labeling box B is represented;
the second wheat head detection model takes EfficientDet as a reference model, and as for the conventional multi-scale feature fusion method, such as a feature pyramid network, a path aggregation network and the like, features are simply added, however, the resolutions of different features are different, and the contribution degrees of the different features to the fused features are also different, so that the EfficientDet introduces a weighted bidirectional feature pyramid network, senses the importance of the different features by combining learnable weights, and simultaneously conducts multi-scale feature fusion from top to bottom and from bottom to top for many times, the second wheat head detection model comprises a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and a mobile flip bottleneck convolution sub-module and a bidirectional feature pyramid sub-module which form the network are defined as follows:
the input of the mobile turnover bottleneck convolution submodule is a characteristic vector with a channel dimension of C, firstly, the characteristic vector is subjected to dimension increasing through a 1 x 1 convolution layer, and sequentially passes through a batch normalization layer, a swish activation function, a 5 x 5 depth separable convolution layer, a batch normalization layer and a swish activation function, then the characteristic vector is divided into two branches, the first branch is provided with a global average pooling layer, a 1 x 1 convolution layer, a swish activation function, a 1 x 1 convolution layer and a sigmoid activation function, the second branch is provided with a 1 x 1 convolution layer for dimension reduction, a batch normalization layer and a dropout function, and finally, the input vector of the module and the output vector of the second branch are subjected to residual error connection to be used as the final output of the module;
the input of the bidirectional feature pyramid submodule is an output vector of a C3, C4 and C5 layer feature layer of a feature extraction network and a C6 and C7 feature vector of a C5 feature vector after twice pooling, firstly, the C7 feature vector is up-sampled, the C6 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 1, the intermediate feature vector 1 is up-sampled, the C5 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 2, the intermediate feature vector 2 is up-sampled, the C4 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 3, the intermediate feature vector 3 is up-sampled, the C3 feature vector is added and passes through a 3 x 3 convolutional layer to obtain a P3 feature vector, the P3 feature vector is down-sampled, the intermediate feature vector of a C4 layer is added and passes through a 3 x 3 convolutional layer to obtain a P4 feature vector, the P4 feature vector is downsampled, added with the middle feature vector of the C5 layer and processed by a 3 x 3 convolutional layer to obtain a P5 feature vector, the P5 feature vector is downsampled, added with the middle feature vector of the C6 layer and processed by a 3 x 3 convolutional layer to obtain a P6 feature vector, the P6 feature vector is downsampled, added with the C7 feature vector and processed by a 3 x 3 convolutional layer to obtain a P7 feature vector, wherein the P3, P4, P5, P6 and P7 feature vectors are output vectors of the bidirectional feature pyramid sub-module;
the feature extraction sub-network comprises a 3 × 3 convolutional layer and 16 mobile turning bottleneck convolutional modules which are sequentially connected, the weighted bidirectional feature pyramid sub-network comprises 3 bidirectional feature pyramid sub-modules, the classification regression sub-network comprises 2 branches, and each branch comprises 2 1 × 1 shared convolutional layers;
the training process of the second wheat head detection model is similar to the training process of the first wheat head detection model, pictures are continuously read in batches, iterative training is carried out, the model is designed to learn and identify a target object iteratively, and distribution of real data is continuously fitted, except that the regression loss function of the second wheat head detection model adopts a Smooth L1 loss function, as shown in formula (3):
Figure BDA0003181921860000041
wherein, Smooth L1Represents the function of the regression loss and is,
Figure BDA0003181921860000042
the predicted value of the boundary box, namely the coordinate of the central point and the width and the height, y represents the labeled coordinate of the real target box,
Figure BDA0003181921860000043
between the predicted value and the true value of the representative modelA difference of (d);
2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model in the steps until the test effect of the wheat head detection model is not improved any more, and setting the stopping condition of model pseudo label training as the stopping condition when the average precision of model prediction is improved by less than 0.2%, wherein the weighting frame fusion method comprises the following steps:
1-2) firstly creating a list B, wherein the list B is used for storing 2 model prediction boundary frames based on a supervised training mode, and then performing descending arrangement on the prediction frames according to the confidence coefficient of the model prediction;
2-2) creating a list L and a list F, wherein the list L is a multi-dimensional list, 1 or more prediction frames are stored in each position, the lists are called clusters, all the prediction frames in each cluster are subjected to weighted fusion, and a boundary frame obtained by fusion is stored in the list F, namely the list F is used for storing 2 fusion frames with supervision models;
3-2) traversing the prediction boundary box in the list B, and performing ' clustering ' in the list F for storing the fusion box, wherein the clustering ' rule is that IoU values are calculated for the current prediction box and the fusion box, and if IoU values of the two boxes are greater than a specified threshold value, successful ' clustering ' is considered, and the technical scheme sets a IoU threshold value to be 0.55;
4-2) in the step 3-2), if the clustering is not successful, that is, the fusion frame larger than the specified IoU threshold is not found, a new cluster is newly built for the current prediction frame and is added to the tail parts of the list L and the list F, if the clustering is successful, the fusion frame larger than the specified IoU threshold is found, the current prediction frame is added to the list L, the added position is the subscript position of the fusion frame matched by the current prediction frame in the list F, and after the list L is added, the fusion frame at the corresponding position in the list F needs to be updated according to all the prediction frames in the cluster;
5-2) in step 4-2), after the current prediction box is added to the corresponding cluster, updating the confidence coefficient and the coordinates of the fusion box in the list F by using all the bounding boxes in each cluster of L, assuming that there are T bounding boxes, and calculating the confidence coefficient and the coordinates of the fusion box in the list F in the manner shown in formula (4), formula (5) and formula (6):
Figure BDA0003181921860000051
Figure BDA0003181921860000052
Figure BDA0003181921860000053
c represents the confidence coefficient of the model prediction frame, X, Y represents the coordinate of the prediction boundary frame, the coordinate of the fusion frame is obtained by multiplying the confidence coefficient of the prediction frame and the coordinate value, then accumulating and dividing by the sum of the confidence coefficients, and thus the action weight of the boundary frame with larger assigned confidence coefficient is larger;
6-2) after traversing the list B, readjusting the confidence of each fused box in the list F, because if the number of boxes in a cluster is too small, it means that only a few models predict the box, and therefore it is necessary to reduce the confidence of the corresponding box in this case, and the adjustment formula is as shown in formula (7):
Figure BDA0003181921860000054
by the semi-supervised pseudo label learning mode, the model obtained by training has the obvious advantages of being capable of self-adapting to wheat head detection of different regional varieties and having good generalization.
Under a real scene, the wheat head identification is easily influenced by various factors such as illumination change, shielding of crops and other plants, complex background, different varieties and the like, so that the wheat head cannot be identified with high precision and high robustness by the existing wheat head detection method.
The technical scheme utilizes a computer vision semi-supervised pseudo label technology, can be self-adaptive to the wheat head detection of different regions and varieties, is more adaptive to actual detection scenes in real life, relieves the challenges of small target detection, shielding, complex background, light change and detection of different wheat varieties, obviously improves the robustness and the identification precision of the wheat head detector, and is beneficial to promoting the progress of agricultural modernization.
The method has high robustness and high accuracy of wheat identification in various scenes, and can realize intelligent wheat head identification so as to reduce labor consumption and improve identification efficiency.
Drawings
FIG. 1 is a schematic diagram of an EfficientNet network in an embodiment;
FIG. 2 is a diagram illustrating a structure of a convolution sub-module of a moving flip bottleneck in an embodiment;
FIG. 3 is a schematic diagram of a bi-directional feature pyramid sub-module in an embodiment;
FIG. 4 is a schematic flow chart of an exemplary method.
Detailed Description
The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.
Example (b):
referring to fig. 4, a wheat head detection method based on computer vision semi-supervised pseudo label learning, the method comprising: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,
1) the wheat head detection models for training different types are as follows: the method is characterized in that a first wheat head detection model and a second wheat head detection model which are different and based on a supervised training mode are trained, a data set used for training the wheat head detection model is a global wheat head detection data set which is a data set mainly completed by nine research institutes of seven countries, including Tokyo university, French national agricultural science institute, French nutrition and environment institute, France technical research institution Arvalis, Zuis Suzurich Federal Industrial school, Kasan Kachester Technology university, Australian Kunland Lange university, Nanjing agriculture university in China and Hibiscus British institute, the global wheat head detection data set is used for a universal solution of wheat head detection and is used for estimating the number and the size of wheat heads, accurate wheat head detection in a field image is very challenging, overlapping of wheat and external factors such as wind blowing can cause blurred pictures, and many unpredictable problems can make it difficult to identify a single wheat head, besides, the appearance of wheat has great difference due to maturity, color, genotype, head direction and planting density, and meanwhile, different wheat varieties are planted in all regions of the world, in order to make the wheat detection model have better generalization performance under different detection environments, 3000 images of france, british and switzerland canada in european regions in the global wheat head detection data set are selected as training data sets, 1000 images of different varieties of wheat heads in different regions from australia, japan and china in the global wheat head detection data set are selected as test data sets, the first wheat head detection model takes yolov5s as a reference model, and the training process is as follows:
1-1) carrying out batch selection on training samples from a training set part of a global wheat head detection data set, randomly reading n pictures in each batch, randomly selecting the value of n, and selecting n-32 pictures to input a model for training in the example;
1-2) scaling the picture to 640 x 640 pixel size;
1-3) performing data augmentation on all the batch pictures, including color space transformation, picture rotation, random translation, turnover, Mosaic and affine transformation;
1-4) inputting the pictures subjected to data amplification into a yolv 5s network model in batches, predicting to obtain target types and position information in the pictures, wherein the target types and position information comprise target object types and coordinates of center points of predicted boundary frames, width and height, then performing loss calculation on a predicted value of a first wheat head detection model and a target label of the input picture through a loss function, returning gradients to the network through a back propagation algorithm by the calculated loss value, performing iterative update of network parameters, and enabling the model to learn and identify the target object iteratively through a learning mode to continuously fit the distribution of real data, aiming at training to obtain an identification model with the best performance, wherein the classification loss function of the first wheat head detection model adopts a Focal loss function, and is shown in a formula (1):
Figure BDA0003181921860000071
wherein L isclsRepresenting a classification loss function, alpha and gamma are hyper-parameters of the loss function, gamma is used for adjusting the loss of easy samples and difficult samples, so that the loss function can focus more on the difficult samples, alpha is used for balancing the nonuniformity of positive and negative samples,
Figure BDA0003181921860000072
the classification prediction value of the first wheat head detection model is between 0 and 1,
the classification loss function is used for continuously optimizing the prediction of the target object category, the regression loss function is used for continuously optimizing the prediction of the target object position coordinate, the coordinate of the central point of the predicted boundary box is continuously fitted with the width and the height of the predicted boundary box, however, the regression task is only used for optimizing the positive sample and not used for performing iterative optimization on the negative sample, the target box does not exist in the negative sample, and the first wheat head detection model adopts the GIoU loss function as the regression loss function, as shown in a formula (2):
Figure BDA0003181921860000073
wherein L islocRepresents the regression loss function, A represents the area of the predicted bounding box, B represents the area of the true labeled box of the target, C represents the area of the smallest rectangle that can enclose A and B, IoUABRepresenting the intersection and comparison of the areas of the predicted boundary frame A and the target real labeling frame, wherein C/(Au B) represents the difference value obtained by subtracting the union of the areas of the predicted boundary frame A and the target real labeling frame from the area of the minimum surrounding rectangle C;
the second wheat head detection model uses EfficientDet as a reference model, as shown in fig. 1, for the conventional multi-scale feature fusion method, such as a feature pyramid network, a path aggregation network and the like, which simply adds features, however, the resolutions of different features are different, and the contribution degrees of the different features to the fused features are also different, so that the EfficientDet introduces a weighted bidirectional feature pyramid network, senses the importance of different features by combining learnable weights, and performs multi-scale feature fusion from top to bottom and from bottom to top for multiple times, the second wheat head detection model includes a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and a mobile flip bottleneck convolution sub-module and a bidirectional feature pyramid sub-module which form a network are respectively shown in fig. 2 and fig. 3 and are defined as follows:
the input of the mobile turnover bottleneck convolution submodule is a characteristic vector with a channel dimension of C, firstly, the characteristic vector is subjected to dimension increasing through a 1 x 1 convolution layer, and sequentially passes through a batch normalization layer, a swish activation function, a 5 x 5 depth separable convolution layer, a batch normalization layer and a swish activation function, then the characteristic vector is divided into two branches, the first branch is provided with a global average pooling layer, a 1 x 1 convolution layer, a swish activation function, a 1 x 1 convolution layer and a sigmoid activation function, the second branch is provided with a 1 x 1 convolution layer for dimension reduction, a batch normalization layer and a dropout function, and finally, the input vector of the module and the output vector of the second branch are subjected to residual error connection to be used as the final output of the module;
the input of the bidirectional feature pyramid submodule is an output vector of a C3, C4 and C5 layer feature layer of a feature extraction network and a C6 and C7 feature vector of a C5 feature vector after twice pooling, firstly, the C7 feature vector is up-sampled, the C6 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 1, the intermediate feature vector 1 is up-sampled, the C5 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 2, the intermediate feature vector 2 is up-sampled, the C4 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 3, the intermediate feature vector 3 is up-sampled, the C3 feature vector is added and passes through a 3 x 3 convolutional layer to obtain a P3 feature vector, the P3 feature vector is down-sampled, the intermediate feature vector of a C4 layer is added and passes through a 3 x 3 convolutional layer to obtain a P4 feature vector, the P4 feature vector is downsampled, added with the middle feature vector of the C5 layer and processed by a 3 x 3 convolutional layer to obtain a P5 feature vector, the P5 feature vector is downsampled, added with the middle feature vector of the C6 layer and processed by a 3 x 3 convolutional layer to obtain a P6 feature vector, the P6 feature vector is downsampled, added with the C7 feature vector and processed by a 3 x 3 convolutional layer to obtain a P7 feature vector, wherein the P3, P4, P5, P6 and P7 feature vectors are output vectors of the bidirectional feature pyramid sub-module;
the feature extraction sub-network comprises a 3 × 3 convolutional layer and 16 mobile turning bottleneck convolutional modules which are sequentially connected, the weighted bidirectional feature pyramid sub-network comprises 3 bidirectional feature pyramid sub-modules, the classification regression sub-network comprises 2 branches, and each branch comprises 2 1 × 1 shared convolutional layers;
the training process of the second wheat head detection model is similar to the training process of the first wheat head detection model, pictures are continuously read in batches, iterative training is carried out, the model is designed to learn and identify a target object iteratively, and distribution of real data is continuously fitted, except that the regression loss function of the second wheat head detection model adopts a Smooth L1 loss function, as shown in formula (3):
Figure BDA0003181921860000091
wherein, Smooth L1Represents the function of the regression loss and is,
Figure BDA0003181921860000092
the predicted value of the boundary box, namely the coordinate of the central point and the width and the height, y represents the labeled coordinate of the real target box,
Figure BDA0003181921860000093
representing the difference between the predicted value and the true value of the model;
2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model in the steps until the test effect of the wheat head detection model is not improved any more, in the example, the stopping condition of model pseudo label training is set as that the model prediction average precision is improved by less than 0.2%, and then the training is stopped, wherein the weighting frame fusion method comprises the following steps:
1-2) firstly creating a list B, wherein the list B is used for storing 2 model prediction boundary frames based on a supervised training mode, and then performing descending arrangement on the prediction frames according to the confidence coefficient of the model prediction;
2-2) creating a list L and a list F, wherein the list L is a multi-dimensional list, 1 or more prediction frames are stored in each position, the lists are called clusters, all the prediction frames in each cluster are subjected to weighted fusion, and a boundary frame obtained by fusion is stored in the list F, namely the list F is used for storing 2 fusion frames with supervision models;
3-2) traversing the prediction boundary box in the list B, and performing 'clustering' in the list F for storing the fusion box, wherein the 'clustering' rule is that IoU values are calculated for the current prediction box and the fusion box, and if IoU values of the two boxes are greater than a specified threshold, successful 'clustering' is considered, and the IoU threshold is set to be 0.55 in the example;
4-2) in the step 3-2), if the clustering is not successful, that is, the fusion frame larger than the specified IoU threshold is not found, a new cluster is newly built for the current prediction frame and is added to the tail parts of the list L and the list F, if the clustering is successful, the fusion frame larger than the specified IoU threshold is found, the current prediction frame is added to the list L, the added position is the subscript position of the fusion frame matched by the current prediction frame in the list F, and after the list L is added, the fusion frame at the corresponding position in the list F needs to be updated according to all the prediction frames in the cluster;
5-2) in step 4-2), after the current prediction box is added to the corresponding cluster, updating the confidence coefficient and the coordinates of the fusion box in the list F by using all the bounding boxes in each cluster of L, assuming that there are T bounding boxes, and calculating the confidence coefficient and the coordinates of the fusion box in the list F in the manner shown in formula (4), formula (5) and formula (6):
Figure BDA0003181921860000101
Figure BDA0003181921860000102
Figure BDA0003181921860000103
c represents the confidence coefficient of the model prediction frame, X, Y represents the coordinate of the prediction boundary frame, the coordinate of the fusion frame is obtained by multiplying the confidence coefficient of the prediction frame and the coordinate value, then accumulating and dividing by the sum of the confidence coefficients, and thus the action weight of the boundary frame with larger assigned confidence coefficient is larger;
6-2) after traversing the list B, readjusting the confidence of each fused box in the list F, because if the number of boxes in a cluster is too small, it means that only a few models predict the box, and therefore it is necessary to reduce the confidence of the corresponding box in this case, and the adjustment formula is as shown in formula (7):
Figure BDA0003181921860000104
the model obtained by training in the embodiment by adopting the semi-supervised pseudo label learning mode has the obvious advantages of being capable of self-adapting to wheat head detection of different regional varieties and having good generalization.

Claims (1)

1. A wheat head detection method based on computer vision semi-supervised pseudo label learning is characterized by comprising the following steps: training different wheat head detection models and semi-supervised pseudo label learning strategies, wherein,
1) the wheat head detection models for training different types are as follows: training different first wheat head detection models and second wheat head detection models based on a supervision training mode, wherein a data set used for training the wheat head detection models is a global wheat head detection data set which is used for estimating the number and the size of wheat heads, 3000 images of France, British and Switzerland in European regions and Canada in North America in the global wheat head detection data set are selected as training data sets, 1000 wheat head images of different varieties in Japan and China in Australia in the global wheat head detection data set are selected as test data sets, the first wheat head detection model takes yolov5s as a reference model, and the training process is as follows:
1-1) carrying out batch selection on training samples from a training set part of a global wheat head detection data set, randomly reading n pictures in each batch, and randomly selecting the value of n;
1-2) scaling the picture to 640 x 640 pixel size;
1-3) performing data augmentation on all the batch pictures, including color space transformation, picture rotation, random translation, turnover, Mosaic and affine transformation;
1-4) inputting the pictures subjected to data amplification into a yolov5s network model in batches, predicting to obtain target types and position information in the pictures, wherein the target types and the position information comprise target object types, coordinates of central points of predicted boundary frames, width and height, then performing loss calculation on a predicted value of a first wheat head detection model and a target label of the input pictures through a loss function, returning the gradient of the calculated loss value to the network through a back propagation algorithm, and performing iterative updating of network parameters, wherein a classification loss function of the first wheat head detection model adopts a Focal loss function, and is shown in a formula (1):
Figure FDA0003181921850000011
wherein L isclsRepresenting the classification loss function, alpha and gamma are hyper-parameters of the loss function,
Figure FDA0003181921850000012
the classification prediction value of the first wheat head detection model is between 0 and 1,
the first wheat head detection model uses the GIoU loss function as the regression loss function, as shown in equation (2):
Figure FDA0003181921850000013
wherein L islocRepresents the regression loss function, A represents the area of the predicted bounding box, B represents the area of the true labeled box of the target, C represents the area of the smallest rectangle that can enclose A and B, IoUABRepresenting the intersection and comparison of the areas of the predicted boundary frame A and the target real labeling frame, wherein C/(Au B) represents the difference value obtained by subtracting the union of the areas of the predicted boundary frame A and the target real labeling frame from the area of the minimum surrounding rectangle C;
the second wheat head detection model takes EfficientDet as a reference model, comprises a feature extraction sub-network, a weighted bidirectional feature pyramid sub-network and a classification regression sub-network, and is defined as follows:
the input of the mobile turnover bottleneck convolution submodule is a characteristic vector with a channel dimension of C, firstly, the characteristic vector is subjected to dimension increasing through a 1 x 1 convolution layer, and sequentially passes through a batch normalization layer, a swish activation function, a 5 x 5 depth separable convolution layer, a batch normalization layer and a swish activation function, then the characteristic vector is divided into two branches, the first branch is provided with a global average pooling layer, a 1 x 1 convolution layer, a swish activation function, a 1 x 1 convolution layer and a sigmoid activation function, the second branch is provided with a 1 x 1 convolution layer for dimension reduction, a batch normalization layer and a dropout function, and finally, the input vector of the module and the output vector of the second branch are subjected to residual error connection to be used as the final output of the module;
the input of the bidirectional feature pyramid submodule is an output vector of a C3, C4 and C5 layer feature layer of a feature extraction network and a C6 and C7 feature vector of a C5 feature vector after twice pooling, firstly, the C7 feature vector is up-sampled, the C6 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 1, the intermediate feature vector 1 is up-sampled, the C5 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 2, the intermediate feature vector 2 is up-sampled, the C4 feature vector is added and passes through a 3 x 3 convolutional layer to obtain an intermediate feature vector 3, the intermediate feature vector 3 is up-sampled, the C3 feature vector is added and passes through a 3 x 3 convolutional layer to obtain a P3 feature vector, the P3 feature vector is down-sampled, the intermediate feature vector of a C4 layer is added and passes through a 3 x 3 convolutional layer to obtain a P4 feature vector, the P4 feature vector is downsampled, added with the middle feature vector of the C5 layer and processed by a 3 x 3 convolutional layer to obtain a P5 feature vector, the P5 feature vector is downsampled, added with the middle feature vector of the C6 layer and processed by a 3 x 3 convolutional layer to obtain a P6 feature vector, the P6 feature vector is downsampled, added with the C7 feature vector and processed by a 3 x 3 convolutional layer to obtain a P7 feature vector, wherein the P3, P4, P5, P6 and P7 feature vectors are output vectors of the bidirectional feature pyramid sub-module;
the feature extraction sub-network comprises a 3 × 3 convolutional layer and 16 mobile turning bottleneck convolutional modules which are sequentially connected, the weighted bidirectional feature pyramid sub-network comprises 3 bidirectional feature pyramid sub-modules, the classification regression sub-network comprises 2 branches, and each branch comprises 2 1 × 1 shared convolutional layers;
the training process of the second wheat head detection model is similar to the training process of the first wheat head detection model, except that the regression loss function of the second wheat head detection model adopts a Smooth L1 loss function, as shown in formula (3):
Figure FDA0003181921850000021
wherein, Smooth L1Represents the function of the regression loss and is,
Figure FDA0003181921850000022
the predicted value of the boundary box, namely the coordinate of the central point and the width and the height, y represents the labeled coordinate of the real target box,
Figure FDA0003181921850000023
representing the difference between the predicted value and the true value of the model;
2) the semi-supervised pseudo tag learning strategy is as follows: the method comprises the steps of firstly adopting a first wheat head detection model and a second wheat head detection model to predict label-free data to obtain a plurality of model prediction frames, then adopting a weighting frame fusion method to fuse the prediction frames with higher confidence coefficient as pseudo label data, then adopting an original training data set and a pseudo label data set to retrain a new wheat head detection model, continuously and iteratively training the new model until the test effect of the wheat head detection model is not improved any more, and setting the stopping condition of model pseudo label training as that the model prediction average precision is improved by less than 0.2 percent to stop training, wherein the weighting frame fusion method comprises the following steps:
1-2) firstly creating a list B, wherein the list B is used for storing 2 model prediction boundary frames based on a supervised training mode, and then performing descending arrangement on the prediction frames according to the confidence coefficient of the model prediction;
2-2) creating a list L and a list F, wherein the list L is a multi-dimensional list, 1 or more prediction frames are stored in each position, the lists are called clusters, all the prediction frames in each cluster are subjected to weighted fusion, and a boundary frame obtained by fusion is stored in the list F, namely the list F is used for storing 2 fusion frames with supervision models;
3-2) traversing the prediction boundary box in the list B, and carrying out 'clustering' in the list F for storing the fusion box, wherein the 'clustering' rule is that IoU value calculation is carried out on the current prediction box and the fusion box, and the IoU values of the two boxes are greater than a specified threshold value, so that 'clustering' is considered to be successful;
4-2) in the step 3-2), if the clustering is not successful, that is, the fusion frame larger than the specified IoU threshold is not found, a new cluster is newly built for the current prediction frame and is added to the tail parts of the list L and the list F, if the clustering is successful, the fusion frame larger than the specified IoU threshold is found, the current prediction frame is added to the list L, the added position is the subscript position of the fusion frame matched by the current prediction frame in the list F, and after the list L is added, the fusion frame at the corresponding position in the list F needs to be updated according to all the prediction frames in the cluster;
5-2) in step 4-2), after the current prediction box is added to the corresponding cluster, updating the confidence coefficient and the coordinates of the fusion box in the list F by using all the bounding boxes in each cluster of L, assuming that there are T bounding boxes, and calculating the confidence coefficient and the coordinates of the fusion box in the list F in the manner shown in formula (4), formula (5) and formula (6):
Figure FDA0003181921850000031
Figure FDA0003181921850000032
Figure FDA0003181921850000033
c represents the confidence coefficient of the model prediction frame, X, Y represents the coordinate of the prediction boundary frame, and the coordinate of the fusion frame is obtained by multiplying the confidence coefficient of the prediction frame and the coordinate value, then accumulating and dividing by the sum of the confidence coefficients;
6-2) after traversing the list B, readjusting the confidence of each fusion box in the list F, wherein the adjustment formula is shown as a formula (7):
Figure FDA0003181921850000034
CN202110849609.6A 2021-07-27 2021-07-27 Wheat head detection method based on computer vision semi-supervised pseudo label learning Active CN113554627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110849609.6A CN113554627B (en) 2021-07-27 2021-07-27 Wheat head detection method based on computer vision semi-supervised pseudo label learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110849609.6A CN113554627B (en) 2021-07-27 2021-07-27 Wheat head detection method based on computer vision semi-supervised pseudo label learning

Publications (2)

Publication Number Publication Date
CN113554627A true CN113554627A (en) 2021-10-26
CN113554627B CN113554627B (en) 2022-04-29

Family

ID=78132902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110849609.6A Active CN113554627B (en) 2021-07-27 2021-07-27 Wheat head detection method based on computer vision semi-supervised pseudo label learning

Country Status (1)

Country Link
CN (1) CN113554627B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082757A (en) * 2022-07-13 2022-09-20 北京百度网讯科技有限公司 Pseudo label generation method, target detection model training method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451616A (en) * 2017-08-01 2017-12-08 西安电子科技大学 Multi-spectral remote sensing image terrain classification method based on the semi-supervised transfer learning of depth
CN107644235A (en) * 2017-10-24 2018-01-30 广西师范大学 Image automatic annotation method based on semi-supervised learning
CN112149733A (en) * 2020-09-23 2020-12-29 北京金山云网络技术有限公司 Model training method, model training device, quality determining method, quality determining device, electronic equipment and storage medium
CN112232416A (en) * 2020-10-16 2021-01-15 浙江大学 Semi-supervised learning method based on pseudo label weighting
CN112488006A (en) * 2020-12-05 2021-03-12 东南大学 Target detection algorithm based on wheat image
CN113128476A (en) * 2021-05-17 2021-07-16 广西师范大学 Low-power consumption real-time helmet detection method based on computer vision target detection
CN113158865A (en) * 2021-04-14 2021-07-23 杭州电子科技大学 Wheat ear detection method based on EfficientDet

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451616A (en) * 2017-08-01 2017-12-08 西安电子科技大学 Multi-spectral remote sensing image terrain classification method based on the semi-supervised transfer learning of depth
CN107644235A (en) * 2017-10-24 2018-01-30 广西师范大学 Image automatic annotation method based on semi-supervised learning
CN112149733A (en) * 2020-09-23 2020-12-29 北京金山云网络技术有限公司 Model training method, model training device, quality determining method, quality determining device, electronic equipment and storage medium
CN112232416A (en) * 2020-10-16 2021-01-15 浙江大学 Semi-supervised learning method based on pseudo label weighting
CN112488006A (en) * 2020-12-05 2021-03-12 东南大学 Target detection algorithm based on wheat image
CN113158865A (en) * 2021-04-14 2021-07-23 杭州电子科技大学 Wheat ear detection method based on EfficientDet
CN113128476A (en) * 2021-05-17 2021-07-16 广西师范大学 Low-power consumption real-time helmet detection method based on computer vision target detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
单纯等: "半监督单样本深度行人重识别方法", 《计算机***应用》 *
马蕾等: "基于支持向量机协同训练的半监督回归", 《计算机工程与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082757A (en) * 2022-07-13 2022-09-20 北京百度网讯科技有限公司 Pseudo label generation method, target detection model training method and device

Also Published As

Publication number Publication date
CN113554627B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
Li et al. A high-precision detection method of hydroponic lettuce seedlings status based on improved Faster RCNN
Fu et al. Fast and accurate detection of kiwifruit in orchard using improved YOLOv3-tiny model
Xiong et al. Visual detection of green mangoes by an unmanned aerial vehicle in orchards based on a deep learning method
Chen et al. Detecting citrus in orchard environment by using improved YOLOv4
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
Ruiz-Ruiz et al. Testing different color spaces based on hue for the environmentally adaptive segmentation algorithm (EASA)
Jabir et al. Accuracy and efficiency comparison of object detection open-source models.
CN111340141A (en) Crop seedling and weed detection method and system based on deep learning
CN112364931B (en) Few-sample target detection method and network system based on meta-feature and weight adjustment
CN110222215B (en) Crop pest detection method based on F-SSD-IV3
Rong et al. A peduncle detection method of tomato for autonomous harvesting
CN109325484A (en) Flowers image classification method based on background priori conspicuousness
CN115393687A (en) RGB image semi-supervised target detection method based on double pseudo-label optimization learning
Lv et al. A visual identification method for the apple growth forms in the orchard
CN111723764A (en) Improved fast RCNN hydroponic vegetable seedling state detection method
CN113554627B (en) Wheat head detection method based on computer vision semi-supervised pseudo label learning
Gao et al. Recognition and Detection of Greenhouse Tomatoes in Complex Environment.
CN114549970B (en) Night small target fruit detection method and system integrating global fine granularity information
Ubbens et al. Autocount: Unsupervised segmentation and counting of organs in field images
Li et al. MTA-YOLACT: Multitask-aware network on fruit bunch identification for cherry tomato robotic harvesting
Li et al. A novel approach for the 3D localization of branch picking points based on deep learning applied to longan harvesting UAVs
Huang et al. A survey of deep learning-based object detection methods in crop counting
Zhang et al. Multi-class detection of cherry tomatoes using improved Yolov4-tiny model
Yu et al. A-pruning: a lightweight pineapple flower counting network based on filter pruning
Hu et al. Automatic detection of pecan fruits based on Faster RCNN with FPN in orchard

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant