CN117058669A - Deep learning-based litchi fruit identification method - Google Patents

Deep learning-based litchi fruit identification method Download PDF

Info

Publication number
CN117058669A
CN117058669A CN202311061583.4A CN202311061583A CN117058669A CN 117058669 A CN117058669 A CN 117058669A CN 202311061583 A CN202311061583 A CN 202311061583A CN 117058669 A CN117058669 A CN 117058669A
Authority
CN
China
Prior art keywords
litchi
image
deep learning
identification
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311061583.4A
Other languages
Chinese (zh)
Inventor
彭红星
张淇淇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Agricultural University
Original Assignee
South China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Agricultural University filed Critical South China Agricultural University
Priority to CN202311061583.4A priority Critical patent/CN117058669A/en
Publication of CN117058669A publication Critical patent/CN117058669A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/68Food, e.g. fruit or vegetables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a litchi fruit identification method based on deep learning, which comprises the following steps of S1, acquiring litchi images as sample images; s2, preprocessing the obtained litchi image; s3, marking the preprocessed image, constructing a litchi data set and dividing the data set; s4, constructing a litchi identification model based on deep learning; s5, inputting the training set into the litchi identification model constructed in the step S4, and training the litchi identification model; s6, inputting the image to be identified into the litchi identification model trained in the step S5, and obtaining the litchi identification result of the image to be identified. The invention can automatically identify litchis with different maturity under natural environment, solves the problem of poor identification effect of the traditional litchi identification method, and the litchi identification model used in the scheme introduces a FasterNet module and an EMA attention mechanism based on deep learning YOLOv8, so that important features can be effectively extracted, and network identification performance is improved.

Description

Deep learning-based litchi fruit identification method
Technical Field
The invention relates to the technical field of computer vision, in particular to a litchi fruit identification method based on deep learning.
Background
China is a large country for litchi planting, the planting area is about 810 mu, and the yield of litchi in 2022 is 222.97 mu ton. With the development of scientific technology, more and more industries change from manual production to mechanized production, and agriculture is no exception. The artificial intelligence technology is introduced into litchi picking work, so that automatic picking of litchi can be realized, thereby helping farmers to better manage litchi orchards, improving the quality and yield of litchi, reducing the loss and waste of litchi, increasing the income of farmers, and protecting the growing environment and biodiversity of litchi. The litchi picking robot is mainly responsible for identifying litchi to be picked in a field environment, which is a key point and a difficulty in developing the litchi picking robot.
There are many problems to be solved in the existing litchi identification technology: such as litchi datasets in natural environments that still lack large scale; meanwhile, the existing litchi identification technology is generally limited to specific scenes, and has weak generalization capability, because in a real litchi growth environment, the situations of fruit overlapping, branch and leaf shielding, illumination change, and unstable litchi shape and color caused by inconsistent litchi maturity and the like often occur, and the identification precision of litchi can be influenced.
Disclosure of Invention
The invention aims to provide a litchi fruit identification method based on deep learning, which aims to solve the problems that the litchi fruit identification method is limited to a specific scene and has weak generalization capability, and because the situations of fruit overlapping, branch and leaf shielding, illumination change, unstable litchi maturity and the like often occur in a real litchi growth environment, the litchi fruit identification precision is influenced.
In order to achieve the above purpose, the present invention provides the following technical solutions: the method comprises the following steps:
s1, acquiring a litchi image and taking the litchi image as a sample image;
s2, preprocessing the litchi image obtained in the step S1;
s3, marking the image preprocessed in the step S2, constructing a litchi data set, and dividing the data set into a training set, a verification set and a test set;
s4, constructing a litchi identification model based on deep learning according to the data set in the step S3;
s5, inputting the sample image of the training set in the step S3 into the litchi identification model constructed in the step S4, and training the litchi identification model;
s6, inputting the image to be identified into the litchi identification model trained in the step S5, and obtaining the litchi identification result of the image to be identified.
Preferably, the litchi recognition model in the step S4 is obtained by training a network model including the data set in the step S3 and introducing a FaterNet module and an attention mechanism EMA by using Yolov8 as a basic network.
Preferably, the litchi fruits included in the sample image of the litchi dataset in step S3 include immature litchi fruits, semi-mature litchi fruits and completely mature litchi fruits.
Preferably, the preprocessing in step S2 is as follows:
and (3) expanding the litchi data set by performing data enhancement of the geometric transformation class and the color transformation class on the sample image in the step (S1).
Preferably, the data enhancement of the geometric transformation class includes, but is not limited to, horizontal flipping, and the data enhancement of the color transformation class includes, but is not limited to, random luminance transformation, gaussian blurring, and gaussian noise addition.
Preferably, the image labeling process in the step S3 is as follows:
and (3) manually marking the image preprocessed in the step (S2), wherein the data set is divided into a training set, a verification set and a test set, and the ratio of the data set to the verification set to the test set is 7:2:1 in sequence.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the prior art, the invention firstly builds the litchi data set under the natural environment, aims at solving the problem of small image data samples, then introduces the FasterNet module and the attention mechanism to improve the YOLOv8 network model, enhances the feature extraction capacity of the feature acquisition module, enhances the fusion effect of the feature fusion module, and finally enhances the detection precision, the generalization capacity and the robustness of the model; finally, the trained litchi recognition model is used for recognizing litchi, so that the litchi in different maturity stages can be accurately recognized. The litchi identification method provided by the invention has higher accuracy in natural environment, and can provide an effective litchi identification method for the litchi picking robot.
Drawings
FIG. 1 is a flow chart of the overall structure of a litchi fruit recognition method based on deep learning;
FIG. 2 is a litchi diagram in litchi data set based on a deep learning litchi fruit identification method of the present invention;
FIG. 3 is a FasterNet structure diagram in a litchi fruit identification method based on deep learning;
FIG. 4 is a diagram showing the structure of an EMA module in a deep learning-based litchi fruit recognition method of the present invention;
fig. 5 is a diagram of recognition results of a test set in the litchi fruit recognition method based on deep learning.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-5, the present invention provides a technical solution: the method comprises the following steps:
s1, acquiring a litchi image and taking the litchi image as a sample image;
s2, preprocessing the litchi image obtained in the step S1, wherein the preprocessing process is as follows:
and (3) expanding the litchi data set by carrying out data enhancement of geometric transformation class and color transformation class on the sample image in the step (S1).
Further, data enhancement of the geometric transformation class includes, but is not limited to, horizontal flipping, data enhancement of the color transformation class includes, but is not limited to, random luminance transformation, gaussian blurring, and adding gaussian noise.
S3, marking the image preprocessed in the step S2 by using Labelimg marking software, simultaneously constructing a litchi data set, dividing the data set into a training set, a verification set and a test set, wherein litchi covered by a sample image of the litchi data set comprises immature litchi fruits, semi-mature litchi fruits and completely mature litchi fruits, and the image marking process is as follows:
and (3) manually marking the image preprocessed in the step (S2), wherein the data set is divided into a training set, a verification set and a test set, and the ratio of the data set to the verification set to the test set is 7:2:1 in sequence.
S4, constructing a litchi identification model based on deep learning according to the data set in the step S3, wherein the litchi identification model is constructed by the data set comprising the step S3 and is based on the litchi identification model of YOLOv8, improving the feature extraction part, simultaneously replacing the backbone network with FaterNet, and adding an EMA attention mechanism.
S5, inputting the sample image of the training set in the step S3 into the litchi identification model constructed in the step S4, and training the litchi identification model;
s6, inputting the image to be identified into the litchi identification model trained in the step S5, and obtaining the litchi identification result of the image to be identified.
Further, the YOLOv8 network model is the most recently introduced neural network model of the YOLO (You Only Look Once) series of object detection models, which uses a single neural network to predict bounding boxes and categories of objects in an image. The YOLOv8 network model consists of several key parts, including a Backbone network part (Backbone), a Neck network part (neg), and a Head of detection part (Head). The backbone network portion is used to extract a feature map from the input image, and the neck network portion and the detection head portion are used to predict bounding boxes and categories of objects in the feature map.
The main network part of YOLOv8 has 10 layers, wherein layers 1, 2, 4, 6 and 8 are CBS modules, layers 3, 5, 7 and 9 are C2f modules, and layer 10 is an SPPF module. The CBS module firstly carries out convolution operation on input data, extracts characteristics of the input data, then uses a BN layer for normalization, improves stability and generalization capability of a network, and finally uses a SiLU activation function to carry out nonlinear transformation on output of the convolution layer, thereby enhancing expression capability of the network. The C2f module carries out convolution operation on input data to extract characteristics, then the Split module is used for dividing the characteristics into two parts, one part is input into the Bottleneck module to acquire more gradient flow information, the other part carries out Concat operation on the other part and the outputs of the Bottleneck modules, and finally the convolution operation is carried out again. The C2f module refers to the design of the C3 module and the ELAN concept, so that the YOLOv8 can obtain more abundant gradient flow information while ensuring the light weight. The SPPF module carries out convolution operation on the input feature map, then carries out pooling operation of different scales on the feature extracted by convolution, extracts feature information of different scales, fuses the feature information of different scales by Concat operation, and finally carries out convolution operation on the feature information.
The neck network part of YOLOv8 has 12 layers, the 11 th layer and the 14 th layer of the network are up-sampling modules, the 12 th layer, the 15 th layer, the 18 th layer and the 21 th layer are Concat modules, the 13 th layer, the 16 th layer, the 19 th layer and the 22 th layer are C2f modules, and the 17 th layer and the 20 th layer are CBS modules. The neck network part firstly carries out 11 th layer up-sampling operation on the output characteristic diagram of the main network part, and the length and the width of the characteristic diagram are doubled. And the 12 th layer performs splicing operation on the characteristic diagram output by the 11 th layer and the output by the 6 th layer, so that the channel number of the characteristic diagram is increased. Layer 13 convolves the feature map with a C2f module to extract features therein. The up-sampling module of the 14 th layer changes the length and width of the characteristic diagram output by the 13 th layer into twice of the original length and width. The Concat module of the 15 th layer performs splicing operation on the feature map output by the 14 th layer and the output by the 5 th layer, and fuses the feature information of different scales. Layer 16 convolves the feature map with a C2f module to extract features therein. The CBS module of the 17 th layer carries out convolution operation on the characteristic diagram, and the length and the width of the characteristic diagram are changed into one half of the original length and the width. The Concat module of the 18 th layer performs splicing operation on the feature map output by the 17 th layer and the output of the 13 th layer, and fuses the feature information of different scales. Layer 19 convolves the feature map with a C2f module to extract features therein. The 20 th layer CBS module carries out convolution operation on the characteristic diagram, and the length and the width of the characteristic diagram are changed into one half of the original length and the width of the characteristic diagram. The Concat module of the 21 st layer performs splicing operation on the feature map output by the 20 th layer and the output of the 10 th layer, and fuses the feature information of different scales. And the 22 nd layer uses a C2f module to carry out convolution operation on the feature map so as to extract the features fused with the information of different scales.
The YOLOv8 sense header section places the outputs of layers 16, 19, 22 as inputs into the same decoupling header structure. The decoupling head structure consists of two branches, each consisting of two CBS modules and a convolution module, one branch predicting the bounding box of the target and one branch predicting the class.
The classification Loss of YOLOv8 used BCE Loss and the regression Loss used CIOUs Loss and DFL (Distribution Focal Loss).
The formula of BCE Loss is: loss= -w [ p log (q) + (1-p) log (1-q) ]
Wherein p and q are theoretical labels and actual predicted values respectively, and w is a weight. The log here corresponds to ln mathematically.
The formula of CIOU Loss is:
CIOU loss=1-CIOU
wherein do is the Euclidean distance between the center points of the target frame and the prediction frame, dc is the diagonal distance of the target frame, and w gt And h gt
Is the width and height, w, of the real target frame p And h p Is the width and height of the prediction box.
The formula of the DFL is:
wherein y is a theoretical label, y i And y i+1 Is a value near y, p i And p i+1 Probability is distributed for the predicted bounding box.
In the technical scheme, a method for rapidly identifying litchi by using FasterNet and EMA attention mechanisms in Yolov8 is provided. According to the invention, the feature extraction part of the YOLOv8 is optimized, and the FasterNet network is used for replacing the main network of the YOLOv8, so that the detection speed and the detection precision of the YOLOv8 are improved, and the accuracy of a model is improved. The EMA attention mechanism is used for enhancing the YOLOv8 network so as to improve the distinguishing capability and the multi-scale processing capability of the characteristics of the model and improve the small target detection effect.
FasterNet is a lightweight deep convolutional network based on a partial convolutional module PConv that can efficiently extract spatial features with reduced redundant computation and memory access. The FasterNet is simplified through reasonable model design and matching framework, and the running speed can be greatly improved while the detection effect is maintained.
FIG. 3 is a block diagram of FasterNet. The network structure of FaterNet is 8 layers, the 1 st layer is a PatchEbed layer, the 2 nd, 4 th, 6 th and 8 th layers are FaterNet Block modules, and the 3 rd, 5 th and 7 th layers are PatchMerging layers. The FaterNet firstly extracts the patch characteristic through the PatchEbed module, then enters into the multistage FaterNet Block module, the FaterNet totally defines 4 stages, and each Stage comprises a plurality of FaterNet Block modules. The first Stage contains 1 FasterNet Block module, and the feature map is halved through the first PatchMerging module. The second Stage contains 2 FasterNet Block modules, and the feature map is halved by the second PatchMerging module. The third Stage contains 8 FaterNet Block modules, and the feature map is halved by the third PatchMerging module. Finally, the fourth Stage contains 2 FaterNetBlock modules.
The FasterNet Block module carries out convolution operation of the PConv module on input data, then connects with a 1×1 convolution layer, normalizes by using a BN layer, carries out nonlinear change by using a ReLU activation function, superimposes a 1×1 convolution layer, carries out residual connection on the characteristic and the input of the FasterNet Block module, and takes the result as the final output of the module.
EMA (effective Multi-Scale Attention) is an Efficient Multi-Scale Attention mechanism that achieves a flexible channel relationship learning by modeling the dependency between channels. The core idea of EMA is: dividing the input feature diagram X into G groups, and carrying out self-adaptive average pooling on each group in the height and width directions to obtain the context information in the height and width directions. They are then Concat in the channel dimension, passed through interactions between 1x1 convolution learning channels, and then activated by Sigmoid as an attention weight.
Fig. 4 is a structural diagram of EMA. EMA will divide the input features into G sub-features across the channel dimension direction for learning different semantics. EMA is composed of three parallel branches to extract the attention weight descriptors of the packet feature map. The first and second branches are 1x1 branches, and the third branch is 3x3 branches. The first two branches are respectively subjected to average pooling operation along the X, Y direction, are spliced together and are convolved by using 1x1, the output of the 1x1 convolution is decomposed into two tensors, the two tensors are multiplied by a grouping feature map after being subjected to Sigmoid operation, and then the channels are normalized by being subjected to groupnum operation. Only one 3x3 convolution kernel is stacked in the 3x3 branch to capture the multi-scale features. The EMA uses two-dimensional global averaging pooling to encode global spatial information for the output of the 1x1 branch, and a nonlinear function Softmax is used at the output of the two-dimensional global averaging pooling to fit the linear transformation. The output of the above processing is multiplied by the original matrix to obtain a first spatial attention map. The global space information is coded at the 3x3 branch by two-dimensional global average pooling, linear transformation is fitted by adopting Softmax, and a second space attention diagram is obtained through matrix multiplication operation. Finally, after the attention weight is subjected to Sigmoid activation operation, multiplying the attention weight by the grouping feature map to realize an attention mechanism.
EMA uses a cross-spatial information aggregation method in different spatial dimension directions to achieve richer feature aggregation. Through the attention weights in the two directions of the height and the width, the dependence relationship among the channels can be modeled, so that the interaction among different channels can be learned by the network. The EMA has the advantages that: 1) The calculation is efficient, and the excessive calculation amount is not increased; 2) Non-local channel dependencies can be modeled; 3) Square complexity in self-attention is avoided.
In the implementation process of the embodiment: the present example adopts the Precision rate P (Precision), recall rate R (Recall), and mAP50 (mean Average Precision) as the evaluation index of the experiment. P and R represent the accuracy of the detection algorithm in the positive and correct samples, respectively. mAP50 is the average AP value of all class detection results when the predicted box and the real box are intersected, and the threshold of IoU (Intersection Over Union) is set to 0.5. The higher the mAP50, the higher the accuracy of the representation model, and the value range is 0-1 (the smaller the value, the worse the segmentation effect). The calculation formula of the evaluation index is as follows:
wherein TP is the number of true positive samples, FP is the number of false positive samples, FN is the number of false negative samples, C is the number of categories, N is the number of reference thresholds, k is a threshold, P (k) is the accuracy rate, and R (k) is the recall rate.
This example was done in the environment of AMD Ryzen 7 5800h for CPU, NVIDIA GeForce RTX 3060Laptop GPU, cuda 11.7.1, python 3.8.16, torch 1.13.1, ubuntu 18.04.6LTS for GPU.
The accuracy rates of the mature litchi fruits, the semi-mature litchi fruits and the immature litchi fruits are respectively 80.3%, 68.9% and 81.5%, the recall rates are respectively 83.6%, 67.4% and 71.6%, and mAP50 values are respectively 87.6%, 72.4% and 72.5% after the test on 197 litchi images, so that the effectiveness of the method provided by the embodiment is verified.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. The litchi fruit identification method based on deep learning is characterized by comprising the following steps of: the method comprises the following steps:
s1, acquiring a litchi image and taking the litchi image as a sample image;
s2, preprocessing the litchi image obtained in the step S1;
s3, marking the image preprocessed in the step S2, constructing a litchi data set, and dividing the data set into a training set, a verification set and a test set;
s4, constructing a litchi identification model based on deep learning according to the data set in the step S3;
s5, inputting the sample image of the training set in the step S3 into the litchi identification model constructed in the step S4, and training the litchi identification model;
s6, inputting the image to be identified into the litchi identification model trained in the step S5, and obtaining the litchi identification result of the image to be identified.
2. The litchi fruit identification method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: the litchi recognition model in the step S4 is obtained by training a network model which comprises the data set in the step S3 and introduces a FasterNet module and an attention mechanism EMA by taking Yolov8 as a basic network.
3. The litchi fruit identification method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: the litchi fruits included in the sample image of the litchi dataset in step S3 include immature litchi fruits, immature litchi fruits and completely mature litchi fruits.
4. The litchi fruit identification method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: the preprocessing process in the step S2 is as follows:
and (3) expanding the litchi data set by performing data enhancement of the geometric transformation class and the color transformation class on the sample image in the step (S1).
5. The litchi fruit identification method based on deep learning as claimed in claim 4, wherein the method comprises the following steps: data enhancements of the geometric transformation class include, but are not limited to, horizontal flipping, and data enhancements of the color transformation class include, but are not limited to, random luminance transformation, gaussian blurring, and adding gaussian noise.
6. The litchi fruit identification method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: the image labeling process in the step S3 is as follows:
and (3) manually marking the image preprocessed in the step (S2), wherein the data set is divided into a training set, a verification set and a test set, and the ratio of the data set to the verification set to the test set is 7:2:1 in sequence.
CN202311061583.4A 2023-08-23 2023-08-23 Deep learning-based litchi fruit identification method Pending CN117058669A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311061583.4A CN117058669A (en) 2023-08-23 2023-08-23 Deep learning-based litchi fruit identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311061583.4A CN117058669A (en) 2023-08-23 2023-08-23 Deep learning-based litchi fruit identification method

Publications (1)

Publication Number Publication Date
CN117058669A true CN117058669A (en) 2023-11-14

Family

ID=88664148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311061583.4A Pending CN117058669A (en) 2023-08-23 2023-08-23 Deep learning-based litchi fruit identification method

Country Status (1)

Country Link
CN (1) CN117058669A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117630012A (en) * 2023-11-29 2024-03-01 广东石油化工学院 High-efficiency lightweight litchi fruit anthracnose detection method for complex agricultural scene
CN118071751A (en) * 2024-04-22 2024-05-24 成都中科卓尔智能科技集团有限公司 YOLOv 8-based defect detection method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117630012A (en) * 2023-11-29 2024-03-01 广东石油化工学院 High-efficiency lightweight litchi fruit anthracnose detection method for complex agricultural scene
CN117630012B (en) * 2023-11-29 2024-05-17 广东石油化工学院 High-efficiency lightweight litchi fruit anthracnose detection method for complex agricultural scene
CN118071751A (en) * 2024-04-22 2024-05-24 成都中科卓尔智能科技集团有限公司 YOLOv 8-based defect detection method

Similar Documents

Publication Publication Date Title
CN108830188A (en) Vehicle checking method based on deep learning
CN111611924B (en) Mushroom identification method based on deep migration learning model
CN114332621B (en) Disease and pest identification method and system based on multi-model feature fusion
CN114821014B (en) Multi-mode and countermeasure learning-based multi-task target detection and identification method and device
Shen et al. Image recognition method based on an improved convolutional neural network to detect impurities in wheat
CN111582337A (en) Strawberry malformation state detection method based on small sample fine-grained image analysis
CN117058669A (en) Deep learning-based litchi fruit identification method
CN112329771B (en) Deep learning-based building material sample identification method
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
Yang et al. Instance segmentation and classification method for plant leaf images based on ISC-MRCNN and APS-DCCNN
CN114140665A (en) Dense small target detection method based on improved YOLOv5
CN114494910B (en) Multi-category identification and classification method for facility agricultural land based on remote sensing image
Sun et al. YOLO-P: An efficient method for pear fast detection in complex orchard picking environment
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
CN111882000A (en) Network structure and method applied to small sample fine-grained learning
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
Wang et al. Apple rapid recognition and processing method based on an improved version of YOLOv5
CN114067171A (en) Image recognition precision improving method and system for overcoming small data training set
CN117372853A (en) Underwater target detection algorithm based on image enhancement and attention mechanism
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception
CN116977859A (en) Weak supervision target detection method based on multi-scale image cutting and instance difficulty
Liu Interfruit: deep learning network for classifying fruit images
Zhi-Feng et al. Light-YOLOv3: fast method for detecting green mangoes in complex scenes using picking robots
CN114140524A (en) Closed loop detection system and method for multi-scale feature fusion
CN112487909A (en) Fruit variety identification method based on parallel convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination