CN115272791B - YoloV 5-based multi-target detection and positioning method for tea leaves - Google Patents

YoloV 5-based multi-target detection and positioning method for tea leaves Download PDF

Info

Publication number
CN115272791B
CN115272791B CN202210866833.0A CN202210866833A CN115272791B CN 115272791 B CN115272791 B CN 115272791B CN 202210866833 A CN202210866833 A CN 202210866833A CN 115272791 B CN115272791 B CN 115272791B
Authority
CN
China
Prior art keywords
tea
module
dimensional point
buds
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210866833.0A
Other languages
Chinese (zh)
Other versions
CN115272791A (en
Inventor
朱立学
张智浩
林桂潮
张世昂
陈品岚
官金炫
陈明杰
林深凯
吴天骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongkai University of Agriculture and Engineering
Original Assignee
Zhongkai University of Agriculture and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongkai University of Agriculture and Engineering filed Critical Zhongkai University of Agriculture and Engineering
Priority to CN202210866833.0A priority Critical patent/CN115272791B/en
Publication of CN115272791A publication Critical patent/CN115272791A/en
Application granted granted Critical
Publication of CN115272791B publication Critical patent/CN115272791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a YoloV 5-based multi-target detection and positioning method for tea, which specifically comprises the steps of S01, constructing a tea bud image data set; s02, improving a YoloV5 detection network; s03, obtaining a three-dimensional point cloud of the tea tender shoots, fitting a minimum external cuboid of the tea tender shoots, and obtaining tea tender shoot picking points. The method can effectively carry out multi-target detection and positioning on the tea buds, so that the positions of the tea buds can be accurately and effectively identified, and the intelligent picking of the tea buds is realized by matching with a picking tool, so that the picking efficiency is improved, the picking time is saved, and the labor cost is reduced.

Description

YoloV 5-based multi-target detection and positioning method for tea leaves
Technical Field
The invention relates to the technical field of tea positioning, in particular to a yov 5-based multi-target detection positioning method for tea.
Background
Tea processing is also called tea making, which is a process of picking fresh leaves on tea trees and then carrying out various processing procedures to prepare various semi-finished tea products or finished tea products; among them, picking of tea shoots (or fresh leaves) is one of the important links of tea processing and production. At present, picking of tea buds (or fresh leaves) mainly takes manpower as a main part, but the manual picking mode has low efficiency, increases the labor intensity of staff and wastes a great deal of manpower cost; meanwhile, in the prior art, a mode of picking tea leaves by using a tea picking machine exists, but most of the current working modes of the tea picking machine are a reciprocating cutting mode of 'one-cut', and the mode is not selective, and because tea trees are influenced by various environmental factors (such as illumination, gravity, temperature, humidity and the like), the growth heights of tea leaves buds are inconsistent, and the reciprocating cutting mode of 'one-cut' is extremely easy to cause missed picking and wrong picking, and even cause damage to the tea leaves buds. Therefore, how to identify and judge the positions of the tea leaves, thereby realizing accurate and mechanized tea leaf picking, avoiding the problems of missing picking, wrong picking and tea leaf breakage, and being one of the challenges faced by the current intelligent tea leaf picking.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide the multi-target detection and positioning method for the tea based on the YoloV5, which can effectively carry out multi-target detection and positioning on the tea buds, so that the positions of the tea buds can be accurately and effectively identified, and the intelligent picking of the tea buds can be realized by matching with a picking tool, so that the picking efficiency is improved, the picking time is saved, and the labor cost is reduced.
The aim of the invention is achieved by the following technical scheme:
a YoloV 5-based multi-target detection and positioning method for tea leaves is characterized by comprising the following steps of: the method specifically comprises the following steps:
s01, constructing a tea bud image data set;
s02, constructing a feature map with rich semantic information through a bidirectional feature pyramid network and a channel attention mechanism based on the data set in the step S01, improving a YoloV5 detection network, obtaining a YoloV5 target detection network model, and detecting small-size tea buds;
s03, obtaining a three-dimensional point cloud of the tea based on a training result of the YoloV5 target detection network model in the step S02; screening out three-dimensional point clouds of tea buds from the three-dimensional point clouds of tea leaves; and finally, fitting a cuboid with the smallest external connection of the tea buds to obtain the accurate positions and picking points of the tea buds.
Further optimizing, the step S01 specifically includes: firstly, collecting image data of tea buds by using an RGB-D camera to obtain color images and depth images of the tea buds; then, marking the color image by using a marking tool, performing data set enhancement operation, and expanding the number of the data sets to construct a tea sprout image data set; and finally, dividing the data set into a training set, a testing set and a verification set.
Preferably, the labeling tool is a Labelimg labeling tool.
Further optimizing, the YoloV5 detection network comprises a Backbone module, a Neck module and a Head module; the backbox module comprises a Focus module, an SPP module and a CBS module which are used for slicing pictures, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a detection layer for detecting targets on feature maps with different scales by using a grid-based anchor.
Preferably, the yolv 5 detection network adopts a network model with the smallest model file size and the smallest depth and width of the feature map.
Further optimizing, the step S02 specifically includes:
s21, firstly, preprocessing the images in the training set in the step S01, and unifying the resolutions of all the images in the training set; inputting the preprocessed image into a Backbone module to obtain feature images with different sizes;
s22, inputting the feature graphs with different sizes in the step S21 into a Neck module, and adopting a Bi-directional feature pyramid network (Bi-Directional Feature Pyramid Network, biFPN) to replace an original path aggregation network (Path Aggregation Network, PANet) in the Neck module to perform multi-feature fusion; the feature images are up-sampled and down-sampled in sequence, and feature images with various sizes are generated through the splicing of a channel attention mechanism (Efficient Channel Attention, ECA) and are input into a Detect layer of a Head module;
s23, back propagation is carried out by combining with a plurality of loss functions, and the gradient in the model is updated and weight parameters are adjusted;
and S24, finally, verifying the existing model by adopting the verification set in the step S01 to obtain the YoloV5 target detection network model.
Further preferably, the step S03 specifically includes:
s31, firstly, according to the result of the YoloV5 target detection network model in the step S02, acquiring detection frame coordinates, and generating a color image and a region of interest (Region of Interest, ROI) corresponding to the depth image;
s32, obtaining corresponding mapped color image coordinates according to the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the color image and through the coordinate values, the pixel values and the recording distances of the depth image;
s33, obtaining a three-dimensional point cloud of the tea through coordinate fusion of the color image and the depth image, wherein the three-dimensional point cloud comprises the following specific steps of:
Figure BDA0003759558010000031
in the method, in the process of the invention,
Figure BDA0003759558010000032
a coordinate system representing a three-dimensional point cloud; />
Figure BDA0003759558010000033
A coordinate system representing a color image; d represents a depth value, obtained by a depth image; f (f) x 、f y Representing the camera focal length;
s34, because the generated three-dimensional point cloud of the tea leaves comprises tea leaves buds and background point clouds thereof, an average value of the three-dimensional point cloud of the tea leaves is obtained through calculation, and the average value is used as a distance threshold value; then, filtering the background point cloud which is larger than the distance threshold value to obtain a primarily segmented three-dimensional point cloud; then adopting DBSCAN clustering algorithm, and setting parameter radius Eps and minimum sample number M required to be contained in the neighborhood p The primarily segmented three-dimensional point clouds are gathered into one type, and three-dimensional point clouds of tea buds are screened out;
s35, adopting a principal component analysis (Principal Component Analysis, PCA) to fit the minimum external cuboid of the tea buds at the position according to the growth posture of the tea buds; then calculating to obtain the coordinates of each vertex of the cuboid; and obtaining the coordinates of the center point of the bottom surface of the cuboid by solving the average value of four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of tea buds.
Further preferably, the step S35 specifically includes:
firstly, screening three main directions of a three-dimensional point cloud of tea buds, namely x, y and z directions by adopting a principal component analysis method, and calculating mass centers and covariance to obtain a covariance matrix; the method comprises the following steps:
Figure BDA0003759558010000041
Figure BDA0003759558010000042
wherein P is c Representing centroid coordinates of the three-dimensional point cloud; n represents the number of three-dimensional point clouds (i.e., the number of points); (x) i ,y i ,z i ) Representing three-dimensional coordinates of the i-th point;
Figure BDA0003759558010000043
wherein C is p Representing a covariance matrix of the three-dimensional point cloud;
then, singular value decomposition is carried out on the covariance matrix to obtain a characteristic value and a characteristic vector, wherein the specific formula is as follows:
Figure BDA0003759558010000044
in U p Representing covariance matrix C p C p T Is a feature vector matrix of (a); d (D) p Indicating that a non-0 value on a diagonal is C p C p T A diagonal matrix of square roots of non-0 eigenvalues;
Figure BDA0003759558010000045
represents a C p T C p Is a feature vector matrix of (a);
the direction of the feature vector corresponding to the maximum feature value is the main axis direction of the cuboid;
then, the coordinate points are projected onto the direction vector, and the position coordinates P of each vertex are calculated i Obtaining the maximum value and the minimum value of the inner product of the coordinate point unit vector in each direction, enabling a, b and c to be the average value of the maximum value and the minimum value on x, y and z respectively, obtaining the center point O and the length L of the cuboid, and generating the cuboid with the most proper and compact tea bud;
the specific formula is as follows:
Figure BDA0003759558010000051
O=ax+by+cz;
Figure BDA0003759558010000052
wherein X is a unit vector of the coordinate point in the X direction; y is a unit vector of the coordinate point in the Y direction; z is a unit vector of the coordinate point in the Z direction; l (L) x 、L y 、L z The lengths of the cuboids in the x direction, the y direction and the z direction are respectively;
then, the coordinates of the smallest four points in the y direction of the cuboid are judged to be used as four vertex coordinates of the bottom surface of the cuboid; and finally, obtaining the coordinate of the center point of the bottom surface of the cuboid, namely the picking point, through the average value of the four vertex coordinates.
The invention has the following technical effects:
according to the method, the characteristic diagram with rich semantic information is constructed by adopting the bidirectional characteristic pyramid network and the channel attention mechanism, and the improved YoloV5 target detection network model is constructed, so that more characteristics are fused on the premise of not increasing extra cost, semantic expression and positioning capacity on multiple scales are enhanced, the probability of judging objects and the detection precision of the model are improved, the method is effectively applicable to the identification of tea buds with smaller targets and complex environments, and the problems of misjudgment, unclear identification and even incapability of identification caused by small proportion of the tea buds in the whole image are avoided; the minimum external cuboid fitted with the tea buds and the bottom surface center point of the minimum external cuboid are used as picking points of the tea buds, so that accurate positioning of the tea buds is realized, meanwhile, the automatic picking tool is matched for picking the tea buds, the problems that the tea buds are easy to damage, mispicking and missing picking are easy to occur in mechanical picking are effectively avoided, and the picking efficiency of the tea is effectively improved.
Drawings
Fig. 1 is a schematic diagram of an embodiment of the present invention after labeling a picture with a labeling tool.
Fig. 2 is a multi-scale feature fusion structure diagram based on a bi-directional feature pyramid network structure in an embodiment of the present invention.
Fig. 3 is a flowchart of a multi-target detection positioning method according to an embodiment of the present application.
Detailed Description
The present invention will be described in further detail below by way of examples, but it should not be construed that the scope of the subject matter of the present invention is limited to the following examples, and any modifications and/or alterations made to the present invention based on the above-described examples will fall within the scope of the present invention.
Examples:
a YoloV 5-based multi-target detection and positioning method for tea leaves is characterized by comprising the following steps of: the method specifically comprises the following steps:
s01, constructing a tea bud image data set; the method comprises the following steps:
firstly, collecting image data of tea buds by using an RGB-D camera to obtain color images and depth images of the tea buds; then, a labeling tool, such as a Labelimg labeling tool, is used for labeling the color image (as shown in fig. 1), a dataset enhancement operation (the dataset enhancement operation can be realized by adopting the conventional technical means, for example, means such as space conversion, color conversion and the like, which can be clearly known and understood by a person skilled in the art), and the number of datasets is expanded, so that a tea sprout image dataset is constructed; and finally, dividing the data set into a training set, a testing set and a verification set.
S02, constructing a feature map with rich semantic information through a bidirectional feature pyramid network and a channel attention mechanism based on the data set in the step S01, improving a YoloV5 detection network, obtaining a YoloV5 target detection network model, and detecting small-size tea buds;
the YoloV5 detection network adopts a network model with the minimum size of a model file and the minimum depth and width of a feature map, and comprises a Backbone module, a Neck module and a Head module; the backstone module comprises a Focus module, an SPP module and a CBS module which are used for slicing the picture, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a Detect layer for detecting targets on feature graphs with different scales by using a grid-based anchor;
the method specifically comprises the following steps:
s21, firstly, preprocessing the images in the training set in the step S01, and unifying the resolutions of all the images in the training set; inputting the preprocessed image into a Backbone module to obtain feature images with different sizes;
s22, inputting the feature graphs with different sizes in the step S21 into a Neck module, and adopting a Bi-directional feature pyramid network (Bi-Directional Feature Pyramid Network, biFPN) to replace an original path aggregation network (Path Aggregation Network, PANet) in the Neck module to perform multi-feature fusion; the feature images are up-sampled and down-sampled in sequence, and feature images with various sizes are generated through the splicing of a channel attention mechanism (Efficient Channel Attention, ECA) and are input into a Detect layer of a Head module;
in the yolv5 detection network (i.e., in the original yolv5 network structure), the reinforcing feature is used to extract BiFPN, upsample p5_in, and then, after upsampling, bifpn_concat stacking with p4_in to obtain p4_td; then upsampling is carried out on the P4_td, and BiFPN_Concat stacking is carried out on the upsampled P3_in to obtain P3_out; then, carrying out downsampling on the P3_out, and carrying out BiFPN_Concat stacking with the P4_td after downsampling to obtain the P4_out; and then downsampling is carried out on the P4_out, and the downsampled P5_out is stacked with the P5_in to obtain the P5_out. According to the method, efficient bidirectional cross connection is used for feature fusion, nodes with small contribution to feature fusion in PANet are removed, additional connection is added between input nodes and output nodes at the same level, more features are fused without additional cost, and semantic expression and positioning capability on multiple scales are enhanced, as shown in figure 2.
Then adding ECA after the 9 th layer, the module changes the input feature map from a matrix of [ h, w, c ] to a vector of [1, c ] through global average pooling (Global Average Pooling), then calculates to obtain an adaptive one-dimensional convolution kernel size, and uses the kernel size in one-dimensional convolution to obtain the weight of each channel in the feature map; and multiplying the normalized weight and the original input feature map channel by channel to generate a weighted feature map.
The attention mechanism uses a 1x1 convolution layer after the global average pooling layer, so that a full connection layer is removed, dimension reduction is avoided, cross-channel interaction is effectively captured, and finally the probability of judging an object and the detection precision of a model are improved; the specific formula is as follows:
Figure BDA0003759558010000081
wherein, C represents the channel dimension; k represents volume and size; y and b are respectively 2 and 1;
s23, back propagation is carried out by combining multiple loss functions (such as classification loss, positioning loss, execution loss and the like), and the gradient in the model is updated and weight parameters are adjusted;
and S24, finally, verifying the existing model by adopting the verification set in the step S01 to obtain the YoloV5 target detection network model.
S03, obtaining a three-dimensional point cloud of the tea based on a training result of the YoloV5 target detection network model in the step S02; screening out three-dimensional point clouds of tea buds from the three-dimensional point clouds of tea leaves; and finally, fitting a cuboid with the smallest external connection of the tea buds to obtain the accurate positions and picking points of the tea buds.
The method comprises the following steps:
s31, firstly, according to the result of the YoloV5 target detection network model in the step S02, acquiring detection frame coordinates, and generating a color image and a region of interest (Region of Interest, ROI) corresponding to the depth image;
s32, obtaining corresponding mapped color image coordinates according to the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the color image and through the coordinate values, the pixel values and the recording distances of the depth image;
s33, obtaining a three-dimensional point cloud of the tea through coordinate fusion of the color image and the depth image, wherein the three-dimensional point cloud comprises the following specific steps of:
Figure BDA0003759558010000091
in the method, in the process of the invention,
Figure BDA0003759558010000092
a coordinate system representing a three-dimensional point cloud; />
Figure BDA0003759558010000093
A coordinate system representing a color image; d represents a depth value, obtained by a depth image; f (f) x 、f y Representing the camera focal length;
s34, because the generated three-dimensional point cloud of the tea leaves comprises tea leaves buds and background point clouds thereof, an average value of the three-dimensional point cloud of the tea leaves is obtained through calculation, and the average value is used as a distance threshold value; then, filtering the background point cloud which is larger than the distance threshold value to obtain a primarily segmented three-dimensional point cloud; then adopting DBSCAN clustering algorithm, and setting parameter radius Eps and minimum sample number M required to be contained in the neighborhood p The primarily segmented three-dimensional point clouds are gathered into one type, and three-dimensional point clouds of tea buds are screened out;
wherein, the DBSCAN clustering algorithm randomly selects a data sample in space, and determines whether the sample number distributed in the neighborhood radius Eps is greater than or equal to the minimum sample number M p The threshold number determines whether it is a core object:
if so, dividing all points in the neighborhood into the same cluster group, and meanwhile, on the basis of the cluster group above, searching for all samples with reachable density by breadth-first search, and dividing the samples into the cluster group;
if the data sample is a non-core object, marking the data sample as noise point removal;
the formula is specifically as follows:
N Eps (p)={q∈D|dist(p,q)≤Eps};
wherein D represents a point cloud sample set; p and q respectively represent sample points summarized by the sample set;
for any p.epsilon.D, if its Eps corresponds to |N Eps (p) | contains at least M p P is the core object; if q is within Eps of p, and p is the core object, then q is defined as qThe p density can be reached;
s35, adopting a principal component analysis (Principal Component Analysis, PCA) to fit the minimum external cuboid of the tea buds at the position according to the growth posture of the tea buds; then calculating to obtain the coordinates of each vertex of the cuboid; obtaining coordinates of a center point of the bottom surface of the cuboid by solving an average value of four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of tea buds, wherein the method specifically comprises the following steps:
firstly, screening three main directions of a three-dimensional point cloud of tea buds, namely x, y and z directions by adopting a principal component analysis method, and calculating mass centers and covariance to obtain a covariance matrix; the method comprises the following steps:
Figure BDA0003759558010000101
Figure BDA0003759558010000102
wherein P is c Representing centroid coordinates of the three-dimensional point cloud; n represents the number of three-dimensional point clouds (i.e., the number of points); (x) i ,y i ,z i ) Representing three-dimensional coordinates of the i-th point;
Figure BDA0003759558010000103
wherein C is p Representing a covariance matrix of the three-dimensional point cloud;
then, singular value decomposition is carried out on the covariance matrix to obtain a characteristic value and a characteristic vector, wherein the specific formula is as follows:
Figure BDA0003759558010000104
in U p Representing covariance matrix C p C p T Is a feature vector matrix of (a); d (D) p Indicating that a non-0 value on a diagonal is C p C p T A diagonal matrix of square roots of non-0 eigenvalues;
Figure BDA0003759558010000105
represents a C p T C p Is a feature vector matrix of (a);
the direction of the feature vector corresponding to the maximum feature value is the main axis direction of the cuboid;
then, the coordinate points are projected onto the direction vector, and the position coordinates P of each vertex are calculated i Obtaining the maximum value and the minimum value of the inner product of the coordinate point unit vector in each direction, enabling a, b and c to be the average value of the maximum value and the minimum value on x, y and z respectively, obtaining the center point O and the length L of the cuboid, and generating the cuboid with the most proper and compact tea bud;
the specific formula is as follows:
Figure BDA0003759558010000111
O=ax+by+cz;
Figure BDA0003759558010000112
wherein X is a unit vector of the coordinate point in the X direction; y is a unit vector of the coordinate point in the Y direction; z is a unit vector of the coordinate point in the Z direction; l (L) x 、L y 、L z The lengths of the cuboids in the x direction, the y direction and the z direction are respectively;
then, the coordinates of the smallest four points in the y direction of the cuboid are judged to be used as four vertex coordinates of the bottom surface of the cuboid; and finally, obtaining the coordinate of the center point of the bottom surface of the cuboid, namely the picking point, through the average value of the four vertex coordinates.
The present invention is not limited to the preferred embodiments, and any simple modification, equivalent replacement, and improvement made to the above embodiments by those skilled in the art without departing from the technical scope of the present invention, will fall within the scope of the present invention.

Claims (2)

1. A YoloV 5-based multi-target detection and positioning method for tea leaves is characterized by comprising the following steps of: the method specifically comprises the following steps:
s01, constructing a tea bud image data set;
s02, constructing a feature map with rich semantic information through a bidirectional feature pyramid network and a channel attention mechanism based on the data set in the step S01, improving a YoloV5 detection network, obtaining a YoloV5 target detection network model, and detecting small-size tea buds;
s03, obtaining a three-dimensional point cloud of the tea based on a training result of the YoloV5 target detection network model in the step S02; screening out three-dimensional point clouds of tea buds from the three-dimensional point clouds of tea leaves; finally, fitting a minimum external cuboid of the tea buds to obtain the accurate positions and picking points of the tea buds;
the YoloV5 detection network comprises a Backbone module, a Neck module and a Head module; the backbox module comprises a Focus module, an SPP module and a CBS module which are used for slicing pictures, and a CSP module which is used for enhancing the learning performance of the whole convolutional neural network; the Neck module comprises a CBS module and a CSP module; the Head module comprises a Detect layer for detecting targets on feature graphs with different scales by using a grid-based anchor;
the step S02 specifically includes:
s21, firstly, preprocessing the images in the training set in the step S01, and unifying the resolutions of all the images in the training set; inputting the preprocessed image into a Backbone module to obtain feature images with different sizes;
s22, inputting the feature graphs with different sizes in the step S21 into a Neck module, and adopting a bidirectional feature pyramid network to replace an original path aggregation network in the Neck module to perform multi-feature fusion; then up-sampling and down-sampling the feature images in sequence, and generating feature images with various sizes through the splicing of a channel attention mechanism, and inputting the feature images into a Detect layer of a Head module;
s23, back propagation is carried out by combining with a plurality of loss functions, and the gradient in the model is updated and weight parameters are adjusted;
s24, finally, verifying the existing model by adopting the verification set in the step S01 to obtain a YoloV5 target detection network model;
the step S03 specifically includes:
s31, firstly, according to the result of the YoloV5 target detection network model in the step S02, acquiring detection frame coordinates, and generating a color image and an interested region corresponding to the depth image;
s32, obtaining corresponding mapped color image coordinates according to the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the color image and through the coordinate values, the pixel values and the recording distances of the depth image;
s33, obtaining a three-dimensional point cloud of the tea through coordinate fusion of the color image and the depth image, wherein the three-dimensional point cloud comprises the following specific steps of:
Figure FDA0004187588310000021
/>
in the method, in the process of the invention,
Figure FDA0004187588310000022
a coordinate system representing a three-dimensional point cloud; />
Figure FDA0004187588310000023
A coordinate system representing a color image; d represents a depth value, obtained by a depth image; f (f) x 、f y Representing the camera focal length;
s34, because the generated three-dimensional point cloud of the tea leaves comprises tea leaves buds and background point clouds thereof, an average value of the three-dimensional point cloud of the tea leaves is obtained through calculation, and the average value is used as a distance threshold value; then, filtering the background point cloud which is larger than the distance threshold value to obtain a primarily segmented three-dimensional point cloud; then adopting DBSCAN clustering algorithm, and setting parameter radius Eps and the required content in the neighborhoodWith the lowest number of samples M p The primarily segmented three-dimensional point clouds are gathered into one type, and three-dimensional point clouds of tea buds are screened out;
s35, fitting the minimum external cuboid of the tea buds at the position according to the growth posture of the tea buds by adopting a principal component analysis method; then calculating to obtain the coordinates of each vertex of the cuboid; obtaining the coordinates of the center point of the bottom surface of the cuboid by solving the average value of four vertexes of the bottom surface of the cuboid, and taking the point as a picking point of tea buds;
the step S35 specifically includes:
firstly, screening three main directions of a three-dimensional point cloud of tea buds, namely x, y and z directions by adopting a principal component analysis method, and calculating mass centers and covariance to obtain a covariance matrix; the method comprises the following steps:
Figure FDA0004187588310000031
Figure FDA0004187588310000032
wherein P is c Representing centroid coordinates of the three-dimensional point cloud; n represents the number of three-dimensional point clouds; (x) i ,y i ,z i ) Representing three-dimensional coordinates of the i-th point;
Figure FDA0004187588310000033
wherein C is p Representing a covariance matrix of the three-dimensional point cloud;
then, singular value decomposition is carried out on the covariance matrix to obtain a characteristic value and a characteristic vector, wherein the specific formula is as follows:
Figure FDA0004187588310000034
in U p Representation collaborationVariance matrix C p C p T Is a feature vector matrix of (a); d (D) p Indicating that a non-0 value on a diagonal is C p C p T A diagonal matrix of square roots of non-0 eigenvalues;
Figure FDA0004187588310000035
represents a C p T C p Is a feature vector matrix of (a);
the direction of the feature vector corresponding to the maximum feature value is the main axis direction of the cuboid;
then, the coordinate points are projected onto the direction vector, and the position coordinates P of each vertex are calculated i Obtaining the maximum value and the minimum value of the inner product of the coordinate point unit vector in each direction, enabling a, b and c to be the average value of the maximum value and the minimum value on x, y and z respectively, obtaining the center point O and the length L of the cuboid, and generating the cuboid with the most proper and compact tea bud;
the specific formula is as follows:
Figure FDA0004187588310000041
O=ax+by+cz;
Figure FDA0004187588310000042
wherein X is a unit vector of the coordinate point in the X direction; y is a unit vector of the coordinate point in the Y direction; z is a unit vector of the coordinate point in the Z direction; l (L) x 、L y 、L z The lengths of the cuboids in the x direction, the y direction and the z direction are respectively;
then, the coordinates of the smallest four points in the y direction of the cuboid are judged to be used as four vertex coordinates of the bottom surface of the cuboid; and finally, obtaining the coordinate of the center point of the bottom surface of the cuboid, namely the picking point, through the average value of the four vertex coordinates.
2. The yov 5-based multi-target detection and positioning method for tea leaves according to claim 1, wherein the method comprises the following steps: the step S01 specifically includes: firstly, collecting image data of tea buds by using an RGB-D camera to obtain color images and depth images of the tea buds; then, marking the color image by using a marking tool, performing data set enhancement operation, and expanding the number of the data sets to construct a tea sprout image data set; and finally, dividing the data set into a training set, a testing set and a verification set.
CN202210866833.0A 2022-07-22 2022-07-22 YoloV 5-based multi-target detection and positioning method for tea leaves Active CN115272791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210866833.0A CN115272791B (en) 2022-07-22 2022-07-22 YoloV 5-based multi-target detection and positioning method for tea leaves

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210866833.0A CN115272791B (en) 2022-07-22 2022-07-22 YoloV 5-based multi-target detection and positioning method for tea leaves

Publications (2)

Publication Number Publication Date
CN115272791A CN115272791A (en) 2022-11-01
CN115272791B true CN115272791B (en) 2023-05-26

Family

ID=83768705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210866833.0A Active CN115272791B (en) 2022-07-22 2022-07-22 YoloV 5-based multi-target detection and positioning method for tea leaves

Country Status (1)

Country Link
CN (1) CN115272791B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115943809B (en) * 2023-03-09 2023-05-16 四川省农业机械研究设计院 Tea-picking optimization method and system based on quality evaluation
CN116138036B (en) * 2023-03-24 2024-04-02 仲恺农业工程学院 Secondary positioning method for picking young buds of famous tea

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901874A (en) * 2021-09-09 2022-01-07 江苏大学 Tea tender shoot identification and picking point positioning method based on improved R3Det rotating target detection algorithm

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080693A (en) * 2019-11-22 2020-04-28 天津大学 Robot autonomous classification grabbing method based on YOLOv3
CN113223091B (en) * 2021-04-29 2023-01-24 达闼机器人股份有限公司 Three-dimensional target detection method, three-dimensional target capture device and electronic equipment
CN114529799A (en) * 2022-01-06 2022-05-24 浙江工业大学 Aircraft multi-target tracking method based on improved YOLOV5 algorithm
CN114731840B (en) * 2022-04-07 2022-12-27 仲恺农业工程学院 Double-mechanical-arm tea picking robot based on machine vision

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901874A (en) * 2021-09-09 2022-01-07 江苏大学 Tea tender shoot identification and picking point positioning method based on improved R3Det rotating target detection algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"自然环境下葡萄采摘机器人采摘点的自动定位";罗陆锋等;《农业工程学报》;第31卷(第2期);第14-21页 *
Yatao Li et al."In-field tea shoot detection and 3D localization using an RGB-D camera".《Computers and Electronics in Agriculture》.2021,第1-12页. *
张泽坤等."面向物流分拣的多立体摄像头物体操作***".《计算机应用》.2018,第38卷(第8期),第2442-2448页. *

Also Published As

Publication number Publication date
CN115272791A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN112418117B (en) Small target detection method based on unmanned aerial vehicle image
CN115272791B (en) YoloV 5-based multi-target detection and positioning method for tea leaves
CN109934115B (en) Face recognition model construction method, face recognition method and electronic equipment
US9767604B2 (en) Image analysis method by analyzing point cloud using hierarchical search tree
JP6395481B2 (en) Image recognition apparatus, method, and program
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN110852182B (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
Wang et al. YOLOv3‐Litchi Detection Method of Densely Distributed Litchi in Large Vision Scenes
CN110263712B (en) Coarse and fine pedestrian detection method based on region candidates
CN115187803B (en) Positioning method for picking process of famous tea tender shoots
CN110866079A (en) Intelligent scenic spot real scene semantic map generating and auxiliary positioning method
CN112560623B (en) Unmanned aerial vehicle-based rapid mangrove plant species identification method
CN108734200B (en) Human target visual detection method and device based on BING (building information network) features
CN114926747A (en) Remote sensing image directional target detection method based on multi-feature aggregation and interaction
CN114359756A (en) Rapid and intelligent detection method for house damaged by remote sensing image of post-earthquake unmanned aerial vehicle
CN114821102A (en) Intensive citrus quantity detection method, equipment, storage medium and device
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN113408584A (en) RGB-D multi-modal feature fusion 3D target detection method
CN116079749B (en) Robot vision obstacle avoidance method based on cluster separation conditional random field and robot
CN114677596A (en) Remote sensing image ship detection method and device based on attention model
CN114937266A (en) Hard shell clam biological sign identification method based on YOLOX-S
CN116630828B (en) Unmanned aerial vehicle remote sensing information acquisition system and method based on terrain environment adaptation
Gupta et al. Tree annotations in LiDAR data using point densities and convolutional neural networks
CN115271200B (en) Intelligent coherent picking system for famous tea
CN114037737B (en) Neural network-based offshore submarine fish detection and tracking statistical method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant