CN115588190A - Mature fruit identification and picking point positioning method and device - Google Patents

Mature fruit identification and picking point positioning method and device Download PDF

Info

Publication number
CN115588190A
CN115588190A CN202211385166.0A CN202211385166A CN115588190A CN 115588190 A CN115588190 A CN 115588190A CN 202211385166 A CN202211385166 A CN 202211385166A CN 115588190 A CN115588190 A CN 115588190A
Authority
CN
China
Prior art keywords
fruit
module
target
feature
picking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211385166.0A
Other languages
Chinese (zh)
Inventor
刘唯真
曾淇
袁晓辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202211385166.0A priority Critical patent/CN115588190A/en
Publication of CN115588190A publication Critical patent/CN115588190A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/68Food, e.g. fruit or vegetables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a mature fruit identification and picking point positioning method and a device, and the method comprises the following steps: acquiring a two-dimensional image containing a target fruit; acquiring a target recognition model with complete training, recognizing fruits in the two-dimensional image based on the target recognition model with complete training, judging the maturity of the fruits, and recognizing the picking points of the mature fruits to determine the picking points of the mature fruits; and constructing a three-dimensional algorithm model, and performing coordinate conversion on the three-dimensional algorithm model and picking points of mature fruits in the two-dimensional image to obtain three-dimensional space coordinates of the fruit picking point image. The invention can improve the identification of the fruit picking points and the accurate determination of the coordinates of the picking points through deep learning, thereby greatly improving the picking efficiency.

Description

Mature fruit identification and picking point positioning method and device
Technical Field
The invention belongs to the technical field of intelligent picking and machine image processing, and particularly relates to a mature fruit identification and picking point positioning method and device.
Background
The intelligent picking of the fruits can obviously improve the fruit picking efficiency, and the positioning of the picking points of the fruits is the basis for realizing the intelligent picking of the fruits. At present, two methods are mainly used for researching the picking point positioning of the fruits.
One method is that under the traditional segmentation algorithm, fruits are identified based on a threshold value method or by combining color information characteristics and edge distances, picking points of the fruits are further determined, and two-dimensional picking points are positioned.
The other method is an algorithm based on deep learning, fruits are identified and positioned by using a target detection algorithm, and two-dimensional coordinates of picking points are determined by combining a skeleton extraction method, but the target detection algorithms are basically horizontal frames, so that the fruit picking points are not easy to be identified and positioned accurately, particularly, the picking points are determined by subsequently performing skeleton extraction, the task time is greatly prolonged, and the accuracy is low.
Therefore, the invention provides a method and a device for identifying ripe fruits and positioning picking points, which are used for solving the problem of low precision of identification and positioning of the picking points in the prior art.
Disclosure of Invention
In view of this, in order to solve the problem of low picking point identification and positioning accuracy in the prior art, it is necessary to provide a method and a device for identifying and positioning mature fruits, which can improve the accuracy of identifying and positioning the picking points of the fruits.
In order to achieve the above purpose, the invention provides the following technical scheme
A mature fruit identification and picking point locating method comprising:
acquiring a two-dimensional image containing a target fruit;
acquiring a target recognition model with complete training, recognizing fruits in the two-dimensional image based on the target recognition model with complete training, judging the maturity of the fruits, and recognizing the picking points of the mature fruits to determine the picking points of the mature fruits;
and constructing a three-dimensional algorithm model, and performing coordinate conversion on the three-dimensional algorithm model and picking points of mature fruits in the two-dimensional image to obtain three-dimensional space coordinates of the fruit picking point image.
In some possible implementation manners, the target identification model is a Yolox network model improved based on a Yolox network model;
the improved Yolox network model comprises a feature extraction backbone module, an attention mechanism module, a feature extraction pyramid module and an improved feature detection head module.
In some possible implementation manners, identifying the fruit in the two-dimensional image and judging the maturity of the fruit based on the well-trained target identification model, and identifying the picking point of the mature fruit to determine the picking point of the mature fruit, includes:
inputting the two-dimensional image containing the target fruit into a feature extraction trunk module for feature extraction to obtain a first target feature map;
determining a second target feature map with attention information based on the first target feature map and the attention mechanism module;
performing feature extraction and feature fusion on the second target features through a feature extraction pyramid module to obtain a target feature map;
and identifying the fruit in the two-dimensional image through the characteristic detection head module according to the target characteristic diagram, judging the maturity of the fruit, and determining the picking point of the mature fruit.
In some possible implementations, the feature extraction backbone module includes a Focus structure and a CSPDarknet structure; the CSPDarknet structure comprises: the system comprises a CBL module, a first CSP1-X module, a second CSP1-X module, a first CSP-res8 module, a second CSP-res8 module, a first Nonlinear mapping module and a second Nonlinear mapping module; inputting the two-dimensional image containing the target fruit picking point into a feature extraction trunk module for feature extraction to obtain a first target feature map, wherein the first target feature map comprises the following steps:
inputting a two-dimensional image with target fruit picking points into a Focus structure, performing pixel-by-pixel extraction on the row and column directions of a first target feature map to form a new feature layer, recombining each feature map into 4 feature layers, stacking the 4 feature layers, expanding an input channel by 4 times, and outputting to obtain a Focus target feature map;
and inputting the Focus target feature map into a CSP (compact storage protocol) Focus net structure, and processing the Focus target feature map sequentially through a CBL (compact disc language) module, a first CSP1-X module, a second CSP1-X module, a first CSP-res8 module, a second CSP-res8 module, a first Nonlinear mapping module and a second Nonlinear mapping module to obtain a first target feature map.
In some possible implementation manners, the attention mechanism module comprises a first attention mechanism submodule, a second attention mechanism submodule and a second attention mechanism submodule which are three identical attention mechanism submodules, wherein the attention mechanism submodule comprises a global average pooling layer, a Concat function layer, a first 1 × 1 convolution layer, a first activation function layer, a second 1 × 1 convolution layer and a second activation function layer which are connected in sequence;
the determining a second target feature map with attention information based on the first target feature map and the attention mechanism module includes:
performing pooling operation based on the global average pooling layer to obtain a wide-direction feature map and a high-direction feature map;
merging the wide-direction characteristic diagram and the high-direction characteristic diagram based on the Concat function layer to obtain a merged characteristic diagram;
performing dimension reduction on the merged feature map based on the first 1x1 convolution layer to obtain a merged feature map after dimension reduction;
determining a first attention weight of a feature map in a width direction and a feature map in a height direction based on the first activation function layer;
restoring the wide-direction feature map and the high-direction feature map to original dimensions based on the second 1x1 convolutional layer;
determining a second attention weight of the wide-direction feature map and the high-direction feature map which return to the original dimension based on a second activation function;
and carrying out weighting operation on the first target feature picture according to the first attention weight and the second attention weight to obtain a second target feature picture with attention information.
In some possible implementation manners, the feature extraction pyramid module includes an FPN network composed of three feature layers, where the FPN network includes a first FPN network feature layer, a second FPN network feature layer, and a third FPN network feature layer;
the obtaining of the target feature map by performing feature extraction and feature fusion on the second target feature map through the feature extraction pyramid module includes:
the output of the first attention mechanism sub-module is used as input and input into a first FPN network characteristic layer to obtain a first FPN network characteristic;
the output of the second attention mechanism sub-module is used as input and is input into a second FPN network characteristic of a second FPN network characteristic layer;
the output of the third attention mechanism sub-module is used as input and is input into a third FPN network characteristic layer and a third FPN network characteristic;
and obtaining a target feature map based on the first FPN network feature, the first FPN network feature and the third FPN network feature.
In some possible implementations, the feature detection head module includes a first detection head, a second detection head, and a third detection head;
identifying the fruits in the two-dimensional image through the characteristic detection head module according to the target characteristic diagram, judging the maturity of the fruits, and simultaneously determining picking points of the mature fruits, wherein the method comprises the following steps:
identifying whether the fruit is mature or not through a first detection head according to characteristic information carried by the target characteristic graph;
according to the characteristic information carried by the target characteristic diagram, the picking points of the mature fruits in the fruits are identified through a second detection head;
and identifying the position of the fruit on the two-dimensional image and the type of the fruit through a third detection head according to the characteristic information carried by the target characteristic map.
In some possible implementation manners, constructing a three-dimensional algorithm model, and performing coordinate transformation on the three-dimensional algorithm model and picking points of ripe fruits in the two-dimensional image to obtain three-dimensional space coordinates of a fruit picking point image, wherein the three-dimensional space coordinates include:
constructing a three-dimensional algorithm model;
establishing a coordinate system based on the marked picking point image of the mature fruit in the two-dimensional image to determine the two-dimensional coordinates of the fruit picking point on the two-dimensional image;
and determining the three-dimensional coordinates of the fruit picking points according to the three-dimensional algorithm model and the two-dimensional coordinates of the fruit picking points on the two-dimensional image.
In some possible implementation modes, the three-dimensional algorithm model adopts an SGBM algorithm to position fruit picking points;
calculating and determining three-dimensional space coordinates of the fruit picking points according to the fruit picking points in the two-dimensional image and a three-dimensional algorithm model, wherein the three-dimensional space coordinates comprise:
matching corresponding coordinate points in the images according to the two-dimensional coordinates of the fruit picking points in the two-dimensional images, which are obtained through the obtained fruit picking point images, and the images obtained through the binocular camera to obtain parallax values;
and then determining the three-dimensional space coordinates of the peach fruit picking points according to the SGBM algorithm.
In another aspect, the present invention also provides a fruit picking point positioning device, comprising:
an image acquisition unit for acquiring a two-dimensional image containing a target fruit;
the fruit and picking point identification unit is used for acquiring a target identification model with complete training, identifying the fruits in the two-dimensional image and judging the maturity of the fruits on the basis of the target identification model with complete training, and identifying the picking points of the mature fruits to determine the picking points of the mature fruits;
the picking point coordinate positioning unit is used for acquiring a two-dimensional image containing a target fruit;
acquiring a target recognition model with complete training, recognizing fruits in the two-dimensional image based on the target recognition model with complete training, judging the maturity of the fruits, and recognizing picking points of the mature fruits to determine the picking points of the mature fruits;
and constructing a three-dimensional algorithm model, and performing coordinate conversion on the three-dimensional algorithm model and picking points of mature fruits in the two-dimensional image to obtain three-dimensional space coordinates of the fruit picking point image.
Compared with the prior art, the invention has the beneficial effects that: according to the mature fruit identification and picking point positioning method, the precision of fruit identification, fruit maturity identification and fruit picking points is improved by identifying and positioning fruits and fruit picking points through a target identification model improved based on a Yolox network model, and meanwhile, the three-dimensional coordinates of the picking points are predicted based on the two-dimensional coordinates of the fruit picking points through a three-dimensional algorithm model, so that the precision of fruit picking point identification and picking point coordinates is improved, and the picking efficiency is greatly improved.
Drawings
Fig. 1 is a flowchart of an embodiment of a mature fruit identification and picking point locating method according to the present invention;
FIG. 2 is a schematic diagram of a model structure of an embodiment of a target recognition model according to the present invention;
FIG. 3 is a schematic diagram of a model structure of an embodiment of a feature detection head module according to the invention
FIG. 4 is a diagram illustrating an embodiment of spatial relationships among a pixel coordinate system, a camera coordinate system, an image coordinate system, and a world coordinate system according to the present invention;
fig. 5 is a schematic structural view of a fruit picking point positioning device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the schematic drawings are not necessarily to scale. The flowcharts used in this disclosure illustrate operations implemented according to some embodiments of the present invention. It should be understood that the operations of the flow diagrams may be performed out of order, and that steps without logical context may be performed in reverse order or concurrently. In addition, one skilled in the art, under the direction of the present disclosure, may add one or more other operations to the flowchart, or may remove one or more operations from the flowchart.
Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor systems and/or microcontroller systems.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiments of the present invention provide a method for identifying ripe fruits and locating picking points, which is described below.
Fig. 1 is a flowchart of an embodiment of a mature fruit identification and picking point positioning method according to the present invention, which includes the following steps:
s101, acquiring a two-dimensional image containing a target fruit;
s102, acquiring a target recognition model with complete training, recognizing fruits in the two-dimensional image based on the target recognition model with complete training, judging the maturity of the fruits, and recognizing picking points of the mature fruits to determine the picking points of the mature fruits;
s103, constructing a three-dimensional algorithm model, and performing coordinate conversion on the three-dimensional algorithm model and picking points of mature fruits in the two-dimensional image to obtain three-dimensional space coordinates of the fruit picking point image.
It should be noted that the target recognition model is an improved Yolox network model.
Compared with the prior art, the mature fruit identification and picking point positioning method improves the precision of fruit identification, fruit maturity identification and fruit picking point by identifying and positioning the fruit and the fruit picking point through the improved target identification model based on the Yolox network model, and meanwhile, the three-dimensional coordinate of the picking point is predicted through the three-dimensional algorithm model based on the two-dimensional coordinate of the fruit picking point, so that the precision of fruit picking point identification and picking point coordinate is improved, and the picking efficiency is greatly improved.
In an embodiment of the present invention, as shown in fig. 2, fig. 2 is a schematic diagram of a model structure of an embodiment of a target recognition model provided in the embodiment of the present invention, where the target recognition model is a Yolox network model improved based on the Yolox network model;
the improved Yolox network model comprises a feature extraction backbone module, an attention mechanism module, a feature extraction pyramid module and an improved feature detection head module.
In a specific embodiment, a feature extraction backbone module used by the Yolox network model is CSPDarknet, and in the feature extraction pyramid module part, the Yolox network model extracts multiple feature layers for target detection, so that the detection targets with different sizes can be adapted. The characteristic layer number that Yolox network model set up is 3, is located the different positions of feature extraction trunk module respectively, and three characteristic layer all is from feature extraction trunk module earlier through attention module input to feature extraction gold-seed tower again. When the shape of the input backbone network is (640, 3), the shapes of the three feature layers are (80, 256), (40, 512), (20, 1024), respectively. The characteristic pyramid structure performs characteristic fusion on the three characteristic layers and then respectively outputs the three characteristic layers to the characteristic detection head module.
It should be noted that the CSPDarknet is a Darknet network of CSP (Cross Stage Partial) structure.
It is further noted that the Darknet network is a lightweight network structure.
In the embodiment of the present invention, identifying the fruits in the two-dimensional image based on the target identification model with complete training and judging the maturity of the fruits, and identifying the picking points of the mature fruits to determine the picking points of the mature fruits, includes:
inputting the two-dimensional image containing the target fruit picking points into a feature extraction trunk module for feature extraction to obtain a first target feature map;
determining a second target feature map with attention information based on the first target feature picture and the attention mechanism module;
performing feature extraction and feature fusion on the second target features through a feature extraction pyramid module to obtain a target feature map;
and identifying the fruit in the two-dimensional image through the characteristic detection head module according to the target characteristic diagram, judging the maturity of the fruit, and determining the picking point of the mature fruit.
In a specific embodiment, the feature extraction backbone module comprises a convolution of 1X1 and a convolution of 3X3, the feature extraction backbone module further uses a CSPnet network structure, the CSPnet network structure splits stacked residual error networks, one part maintains an original structure, the other part is a residual error edge structure, so that the whole CSP has a larger residual error structure, and the feature extraction backbone module can alleviate the problem of gradient disappearance in the neural network learning process by using jump connection through the residual error network, thereby increasing the depth of the network, and further improving the prediction accuracy of the neural network by using the increase of the network depth.
The feature extraction main module uses a Focus network structure, and the structure obtains four feature layers by taking pixel points at intervals in an image. By stacking these four feature layers, the channel is quadrupled. The structure can reduce the number of parameters, thereby improving the calculation speed.
The attention mechanism module is used for improving the relationship among network analysis channels, considering more position information and further enhancing the expression capability of detecting the network learning characteristics. The module firstly uses global average pooling for the first target feature map from two dimensions of width and height, and uses concat to merge the first target feature map after obtaining feature maps in two directions. And then, reducing the dimension by using a 1 × 1 convolution kernel, performing convolution operation again, then performing normalization operation, sending the feature graph into an activation function, then respectively obtaining the feature graph with unchanged width and height by using the 1 × 1 convolution kernel, restoring the two parts to the original dimensions, processing by using a Sigmoid activation function again to obtain the attention weights in the width dimension and the height dimension, and finally performing weighting calculation on the initial feature graph, thereby completing the encoding of information by using the attention in the width dimension and the height dimension, and obtaining a second target feature graph with attention information.
The feature pyramid module takes the output of the attention mechanism model as input, and fuses the second target feature map with the attention information to obtain a target feature map;
and identifying the fruit in the two-dimensional image through the characteristic detection head module according to the target characteristic diagram, judging the maturity of the fruit, and determining the picking point of the mature fruit.
In the embodiment of the invention, the feature extraction trunk module comprises a Focus structure and a CSPDarknet structure; the CSPDarknet structure comprises: the system comprises a CBL module, a first CSP1-X module, a second CSP1-X module, a first CSP-res8 module, a second CSP-res8 module, a first Nonlinear mapping module and a second Nonlinear mapping module; inputting the two-dimensional image containing the target fruit picking point into a feature extraction trunk module for feature extraction to obtain a first target feature map, wherein the first target feature map comprises the following steps:
after a two-dimensional image with target fruit picking points is input into a Focus structure, pixel-separated extraction is carried out on the row direction and the column direction of a first target feature map to form a new feature layer, each feature map is recombined into 4 feature layers, then the 4 feature layers are stacked, and an input channel is expanded to 4 times to be output to obtain a Focus target feature map;
and inputting the Focus target feature map into a CSP (compact storage protocol) Focus net structure, and processing the Focus target feature map sequentially through a CBL (compact disc language) module, a first CSP1-X module, a second CSP1-X module, a first CSP-res8 module, a second CSP-res8 module, a first Nonlinear mapping module and a second Nonlinear mapping module to obtain a first target feature map.
It should be noted that the CBL module is composed of a convolutional layer Conv, a batch normalization layer (BN layer), and a nonlinear activation function leak relu.
The CSP1_ X module consists of a CBL module, X residual block Res units, a convolutional layer Conv, a Concat, a batch normalization layer (BN layer), a nonlinear activation function Leaky relu and a CBL module; the CSP1_ X module has a working mechanism divided into two paths for processing, wherein one path is processed by the CBL module, the X residual blocks Res unit and the convolutional layer Conv in sequence, and the other path is processed by the convolutional layer Conv; and after being connected by Concat, the processing results of the two paths sequentially pass through a BN layer, a nonlinear activation function leak and a CBL module for processing, and finally the output of the CSP1_ X module is output. And the residual block Res unit is formed by performing add tensor addition operation on an upper branch circuit consisting of 2 CBL modules and an original input serving as a lower branch circuit to obtain an output of the residual block Res unit.
The first CSP1-X module and the second CSP1-X module are two same CSP1-X modules.
Further, it should be noted that the first Nonlinear mapping module and the second Nonlinear mapping module are two identical Nonlinear mapping modules; the Nonlinear mapping module consists of a convolutional layer Conv and a Nonlinear activation function Leaky relu; and the dimensions of the feature map are adjusted by using Nonlinear mapping, so that the details of feature fusion are improved, and the detection effect on small targets is optimized.
Further, it should be noted that the CSP _ Res8 module is formed by splicing 8 Res unit modules and CBM component Concate tensors.
The first CSP-Res8 module and the second CSP-Res8 module are two same CSP _ Res8 modules.
It should be noted that, the Focus network structure obtains four feature layers by taking pixel points at intervals in one image. By stacking these four feature layers, the channel is quadrupled. The structure can reduce the number of parameters, thereby improving the calculation speed.
In the embodiment of the invention, the attention mechanism module comprises a first attention mechanism sub-module, a second attention mechanism sub-module and a second attention mechanism sub-module which are three same attention mechanism sub-modules, wherein the attention mechanism sub-modules comprise a global average pooling layer, a Concat function layer, a first 1 × 1 convolution layer, a first activation function layer, a second 1 × 1 convolution layer and a second activation function layer which are sequentially connected;
the determining a second target feature map with attention information based on the first target feature map and the attention mechanism module includes:
performing pooling operation based on the global average pooling layer to obtain a wide-direction feature map and a high-direction feature map;
merging the wide-direction feature map and the high-direction feature map based on the Concat function layer to obtain a merged feature map;
performing dimension reduction on the first 1x1 convolution layer to obtain a dimension-reduced merged feature map;
determining a first attention weight for a wide-direction feature map and a high-direction feature map based on the first activation function layer;
restoring the wide-direction feature map and the high-direction feature map to original dimensions based on the second 1x1 convolutional layer;
determining a feature map second attention weight of the width direction and the feature map of the high direction which are restored to the original dimension based on a second activation function;
and carrying out weighting operation on the first target feature picture according to the first attention weight and the second attention weight to obtain a second target feature picture with attention information.
In the embodiment of the present invention, the feature extraction pyramid module includes an FPN network composed of three feature layers, where the FPN network includes a first FPN network feature layer, a second FPN network feature layer, and a third FPN network feature layer;
the obtaining of the target feature map by performing feature extraction and feature fusion on the second target feature map through the feature extraction pyramid module includes:
the output of the first attention mechanism submodule is used as input and is input into a first FPN network characteristic layer to obtain first FPN network characteristics;
the output of the second attention mechanism sub-module is used as input and is input into a second FPN network characteristic of a second FPN network characteristic layer;
the output of the third attention mechanism sub-module is used as input and is input into a third FPN network characteristic layer and a third FPN network characteristic;
and obtaining a target feature map based on the first FPN network feature, the first FPN network feature and the third FPN network feature.
It should be noted that the FPN network is a Feature Pyramid network FPN (Feature Pyramid Networks).
In an embodiment of the present invention, the feature detection head module includes a first detection head, a second detection head, and a third detection head;
identifying the fruits in the two-dimensional image through the characteristic detection head module according to the target characteristic diagram, judging the maturity of the fruits, and simultaneously determining picking points of the mature fruits, wherein the method comprises the following steps:
identifying whether the fruit is mature or not through a first detection head according to characteristic information carried by the target characteristic graph;
according to the characteristic information carried by the target characteristic diagram, the picking points of the mature fruits in the fruits are identified through a second detection head;
and identifying the position of the fruit on the two-dimensional image and the type of the fruit through a third detection head according to the characteristic information carried by the target characteristic map.
In an embodiment, as shown in fig. 3, fig. 3 is a schematic structural diagram of a model of an embodiment of a feature detection head module according to an embodiment of the present invention, where the feature detection head module is divided into three parts, as shown in fig. 3, cls is a first detection head, kps is a second detection head, and Reg and Obj are third detection heads. The target characteristic diagram needs to be subjected to convolution processing before being detected by the three detection heads.
The Cls and the Obj share a part of parameters, and the Reg can extract regression parameters of the feature points so as to complete prediction of a detection frame, namely the position of the fruit on the two-dimensional image; the convolution channel number of the prediction result of Reg is 4, which are respectively the offset condition of the central point of the prediction frame compared with the characteristic point and the parameter of the width and height of the prediction frame compared with the logarithmic index, and the specific position frame of the cherry fruit can be predicted by the part; obj is used for judging whether the feature points contain the object, the number of channels of convolution of the prediction result of Obj is 1, and the probability that the object is contained in each feature point prediction frame is represented.
The Cls predicts the possible object types at the point and can be used for judging the ripening information of the cherry fruits, the convolution channel number of the prediction result of the Cls is 3, and the prediction value represents the probability of the ripening degree category to which the fruits belong.
Kps is used for predicting the coordinates of the key points, and the number of convolution channels of the prediction result of Kps is 6, namely, the horizontal and vertical coordinate values of three coordinate points comprise the coordinates of a picking point and the detection coordinates of two auxiliary key points.
It should be noted that Conv2D represents a two-dimensional convolution, BN represents Batch-Normalization (Batch-Normalization), and SiLU represents an activation function.
In the embodiment of the present invention, a three-dimensional algorithm model is constructed, and three-dimensional space coordinates of a fruit picking point image are obtained after performing coordinate transformation based on the three-dimensional algorithm model and picking points of ripe fruits in the two-dimensional image, including:
constructing a three-dimensional algorithm model;
establishing a coordinate system based on the marked picking point image of the mature fruit in the two-dimensional image to determine the two-dimensional coordinates of the fruit picking point on the two-dimensional image;
and determining the three-dimensional coordinates of the fruit picking points according to the three-dimensional algorithm model and the two-dimensional coordinates of the fruit picking points on the two-dimensional image.
In the embodiment of the invention, the three-dimensional algorithm model adopts an SGBM algorithm to position fruit picking points;
calculating and determining three-dimensional space coordinates of the fruit picking points according to the fruit picking points in the two-dimensional image and the three-dimensional algorithm model, wherein the three-dimensional space coordinates comprise:
matching corresponding coordinate points in the images according to the two-dimensional coordinates of the fruit picking points in the two-dimensional images, which are obtained through the obtained fruit picking point images, and the images obtained through the binocular camera to obtain parallax values;
and then determining the three-dimensional space coordinates of the peach fruit picking points according to the SGBM algorithm.
In a specific embodiment, after two-dimensional coordinates on the fruit picking point image are obtained, in order to locate three-dimensional space coordinates of the fruit picking point, the two-dimensional coordinates need to be converted into three-dimensional coordinates by using a fruit image acquired by a binocular camera.
In the specific coordinate conversion process, the conversion of coordinates is completed based on a stereo matching algorithm through the relationship among pixel coordinates, image coordinates, camera coordinates and world coordinates. The specific coordinate relationship is shown in fig. 4, and fig. 4 is a relationship diagram of an embodiment of a spatial relationship of a pixel coordinate system, a camera coordinate system, an image coordinate system, and a world coordinate system provided in the embodiment of the present invention, where the pixel coordinate system is established for determining pixel coordinates on a picture, the picture coordinate system is for determining positions of different pictures, the camera coordinate system is for determining a photographing position of a camera, and the world coordinate system is for determining positions of the pixel coordinate system, the image coordinate system, and the camera coordinate system on the world coordinate system.
The stereo matching algorithm is to match corresponding points in left and right images acquired by a camera to further obtain a disparity value. The algorithm firstly utilizes a horizontal Sobel operator to preprocess an image, and the specific formula of the Sobel operator formula is as follows:
Sobel(x,y)=2[P(x+1,y)P(x1,y)]+P(x+1,y-1)-P(x1,y-1)+P(x+1,y+1)-P(x-1,y+1)
wherein, P represents the original pixel value of the image, x is the horizontal coordinate value of the pixel coordinate, and y is the vertical coordinate value of the pixel coordinate.
Then, the processed pixel points are subjected to mapping calculation through the following mapping function to obtain a new coordinate value, and a specific mapping function formula is as follows:
Figure BDA0003930432250000161
wherein P is the pixel value, P NEW PreFilterCap is a constant parameter for pixel values on a new image.
Further, the new coordinate value is subjected to cost matching through a cost matching formula, and the specific cost matching formula is as follows:
Figure BDA0003930432250000162
in the formula, lr is a cost function of path addition in the r direction, P is a pixel point, d is a parallax, c (P, d) represents the current cost, and P 1 、P 2 The smoothing punishment is generated when the difference between the pixel point and the adjacent pixel point is minimum and maximum. P 1 <P 2 The third term is meaningless, just to eliminate the effect of the path length difference in each direction.
And finally, determining the matching cost value of each pixel point based on the matching cost value, wherein a specific matching cost value formula is as follows:
Figure BDA0003930432250000163
in the formula, p is a pixel point, d is parallax, and Lr is a cost function of path addition in the r direction
After the matching cost value of each pixel point is obtained, the three-dimensional coordinates of the fruit picking points can be determined in the matching process through uniqueness detection, sub-pixel interpolation, left-right consistency detection and other operations.
On the other hand, as shown in fig. 5 and fig. 5 are schematic structural views of an embodiment of a fruit picking point positioning device provided by the present invention, and an embodiment of the present invention further provides a fruit picking point positioning device 500, including:
an image obtaining unit 501, configured to obtain a two-dimensional image including a target fruit;
a fruit and picking point identification unit 502, configured to obtain a target identification model with complete training, identify a fruit in a two-dimensional image based on the target identification model with complete training, determine the maturity of the fruit, and identify a picking point of a mature fruit to determine the picking point of the mature fruit;
the picking point coordinate positioning unit 503 is configured to construct a three-dimensional algorithm model, and perform coordinate transformation based on the three-dimensional algorithm model and picking points of mature fruits in the two-dimensional image to obtain three-dimensional space coordinates of the fruit picking point image.
According to the mature fruit identification and picking point positioning method, the precision of fruit identification, fruit maturity identification and fruit picking points is improved by identifying and positioning fruits and fruit picking points through the improved target identification model based on the Yolox network model, and meanwhile, the three-dimensional coordinates of the picking points are predicted through the three-dimensional algorithm model based on the two-dimensional coordinates of the fruit picking points, so that the precision of fruit picking point identification and picking point coordinates is improved, and the picking efficiency is greatly improved.
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above detailed description of the method and apparatus for identifying and positioning the picking point of the ripe fruit provided by the present invention, and the specific examples applied herein have been set forth the principles and embodiments of the present invention, and the above description of the embodiments is only used to help understand the method and its core ideas of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the embodiments and the application range, and the above-mentioned is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention should be covered within the scope of the present invention.

Claims (10)

1. A mature fruit identification and picking point positioning method is characterized by comprising the following steps:
acquiring a two-dimensional image containing a target fruit;
acquiring a target recognition model with complete training, recognizing fruits in the two-dimensional image based on the target recognition model with complete training, judging the maturity of the fruits, and recognizing the picking points of the mature fruits to determine the picking points of the mature fruits;
and constructing a three-dimensional algorithm model, and performing coordinate conversion on the three-dimensional algorithm model and picking points of mature fruits in the two-dimensional image to obtain three-dimensional space coordinates of the fruit picking point image.
2. The mature fruit identification and picking point positioning method of claim 1, wherein the target identification model is a Yolox network model based on a Yolox network model improvement;
the improved Yolox network model comprises a feature extraction backbone module, an attention mechanism module, a feature extraction pyramid module and an improved feature detection head module.
3. The method for identifying ripe fruit and positioning picking points according to claim 2, wherein identifying the fruit in the two-dimensional image and determining the ripeness of the fruit based on the trained target recognition model, and identifying the picking points of ripe fruit to determine the picking points of ripe fruit comprises:
inputting the two-dimensional image containing the target fruit into a feature extraction trunk module for feature extraction to obtain a first target feature map;
determining a second target feature map with attention information based on the first target feature picture and the attention mechanism module;
performing feature extraction and feature fusion on the second target features through a feature extraction pyramid module to obtain a target feature map;
and identifying the fruit in the two-dimensional image through the characteristic detection head module according to the target characteristic diagram, judging the maturity of the fruit, and determining the picking point of the mature fruit.
4. The method of claim 3, wherein said feature extraction trunk module comprises a Focus structure and a CSPDarknet structure; the CSPDarknet structure comprises: the system comprises a CBL module, a first CSP1-X module, a second CSP1-X module, a first CSP-res8 module, a second CSP-res8 module, a first Nonlinear mapping module and a second Nonlinear mapping module; inputting the two-dimensional image containing the target fruit picking point into a feature extraction trunk module for feature extraction to obtain a first target feature map, wherein the first target feature map comprises the following steps:
inputting a two-dimensional image with target fruit picking points into a Focus structure, performing pixel-by-pixel extraction on the row and column directions of a first target feature map to form a new feature layer, recombining each feature map into 4 feature layers, stacking the 4 feature layers, expanding an input channel by 4 times, and outputting to obtain a Focus target feature map;
the method comprises the steps of inputting a Focus target characteristic diagram into a CSPDarknet structure, and processing the Focus target characteristic diagram sequentially through a CBL module, a first CSP1-X module, a second CSP1-X module, a first CSP-res8 module, a second CSP-res8 module, a first Nonlinear mapping module and a second Nonlinear mapping module to obtain a first target characteristic diagram.
5. The mature fruit identification and picking point locating method according to claim 3, wherein the attentive mechanism module comprises a first attentive mechanism submodule, a second attentive mechanism submodule and a second attentive mechanism submodule which are three identical attentive mechanism submodules, wherein the attentive mechanism submodules comprise a global average pooling layer, a Concat function layer, a first 1x1 convolutional layer, a first activation function layer, a second 1x1 convolutional layer and a second activation function layer which are connected in sequence;
the determining a second target feature map with attention information based on the first target feature map and the attention mechanism module includes:
performing pooling operation based on the global average pooling layer to obtain a wide-direction feature map and a high-direction feature map;
merging the wide-direction characteristic diagram and the high-direction characteristic diagram based on the Concat function layer to obtain a merged characteristic diagram;
performing dimensionality reduction on the merged feature map based on the first 1x1 convolution layer to obtain a merged feature map after dimensionality reduction;
determining a first attention weight of the feature map in the width direction and the feature map in the height direction based on the first activation function layer;
restoring the feature maps in the width direction and the feature maps in the height direction to original dimensions based on the second 1x1 convolutional layer;
determining a feature map second attention weight of the width direction and the feature map of the high direction which are restored to the original dimension based on a second activation function;
and performing weighting operation on the first target feature picture according to the first attention weight and the second attention weight to obtain a second target feature picture with attention information.
6. The mature fruit identification and picking point positioning method of claim 3, wherein the feature extraction pyramid module comprises an FPN network consisting of three feature layers, the FPN network comprising a first FPN network feature layer, a second FPN network feature layer and a third FPN network feature layer;
the feature extraction and feature fusion of the second target feature map by the feature extraction pyramid module to obtain a target feature map comprises the following steps:
the output of the first attention mechanism sub-module is used as input and input into a first FPN network characteristic layer to obtain a first FPN network characteristic;
the output of the second attention mechanism sub-module is used as input and is input into a second FPN network characteristic of a second FPN network characteristic layer;
the output of the third attention mechanism sub-module is used as input and is input into a third FPN network characteristic layer and a third FPN network characteristic;
and obtaining a target feature map based on the first FPN network feature, the first FPN network feature and the third FPN network feature.
7. The mature fruit identification and picking point positioning method of claim 3, wherein the feature detection head module comprises a first detection head, a second detection head and a third detection head;
identifying the fruits in the two-dimensional image through the characteristic detection head module according to the target characteristic diagram, judging the maturity of the fruits, and simultaneously determining picking points of the mature fruits, wherein the method comprises the following steps:
identifying whether the fruit is mature or not through a first detection head according to characteristic information carried by the target characteristic graph;
according to the characteristic information carried by the target characteristic diagram, the picking points of the mature fruits in the fruits are identified through a second detection head;
and identifying the position of the fruit on the two-dimensional image and the type of the fruit through a third detection head according to the characteristic information carried by the target characteristic map.
8. The method for identifying ripe fruits and positioning picking points according to claim 1, wherein a three-dimensional algorithm model is constructed, and three-dimensional space coordinates of a fruit picking point image are obtained after coordinate transformation is performed on the basis of the three-dimensional algorithm model and picking points of ripe fruits in the two-dimensional image, and the method comprises the following steps:
constructing a three-dimensional algorithm model;
establishing a coordinate system based on the marked picking point image of the mature fruit in the two-dimensional image to determine the two-dimensional coordinates of the fruit picking point on the two-dimensional image;
and determining the three-dimensional coordinates of the fruit picking points according to the three-dimensional algorithm model and the two-dimensional coordinates of the fruit picking points on the two-dimensional image.
9. The mature fruit identification and picking point positioning method as claimed in claim 1, wherein the stereo algorithm model adopts SGBM algorithm to position fruit picking points;
calculating and determining three-dimensional space coordinates of the fruit picking points according to the fruit picking points in the two-dimensional image and the three-dimensional algorithm model, wherein the three-dimensional space coordinates comprise:
matching corresponding coordinate points in the images according to the two-dimensional coordinates of the fruit picking points in the two-dimensional images, which are obtained through the obtained fruit picking point images, and the images obtained through the binocular camera to obtain parallax values;
and then determining the three-dimensional space coordinates of the peach fruit picking points according to the SGBM algorithm.
10. A fruit picking point positioning device is characterized by comprising:
an image acquisition unit for acquiring a two-dimensional image containing a target fruit;
the fruit and picking point identification unit is used for acquiring a target identification model with complete training, identifying the fruits in the two-dimensional image based on the target identification model with complete training, judging the maturity of the fruits, and identifying the picking points of the mature fruits to determine the picking points of the mature fruits;
and the picking point coordinate positioning unit is used for constructing a three-dimensional algorithm model, and obtaining the three-dimensional space coordinates of the fruit picking point image after carrying out coordinate conversion on the three-dimensional algorithm model and the picking points of the mature fruits in the two-dimensional image.
CN202211385166.0A 2022-11-07 2022-11-07 Mature fruit identification and picking point positioning method and device Pending CN115588190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211385166.0A CN115588190A (en) 2022-11-07 2022-11-07 Mature fruit identification and picking point positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211385166.0A CN115588190A (en) 2022-11-07 2022-11-07 Mature fruit identification and picking point positioning method and device

Publications (1)

Publication Number Publication Date
CN115588190A true CN115588190A (en) 2023-01-10

Family

ID=84781615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211385166.0A Pending CN115588190A (en) 2022-11-07 2022-11-07 Mature fruit identification and picking point positioning method and device

Country Status (1)

Country Link
CN (1) CN115588190A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310806A (en) * 2023-02-28 2023-06-23 北京理工大学珠海学院 Intelligent agriculture integrated management system and method based on image recognition
CN116616045A (en) * 2023-06-07 2023-08-22 山东农业工程学院 Picking method and picking system based on plant growth
CN116881830A (en) * 2023-07-26 2023-10-13 中国信息通信研究院 Self-adaptive detection method and system based on artificial intelligence
CN117152544A (en) * 2023-10-31 2023-12-01 锐驰激光(深圳)有限公司 Tea-leaf picking method, equipment, storage medium and device
CN117218615A (en) * 2023-08-29 2023-12-12 武汉理工大学 Soybean pod-falling phenotype investigation method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310806A (en) * 2023-02-28 2023-06-23 北京理工大学珠海学院 Intelligent agriculture integrated management system and method based on image recognition
CN116310806B (en) * 2023-02-28 2023-08-29 北京理工大学珠海学院 Intelligent agriculture integrated management system and method based on image recognition
CN116616045A (en) * 2023-06-07 2023-08-22 山东农业工程学院 Picking method and picking system based on plant growth
CN116616045B (en) * 2023-06-07 2023-11-24 山东农业工程学院 Picking method and picking system based on plant growth
CN116881830A (en) * 2023-07-26 2023-10-13 中国信息通信研究院 Self-adaptive detection method and system based on artificial intelligence
CN117218615A (en) * 2023-08-29 2023-12-12 武汉理工大学 Soybean pod-falling phenotype investigation method
CN117218615B (en) * 2023-08-29 2024-04-12 武汉理工大学 Soybean pod-falling phenotype investigation method
CN117152544A (en) * 2023-10-31 2023-12-01 锐驰激光(深圳)有限公司 Tea-leaf picking method, equipment, storage medium and device
CN117152544B (en) * 2023-10-31 2024-03-15 锐驰激光(深圳)有限公司 Tea-leaf picking method, equipment, storage medium and device

Similar Documents

Publication Publication Date Title
CN115588190A (en) Mature fruit identification and picking point positioning method and device
CN111179324B (en) Object six-degree-of-freedom pose estimation method based on color and depth information fusion
Kamencay et al. Improved Depth Map Estimation from Stereo Images Based on Hybrid Method.
CN112200045B (en) Remote sensing image target detection model establishment method based on context enhancement and application
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN111563418A (en) Asymmetric multi-mode fusion significance detection method based on attention mechanism
CN107329962B (en) Image retrieval database generation method, and method and device for enhancing reality
CN112085072B (en) Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
CN112200056B (en) Face living body detection method and device, electronic equipment and storage medium
CN113850865A (en) Human body posture positioning method and system based on binocular vision and storage medium
CN114220061B (en) Multi-target tracking method based on deep learning
CN112801047B (en) Defect detection method and device, electronic equipment and readable storage medium
CN110991444A (en) Complex scene-oriented license plate recognition method and device
CN113610905B (en) Deep learning remote sensing image registration method based on sub-image matching and application
CN110598715A (en) Image recognition method and device, computer equipment and readable storage medium
CN112200057A (en) Face living body detection method and device, electronic equipment and storage medium
CN111797841A (en) Visual saliency detection method based on depth residual error network
CN108388901B (en) Collaborative significant target detection method based on space-semantic channel
CN111311611A (en) Real-time three-dimensional large-scene multi-object instance segmentation method
CN115937552A (en) Image matching method based on fusion of manual features and depth features
CN114707604A (en) Twin network tracking system and method based on space-time attention mechanism
CN113011359B (en) Method for simultaneously detecting plane structure and generating plane description based on image and application
CN111914809B (en) Target object positioning method, image processing method, device and computer equipment
CN111539434B (en) Infrared weak and small target detection method based on similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination