CN117456368B

CN117456368B - Fruit and vegetable identification picking method, system and device

Info

Publication number: CN117456368B
Application number: CN202311779603.1A
Authority: CN
Inventors: 詹洁; 王硕; 张慧珊; 闫子豪
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-03-08
Anticipated expiration: 2043-12-22
Also published as: CN117456368A

Abstract

The invention relates to the technical field of deep learning, and discloses a fruit and vegetable identification picking method, a system and a device, wherein the fruit and vegetable identification picking method comprises the following steps: collecting a first image and a second image of a picking area; processing the first image and the second image to obtain image structure data; inputting the graph structure data into a recognition model, and outputting coordinate values representing predicted positions of fruits corresponding to the nodes by the recognition model; the picking robot moves to the position of the fruit according to the predicted position of the fruit, and then a third image is acquired by an image acquisition unit on the picking robot; performing example segmentation on the third image through the contour recognition positioning model to determine picking points of fruits, and then controlling a picking robot to cut the picking points to finish picking the fruits; the invention ensures that the robot does not need to traverse the region and the visual angle when picking the fruit individuals, can save a great deal of recognition and action time and improve the picking speed.

Description

Fruit and vegetable identification picking method, system and device

Technical Field

The invention relates to the technical field of deep learning, in particular to a fruit and vegetable identification and picking method, a fruit and vegetable identification and picking system and a fruit and vegetable identification and picking device.

Background

Machine learning and deep learning can enable robots to understand environments, pick crops and monitor crop health conditions, and for the problems of automatic picking in unstructured agricultural environments, mainly focus on fruit detection and picking point positioning, for example, chinese patent application publication No. CN106441098A discloses an identification positioning method for picking fruits and vegetables, which positions picking points based on a binocular vision system and a geometric method, and one obvious problem is poor performance in the case of fruit covering; chen Pengfei in Mask R-CNN-based mango instance segmentation and picking point detection study and implementation, a method for identifying mangoes and determining picking points based on Mask R-CNN instance segmentation performance is provided, and recognition accuracy of fruit covering in an image is improved through deep learning based on a convolutional neural network; however, the general method is based on optical pictures, and only surface features can be identified, so that the method is only suitable for identifying close-range pictures of the operation direction of a robot, the robot generally adopts a region traversing method to avoid omission, and if the robot wants to further find fruits hidden in the robot, the robot needs traversing with a larger range and view angle, and the picking speed is slower.

Disclosure of Invention

The invention provides a fruit and vegetable identification picking method, system and device, which solve the technical problem of slower picking speed in the related technology.

The invention provides a fruit and vegetable identification picking method, which comprises the following steps:

step 101, collecting a first image and a second image of a picking area;

the first image is a SAR image; the second image is an optical image or a SAR image;

step 102, processing the first image and the second image to obtain graph structure data, wherein the graph structure data comprises nodes, and one node is in data connection with one pixel of the first image or the second image; the conditions for establishing edges for two nodes include: the two node data associating two adjacent pixels on the first image; the two node data associating two adjacent pixels on the second image; one of the two nodes is connected with the pixel on the first image, the other is connected with the pixel on the second image, and the positions of the two pixels on the image coordinate systems of the first image and the second image are the same;

step 103, inputting the graph structure data into a recognition model, and outputting coordinate values representing predicted positions of fruits corresponding to the nodes by the recognition model; if the coordinate value of the predicted position of the fruit corresponding to the node is 0, the pixel of the first image representing the node data connection does not correspond to the fruit;

104, moving the picking robot to the position of the fruit according to the predicted position of the fruit, and then acquiring a third image which is a visible light image through an image acquisition unit on the picking robot;

and 105, performing example segmentation on the third image through the contour recognition positioning model to determine picking points of the fruits, and controlling the picking robot to cut the picking points to finish picking the fruits.

Further, the second image is a SAR image, acquired in a different frequency band from the first image, and the frequency of the acquired frequency band of the second image is lower than the acquired frequency band of the first image.

Further, the first image and the second image are superimposed on the same image coordinate system, so that the superimposed two pixels of the first image and the second image are the corresponding two pixels.

Further, the identification model comprises a first hidden layer and a fully connected layer;

the calculation formula of the first hidden layer is as follows:

；

wherein the method comprises the steps ofFor initially hiding features->Node characteristics representing the v-th node, +.>Representing fusion vectors +.>、、/>、/>Representing the first, second, third and fourth intermediate features of the v-th node at the t-th time step, respectively,/->And->Output characteristics of the v-th node at the t-th time step and t-1 time step are respectively represented,/->Representing the output characteristics of the jth node at t-1 time steps,/for>Representing a set of nodes directly connected by edges to a v-th node, the set comprising the v-th node,/the v-th node>Representing dot product->Representing vector concatenation->、/>、/>、/>、/>、/>、/>、/>Respectively represent the first, second, third, fourth, fifth, sixth, seventh and eighth weight parameters, ++>、/>、/>、/>Respectively representing the first, second, third and fourth bias parameters.

Further, the calculation formula of the full connection layer is as follows:

；

wherein the method comprises the steps ofA coordinate vector representing the predicted position of the fruit corresponding to the y-th node, the p-th component of the coordinate vector representing the p-th coordinate value of the predicted position of the fruit corresponding to the y-th node, ">Representing the output characteristics of the y-th node at the S-th time step, the y-th node establishing a data connection with the pixels of the first image,/a>Represents a ninth weight parameter,>representing a fifth bias parameter.

Further, the coordinate value of the predicted position of the fruit corresponding to the node output representing the recognition model is a three-dimensional coordinate in the world coordinate system, one node corresponds to three coordinate values, and the world coordinate system takes a certain point on the ground of the picking area as an origin.

Further, a calculation formula for calculating the loss value of the identification model with the coordinate values in the world coordinate system is as follows:

；

wherein Loss represents a Loss value, A represents the total number of nodes corresponding to which the coordinate value output by the recognition model is not 0, B represents the total number of actual fruits in the picking area,the g coordinate value of the predicted position of the fruit corresponding to the kth node,/for>A g-th coordinate value indicating a real fruit position corresponding to a predicted position of the fruit corresponding to the k-th node; />The g coordinate value of the d-th real fruit, < > is shown>And g coordinate values representing the predicted positions corresponding to the d-th real fruits.

Further, the contour recognition positioning model is Mask R-CNN.

The invention provides a fruit and vegetable identifying and picking system, which is used for executing the fruit and vegetable identifying and picking method, and comprises the following steps:

the first image acquisition module acquires a first image and a second image of the picking area;

the image preprocessing module is used for processing the first image and the second image to obtain image structure data, wherein the image structure data comprises nodes, and one node is in data connection with one pixel of the first image or the second image;

the target recognition module inputs the graph structure data into a recognition model, and the recognition model outputs coordinate values representing predicted positions of fruits corresponding to the nodes; if the coordinate value of the predicted position of the fruit corresponding to the node is 0, the pixel of the first image representing the node data connection does not correspond to the fruit;

the movement control module is used for enabling the picking robot to move to the position where the fruit is located according to the predicted position of the fruit;

the second image acquisition module acquires a third image on the front surface of the picking robot, wherein the third image is a visible light image;

the example segmentation module is used for carrying out example segmentation on the third image through the contour recognition positioning model to determine picking points of fruits;

and the picking control module is used for controlling the picking robot to cut the picking points to finish picking fruits.

The invention provides a fruit and vegetable identifying and picking device, which comprises:

a memory for storing machine executable instructions, an identification model, and a contour identification positioning model;

a processor for executing machine executable instructions stored in the memory, which when executed, is capable of performing the steps of a fruit and vegetable identification picking method as described above

The invention has the beneficial effects that: according to the invention, the fruit image features are obtained by penetrating through the leaf blade based on the micro synthetic aperture radar platform, and the fruit distribution prediction of the whole picking area is positioned by extracting the fused captured fruit image features based on the second image of the captured surface features, so that global guidance is provided for the picking of the robot, the robot does not need to traverse the area and the visual angle when picking the fruit individual, a large amount of recognition and action time is saved, the picking speed is improved, and the robot path planning method can be used for robot path planning.

Drawings

FIG. 1 is a flow chart of a fruit and vegetable identification picking method of the present invention;

fig. 2 is a schematic block diagram of a fruit and vegetable identification picking system according to the present invention.

In the figure: a first image acquisition module 201, an image preprocessing module 202, a target identification module 203, a movement control module 204, a second image acquisition module 205, an instance segmentation module 206 and a picking control module 207.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It is to be understood that these embodiments are merely discussed so that those skilled in the art may better understand and implement the subject matter described herein and that changes may be made in the function and arrangement of the elements discussed without departing from the scope of the disclosure herein. Various examples may omit, replace, or add various procedures or components as desired. In addition, features described with respect to some examples may be combined in other examples as well.

As shown in fig. 1, the fruit and vegetable identification picking method comprises the following steps:

step 101, collecting a first image and a second image of a picking area;

the first image is an SAR image, and can be acquired by taking a miniature synthetic aperture radar based on an unmanned aerial vehicle as a platform as acquisition equipment; has the characteristics of close distance and high resolution;

the second image is an optical image or a SAR image, and if the second image is a SAR image, the same unmanned aerial vehicle platform as the first image is used, but it is acquired at a different frequency band from the first image, and the frequency of the acquisition frequency band of the second image is lower than that of the first image.

In one embodiment of the invention, the optical image is a laser radar image, and can be acquired by taking a laser radar based on an unmanned plane as a platform as acquisition equipment;

if the SAR image is used as the second image and the first image, the parameters corresponding to the pixels of the second image and the first image are the same, so that the consistency processing is not needed;

if the SAR image is used as the first image and the optical image is used as the second image, the parameters corresponding to the two pixels are different, and an unification process, such as conversion into a depth map, is required, and the depth value under the camera coordinate system is used as the parameter of the pixel.

the first image and the second image are superimposed on the same image coordinate system, so that the superimposed two pixels of the first image and the second image are the corresponding two pixels;

in one embodiment of the invention, the SAR image is taken as a second image and a first image, which are registered to the same space by registration prior to step 102.

In an embodiment of the invention, the second image is an optical image, typically having a resolution greater than the SAR image, and the pixels of the optical image may be subjected to a resolution reduction process.

the recognition model comprises a first hidden layer and a full connection layer;

in one embodiment of the present invention, the calculation formula of the first hidden layer is as follows:

wherein the method comprises the steps ofFor initially hiding features->Node characteristics representing the v-th node, +.>Representing fusion vectors +.>、/>、/>、/>Representing the first, second, third and fourth intermediate features of the v-th node at the t-th time step, respectively,/->Andoutput characteristics of the v-th node at the t-th time step and t-1 time step are respectively represented,/->Representing the output characteristics of the jth node at t-1 time steps,/for>Representing a set of nodes directly connected by edges to a v-th node, the set comprising the v-th node,/the v-th node>Representing dot product->Representing vector concatenation->、/>、/>、/>、/>、/>、/>、/>Respectively represent the first, second, third, fourth, fifth, sixth, seventh and eighth weight parameters, ++>、/>、/>、/>Respectively representing first, second, third and fourth bias parameters;

the nth component of the node characteristic of the nth node represents the nth attribute value of the pixel to which the data of the nth node is connected; the properties of the pixels of the SAR image include phase, amplitude, etc.

S is a super parameter, and the default value is 3.

wherein X represents a node feature matrix, the ith row vector of which represents node features of the ith node,representing an output feature matrix whose ith row vector represents the output feature of the ith node,/>Representing the sum of the adjacency matrix and the identity matrix, +.>Representation->Degree matrix of->Representing a tenth weight parameter, an element value of 1 in the nth row and mth column of the adjacency matrix indicates that there is an edge between the mth and nth nodes, otherwise the element value is 0.

In one embodiment of the present invention, the calculation formula of the full connection layer is as follows:

a coordinate vector representing the predicted position of the fruit corresponding to the y-th node, the p-th component of the coordinate vector representing the p-th coordinate value of the predicted position of the fruit corresponding to the y-th node, ">Representing the output characteristics of the y-th node at the S-th time step, the y-th node establishing a data connection with the pixels of the first image,/a>Represents a ninth weight parameter,>representing a fifth bias parameter.

The foregoingRepresenting sigmoid function->Representing a hyperbolic tangent function.

In one embodiment of the present invention, the coordinate value of the predicted position of the fruit corresponding to the node output representing the recognition model is a three-dimensional coordinate in the world coordinate system, one node corresponds to three coordinate values, and the world coordinate system takes a certain point on the ground of the picking area as the origin.

For the training process, for the output of the recognition model, it is feasible to independently measure the actual position of the fruit, but the cost is high; as a method for reducing the cost, the data for comparison with the output of the recognition model may be derived from the picking position of the robot arm at the time of actual picking as the actual fruit position, and the center point of the holding space of the robot arm holding the fruit at the time as the actual fruit centroid position. The data acquisition is used as the addition of the acquisition fruits, so that the cost is reduced compared with the data acquisition alone, and enough data which can meet the training of the recognition model can be acquired; it should be noted that the manipulator may be controlled automatically or manually or a combination of both.

For the training process, the calculation formula for calculating the loss value by the coordinate values under the world coordinate system is as follows:

wherein A represents the total number of nodes corresponding to which the coordinate value output by the recognition model is not 0, B represents the total number of actual fruits in the picking area,the g coordinate value of the predicted position of the fruit corresponding to the kth node,/for>A g-th coordinate value indicating a real fruit position corresponding to a predicted position of the fruit corresponding to the k-th node; />The g coordinate value of the d-th real fruit, < > is shown>A g-th coordinate value representing a predicted position corresponding to the d-th real fruit;

if the fruit exists in the spherical space with the radius L and the predicted position of the fruit corresponding to the kth node is taken as the center, the actual position of the actual fruit with the centroid closest to the predicted position of the R-th fruit is taken as the actual fruit position corresponding to the predicted position of the fruit corresponding to the kth node, otherwise, the definition is madeIs 0;

if the predicted position exists in the spherical space with the radius L and the true position of the d-th real fruit is taken as the predicted position corresponding to the d-th real fruit, the predicted position closest to the d-th real fruit is taken as the predicted position corresponding to the d-th real fruit, otherwise, the definition is madeIs 0.

The loss calculation method can unify the loss of the real fruits and the loss of the position deviation.

The contour recognition positioning model can adopt Mask R-CNN mentioned in the background art as another common deep learning model YOLO model, and can also be used for achieving the same purpose.

The fruit and vegetable identification picking method in one embodiment of the present invention further includes a step for planning a moving path of a plurality of picking robots, specifically, planning by combining coordinate values of the obtained predicted positions of all fruits with an ant colony algorithm, with the shortest picking time as an optimization target of the planning.

For grape vine, HV polarization is adopted in the W frequency band, so that a more proper penetrating effect can be obtained, and the identification accuracy and recall rate are optimal under the condition that other environments and parameters are the same.

It should be noted that, since the present invention utilizes the difference between the moisture content of the fruit and the moisture content of the leaf and the branch to generate different radar echo reactions, the method in the foregoing embodiment is preferably applied to fruits and vegetables with the moisture content of the fruit being more than 1.5 times of the moisture content of the leaf.

Since the first image adopts the SAR image, the method in the foregoing embodiment is preferably applied to fruits and vegetables having an average diameter of greater than 2.5cm or fruits having a string-like structure similar to grape, in view of resolution.

The aforementioned frequency band refers to an electromagnetic wave frequency band, which is defined by the IEEE (institute of electrical and electronics engineers) division method, and since the frequency of the electromagnetic wave frequency band used is not greater than the W frequency band, in the embodiment of the present invention, there is no cross-overlapping of frequency bands.

In one embodiment of the present invention, there is provided a fruit and vegetable identification picking system, as shown in fig. 2, comprising:

a first image acquisition module 201 that acquires a first image and a second image of a picking area;

an image preprocessing module 202 for processing the first image and the second image to obtain image structure data, wherein the image structure data comprises nodes, and one node is in data connection with one pixel of the first image or the second image;

the target recognition module 203 inputs the graph structure data into a recognition model, and the recognition model outputs coordinate values representing predicted positions of fruits corresponding to the nodes; if the coordinate value of the predicted position of the fruit corresponding to the node is 0, the pixel of the first image representing the node data connection does not correspond to the fruit;

a movement control module 204, wherein the picking robot moves to the position of the fruit according to the predicted position of the fruit;

the second image acquisition module 205 acquires a third image of the front surface of the picking robot, wherein the third image is a visible light image;

the example segmentation module 206 performs example segmentation on the third image through the contour recognition positioning model to determine picking points of fruits;

a picking control module 207 for controlling the picking robot to cut the picking points to complete picking of the fruit.

In one embodiment of the present invention, there is provided a fruit and vegetable identifying and picking device, comprising:

and a processor for executing the machine executable instructions stored in the memory, which when executed, is capable of performing the steps of a fruit and vegetable identification picking method as described above.

The embodiment has been described above with reference to the embodiment, but the embodiment is not limited to the above-described specific implementation, which is only illustrative and not restrictive, and many forms can be made by those of ordinary skill in the art, given the benefit of this disclosure, are within the scope of this embodiment.

Claims

1. The fruit and vegetable identification picking method is characterized by comprising the following steps of:

step 101, collecting a first image and a second image of a picking area;

the calculation formula of the first hidden layer is as follows:

；

wherein the method comprises the steps ofFor initially hiding features->Node characteristics representing the v-th node, +.>Representing fusion vectors +.>、/>、、/>Representing the first, second, third and fourth intermediate features of the v-th node at the t-th time step, respectively,/->And->Respectively represent the firstOutput characteristics of v nodes at t-th time step and t-1 time step,/->Representing the output characteristics of the jth node at t-1 time steps,/for>Representing a set of nodes directly connected by edges to a v-th node, the set comprising the v-th node,/the v-th node>Representing dot product->Representing vector concatenation->、/>、/>、/>、/>、/>、/>、/>Respectively represent the first, second, third, fourth, fifth, sixth, seventh and eighth weight parameters, ++>、/>、/>、/>Respectively representing first, second, third and fourth bias parameters;

the calculation formula of the full connection layer is as follows:

；

wherein the method comprises the steps ofA coordinate vector representing the predicted position of the fruit corresponding to the y-th node, the p-th component of the coordinate vector representing the p-th coordinate value of the predicted position of the fruit corresponding to the y-th node, ">Representing the output characteristics of the y-th node at the S-th time step, the y-th node establishing a data connection with the pixels of the first image,/a>Represents a ninth weight parameter,>representing a fifth bias parameter;

the coordinate value of the predicted position of the fruit corresponding to the node output representing the identification model is a three-dimensional coordinate under a world coordinate system, one node corresponds to three coordinate values, and the world coordinate system takes a certain point on the ground of the picking area as an origin;

the calculation formula for calculating the loss value of the identification model by the coordinate values under the world coordinate system is as follows:

；

wherein Loss represents a Loss value, A represents the total number of nodes corresponding to which the coordinate value output by the recognition model is not 0, B represents the total number of actual fruits in the picking area,the g coordinate value of the predicted position of the fruit corresponding to the kth node,/for>A g-th coordinate value indicating a real fruit position corresponding to a predicted position of the fruit corresponding to the k-th node; />The g coordinate value of the d-th real fruit, < > is shown>A g-th coordinate value representing a predicted position corresponding to the d-th real fruit;

2. The fruit and vegetable identification picking method according to claim 1, wherein the second image is a SAR image, is acquired in a different frequency band from the first image, and the frequency of the acquisition frequency band of the second image is lower than the acquisition frequency band of the first image.

3. A fruit and vegetable identification picking method according to claim 1 wherein the first image and the second image are superimposed on the same image coordinate system such that the two pixels of the first image and the second image that are superimposed are the corresponding two pixels.

4. The fruit and vegetable identification picking method according to claim 1, wherein the contour identification positioning model is Mask R-CNN.

5. A fruit and vegetable identification picking system for performing a fruit and vegetable identification picking method according to any one of claims 1-4, comprising:

6. The utility model provides a fruit vegetables discernment picking device which characterized in that includes:

a processor for executing machine executable instructions stored in a memory, which when executed, is capable of performing the steps of a fruit and vegetable identification picking method as claimed in any one of claims 1 to 4.