CN114049628A

CN114049628A - Apple bounding box identification method and device based on improved SSD deep learning model

Info

Publication number: CN114049628A
Application number: CN202111407156.8A
Authority: CN
Inventors: 陈伟; 朱志宇; 宦震; 刘涛
Original assignee: Zhenjiang Daqo Modern Agriculture Development Co ltd
Current assignee: Zhenjiang Daqo Modern Agriculture Development Co ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-02-15

Abstract

The application discloses an apple bounding box identification method and device based on an improved SSD deep learning model, wherein the method comprises the following steps: aiming at each apple image, acquiring a feature map and a real frame and a prior frame set of the feature map; dividing a plurality of tiny prior frames aiming at one prior frame; calculating a first overlapping rate of each tiny prior frame and a real frame of the feature map; acquiring a best matching bounding box of the feature map of each apple image; obtaining a prediction frame of each tiny prior frame; calculating a second overlapping rate of the prediction frame of each tiny prior frame and the real frame of the feature map; acquiring an optimal boundary frame of a feature map of each apple image; acquiring an apple image training set, wherein the apple image training set comprises an optimal matching boundary box and an optimal boundary box of a feature map of each apple image; constructing an improved SSD deep learning model; acquiring an apple real-time image; and acquiring coordinates of the boundary frame and the central point of the apple. This application is when picking the apple, and is efficient to apple frame recognition, and the rate of accuracy is high.

Description

Apple bounding box identification method and device based on improved SSD deep learning model

Technical Field

The application relates to the technical field of bounding box identification, in particular to an apple bounding box identification method and device based on an improved SSD deep learning model.

Background

China is the largest world of apple producing and consuming countries. The apples need to be picked intensively in the mature period, and the picking operation is one of the most time-consuming and labor-consuming links in apple planting production and is finished by pure hands at present. In recent years, picking robots based on machine vision become a research hotspot of agricultural engineering at home and abroad, and aim to realize automatic fruit picking through intelligent robot technology. The picking robot is a comprehensive system integrating environment perception, motion planning and servo control. Wherein, environmental perception is an important basis and precondition for automatic picking. Detection of the apple image may provide picking target information to the picking robot control system. The robot can work for a long time by fast and accurate target detection, the labor cost is reduced, and the production efficiency is improved. Therefore, the research on the apple detection method has important significance for improving the picking efficiency and success rate of the apple picking robot.

In recent years, scholars at home and abroad try to combine deep learning with agricultural automation to obtain good results. The accurate and rapid fruit identification method becomes a key factor for improving the performance of the picking robot, and the breakthrough development of target detection is promoted by virtue of strong data expression and characteristic extraction capability of deep learning. The traditional method has a bottleneck due to the complex picking environment and high demand condition of the on-branch apple fruit identification, and the deep learning method has very important significance for solving the problem. However, at present, the deep learning method is not suitable for apple recognition.

Disclosure of Invention

In order to solve the problems of low recognition efficiency, large error and difficulty in picking work caused by the fact that a robot is used for picking apple fruits in the prior art, the application discloses an apple boundary frame recognition method and device based on an improved SSD deep learning model.

The application discloses in a first aspect an apple bounding box identification method based on an improved SSD deep learning model, comprising:

acquiring a plurality of apple images;

acquiring a characteristic diagram of the multiple apple images;

aiming at the feature map of each apple image, acquiring a real frame of the feature map;

acquiring a prior frame set of a feature map of each apple image; the prior box set comprises a plurality of prior boxes of different sizes and aspect ratios;

dividing a plurality of tiny prior frames aiming at one prior frame;

calculating a first overlapping rate of each tiny prior frame and a real frame of the feature map aiming at each tiny prior frame;

acquiring the best matching bounding box of the feature map of each apple image according to the first overlapping rate; the best matching bounding box comprises a plurality of tiny prior boxes;

obtaining a prediction frame of each tiny prior frame; the prediction box comprises a confidence coefficient of the apple target and an offset relative to a real box;

calculating a second overlapping rate of the prediction frame of each tiny prior frame and the real frame of the feature map;

acquiring an optimal boundary frame of the feature map of each apple image according to the second overlapping rate; the optimal bounding box comprises a plurality of forecasting boxes of tiny prior boxes;

acquiring an apple image training set, wherein the apple image training set comprises a best matching bounding box of a feature map of each apple image and an optimal bounding box of the feature map of each apple image;

constructing an improved SSD deep learning model according to the apple image training set;

acquiring an apple real-time image;

and acquiring an apple boundary frame and a central point coordinate of the real-time apple image according to the improved SSD deep learning model.

Optionally, after acquiring the plurality of apple images, the method further includes:

and unifying the sizes of the apple images.

Optionally, before acquiring the plurality of apple images, the method further includes:

according to the illumination angles of the front light, the side light and the back light, carrying out apple initial image acquisition on apple fruits with different pixel sizes;

preprocessing the apple initial image; the preprocessing includes enhancing the contrast of the apple initial image according to an adaptive histogram equalization method.

Optionally, after obtaining the feature maps of the multiple apple images, the method further includes:

classifying the feature maps of the multiple apple images according to different feature information of the apples; the different characteristic information of the apple comprises color characteristic information and size characteristic information of the apple.

Optionally, before acquiring the prior frame set of the feature map of each apple image, the method further includes:

and constructing a matching relation between the prior frame and the real frame of the feature map of each apple image.

The second aspect of the application discloses an apple bounding box recognition device based on an improved SSD deep learning model, which is applied to the apple bounding box recognition method based on the improved SSD deep learning model and comprises the following steps:

the image acquisition module is used for acquiring a plurality of apple images;

the characteristic diagram acquisition module is used for acquiring characteristic diagrams of the apple images;

the real frame acquisition module is used for acquiring a real frame of the feature map aiming at the feature map of each apple image;

the prior frame acquisition module is used for acquiring a prior frame set of the feature map of each apple image; the prior box set comprises a plurality of prior boxes of different sizes and aspect ratios;

the micro prior frame acquisition module is used for dividing a plurality of micro prior frames aiming at one prior frame;

the first overlapping rate calculation module is used for calculating a first overlapping rate of each tiny prior frame and a real frame of the feature map aiming at each tiny prior frame;

the optimal matching bounding box obtaining module is used for obtaining an optimal matching bounding box of the feature map of each apple image according to the first overlapping rate; the best matching bounding box comprises a plurality of tiny prior boxes;

the prediction frame acquisition module is used for acquiring the prediction frame of each tiny prior frame; the prediction box comprises a confidence coefficient of the apple target and an offset relative to a real box;

the second overlapping rate obtaining module is used for calculating a second overlapping rate of the prediction frame of each tiny prior frame and the real frame of the feature map;

the optimal boundary frame obtaining module is used for obtaining an optimal boundary frame of the feature map of each apple image according to the second overlapping rate; the optimal bounding box comprises a plurality of forecasting boxes of tiny prior boxes;

the training set acquisition module is used for acquiring an apple image training set; the apple image training set comprises a best matching bounding box of the feature map of each apple image and an optimal bounding box of the feature map of each apple image;

the model construction module is used for constructing an improved SSD deep learning model according to the apple image training set;

the real-time image acquisition module is used for acquiring real-time images of the apples;

and the apple boundary frame acquisition module is used for acquiring the apple boundary frame and the central point coordinate of the real-time apple image according to the improved SSD deep learning model.

Optionally, after the image acquiring module, the apparatus further includes:

and the size unifying module is used for unifying the sizes of the apple images.

Optionally, before the image acquiring module, the apparatus further includes:

the image acquisition module is used for acquiring initial apple images of apple fruits with different pixel sizes according to the illumination angles of the front light, the side light and the back light;

the image preprocessing module is used for preprocessing the initial apple image; the preprocessing includes enhancing the contrast of the apple initial image according to an adaptive histogram equalization method.

Optionally, after the feature map obtaining module, the apparatus further includes:

the characteristic classification module is used for classifying the characteristic graphs of the apple images according to different characteristic information of the apples; the different characteristic information of the apple comprises color characteristic information and size characteristic information of the apple.

Optionally, before the prior frame obtaining module, the apparatus further includes:

and the relationship construction module is used for constructing the matching relationship between the prior frame and the real frame of the feature map of each apple image.

The application discloses an apple bounding box identification method and device based on an improved SSD deep learning model, wherein the method comprises the following steps: aiming at each apple image, acquiring a feature map, a real frame of the feature map and a prior frame set of the feature map; dividing a plurality of tiny prior frames aiming at one prior frame; calculating a first overlapping rate of each tiny prior frame and a real frame of the feature map aiming at each tiny prior frame; acquiring a best matching bounding box of the feature map of each apple image; obtaining a prediction frame of each tiny prior frame; calculating a second overlapping rate of the prediction frame of each tiny prior frame and the real frame of the feature map; acquiring an optimal boundary frame of a feature map of each apple image; acquiring an apple image training set, wherein the apple image training set comprises an optimal matching boundary box of the feature map of each apple image and an optimal boundary box of the feature map of each apple image; constructing an improved SSD deep learning model; acquiring an apple real-time image; and acquiring the coordinates of the boundary frame and the central point of the real-time apple image.

The improved SSD deep learning model is constructed, so that the extraction efficiency of characteristic information is improved, and the information loss caused by image compression is reduced; according to the method, the obtained apple image is subjected to original sample collection aiming at apples with different pixel sizes, different light conditions and different fruit states in a natural environment, the variety of the contained apples is multiple, the coverage area is wide, and rapid identification and picking work can be carried out aiming at different types of apples in different regions; the apple image training set constructed by the application comprises the optimal matching bounding box and the optimal bounding box of the apple, so that the quality of the model is improved; this application is when picking the apple, and is efficient to apple frame recognition, and the rate of accuracy is high.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an apple bounding box identification method based on an improved SSD deep learning model according to an embodiment of the present application;

fig. 2 is a schematic device diagram of an apple bounding box identification method based on an improved SSD deep learning model according to an embodiment of the present application.

Detailed Description

The first embodiment of the present application discloses an apple bounding box identification method based on an improved SSD deep learning model, which is described with reference to the flowchart in fig. 1, and includes:

and acquiring a plurality of apple images.

And acquiring the characteristic maps of the multiple apple images.

And acquiring a real frame of the feature map aiming at the feature map of each apple image.

And acquiring a prior frame set of the feature map of each apple image. The set of prior boxes includes a plurality of prior boxes of different sizes and aspect ratios.

And dividing a plurality of tiny prior frames aiming at one prior frame.

For each tiny prior frame, calculating a first overlapping rate of each tiny prior frame and a real frame of the feature map.

And screening according to the first overlapping rate and an IOU threshold value to obtain the best matching bounding box of the feature map of each apple image. The best match bounding box includes a plurality of tiny prior boxes. Selecting the best matching bounding box from a plurality of prior boxes as a positive sample, and ensuring that the ratio of the positive sample to the negative sample is 1: 3.

and acquiring a prediction frame of each tiny prior frame. The prediction box includes a confidence of the apple target and an offset from the real box.

And calculating a second overlapping rate of the prediction frame of each tiny prior frame and the real frame of the feature map.

And acquiring the optimal boundary frame of the feature map of each apple image by using a non-maximum suppression method according to the second overlapping rate. The optimal bounding box comprises a plurality of prediction boxes of tiny prior boxes.

And acquiring an apple image training set, wherein the apple image training set comprises a best matching bounding box of the feature map of each apple image and an optimal bounding box of the feature map of each apple image.

And constructing an improved SSD deep learning model according to the apple image training set. The structure of the improved SSD deep learning model can be divided into three layers: the device comprises a basic convolutional layer, an auxiliary convolutional layer and a prediction layer, wherein the basic convolutional layer mainly extracts the characteristics of low-dimensional information, the auxiliary convolutional layer mainly extracts the characteristics of high-dimensional information, and the prediction layer converts the characteristic mapping and outputs the category information and the position information of a prediction frame of each position. In the training process, an RMSProp training optimization algorithm is adopted. The overfitting of the model is prevented by a learning rate attenuation method, the initial learning rate is 4 multiplied by 10 < -4 >, the attenuation coefficient is 0.95, and the attenuation step number is 3 multiplied by 10. The batch processing size is 64, the momentum term is 0.9, the maximum iteration step number is 6 multiplied by 104, and the model is saved after 1 multiplied by 10 iterations in the training process. In the model training stage, firstly, a data set required by training is generated, then, by using a back propagation algorithm, under continuous data iteration, model weights are continuously updated to enable a loss function to continuously tend to be minimized, and finally, an SSD _ MobileNet V2 recognition model with the best fitting degree is obtained. The model is directly used for apple recognition in a prediction stage, and a target citrus boundary box and a central point coordinate thereof are output.

And acquiring real-time apple images.

Further, after acquiring the plurality of apple images, the method further includes:

and unifying the sizes of the apple images. The input images are normalized in size, and the size of each image is unified to 300 pixels × 300 pixels.

Further, before the acquiring the plurality of apple images, the method further comprises:

according to the illumination angles of the front light, the side light and the back light, the initial image acquisition of the apples is carried out on the apples with different pixel sizes.

And preprocessing the initial apple image. The preprocessing includes enhancing the contrast of the apple initial image according to an adaptive histogram equalization method. The brighter and darker images are processed using adaptive histogram equalization that limits contrast. The adaptive histogram equalization is to uniformly adjust the gray value in a certain range for each segmentation region, so as to indirectly enhance the contrast of the image, and in order to avoid that local noise is amplified smoothly in the equalization process, the contrast is limited within a certain threshold.

Further, after obtaining the feature maps of the plurality of apple images, the method further includes:

and classifying the characteristic graphs of the multiple apple images according to different characteristic information of the apples. The different characteristic information of the apple comprises color characteristic information and size characteristic information of the apple.

Further, before the obtaining of the prior frame set of the feature map of each apple image, the method further includes:

and constructing a matching relation between the prior frame and the real frame of the feature map of each apple image. And marking the test sample, and after sample data enhancement and marking processing, improving the identification capability of the identification model on the shielded and overlapped apples by adopting a target foreground region marking mode. On the other hand, in consideration of the imaging posture of the apple on the branch under the picking robot, video acquisition is carried out on each fruit area by adopting three angles of upward view, horizontal view and downward view.

This application has demonstrated, after numerous experiments: the accuracy, the recall rate and the average accuracy of the improved SSD model used by the method are respectively improved by 1.07%, 2.14% and 5.08% compared with the SSD model before improvement, the parameter quantity is reduced to 1/6 compared with the SSD model before improvement, the recognition speed is improved by 2.52 times, and 41.34 frames/s are achieved; compared with the global labeling mode, the target foreground region labeling mode is adopted, and the accuracy, the recall rate and the average accuracy of the trained model are respectively improved by 1.09%, 3.20% and 3.76%. Particularly, when all the shielded overlapped fruit samples are trained, the accuracy, the recall rate and the average accuracy after the method is used are respectively improved by 6.83%, 6.02% and 10.15%; compared with FasterR-CNN model and HOG combined SVM method, the invention has the advantages that the average accuracy of the improved SSD model is respectively 7.54% higher and 17.38% higher, and the recognition speed is respectively 9 times and 15 times of the two methods.

The second embodiment of the present application discloses an apple bounding box recognition device based on an improved SSD deep learning model, where the device is applied to the apple bounding box recognition method based on an improved SSD deep learning model, and refer to a device structure diagram shown in fig. 2, and the device structure diagram includes:

and the image acquisition module is used for acquiring a plurality of apple images.

And the characteristic diagram acquisition module is used for acquiring the characteristic diagrams of the multiple apple images.

And the real frame acquisition module is used for acquiring a real frame of the characteristic diagram aiming at the characteristic diagram of each apple image.

And the prior frame acquisition module is used for acquiring a prior frame set of the feature map of each apple image. The set of prior boxes includes a plurality of prior boxes of different sizes and aspect ratios.

And the micro prior frame acquisition module is used for dividing a plurality of micro prior frames aiming at one prior frame.

And the first overlapping rate calculation module is used for calculating a first overlapping rate of each tiny prior frame and a real frame of the feature map aiming at each tiny prior frame.

And the best matching bounding box acquisition module is used for acquiring a best matching bounding box of the feature map of each apple image according to the first overlapping rate. The best match bounding box includes a plurality of tiny prior boxes.

And the prediction frame acquisition module is used for acquiring the prediction frame of each tiny prior frame. The prediction box includes a confidence of the apple target and an offset from the real box.

And the second overlapping rate acquisition module is used for calculating a second overlapping rate of the prediction frame of each tiny prior frame and the real frame of the feature map.

And the optimal boundary frame acquisition module is used for acquiring the optimal boundary frame of the feature map of each apple image according to the second overlapping rate. The optimal bounding box comprises a plurality of prediction boxes of tiny prior boxes.

And the training set acquisition module is used for acquiring an apple image training set. The apple image training set comprises a best matching bounding box of the feature map of each apple image and an optimal bounding box of the feature map of each apple image.

And the model construction module is used for constructing an improved SSD deep learning model according to the apple image training set.

And the real-time image acquisition module is used for acquiring real-time images of the apples.

Further, after the image obtaining module, the apparatus further includes:

Further, before the image obtaining module, the apparatus further includes:

and the image acquisition module is used for acquiring initial images of the apples according to the illumination angles of the front light, the side light and the back light and carrying out apple initial image acquisition on the apples with different pixel sizes.

And the image preprocessing module is used for preprocessing the initial apple image. The preprocessing includes enhancing the contrast of the apple initial image according to an adaptive histogram equalization method.

Further, after the feature map obtaining module, the apparatus further includes:

and the characteristic classification module is used for classifying the characteristic graphs of the apple images according to different characteristic information of the apples. The different characteristic information of the apple comprises color characteristic information and size characteristic information of the apple.

Further, before the prior frame obtaining module, the apparatus further includes:

The present application has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to limit the application. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the presently disclosed embodiments and implementations thereof without departing from the spirit and scope of the present disclosure, and these fall within the scope of the present disclosure. The protection scope of this application is subject to the appended claims.

Claims

1. An apple bounding box identification method based on an improved SSD deep learning model is characterized by comprising the following steps:

acquiring a plurality of apple images;

acquiring a characteristic diagram of the multiple apple images;

dividing a plurality of tiny prior frames aiming at one prior frame;

acquiring an apple real-time image;

2. The method of claim 1, wherein after the obtaining of the plurality of apple images, the method further comprises:

and unifying the sizes of the apple images.

3. The method of claim 1, wherein before the obtaining the plurality of apple images, the method further comprises:

4. The method of claim 1, wherein after obtaining the feature maps of the plurality of apple images, the method further comprises:

5. The method of claim 1, wherein before the obtaining the prior frame set of the feature map of each apple image, the method further comprises:

6. An apple bounding box identification device based on an improved SSD deep learning model, which is applied to the apple bounding box identification method based on the improved SSD deep learning model in any one of claims 1 to 5, and is characterized by comprising the following steps:

the image acquisition module is used for acquiring a plurality of apple images;

7. The apparatus of claim 6, wherein after the image obtaining module, the apparatus further comprises:

8. The apparatus of claim 6, wherein the image acquisition module is preceded by an apple bounding box recognition apparatus based on an improved SSD deep learning model, the apparatus further comprising:

9. The apparatus of claim 6, wherein after the feature map obtaining module, the apparatus further comprises:

10. The apparatus of claim 6, wherein the prior frame acquisition module is preceded by the apparatus further comprising: