CN113191237A

CN113191237A - Improved YOLOv 3-based fruit tree image small target detection method and device

Info

Publication number: CN113191237A
Application number: CN202110434149.0A
Authority: CN
Inventors: 毛亮; 郭子豪; 陈鹏飞; 杨晓帆
Original assignee: Shenzhen Polytechnic
Current assignee: Shenzhen Polytechnic
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-07-30

Abstract

The invention discloses a fruit tree image small target detection method and device based on improved YOLOv 3. The fruit tree image small target detection method based on the improved YOLOv3 comprises the following steps: preprocessing an original image marked with a small target to be detected to obtain a training image, and collecting the training image in a training image set; respectively replacing an original transmission layer and a partial down-sampling layer of the YOLOv3 with DenseNet, and newly adding a feature extraction layer to construct a small target detection model of improved YOLOv 3; training the small target detection model by using the training image set, so that the small target detection model outputs the category and the position of the small target to be detected; and inputting the detection image into the trained small target detection model to obtain the category and the position of the small target in the detection image. The method can fully consider the characteristics of the small target in the fruit tree image and the shielding condition of the small target, and improve the detection precision of the small target.

Description

Improved YOLOv 3-based fruit tree image small target detection method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to a fruit tree image small target detection method and device based on improved YOLOv 3.

Background

In recent years, as target detection is widely applied to the agricultural field, a target detection method based on deep learning is gradually adopted to replace a traditional sampling or visual inspection method to detect fruits in a fruit tree image so as to estimate the fruit tree yield. The target detection method based on deep learning is generally divided into a candidate box-based method and a regression-based method, common candidate box-based methods include Fast R-CNN, Fast R-CNN and R-FCN, and common regression-based methods include YOLO and SSD. Compared with a candidate frame-based method, the regression-based method does not need to extract a candidate frame, has high detection efficiency, but has great limitation on the size of an input image, and has low detection precision for small targets such as fruits which occupy few pixels, have unobvious texture and edge features and are likely to be shielded in fruit tree images.

Therefore, the currently proposed target detection method based on deep learning cannot be perfectly applied to the detection of small targets in fruit tree images, particularly small targets which are shielded, and the detection precision of the small targets is difficult to improve.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a fruit tree image small target detection method and device based on improved YOLOv3, which can fully consider the characteristics of small targets in fruit tree images and the condition that the small targets are blocked, and improve the small target detection precision.

In order to solve the above technical problem, in a first aspect, an embodiment of the present invention provides a method for detecting a small target of a fruit tree image based on improved YOLOv3, including:

preprocessing an original image marked with a small target to be detected to obtain a training image, and collecting the training image in a training image set;

respectively replacing an original transmission layer and a partial down-sampling layer of the YOLOv3 with DenseNet, and newly adding a feature extraction layer to construct a small target detection model of improved YOLOv 3;

training the small target detection model by using the training image set, so that the small target detection model outputs the category and the position of the small target to be detected;

and inputting the detection image into the trained small target detection model to obtain the category and the position of the small target in the detection image.

Further, before the inputting the detection image into the trained small target detection model and obtaining the category and the position of the small target in the detection image, the method further includes:

and preprocessing the detection image.

Further, the preprocessing comprises any one or more image processing of image cropping, image flipping and image scaling.

Further, the original transmission layer and the partial down-sampling layer of YOLOv3 are respectively replaced by DenseNet, and a new feature extraction layer is added to construct a small target detection model of improved YOLOv3, specifically:

replacing an original transmission layer of YOLOv3 with DenseNet, enabling the original transmission layer to adjust the size of an input image to 512 x 512, replacing a 32 x 32 down-sampling layer and a 16 x 16 down-sampling layer of YOLOv3 with DenseNet, adding a feature extraction layer after a first residual block of YOLOv3, enabling the feature extraction layer to extract a feature map with the size of 128 x 128, and constructing the small target detection model.

Further, the inputting of the detection image into the trained small target detection model to obtain the category and the position of the small target in the detection image specifically includes:

and inputting the detection image into the trained small target detection model, and enabling the trained small target detection model to perform non-maximum suppression operation on the predicted target in the detection image to obtain the type and the position of the small target in the detection image.

In a second aspect, an embodiment of the present invention provides an apparatus for detecting a small target in a fruit tree image based on improved YOLOv3, including:

the image processing module is used for preprocessing an original image marked with a small target to be detected to obtain a training image and collecting the training image in a training image set;

the model construction module is used for replacing an original transmission layer and a partial down-sampling layer of the YOLOv3 with DenseNet, adding a new feature extraction layer and constructing a small target detection model of improved YOLOv 3;

the model training module is used for training the small target detection model by using the training image set so as to enable the small target detection model to output the category and the position of the small target to be detected;

and the target detection module is used for inputting the detection image into the trained small target detection model to obtain the category and the position of the small target in the detection image.

Further, the target detection module is further configured to perform preprocessing on the detection image before the detection image is input into the trained small target detection model to obtain the type and the position of the small target in the detection image.

The embodiment of the invention has the following beneficial effects:

the method comprises the steps of preprocessing an original image marked with a small target to be detected to obtain a training image, collecting the training image in a training image set, respectively replacing an original transmission layer and a partial down-sampling layer of YOLOv3 with DenseNet, adding a new feature extraction layer, constructing an improved YOLOv3 small target detection model, training the small target detection model by using the training image set, enabling the small target detection model to output the type and the position of the small target to be detected, inputting the detection image into the trained small target detection model to obtain the type and the position of the small target in the detection image, and completing small target detection of the detection image. Compared with the prior art, the small target detection model is constructed based on the improved YOLOv3 network, the feature propagation of the image is enhanced, the feature fusion is promoted, one more feature graph with one scale is extracted by utilizing the newly added feature extraction layer, the detection capability of the small target is improved, the characteristics of the small target in the fruit tree image and the shielding condition of the small target can be fully considered, and the small target detection precision is improved.

Drawings

Fig. 1 is a schematic flow chart of a fruit tree image small target detection method based on improved YOLOv3 in a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a prior art YOLOv3 network;

FIG. 3 is a schematic structural diagram of an improved YOLOv3 network according to a first embodiment of the present invention;

FIG. 4 is a data flow diagram of a training small target detection network according to a first embodiment of the present invention;

fig. 5 is a schematic structural diagram of a fruit tree image small-object detection device based on improved YOLOv3 in a second embodiment of the present invention.

Detailed Description

The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, the step numbers in the text are only for convenience of explanation of the specific embodiments, and do not serve to limit the execution sequence of the steps.

The first embodiment:

as shown in fig. 1, the first embodiment provides a fruit tree image small target detection method based on the improved YOLOv3, which includes steps S1 to S4:

s1, preprocessing the original image marked with the small target to be detected to obtain a training image, and collecting the training image in a training image set;

s2, replacing an original transmission layer and a partial down-sampling layer of the YOLOv3 with DenseNet, adding a new feature extraction layer, and constructing a small target detection model of improved YOLOv 3;

s3, training a small target detection model by using the training image set, and enabling the small target detection model to output the category and the position of the small target to be detected;

and S4, inputting the detection image into the trained small target detection model to obtain the type and position of the small target in the detection image.

Before the detection image is input to the trained small target detection model, the size of the detection image should be made equal to the size of the training image.

As an example, in step S1, an original image is acquired, a small target to be detected in the original image is labeled by a human or an image labeling tool, and the labeled original image is subjected to preprocessing, such as any one or more of image cropping, image flipping, and image scaling, to obtain a training image. The marked original image is preprocessed, so that the image quantity can be increased, the randomness of the image can be increased, and a stable small target detection model can be obtained. In order to improve the detection capability of small targets, for an original image with a high resolution, the original image needs to be cut into a series of block images with a certain naming specification to serve as training images.

In step S2, a backbone network DarkNet-53 of YOLOv3 is selected as a basic network architecture, the DarkNet-53 mainly consists of 1 × 1 or 3 × 3 convolution kernels and includes 53 convolutional layers, in the YOLOv3 network training process, due to multiple convolution and downsampling operations in the YOLOv3 network, feature information of an input image is lost in the network forward propagation process, by replacing an original transport layer with a lower resolution in the YOLOv3 network with DenseNet, feature propagation can be enhanced and feature fusion is promoted, and DenseNet promotes backward propagation of gradients, so that the network is easier to train, and by replacing a part of downsampling layers in the YOLOv3 network with DenseNet and adding a new feature extraction layer in the YOLOv3 network, a feature map with one more scale can be extracted, which is beneficial to improving the feature extraction capability of the network and improving the small target detection accuracy.

In step S3, a small target detection model is trained using the training image set, the training images in the training image set are input to the small target detection model for training, and during training, the type and position of the small target to be detected are regressed, so that the small target detection model outputs the type and position of the small target to be detected, and the trained small target detection model is obtained.

In step S4, the trained small target model is initially deployed by using the caffe deep learning framework, and a detection image with a size consistent with that of the training image is input into the trained small target detection model, so that the trained small target detection model performs target detection on the detection image, and the type and position of the small target in the detection image are obtained.

In a preferred embodiment, before inputting the detection image into the trained small target detection model and obtaining the category and the position of the small target in the detection image, the method further includes: and preprocessing the detection image.

Wherein the preprocessing comprises any one or more image processing of image cropping, image turning and image scaling.

In the embodiment, before the detection image is input into the trained small target detection model, any one or more of image cutting, image turning and image turning is performed on the detection image, so that the size of the detection image can be ensured to be consistent with that of the training image, the randomness of the image can be increased, and the small target detection precision can be improved.

In a preferred embodiment, an original transmission layer and a partial downsampling layer of YOLOv3 are respectively replaced by DenseNet, and a feature extraction layer is added to construct a small target detection model of improved YOLOv3, specifically: replacing an original transmission layer of YOLOv3 with DenseNet, enabling the original transmission layer to adjust the size of an input image to 512 x 512, replacing a 32 x 32 down-sampling layer and a 16 x 16 down-sampling layer of YOLOv3 with DenseNet, adding a new feature extraction layer after a first residual block of YOLOv3, enabling the feature extraction layer to extract a feature map with the size of 128 x 128, and constructing a small target detection model.

Illustratively, a schematic structure diagram of a YOLOv3 network in the prior art is shown in fig. 2, and a schematic structure diagram of an improved YOLOv3 network is shown in fig. 3.

By replacing the original transmission layer with lower resolution in the YOLOv3 network with DenseNet, the original transmission layer adjusts the size of the input image from 256 × 256 to 512 × 512, which can enhance the feature propagation and promote the feature fusion, effectively avoid the loss of the feature information of the input image in the network forward propagation process, while by replacing the 32 x 32 and 16 x 16 downsampled layers in the YOLOv3 network with DenseNet, and a new feature extraction layer is added behind the first residual block in the YOLOv3 network, so that the feature extraction layer extracts a feature map with the scale of 128 × 128, and the feature maps with the scales of 64 × 64, 32 × 32 and 16 × 16 originally extractable by the YOLOv3 network are added to the feature map, the FPN (feature Pyramid network) algorithm is used for fusing the high-resolution characteristic of the low-level characteristic and the high-semantic information characteristic of the high-level characteristic, so that the feature extraction capability of the network is favorably improved, and the small target detection precision is improved.

Wherein, the introduction of DenseNet can solve the gradient disappearance problem of the deep network and enhance the feature propagation at the same time. The output of the YOLOv3 network at the l level is expressed using the formula:

x_l＝H_l(x_l-1)；

for ResNet, the identity function from the previous layer output is added:

x_l＝H_l(x_l-1)+x_l-1；

in DenseNet, all previous layers would be connected as inputs:

x_l＝H_l([x₀,x₁,...,x_l-1])；

in the above formula, H_l(. cndot.) is a nonlinear transfer function, a combinatorial operation, which may include a series of BN (batch normalization), ReLu, Pooling, and Conv operations, with a transfer function H of the DenseNet architecture_lBN-ReLu-Conv (1X 1) -BN-ReLu-Conv (3X 3).

In the embodiment, by improving the YOLOv3 and constructing the small target detection model based on the improved YOLOv3 network, the small target detection capability of the small target detection model to the small target is favorably improved, and the small target detection accuracy is improved.

Illustratively, the training process of the small target detection model is shown in fig. 4.

Some Python libraries are written to pre-process and post-process the training images.

Preprocessing the high-resolution training image: the training image is cut into a series of block images with a certain naming standard, and the block images are input into a small target detection model as the training image for training. The block image is traversed by partitioning through the sliding window, in order to ensure that each region can be detected, the sliding window has a definable cropping size and an overlapping proportion, and the naming specification of the picture cropped by the sliding window is as follows: ImageName Row _ Column _ height _ width.

And (3) carrying out post-processing on the high-resolution block images: the predicted coordinates of the bounding box position of each block image are added with the values of row and column in the image name to be equal to the predicted coordinates of the small target in the uncut image. However, it should be noted that the overlap portion (overlap region) is repeatedly detected twice, so that the predicted values of two bounding boxes are generated, and therefore, the non-maximum suppression method can be applied to the global matrix of the bounding box prediction to mitigate such overlap detection.

The small target detection model uses a Loss function, which is the same as the YOLOv3 network, and the class and the position of the small target are regressed at the same time during training, wherein the Loss function Loss is the sum of the positioning Loss, the confidence Loss and the classification Loss, and the expression is as follows:

Loss＝Error_coord+Error_iou+Error_cls；

in the above formula, Error_coordFor localizing loss, Error_iouFor confidence loss, Error_clsIs a classification loss.

In a preferred embodiment, the detection image is input into the trained small target detection model to obtain the category and position of the small target in the detection image, specifically: and inputting the detection image into the trained small target detection model, and enabling the trained small target detection model to perform non-maximum suppression operation on the prediction target in the detection image to obtain the type and the position of the small target in the detection image.

In the embodiment, the trained small target detection model performs non-maximum suppression operation on the predicted target in the detection image, so that the optimal small target can be selected from a plurality of small targets repeatedly marked in the detection image overlapping region, and the small target detection accuracy is improved.

Second embodiment:

as shown in fig. 5, a second embodiment provides a fruit tree image small target detection device based on the improved YOLOv3, including: the image processing module 21 is configured to pre-process an original image labeled with a small target to be detected to obtain a training image, and collect the training image in a training image set; the model construction module 22 is used for replacing an original transmission layer and a partial downsampling layer of the YOLOv3 with DenseNet respectively, adding a new feature extraction layer and constructing a small target detection model of improved YOLOv 3; the model training module 23 is configured to train a small target detection model by using the training image set, so that the small target detection model outputs the category and the position of the small target to be detected; and the target detection module 24 is configured to input the detection image into the trained small target detection model to obtain the category and the position of the small target in the detection image.

The object detection module 24 should make the size of the detection image and the size of the training image consistent before inputting the detection image into the trained small object detection model.

Illustratively, an original image is acquired through the image processing module 21, a small target to be detected in the original image is labeled through a manual or image labeling tool, and the labeled original image is preprocessed, for example, by any one or more of image cropping, image flipping, and image scaling, so as to obtain a training image. The marked original image is preprocessed, so that the image quantity can be increased, the randomness of the image can be increased, and a stable small target detection model can be obtained. In order to improve the detection capability of small targets, for an original image with a high resolution, the original image needs to be cut into a series of block images with a certain naming specification to serve as training images.

Through the model building module 22, a backbone network DarkNet-53 of the YOLOv3 is selected as a basic network architecture, the DarkNet-53 mainly consists of 1 × 1 or 3 × 3 convolution kernels and comprises 53 convolution layers, in the process of training the YOLOv3 network, due to the fact that multiple convolution and down-sampling operations exist in the YOLOv3 network, feature information of an input image can be lost in the process of forward propagation of the network, feature propagation can be enhanced and feature fusion is promoted by replacing an original transmission layer with lower resolution in the YOLOv3 network with DenseNet, and Denset promotes backward propagation of gradient, so that the network is easier to train, meanwhile, by replacing a part of down-sampling layers in the YOLOv3 network with DenseNet and adding a feature extraction layer in the YOLOv3 network, one more feature graph can be extracted, the feature extraction capability of the network can be improved, and the detection accuracy of small targets can be improved.

The small target detection model is trained by the training image set through the model training module 23, the training images in the training image set are input into the small target detection model for training, and the type and the position of the small target to be detected are regressed during training, so that the small target detection model outputs the type and the position of the small target to be detected, and the trained small target detection model is obtained.

Through the target detection module 24, the trained small target model is initialized and deployed by using the caffe deep learning framework, and the detection image with the size consistent with that of the training image is input into the trained small target detection model, so that the trained small target detection model performs target detection on the detection image to obtain the type and position of the small target in the detection image.

In a preferred embodiment, the target detection module 24 is further configured to perform preprocessing on the detection image before inputting the detection image into the trained small target detection model to obtain the type and position of the small target in the detection image.

In this embodiment, the target detection module 24 performs any one or more of image processing such as image cropping, image flipping, and image flipping on the inspection image before inputting the detection image into the trained small target detection model, so that the size of the detection image can be kept consistent with that of the training image, the randomness of the image can be increased, and the small target detection accuracy can be improved.

Illustratively, by replacing an original transmission layer with a lower resolution in the YOLOv3 network with a DenseNet, adjusting the size of an input image from 256 × 256 to 512 × 512 by the original transmission layer, it is able to enhance feature propagation and promote feature fusion, effectively avoiding the feature information of the input image from being lost during network forward propagation, and by replacing a 32 × 32 downsampling layer and a 16 × 16 downsampling layer in the YOLOv3 network with a DenseNet, and adding a feature extraction layer after the first residual block in the YOLOv3 network, making the feature extraction layer extract a feature map with a scale of 128 × 128, and adding a total of four feature maps with scales of 64 × 64, 32 × 32, and 16 × 16 that can be extracted by the YOLOv3 network, so as to fuse high-resolution features of low-level features and high-level semantic features by an fpn (feature Pyramid) algorithm, which is beneficial to improving the feature extraction capability of the network, and the small target detection precision is improved.

x_l＝H_l(x_l-1)；

for ResNet, the identity function from the previous layer output is added:

x_l＝H_l(x_l-1)+x_l-1；

in DenseNet, all previous layers would be connected as inputs:

x_l＝H_l([x₀,x₁,...,x_l-1])；

in the above formula, H_l(. cndot.) is a non-linear transfer function, a combinatorial operation that may include a series of BN (batch normalization), ReLu,Pooling and Conv operation, transfer function H of the structure DenseNet_lBN-ReLu-Conv (1X 1) -BN-ReLu-Conv (3X 3).

In this embodiment, the model building module 22 is used to improve YOLOv3, and a small target detection model is built based on the improved YOLOv3 network, which is beneficial to improving the detection capability of the small target detection model on the small target and improving the detection accuracy of the small target.

Illustratively, the training process of the small target detection model is specifically as follows:

Loss＝Error_coord+Error_iou+Error_cls；

in the above formula, Error_coordFor localizing loss, Error_iouAs a loss of confidenceLoss of Error_clsIs a classification loss.

In this embodiment, the target detection module 24 enables the trained small target detection model to perform non-maximum suppression operation on the predicted target in the detection image, so that the optimal small target can be selected from a plurality of small targets repeatedly marked in the detection image overlapping region, which is beneficial to improving the small target detection accuracy.

In summary, the embodiment of the present invention has the following advantages:

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

It will be understood by those skilled in the art that all or part of the processes of the above embodiments may be implemented by hardware related to instructions of a computer program, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims

1. A fruit tree image small target detection method based on improved YOLOv3 is characterized by comprising the following steps:

2. The improved YOLOv 3-based fruit tree image small target detection method according to claim 1, wherein before the inputting the detection image into the trained small target detection model to obtain the category and position of the small target in the detection image, the method further comprises:

and preprocessing the detection image.

3. The improved YOLOv 3-based fruit tree image small-object detection method as claimed in claim 1 or 2, wherein the pre-processing includes any one or more of image cropping, image flipping, and image scaling.

4. The method for detecting the small target of the fruit tree image based on the improved YOLOv3 as claimed in claim 1, wherein the original transmission layer and the partial down-sampling layer of YOLOv3 are replaced by DenseNet, and a feature extraction layer is added to construct the small target detection model of the improved YOLOv3, specifically:

5. The improved YOLOv 3-based fruit tree image small target detection method as claimed in claim 1, wherein the detection image is input into a trained small target detection model to obtain the type and position of the small target in the detection image, specifically:

6. A fruit tree image small target detection device based on improved YOLOv3 is characterized by comprising:

7. The improved YOLOv 3-based fruit tree image small target detection device according to claim 6, wherein the target detection module is further configured to pre-process the detection image before inputting the detection image into the trained small target detection model to obtain the type and position of the small target in the detection image.

8. The improved YOLOv 3-based fruit tree image small-object detection device as claimed in claim 6 or 7, wherein the pre-processing includes any one or more of image cropping, image flipping, and image scaling.

9. The device for detecting the small target of the fruit tree image based on the improved YOLOv3 as claimed in claim 6, wherein the original transmission layer and the partial down-sampling layer of YOLOv3 are replaced by DenseNet, and a feature extraction layer is added to construct the small target detection model of the improved YOLOv3, specifically:

10. The improved YOLOv 3-based fruit tree image small target detection device as claimed in claim 6, wherein the input of the detection image into the trained small target detection model results in the type and position of the small target in the detection image, specifically: