CN112183448B

CN112183448B - Method for dividing pod-removed soybean image based on three-level classification and multi-scale FCN

Info

Publication number: CN112183448B
Application number: CN202011102031.XA
Authority: CN
Inventors: 王敏娟; 王莹; 陈昕; 杨斯
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2023-05-12
Anticipated expiration: 2040-10-15
Also published as: CN112183448A

Abstract

The invention provides a method for dividing an off-pod soybean image based on three-level classification and a multi-scale Full Convolution Network (FCN), which is characterized in that tight pods which are difficult to separate are divided by considering edge pixels of the pods, a ternary label image synthesized by a three-level classification method is used for training an FCN model with a multi-scale structure, the network extracts characteristics with different scales through different convolution stages, and high-resolution branches are designed to obtain low-level global information to perform characteristic fusion, so that the division of the pods with different forms and sizes is realized.

Description

Method for dividing pod-removed soybean image based on three-level classification and multi-scale FCN

Technical Field

The invention relates to the field of computer vision of artificial intelligence, in particular to a method for dividing an image of an ungapped soybean based on three-level classification and a multi-scale Full Convolution Network (FCN).

Background

Soybean is an important legume grain, which is also one of the most valuable oil crops in the world, mainly for soybean meal and vegetable oil. Seeds of which provide approximately 60% of the world's supply of vegetable proteins are considered to be good substitutes for animal proteins. China is the largest soybean import country and accounts for more than 60% of the total soybean export world. However, due to reduced cultivated land, low yields and inadequate policies, the supply of non-sustainable and scarce soybeans cannot meet such tremendous demands, e.g., chinese soybean yields only reach 75% of the world's average level. Under such circumstances, how to maintain and develop the soybean production in our country and to ensure the most basic food supply requirements of the soybean in our country have become a major problem for the soybean industry. The biggest bottleneck restricting the production and development of soybeans is low yield per unit, and increasing the yield per unit of soybeans is the most direct and effective way for turning the current passive situation.

To increase yield, plant breeding programs require phenotyping of a large population, evaluating some or even several useful traits. The general acquisition method of the soybean phenotype data is seed examination, namely, the properties of mature plants and seeds are inspected so as to know the influence of different varieties or different treatment factors on the plants, and therefore, the quality of the high yield factors and the seed quality of the different varieties or different treatments are analyzed and mastered. The number of pods per Plant (PN) in many agricultural traits of soybean has a great influence on yield and is paid attention to breeders. However, quantitative statistics of soybean pods is usually a manual task, so that the working efficiency is low, the time consumption is high, and when manual long-time metering is performed, continuous repeated actions are easy to cause fatigue, further the accuracy of metering results is affected, so that the indirect loss is large, and the comparability of test data is poor.

In recent years, with the continuous development of computer technology and the generalization of image information, machine vision is becoming a significant part of the intelligent computer research field, and researchers in soybean phenotype research are also gradually utilizing machine vision and computer related technologies to solve the problem in plant phenotype detection. However, most of the current research on the lossy phenotype is aimed at soybean grain counting, classification and identification of soybean varieties and grades, intelligent diagnosis of nutrient elements, detection of soybean diseases and the like. Meanwhile, the deep learning network model is successfully applied to a plurality of fields, and the technology is also well seen and used by more researchers in the agricultural field, however, the research start in the agricultural field of deep learning is late, and the research on soybean pod segmentation based on deep learning is very few.

In computer vision, image segmentation is a method of dividing a digital image into multiple regions according to different properties of pixels. Unlike classification and object detection, it is often a low-level or pixel-level visual task, since spatial information of an image is important for segmentation of semantically distinct regions. Segmentation aims at extracting meaningful information for analysis. In this case, the image pixels are marked such that each pixel in the image shares certain features, such as color, intensity, texture, etc. Semantic segmentation describes the process of associating each pixel of an image with a class label. The application of Convolutional Neural Networks (CNNs) in semantic segmentation models starts from a great diversity. Of the different semantic segmentation models based on CNNs, FCNs have attracted the greatest attention and trends have emerged based on FCN semantic segmentation models. To preserve spatial information of the image, the FCN-based model eliminates the fully connected layers of the conventional CNN. In some studies, authors have used contextual features and obtained the most advanced performance. However, since FCN is a linear structure and cannot provide features with different scales, only local information is used for semantic segmentation, which makes global upper and lower Wen Yuyi of an image loose, so that semantic segmentation becomes quite ambiguous, and challenges caused by pod explosion and fruit overlapping still exist at present.

Disclosure of Invention

In order to solve the problems, the invention provides an image segmentation method of the pod-removed soybean based on three-level classification and multi-scale FCN, which is used for rapidly and accurately segmenting the pod-removed soybean. The method specifically comprises the following steps:

s1, acquiring image data of the pod-removed soybean;

s2, preprocessing a target data set;

s3, constructing and training a multi-scale full convolution neural network;

s4, realizing the image segmentation of the pod-removed soybean by using the trained multi-scale FCN.

Further, the acquiring the image data of the pod-removed soybean in the process S1 includes:

s11, randomly selecting soybean plants in different planting areas, performing operations such as pod picking, manual dirt removal and the like, and performing digital imaging in an image acquisition area;

s12, soybean pods (one pod, two pods, three pods and four pods) containing different numbers are randomly tiled on a black light absorption background cloth, so that the soybean pods are prevented from being seriously overlapped, and the camera is positioned right above the soybean pods.

Further, preprocessing the target dataset in the process S2 comprises:

s21, preprocessing pictures by using common scale transformation, random clipping, noise adding, rotation transformation and the like to enrich data volume and enhance the robustness of the model;

s22, three types of label information are considered, including: the method comprises the steps of (1) synthesizing three-level classification annotation images by the aid of a background, a pod-removed soybean main body and pod edges;

s23, adjusting the sizes of the enhanced image and the three-level classification marked image to 480 multiplied by 480.

Further, constructing the multi-scale full convolutional neural network in the process S3 includes:

s31, the input image passes through two 3X 3 conventional convolution layers, a BN layer is arranged behind the first 3X 3 convolution layer and is used for calculating the mean value and the variance of output data of the convolution layers and normalizing, and then the characteristic image output by the second convolution layer passes through UP-

The Conv layer upsamples them to the same size as the input image, resulting in conv_fcn_out, where UP-Conv is composed of bilinear interpolation and one convolution layer, rather than the usual deconvolution, which can lead to checkerboard artifacts;

s32, inputting the characteristic image into 3 general bottleneck blocks, wherein a maximum pooling layer (Max-pooling) is used before the bottleneck blocks, and meanwhile, a dense convolution kernel is provided, and 3 residual bottleneck blocks are arranged in all general bottleneck blocks, so that memory resources can be saved, the training process is faster, and the method is defined as:

y＝f(x，{W _i })+W _s x (1)

y＝f(x，{W _i })+x (2)

wherein x and y are respectively as followsInput and output showing residual bottleneck, W _i And W is _s Representing weights, function f=w ₃ σ ₃ (W ₂ σ ₂ (W ₁ σ ₁ ) A) represents the residual map to be learned, where σ represents the ReLU activation function, and for the residual bottleneck block, equation (1) represents the first residual bottleneck of the block, where W _s So that the number of channels of x is the same as f. Equation (2) is the remaining residual bottleneck, and there are three convolution layers in the residual bottleneck, including two 1×1 convolution layers and one 3×3 convolution layer, where each convolution layer is preceded by a BN layer, where the expansion factor λ is 1, and then upsampling the feature images output by the 3 general residual bottleneck blocks through the UP-Conv layer to output bb_fcn_out1, bb_fcn_out2, and bb_fcn_out3, respectively;

s33, next, using 2 expansion bottleneck blocks, unlike a general bottleneck block, the expansion bottleneck block uses hole convolution (DC). DC is a special convolution operation that expands the kernel by inserting a space, thus expanding the receptive field without adding other parameters, where DC is used to expand the 3 x 3 convolution layer in the residual block by a factor λ=2, the number of residual bottlenecks expanded affects the receptive field of the last two multi-scale features, and thus the pod split. The invention tests different numbers of expansion residual bottlenecks of 6, 8, 10, 12 and the like, wherein the performance of the expansion residual bottlenecks of 10 layers is optimal. It should be noted that all convolution layers are followed by batch normalization and a ReLU activation function. Then, respectively enabling the characteristic images output by the 2 expansion residual bottleneck blocks to pass through an UP-Conv layer and outputting DBB_FCN_out1 and DBB_FCN_out2;

s34, performing feature fusion on the high-resolution branch feature information and Conv_FCN_out, BB_FCN_out1, BB_FCN_out2, BB_FCN_out3, DBB_FCN_out1 and DBB_FCN_out2 sent to the serial layers to obtain different Receptive Field Features (RFFs), finally reducing the size by using 3×3 and 1×1 convolution layers, and performing pixel classification by using a softmax layer to realize pixel level segmentation. To supervise the classification task, the loss function of the network is defined as:

O(i；θ)＝-∑ _i∈I log[p(i，g(i)；θ)] (3)

where I and θ represent the pixel location and network parameters (weights and deviations), respectively, in image space I. p (i, g (i); θ) represents the predicted probability of assigning pixel i to ground truth value g (i) after softmax classification.

Further, the high fraction branch (HRB) in the process S34 includes: two 1 x 1 convolutional layers. The original image passes through the HRB and then merges the HRB features into the tandem layer, which branch can provide low level global information.

Further, the implementing the split of the pod-removed soybean image using the trained multi-scale FCN in the process S4 includes: and outputting a probability map of pod objects and pod boundaries by a network, subtracting the pod edge probability map of the third channel from the pod object probability map of the second channel in order to obtain accurate pod interiors, and finally recovering pod images by ecology expansion.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides an image segmentation method of an ungapped soybean based on three-stage classification and multi-scale FCN, which is characterized in that each pod is precisely separated by a three-stage classification method, wherein edge pixels among tight pods are considered to segment the tight pods which are difficult to separate. It is able to divide the compact pod into a plurality of separate parts rather than being misinterpreted as a whole. On the other hand, the novel multi-scale FCN extracts features with different scales through different convolution stages, and designs a high-resolution branch to acquire low-level global information for feature fusion, wherein the high-resolution branch is different from the linear structure of the FCN, and the multi-scale structure can extract different receptive field features corresponding to multi-size objects, so that the segmentation of different forms and sizes of pods is realized.

Drawings

FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a specific structure of a multi-scale FCN model according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a method for dividing an image of an ungapped soybean based on three-level classification and multi-scale FCN, which is shown in combination with FIG. 1, and comprises the following steps:

s1, in a greenhouse with diffuse reflection natural illumination in daytime, acquiring an original pod image by using a camera, randomly tiling the pod-removed soybeans comprising one pod, two pods, three pods and four pods on black light-absorbing background cloth, simultaneously avoiding serious overlapping, and placing the camera right above the tiled pods at a distance of between 30 and 40cm from the pods. The camera used therein was Apple, model iPhone8, aperture: f/1.8, the focal length is set to 4 mm, the flash mode is set to force no flash, and the live mode is closed, finally 167 images are acquired, and the size is 3024 multiplied by 4032;

s2, enriching the data volume by using some data enhancement methods including random horizontal overturn (overturn probability: 0.5) and random rotation (range: 45 DEG to 45 DEG). In order to implement the proposed three-level classification method, three classes of pod bodies and pod edges are considered as label information, and the proposed multi-scale FCN is trained by these three classes of labels as well. In order to obtain the three-level annotation image, the following steps are performed:

s21, binarizing the group trunk image;

s22, performing morphological expansion and corrosion on the binary image by using circular structure filtering with the radius of 5;

s23, obtaining the edge of the pod by subtracting the erosion image from the swelling image;

s24, combining the edge image and the erosion image to obtain three types of annotation images;

here, the pod bodies and pod edges are labeled 0, 1, 2, respectively, in the background. Then, the enhanced image and the corresponding three-level classification labeling image are adjusted to 480 multiplied by 480, and are divided into a training set, a verification set and a test set according to the proportion of 8:1:1;

s3, inputting the manufactured training data set into the multi-scale FCN to train the model. During training, the initial learning rate is set to 0.08. Random gradient descent (SGD) is used to optimize the loss function. The weight decay is 0.0005 and the momentum is 0.99. As shown in connection with fig. 2, the method comprises the following steps:

s31, an input original image with 480 multiplied by 480 resolution enters a high resolution branch, wherein the branch comprises 21 multiplied by 1 convolution layers, and the convolution layers have 3 convolution kernels to obtain HRB_out, so as to retain more low-level global information and help to locate and construct a segmented object; on the other hand, the backbone network is entered, firstly, a 3×3 convolution layer with 64 convolution kernels is passed, then, the 3×3 convolution layer is entered again through batch normalization and a ReLU activation function, at this time, the obtained characteristic information is UP-sampled to the same size as the input image through UP_Conv to obtain Conv_FCN_out, and the Conv_FCN_out enters a serial layer to perform characteristic fusion, and on the other hand, the general bottleneck block is entered;

s32, 3 general bottleneck blocks are sequentially passed to reduce network parameters and deepen depth, so that training is relatively easy, a Max_pulling layer is used before each bottleneck block, a characteristic image firstly enters the first residual bottleneck block inside the bottleneck block, a 1×1 convolution layer, a 3×3 convolution layer with an expansion factor lambda of 1 and a 1×1 convolution layer with 4×mu convolution kernels are respectively passed, 3 residual bottleneck blocks are sequentially used, corresponding mu is respectively 32, 64 and 128, and BN layers are respectively arranged before convolution. The characteristic image generated by each universal bottleneck block uses an UP_Conv layer to respectively obtain BB_FCN_out1, BB_FCN_out2 and BB_FCN_out3 characteristic information with different receptive fields;

s33, feature images passing through the universal bottleneck blocks enter 2 expansion bottleneck blocks, corresponding mu is 256 and 521 respectively, features are extracted from each expansion bottleneck block through 10 expansion residual bottlenecks in sequence, an expansion factor lambda is 2, and UP-sampling is carried out on the features generated by each expansion bottleneck block by using UP_Conv to respectively obtain DBB_FCN_out1 and DBB_FCN_out2;

s34, feature HRB_out, conv_FCN_out, BB_FCN_out1, BB_FCN_out2, BB_FCN_out3, DBB_FCN_out1 and DBB_FCN_out2 obtained through UP-sampling of an UP_Conv layer are subjected to feature fusion, finally, the fused image size is reduced by using a 3×3 convolution layer and a 1×1 convolution layer respectively, and pixels are classified by using a softmax layer, so that semantic segmentation is realized.

S4, inputting the images into a trained multi-scale FCN network to further obtain a three-channel probability map, wherein the channels respectively represent the background, the pod interior and the pod edge. To obtain an accurate pod interior, the probability map of the second channel pod was subtracted from the probability map of the third channel edge, and after subtraction, the split pod-removed soybean was obtained by a probability map threshold of t=0.8. The circular structure filter radius denoted R is set to 5 pixels and the pod eroded during the preprocessing to create the edge class is restored by expanding the pod object. Finally, the segmented image was adjusted to the original size 480×480 and each pod was marked with a different integer (background 0).

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. The method for dividing the pod-removed soybean image based on three-level classification and multi-scale FCN is characterized by comprising the following steps of:

s1, acquiring image data of the pod-removed soybean;

s2, preprocessing a target data set;

s3, constructing and training a multi-scale FCN;

the specific process of the step S3 is as follows:

s31, an input image passes through two 3X 3 conventional convolution layers, a BN layer is arranged behind the first 3X 3 convolution layer and is used for calculating the mean value and the variance of output data of the convolution layers and normalizing, and then the characteristic images output by the second convolution layer are UP-sampled to the same size as the input image through an UP-Conv layer to obtain Conv_FCN_out, wherein the UP-Conv consists of bilinear interpolation and one convolution layer;

s32, inputting the characteristic image into 3 general bottleneck blocks, wherein the general bottleneck blocks have dense convolution kernels and a maximum pooling layer is used before the bottleneck blocks, and 3 residual bottleneck blocks are arranged in all the general bottleneck blocks and are defined as:

y＝f(x,{W _i })+W _s x (1)

y＝f(x,{W _i })+x (2)

wherein x and y represent input and output of residual bottleneck, W _i And W is _s Representing the weight; function f=w ₃ σ ₃ (W ₂ σ ₂ (W ₁ σ ₁ ) Representing residual mapping to be learned, σ representing the ReLU activation function; for a residual bottleneck block, equation (1) represents the first residual bottleneck, W, of the block _s So that the number of channels of x is the same as f, equation (2) is the remaining residual bottleneck; three convolution layers are shared in the residual bottleneck, wherein the three convolution layers comprise two 1 multiplied by 1 convolution layers and one 3 multiplied by 3 convolution layer, a BN layer is arranged in front of each convolution layer, the expansion factor lambda is 1, and then the characteristic images output by 3 universal residual bottleneck blocks are respectively subjected to UP-sampling through an UP-Conv layer to output BB_FCN_out1, BB_FCN_out2 and BB_FCN_out3;

s33, using 2 expansion bottleneck blocks, wherein the expansion bottleneck blocks use hole convolution DC, DC is used for 3×3 convolution layers in expansion residual blocks, expansion factors lambda=2, all the convolution layers are subjected to batch normalization and ReLU activation functions, and then characteristic images output by the 2 expansion residual bottleneck blocks respectively pass through UP-Conv layers to output DBB_FCN_out1 and DBB_FCN_out2;

s34, feature fusion is performed on feature information of the high resolution branch HRB and conv_fcn_out, bb_fcn_out1, bb_fcn_out2, bb_fcn_out3, dbb_fcn_out1 and dbb_fcn_out2 sent to the tandem layer to obtain different receptive field features RFF, finally, the size is reduced by using 3×3 and 1×1 convolution layers, and pixel classification is performed by using the softmax layer to realize pixel level segmentation, and the loss function of the network is defined as:

O(i；θ)＝-Σ _i∈I log[p(i,g(i)；θ)] (3)

wherein I and θ represent pixel locations in image space I and network parameters, respectively, the network parameters including weights and biases, p (I, g (I)), and θ represent the prediction probability of assigning pixel I to ground truth value g (I) after softmax classification; the HRB has two 1 x 1 convolution layers, the original image passes through the HRB, and then HRB features are fused into the tandem layer, and the branches can provide low-level global information;

s4, realizing pod-removed soybean image segmentation by using the trained multi-scale FCN, outputting a probability map of pod objects and pod boundaries by a network, subtracting the pod edge probability map of the third channel from the pod object probability map of the second channel, and finally recovering pod images by ecological expansion.

2. The method for dividing the pod-removed soybean image based on three-stage classification and multi-scale FCN according to claim 1, wherein the specific process of step S1 is as follows:

s11, randomly selecting soybean plants in different planting areas, performing pod picking and dirt manually removing operations, and performing digital imaging in an image acquisition area;

s12, soybean pods containing different grain numbers are randomly paved on a black light absorption background cloth, so that the soybean pods are prevented from being seriously overlapped, and a camera is positioned right above the soybean pods; the soybean pods containing different grain numbers comprise one pod, two pods, three pods and four pods.

3. The method for dividing the pod-removed soybean image based on the three-stage classification and the multi-scale FCN according to claim 1, wherein the specific process of the step S2 is as follows:

s21, preprocessing the picture by using common scale transformation, random clipping, noise adding and rotation transformation to enrich the data volume and enhance the robustness of the model;

s22, using three types of label information, including: the method comprises the steps of (1) synthesizing three-level classification annotation images by the aid of a background, a pod-removed soybean main body and pod edges;