CN113128593A

CN113128593A - Plant fine-grained identification method based on bilinear convolutional neural network

Info

Publication number: CN113128593A
Application number: CN202110425490.XA
Authority: CN
Inventors: 业巧林; 范习健; 杨紫颖; 何文妍; 母园
Original assignee: Nanjing Forestry University
Current assignee: Nanjing Forestry University
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2021-07-16

Abstract

The invention discloses a plant fine granularity identification method based on a bilinear convolutional neural network, which comprises the following steps: s1, extracting coarse-grained features by using a Relu activation function to obtain coarse-grained features X1; s2, acquiring integral representation of the bilinear model; s3, in the step S2, bilinear fusion is carried out on the features extracted by the A, B two feature extraction functions in the bilinear model, and a matrix b is obtained; s4, summing and pooling the matrix b to obtain a matrix xi, and carrying out multi-dimensional vector expansion on xi to obtain a feature vector x; s5, carrying out matrix normalization and L2 normalization on the feature vector x to obtain bilinear features, the method is scientific and reasonable in structure and safe and convenient to use, effectively extracts the fine differences among high-flux plant expression species and even among individuals by using a bilinear model, fully exerts the feature extraction advantages of bilinear pooling on fine-grained images, and achieves higher plant fine-grained identification precision.

Description

Plant fine-grained identification method based on bilinear convolutional neural network

Technical Field

The invention relates to the technical field of plant identification algorithm improvement, in particular to a plant fine-grained identification method based on a bilinear convolutional neural network.

Background

In recent years, with the development of deep learning in the field of computer vision, a deep Curl Neural Network (CNN) processing images of two-dimensional natural scenes becomes one of research focuses of common concern at home and abroad, and vast agriculture and forestry research workers also apply a deep learning technology represented by the CNN to plant phenotype research, thereby opening an intelligent research era of plant phenotype, displaying related research, researching a classification network based on a deep learning method on the basis of acquiring a large amount of high-quality plant image data in real time, and providing precision, texture, color and other morphological characteristic phenotypes for agricultural producers at high efficiency and low cost, Reliable plant classification recognition results. The deep learning technology represented by CNN has strong feature extraction and modeling capability on plant phenotype images, the performance of the plant phenotype image is superior to that of a traditional machine learning method to a great extent, the plant phenotype image becomes a common algorithm for phenotype big data analysis, vein texture features are extracted by Grinblat and the like, three leguminous plants of different varieties are identified through the deep learning algorithm, and the identification rate reaches 96.9% +/-0.2%; liu et al, using multi-feature fusion and an improved deep belief network, improved leaf classification accuracy to 93.9% for 220 different plants; nguyen et al utilize CNN and migratory learning ability to construct a universal crop identification system which adapts to uneven distribution of plants in different regions through a flexible data collection mode, and evaluate the classification identification effect of CNN algorithm frameworks constructed by researchers such as AlexNet and VGG under different organ characteristics of crops. The above-mentioned methods all have in common that a deep learning model that processes images of natural scenes is directly used for images of plants, but there are the following problems in practical applications:

1. according to the method, the plant image is not specially analyzed, so that unique characteristics existing in the plant image are ignored, external interference factors such as posture, illumination, shielding and complex background exist in data acquisition, different samples belonging to the same type of image have large difference, the plant has growth variation, the characterization difference of different stages of plant growth is further obvious, and the fine granularity characteristic of 'intra-class difference' is formed;

2. the detailed division of biological subclasses or subclasses exists in plants, and the subclasses are similar to each other in a certain biological form, so that the fine-grained identification problem of 'similarity between classes' is caused;

3. most of domestic and foreign related researches are only suitable for identifying coarse granularity of plants with large intermediate differences, can not effectively solve the problem of identifying fine granularity caused by direct inter-species similarity, and can not meet the requirements of fine agriculture;

therefore, a plant fine-grained identification method based on a bilinear convolutional neural network is urgently needed to solve the problems.

Disclosure of Invention

The invention aims to provide a plant fine-grained identification method based on a bilinear convolutional neural network, so as to solve the problems in the background technology.

In order to solve the technical problems, the invention provides the following technical scheme: a plant fine-grained identification method based on a bilinear convolutional neural network comprises the following steps:

s1, extracting coarse-grained features by using a Relu activation function according to a VGG 16-based bilinear model feature extractor to obtain coarse-grained features X1;

s2, acquiring an integral representation of the bilinear model, pre-training a network of the bilinear model, initializing a basic network, transferring the initialized basic network to a data set for training, and extracting fine-grained features and updating model parameters;

s3, in step S2, bilinear fusion is carried out on the features extracted by the A, B two feature extraction functions in the bilinear model, and an M multiplied by N dimensional matrix b is obtained;

s4, carrying out summation pooling on the matrix b obtained in the step S3 to obtain a matrix xi, and carrying out multi-dimensional vector expansion on xi to obtain a feature vector x;

s5, according to the feature vector x obtained in the step S4, performing moment normalization and L2 normalization on the features to obtain bilinear features, and performing a chain derivation method on the bilinear features to obtain a gradient expression of the loss function to the feature vector;

further, in step S1, globally using a VGG16 model with a convolutional layer of 3 × 3 and a pooling layer of 2 × 2, simplifying the calculation by using a Relu activation function, defining the whole feature extraction function as f (·), and using a pre-network based on a VGG16 bilinear model to obtain a coarse feature X1:

X₁＝H_vgg(x，y，{W₁，b₁，δ_relu})

wherein H_vggIs the convolution layer of VGG16, and (x, y) is the input characteristic parameter of the image, W₁Weight parameters requiring iterative updating for the network model, b₁For linear bias, δ is the relu activation function;

further, the input characteristic parameters (x, y) represent the input of the first layer, x is an RGB three-channel value of the image, and y is the numeralization of the classification label;

further, the weight parameter W and the linear bias parameter b are updated according to a random gradient descent method, the gradient is obtained according to a chain derivation method, and the weight parameter W and the new formula are as follows:

the linear bias parameter b is updated by the formula:

where η represents learning efficiency and δ represents derivation according to a chainThe gradient obtained by the method is used as a rule,

an ith neuron output indicating that the l-1 layer has used an activation function;

further, in step S2, the bilinear model is expressed as a whole: b ═ f_A，f_BP, C), wherein f_A，f_BAs a feature function for mapping the image I to the location l to obtain f_A(l，I)∈R^c×M、f_B(l，I)∈R^c×NWherein P is a pooling layer for sampling the feature layer, and C is a classification function;

further, in step S3, the feature function f is set_A(l，I)∈R^c×MAnd f_B(l，I)∈R^c×NCarrying out bilinear fusion multiplication at the same position to obtain an M multiplied by N dimensional matrix b:

further, in step S4, summing and pooling are adopted, the matrix b at all positions is accumulated according to the following formula to obtain a matrix xi, and the matrix xi is subjected to multidimensional vector expansion to obtain a feature vector x:

x＝vec(ξ(I))；

further, in step S5, the moment normalization and the L2 normalization are combined to obtain the final feature, and fine-grained classification is performed on the final feature, where the normalized feature description is represented as z, and z is obtained according to the following formula:

furthermore, the bilinear model extracts the image features, wherein c is 1, and M and N are featuresThe number of channels represents two characteristics in the bilinear model as vectors respectively

(Vector)

Obtaining a matrix from the matrix accumulation

Denote A as [ a ]₁，...，a_L]∈R^M×LAnd B is represented as B ═ B₁，...，b_L]∈R^N×LAnd generating xi by using a matrix relation, wherein xi is expressed as:

ξ(I)＝AB^T∈R^M×N；

further, a characteristic function f_AAnd a characteristic function f_BThe output tensor is used for obtaining M multiplied by N dimensions, the position L is M multiplied by N position points, each position point obtains P multiplied by P dimensions after bilinear transformation, the feature vector of PP multiplied by 1 is obtained after accumulation and pooling, back propagation is carried out through a chain derivation rule for finishing end-to-end training, and the bilinear feature after accumulation and pooling is expressed as x-A^TB, the gradient of the loss function to the feature vector is expressed as

Where P is the pooling function.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention uses the bilinear model to effectively extract the subtle differences among the expression species of the high-flux plants and even among individuals, and the plant fine-grained identification model based on the transfer learning, the model constructs an identification model which can effectively represent the tiny local differences of different categories and capture the shared information of the same category to mine the knowledge structure of the plant image, designs a bilinear local feature descriptor, constructs a plant phenotype fine-grained identification model based on the bilinear CNN, fully exerts the feature extraction advantages of bilinear pooling on the fine-grained image, achieves higher plant fine-grained identification precision, enhances the robustness of a network model, and is compared with the previous plant coarse-grained identification method only suitable for the larger interspecies differences, the fine-grained identification method can effectively solve the problem of fine-grained identification caused by similarity among plant species, and realizes effective identification of plant images with fine-grained characteristics.

2. The bilinear model adopted by the invention is minimized in the aspect of training loss, has fast convergence, solves the problem that the traditional coarse-grained deep learning network is difficult to adapt to a plant fine-grained data set, can effectively improve the identification accuracy and generalization robustness of the model, and can effectively finish the identification of the plant phenotype fine-grained data set.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a method for identifying plant fine granularity based on a bilinear convolutional neural network according to the invention;

FIG. 2 is a schematic process diagram of a plant fine-grained identification method based on a bilinear convolutional neural network according to the present invention;

FIG. 3 is a data sample diagram in an embodiment of a plant fine-grained identification method based on a bilinear convolutional neural network according to the present invention;

FIG. 4 is a diagram of a comparison model recognition result in an embodiment of a plant fine-grained recognition method based on a bilinear convolutional neural network according to the present invention;

FIG. 5 is a comparative model loss curve diagram in an embodiment of a plant fine-grained identification method based on a bilinear convolutional neural network.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example (b): please refer to fig. 1-2: the invention provides the technical scheme that: a plant fine-grained identification method based on a bilinear convolutional neural network comprises the following steps: s1, extracting coarse-grained features by using a Relu activation function according to a VGG 16-based bilinear model feature extractor to obtain coarse-grained features X1;

and S5, performing moment normalization and L2 normalization on the features according to the feature vector x obtained in the step S4 to obtain bilinear features, and performing a chain derivation method on the bilinear features to obtain a gradient expression of the loss function to the feature vector.

In step S1, globally using a VGG16 model with convolutional layer of 3 × 3 and pooling layer of 2 × 2, simplifying the calculation with the Relu activation function, defining the whole feature extraction function as f (·), and using a pre-network based on VGG16 bilinear model to obtain coarse feature X1:

X₁＝H_vgg(x，y，{W₁，b₁，δ_relu})

wherein H_vggIs the convolution layer of VGG16, and (x, y) is the input characteristic parameter of the image, W₁Weight parameters requiring iterative updating for the network model, b₁For linear biasing, δ is the relu activation function.

The input characteristic parameters (x, y) represent the input of the first layer, x is the RGB three-channel value of the image, and y is the numeralization of the classification label.

The weight parameter W and the linear bias parameter b are updated according to a random gradient descent method, the gradient is obtained according to a chain type derivation method, and the weight parameter W and the new formula are as follows:

the linear bias parameter b is updated by the formula:

where η represents the learning efficiency, δ represents the gradient obtained according to the chain-derivative rule,

indicating that layer l-1 has used the ith neuron output of the activation function.

The adopted bilinear model feature extractor is composed of two weak supervision networks, the preposed networks of two fine-grained models can perform coarse-grained feature extraction based on VGG16, the networks are pre-trained on ImageNet to initialize two basic networks, then the networks are migrated to a small Chinese cabbage dataset in the embodiment to be trained so as to perform fine-grained feature extraction and model parameter updating, VGG16 is composed of 5 layers of convolution layers, 3 layers of full connection layers and softmax output layers, the layers are separated by max-poolling maximum pooling, all activation units of the hidden layers adopt Relu functions, the VGG model is a preferred algorithm for extracting CNN features from images, the VGG16 network model provides that the depth increase of a convolutional neural network and the use of a small convolutional kernel have great effect on the final classification recognition effect of the networks, the VGG16 is finely tuned in the embodiment, the final pooling layer and the two full connection layers and the softmax layers are removed, the convolution layers used in the VGG16 are all 3 x 3, the pooling layers are all 2 x 2, gradient calculation and back propagation can be more effectively carried out while calculation is simplified by adopting a relu activation function, gradient extinction and gradient explosion are avoided, and the expression of the relu activation function is

The activation function being artificialThe function run on the neuron of the neural network is responsible for mapping the input of the neuron to the output end, the activation function is used for adding nonlinear factors, the expression ability of the neural network to the model is improved, and the problem which cannot be solved by a linear model can be solved^TThe result of the nonlinear output after x + b, i.e. for the input vector x of the neuron from the neural network of the previous layer, the neuron using the linear rectification activation function will output max (0. w)^Tx + b) to the output of the next layer of neurons, the definition of rele activation function is f (x) max (0, x), the whole feature extraction function is defined as f (·), f_AAnd f_BThe corresponding alignment is f (·).

In step S2, the entirety of the bilinear model is represented as: b ═ f_A，f_BP, C), wherein f_A，f_BIs a feature function for mapping the image I and the position l into dimensional features to obtain f_A(l，I)∈R^c×M、f_B(l，I)∈R^c×NWherein P is a pooling layer for sampling the feature layer, and C is a classification function; common classifiers are based on logistic regression, support vector machines and naive bayes, etc.

In step S3, the feature function f is set_A(l，I)∈R^c×MAnd f_B(l，I)∈R^c×NCarrying out bilinear fusion multiplication at the same position to obtain an M multiplied by N dimensional matrix b:

f_A、f_Btwo feature functions f, which are feature functions for mapping the image I and the position l into dimensional features_A、f_BThe extracted feature dimensions are C M and C N, respectively, the output b of the pooling function P will be an M N matrix, position I is the position in the image, x is_ijThe characteristic value of the ith row and the jth column in the image is represented.

In step S4, summing pooling is adopted, the matrix b at all positions is accumulated according to the following formula to obtain a matrix xi, and the matrix xi is subjected to multidimensional vector expansion to obtain a feature vector x:

x＝vec(ξ(I))。

in step S5, the moment normalization and the L2 normalization are combined to obtain the final feature, and fine-grained classification is performed on the final feature, where the normalized feature description is represented as z, and z is obtained according to the following formula:

in the bilinear model extraction image features, c is 1, M and N are the channel number of the features, and two features in the bilinear model are respectively expressed as vectors

(Vector)

Obtaining a matrix from the matrix accumulation

Denote A as [ a ]₁，...，a_L]∈R^M×LAnd B is represented as B ═ B₁，...，b_L]∈R^N×LAnd generating xi by using a matrix relation, wherein xi is expressed as: xi (I) ═ AB^T∈R^M×N。

Characteristic function f_AAnd a characteristic function f_BThe output tensor is used for obtaining M multiplied by N dimensions, the position L is M multiplied by N position points, each position point obtains P multiplied by P dimensions after bilinear transformation, the feature vector of PP multiplied by 1 is obtained after accumulation and pooling, back propagation is carried out through a chain derivation rule for finishing end-to-end training, and the bilinear feature after accumulation and pooling is expressed as x-A^TB, the gradient of the loss function to the feature vector is expressed as

Wherein P is a pooling function;

at present, the implementation modes of bilinear CNNs are divided into two types, one is multi-Mode Bilinear Pooling (MBP), that is, two features extracted from a unified sample come from two different feature extraction functions, and the other is Homologous Bilinear Pooling (HBP) or Second-order Pooling (Second-order Pooling), that is, two features are extracted through the same feature extractor.

Chain-type derivation law: the input is x and the target value is y, w_fIs a variable in the function f, f ═ f (x, w)_f) The variable w included in each map is used to change the data so that the target becomes smaller_fConducting derivation and calculation

Can pass through

In this example, arabidopsis thaliana was selected as the target of the experiment because it has a good genomic sequence and is an important plant type for plant phenotype research, and this example uses a published depth phenotype data set as the data set of this experiment, which is composed of 4 consecutive top view images of arabidopsis thaliana of different species, SF-2, Cvi, Landsberg (Ler-1) and Columbia (Col-0), respectively, and data samples are shown in fig. 3, different species of arabidopsis thaliana were planted in a substrate that strictly controls environmental conditions such as soil and light, a fixed camera was installed above the plant to take the top view images at a fixed rate, and a data sequence of each plant was constructed with potted images recorded at 12:00pm per day, and each plant acquisition sequence was designed to involve 22 consecutive top view images;

it is an example that 8/11 (1552 total) of each class of arabidopsis data set were randomly used as training sets, and the remaining 582 were used as test sets. On the basis of the partition, a cloud server platform is constructed by taking ubuntu 16.04LTS as a system, the platform carries a dual-Core Intel Core [email protected] x8 processor, a memory 256G and a 4 x 4T solid state hard disk, a video card is an NVIDIA Tesla p40 GPU, a computing cache is 96G, and a deep learning framework pytorch3.0 is adopted;

in order to verify the performance advantages of the bilinear CNN model in plant fine-grained identification, the model trains a plurality of comparison models, including a plurality of coarse-grained deep networks such as VGG16, ResNet18 and DenseNet161, parameters related to the training process include batch _ size, learning rate, activation function, optimization function and iteration times, the training parameters of the bilinear CNN model are set as shown in Table 1, the training parameters of the coarse-grained deep networks are set as shown in Table 2, the four network models are trained for multiple times, and the comparison test results obtained by calculation are shown in Table 3:

Batch-size

learning rate

Activating a function

Optimization function

Iterative function

Training parameter values

32

0.001

relu

SGD

55

TABLE 1

Batch-size

Learning rate

Activating a function

Optimization function

Iterative function

Training parameter values

64

1

relu

SGD

55

TABLE 2

TABLE 3

In table 3, VGG16 represents Visual Geometry Group Networks Visual Geometry Networks Group Networks-16 layers, ResNet18 represents Residual Networks-18 layers, densnet 161 represents Dense Networks density Networks-161 layers, and bilinar-CNN represents Bilinear CNN Networks, and as can be seen from table 3, the recognition accuracy on VGG16 and ResNet18 test sets is low, because coarse-grained Networks easily focus on only significant differences on feature maps and ignore fine differences among classes, the recognition effect is poor, and the average accuracy on densnet 161 test sets is 94.85%, and the network has a deeper network structure, can extract deep-level plant phenotypic features, but does not consider fine differences among different classes. The model adopts the bilinear convolution neural network to carry out fine-grained identification, the accuracy rate on a test set can reach 97.25%, and the model has higher identification accuracy rate compared with a common coarse-grained network.

In order to further compare the classification performance of each network model, the average accuracy and the loss function curve of each model identification are plotted, as shown in fig. 4 and 5 respectively. After 25 updating iterations, the comparison result shows that the loss function trends of all models are reduced and tend to be stable in convergence. The loss functions of VGG16, ResNet18 and DenseNet161 fluctuate greatly in the descending process, which shows that the traditional coarse-grained deep learning network is difficult to adapt to fine-grained plant data sets, the recognition robustness needs to be improved, the model obtains a good result in the aspect of training loss, the whole training loss is minimum, the convergence is fast, the recognition accuracy and the generalization robustness of the model are improved more effectively, and the plant phenotype fine-grained recognition work can be effectively performed.

The working principle of the invention is as follows: the invention discloses a plant fine-grained identification model based on transfer learning, which is used for constructing an identification model capable of effectively representing different types of tiny local differences, capturing the same type of shared information to mine the knowledge structure of a plant image, and providing the plant fine-grained identification model based on transfer learning, wherein coarse-grained feature extraction is carried out based on a VGG16 model, then a network of a bilinear model is pre-trained, a basic network is initialized, and the initialized basic network is transferred to a data set to be trained so as to extract fine-grained features and update model parameters.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A plant fine-grained identification method based on a bilinear convolutional neural network is characterized by comprising the following steps: the identification method comprises the following steps:

2. The plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 1, characterized in that: in step S1, globally using a VGG16 model with convolutional layer of 3 × 3 and pooling layer of 2 × 2, simplifying the calculation with the Relu activation function, defining the whole feature extraction function as f (·), and using a pre-network based on VGG16 bilinear model to obtain coarse feature X1:

X₁＝H_vgg(x，y，{W₁，b₁，δ_relu})

3. The plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 2, characterized in that: the input characteristic parameters (x, y) represent the input of the first layer, x is the RGB three-channel value of the image, and y is the numeralization of the classification label.

4. The plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 2, characterized in that: the weight parameter W and the linear bias parameter b are updated according to a random gradient descent method, the gradient is obtained according to a chain derivation method, and the weightThe heavy parameter W and the new formula are as follows:

the linear bias parameter b is updated by the formula:

5. The plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 1, characterized in that: in step S2, the entirety of the bilinear model is represented as: b ═ f_A，f_BP, C), wherein f_A，f_BIs a feature function for mapping the image I and the position l into dimensional features to obtain f_A(l，I)∈R^c×M、f_B(l，I)∈R^c×NWherein P is a pooling layer for sampling the feature layer, and C is a classification function.

6. The plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 1, characterized in that: in step S3, the feature function f is set_A(l，I)∈R^c×MAnd f_B(l，I)∈R^c×NCarrying out bilinear fusion multiplication at the same position to obtain an M multiplied by N dimensional matrix b:

7. the plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 1, characterized in that: in step S4, summing pooling is adopted, the matrix b at all positions is accumulated according to the following formula to obtain a matrix xi, and the matrix xi is subjected to multidimensional vector expansion to obtain a feature vector x:

x＝vec(ξ(I))。

8. the plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 1, characterized in that: in step S5, the moment normalization and the L2 normalization are combined to obtain the final feature, and fine-grained classification is performed on the final feature, where the normalized feature description is represented as z, and z is obtained according to the following formula:

9. the plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 6, characterized in that: in the bilinear model extraction image features, c is 1, M and N are the channel number of the features, and two features in the bilinear model are respectively expressed as vectors

(Vector)

Obtaining a matrix from the matrix accumulation

10. The plant fine-grained identification method based on the bilinear convolutional neural network as claimed in claim 9, wherein: characteristic function f_AAnd a characteristic function f_BThe output tensor is used for obtaining M multiplied by N dimensions, the position L is M multiplied by N position points, each position point obtains P multiplied by P dimensions after bilinear transformation, the feature vector of PP multiplied by 1 is obtained after accumulation and pooling, back propagation is carried out through a chain derivation rule for finishing end-to-end training, and the bilinear feature after accumulation and pooling is expressed as x-A^TB, the gradient of the loss function to the feature vector is expressed as

Where P is the pooling function.