CN117036370A

CN117036370A - Plant organ point cloud segmentation method based on attention mechanism and graph convolution

Info

Publication number: CN117036370A
Application number: CN202310704110.5A
Authority: CN
Inventors: 马韫韬; 蔡智博; 朱晋宇; 郭焱; 李保国
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2023-06-14
Filing date: 2023-06-14
Publication date: 2023-11-10

Abstract

A plant organ point cloud segmentation method based on an attention mechanism and graph convolution belongs to the technical field of three-dimensional point cloud instance segmentation. The method comprises the steps of dividing a network TRGCN based on a point attention mechanism and a double-branch parallel example of space diagram convolution, directly inputting a three-dimensional point cloud, respectively focusing on local feature extraction and global feature extraction by double branches, and fusing the two features through a T-G feature coupling layer. The method takes five plant point cloud data of tomatoes, corns, tobaccos, sorghum and wheat as research objects, and the dual-branch parallel neural network architecture TRGCN can capture local characteristics and global characteristics of the point cloud at the same time, is used for training a high-robustness example segmentation model, can improve the segmentation precision of the plant point cloud, has good generalization capability, and can provide good data support for rapid, efficient and accurate plant phenotype analysis.

Description

Plant organ point cloud segmentation method based on attention mechanism and graph convolution

Technical Field

The invention belongs to the technical field of three-dimensional point cloud instance segmentation, and particularly relates to a double-branch parallel plant organ point cloud segmentation method based on an attention mechanism and space diagram convolution.

Background

With the popularization of laser radar equipment and the advent of various consumer-level depth sensors, point cloud data is increasingly being used in various fields, such as robots, autopilot, city planning, and the like. In phenotypic studies, three-dimensional point clouds have become the most directly effective data form for studying plant structure and morphology as a real-world low-resolution representation. Many studies have employed three-dimensional structures of plants for organ segmentation, monitoring growth vigor, and evaluating varieties, etc. Points in the three-dimensional coordinate system serve as the most basic units constituting the point cloud, are similar to pixel points in a two-dimensional picture, but can accommodate more high-dimensional semantic information. In phenotypic research, the morphological structure of plant organs is an intuitive and important character, and can reflect the adaptability of plants to external conditions and the growth condition, such as photosynthesis efficiency, water absorption efficiency and the like. The plant organ point cloud segmentation refers to the process of semantically dividing plants according to different organs (such as stems, leaves, fruits and the like), is the basis for deep understanding of point cloud data later, has important significance for understanding the functional structure of the plants, and is a challenging research direction at present.

The traditional plant point cloud segmentation algorithm needs to manually perform feature description in advance, the segmentation process is complex and complicated, and along with the arrival of big data age, the traditional processing method is difficult to meet the requirement of rapid and accurate analysis. Therefore, the demand for automated segmentation methods is increasing. With the rapid increase of the performance of the computer graphics processor, as a leading technology of artificial intelligence, deep learning has been successfully used for solving various two-dimensional vision problems. However, due to the disorder and complexity of the point cloud in spatial summary, the application of the deep learning method on the point cloud also faces many challenges. The Convolutional Neural Network (CNN) with good performance in the visual segmentation task extracts the features through shared kernel convolution, so that the model efficiency is improved, and the inherent translational invariance of the CNN enables the control of the local features to be more accurate. However, CNNs themselves are typically relatively small in receptive field, relatively weak in capturing ability for global features, and cannot directly act on the original point cloud data. Another neural network structure applied to point cloud data is a graph roll-up network (GCN), which treats each individual point in the point cloud as a vertex in the graph data structure, and can extract local features by performing a convolution-like operation directly in the point cloud data. The transducer with outstanding performance in the natural language processing field can capture global features equally well, and the core thought Attention (Attention) mechanism is also very suitable for processing point cloud data. These deep learning methods have achieved satisfactory segmentation results on many common point cloud data sets, revealing the effectiveness of the deep learning method for point cloud data segmentation.

However, the structural complexity of the plant point cloud results in a relatively large amount of semantic information that needs to be identified in the organ segmentation task. When the point cloud is acquired, the shielding problem among the blades often causes the loss of part of the point cloud, and the problems of holes and sparseness occur. In addition, the similarity between plant organs is high, and different leaf examples often have the same color, morphological structure, texture and other characteristics, and the highly repeated characteristics are not friendly to learning of the neural network. Finally, the plants of different varieties have different geometric characteristics, and even the plants of the same variety have different phenotypic characteristics under different growing environments, even large differences can be generated, so that the requirement on the generalization capability of the network is high. In summary, the current segmentation accuracy of the plant point cloud cannot meet the requirements.

Disclosure of Invention

The invention aims to solve the problem of accurate organ segmentation in complex plant point clouds, and provides a double-branch parallel plant organ point cloud segmentation method based on an attention mechanism and a space diagram convolution.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a plant organ point cloud segmentation method based on an attention mechanism and graph convolution, the method comprising:

step one: the feature encoder takes an original point cloud as input, adopts a multi-layer perceptron to map features to a high-dimensional space, adopts a point cloud attention mechanism to extract the features preliminarily, and then inputs initial feature data into a TRGCN block, and the module can be used for cascade superposition to deepen understanding of the high-dimensional features; the feature aggregation layer in the TRGCN block extracts the neighborhood feature and downsamples the point cloud, then enters a double-branch parallel network part, is a local feature capturing branch formed by space graph convolution and a global feature learning branch formed by a point attention mechanism respectively, finally inputs feature data into the T-G feature coupling layer to obtain a target number of point clouds and corresponding high-dimensional abstract features, and the encoder part extracts high-dimensional feature information from the original plant point clouds by superposing the TRGCN block for dividing tasks;

step two: the feature decoder part stacks three cascaded TRGCN blocks, and receives the outputs of the three TRGCN blocks in the encoder respectively, but replaces the feature aggregation layer with the interpolation layer, and the interpolation layer restores the features of the high-dimensional point set to the low-dimensional point set, and still outputs the grouping result of the K nearest neighbor algorithm for two branches of the TRGCN to calculate; for segmentation result prediction, the decoder sets an independent interpolation layer behind the TRGCN block, adopts a single point attention layer to ensure information integrity, and finally adopts a multi-layer perceptron to output a segmentation result of point cloud;

step three: training a network: all experiments in this study were performed on a stand-alone server equipped with a 12-core 20-thread CPU, 64GB memory, and a Nvidia GeForce RTX 3090Ti GPU; the neural network training is carried out by using an independent server, and in the training stage, all plant point cloud segmentation models adopt the same super parameters, wherein the super parameters are specifically as follows: training batch size was set to 32, initial learning rate was set to 0.001, the network was optimized using Adam method for 100 cycles, learning rate was halved every 20 cycles, weight decay was set to 0.0001, momentum was set to 0.9, K value of K nearest neighbor algorithm was set to 12, and feature dimension of point attention layer was set to 256.

Further, the first step specifically comprises:

(1) Feature polymeric layer

The specific process of feature polymerization is as follows: inputting x points with feature dimension, firstly sampling the points at the most distant point by using a random, grouping the point clouds by adopting a K nearest neighbor algorithm, inputting the point clouds into a multi-layer perceptron to aggregate the neighbor point features to a central point, and finally obtaining y points with feature' dimension features by adopting a maximum pooling operation;

the characteristic aggregation layer adopts a K nearest neighbor algorithm to sample and group the input point set; the feature aggregation layer outputs the calculated K neighbor matrix and shares the K neighbor matrix with the subsequent parallel branches;

(2) Local feature capture branching

The branch is constructed based on a dynamic space diagram convolution and is used for extracting local features in an input plant point cloud; firstly, constructing a feature graph G= (V, E) based on a point set V and neighbor information E, and carrying out feature extraction on an input feature space by adopting edge convolution; extracting a certain point x _i The formula of the characteristics is as follows:

f _i ＝？h(x _i ,y _i )

wherein x is _j For point x _i Is one of the neighbor points? And h represents a certain aggregation function and a certain relational operation, respectively; the method comprises the steps that neighbor point features around candidate points are aggregated through a relation operation, so that feature information of the candidate points is obtained, and the relation operation is defined as edge convolution;

the maximum pooling is adopted as an aggregation function, and the specific process is as follows:

conv _i ＝Max(MLP(h(x _i ,x _i -x _j )))

the relational operation h is defined as point x _i ，x _i And its neighbor point x _j Feature difference value and point x of (2) _i Linear combinations between the output values;

(3) Global feature learning branching

The feature is extracted by adopting a vector attention mechanism in a local neighborhood, and the calculation formula is as follows:

wherein x is _j Is the point x _i X is the independent point set in each single plant point cloud, ρ is the regularization function, γ is the mapping function, β is a certain relational operation, which is defined in this study as the difference between the neighborhood point and the point of interest, φ,Alpha is a feature transformation method of a point level, Q, K, V (Query, key, value is a proper noun in an attention mechanism, corresponding to Chinese is query, key and value) values in the self-attention mechanism are respectively obtained, delta is a position coding function, a point attention layer is proposed according to the attention mechanism, and an improved calculation formula is as follows:

(4) T-G feature coupling layer

Through the processing, the feature matrix with two dimensions and identical shapes is obtained: a matrix G with significant local features and a matrix T with complete global features; and (3) inputting the spliced G and T into a feature coupling layer to obtain a target feature matrix:

TG＝Linear(ReLU(Linear(T,G)))

the T-G characteristic coupling layer is designed by adopting two linear layers and one ReLU activation layer, so that the network can learn more important information of each of the two parts of matrixes and combine the two parts of matrixes into a target characteristic matrix.

Compared with the prior art, the invention has the beneficial effects that: the invention designs a brand new dual-branch parallel instance segmentation network TRGCN based on a point attention mechanism and space diagram convolution, which directly inputs three-dimensional point cloud, and the dual branches respectively pay attention to local feature extraction and global feature extraction and fuse the two features through a T-G feature coupling layer. The result shows that TRGCN has excellent performance on different plant point clouds, has higher accuracy than other main stream point cloud segmentation networks, has good generalization capability, and can provide good data support for rapid, efficient and accurate plant phenotype analysis.

Drawings

FIG. 1 is a network architecture diagram of a TRGCN of the present invention;

FIG. 2 is a block diagram of the TRGCN block of the present invention;

FIG. 3 is a schematic diagram of a TRGCN block global feature learning layer of the present invention;

fig. 4 is a graph of the segmentation result of five plant point clouds according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following embodiments, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1:

the present study is based on a point cloud self-attention mechanism and a space diagram convolution, and innovatively proposes a dual-branch parallel network Transformer Graph Convolution Network (TRGCN) designed by adopting an encoder-decoder architecture (fig. 1).

The feature encoder takes the original point cloud as input, adopts a multi-layer perceptron to map the features into a high-dimensional space (defaults to 32 dimensions), and adopts a point cloud attention mechanism to extract the features preliminarily. The initial feature data is then input to a TRGCN module (fig. 2) that can be stacked in multiple cascades to continually deepen understanding of the high-dimensional features. Specifically, the feature aggregation layer in the TRGCN block extracts the neighborhood feature and downsamples the point cloud, and then enters a dual-branch parallel network part, which is a local feature capture branch formed by space diagram convolution and a global feature learning branch formed by a point attention mechanism, respectively. And finally, inputting the characteristic data into a specially designed T-G characteristic coupling layer to obtain the target number of point clouds and corresponding high-dimensional abstract characteristics. The encoder part extracts high-dimensional characteristic information from the original plant point cloud by superposing TRGCN blocks for dividing tasks.

(1) Feature polymeric layer

The feature aggregation layer in the TRGCN block is used for reducing the input point set base number and abstracting the feature vector with high dimension in the process of stacking a plurality of modules. For example, from the original input to the first TRGCN block, the number of points is reduced from N to N/4, and the feature dimension of the point cloud is increased from F to 2F;

the specific process of feature polymerization is as follows: the method comprises the steps of inputting x points with feature dimension, firstly, sampling the points at the most distant randomly, then grouping the point clouds by adopting a K nearest neighbor algorithm, inputting the point clouds into a multi-layer perceptron to aggregate the neighbor point features to a central point, and finally, calculating to obtain y points with feature 'dimension feature by adopting a maximum pooling operation (default y=x/4, feature' =2 x feature).

The feature aggregation layer adopts a K neighbor algorithm to sample and group the input point set. In addition, in order to save the video memory space during training, the layer outputs the calculated K neighbor matrix and shares the K neighbor matrix with the subsequent parallel branches.

(2) Local feature capture branching

The branch is constructed based on a dynamic space graph convolution for extracting local features in an input plant point cloud. First, a feature map g= (V, E) is constructed based on the point set V and the neighbor information E, and feature extraction is performed on the input feature space using edge convolution. Extracting a certain point x _i The formula of the features is as follows:

f _i ＝？h(x _i ,y _i )

wherein x is _j Representative point x _i Is one of the neighbor points? And h represents some aggregation function and some relational operation, respectively. The feature information of the candidate points can be obtained by aggregating the neighbor point features around the candidate points by adopting a relational operation, wherein the relational operation is defined as edge convolution. To enhance local features in a point cloudIt is understood that the present study uses maximum pooling as an aggregation function, as follows:

conv _i ＝Max(MLP(h(x _i ,x _i -x _j )))

the relational operation h is defined as point x _i x _i And its neighbor point x _j Feature difference value and point x of (2) _i Linear combinations between the output values. This choice not only preserves the features of the local point sets that affect each other, but also partially considers the global features of the whole.

(3) Global feature learning branching

As shown in fig. 3, this branch is constructed based on a point cloud attention mechanism, and is very suitable for processing point cloud data, which can be essentially regarded as word vectors embedded in an attention space, and the present study adopts a vector attention mechanism in a local neighborhood to extract features, and the calculation formula is as follows:

wherein x is _j Is the point x _i One of K neighbor points phi,Alpha is a characteristic transformation method of the point level, Q, K, V values in a self-attention mechanism are respectively obtained, delta is a position coding function, rho is a regularization function, gamma is a mapping function, and beta is a certain relational operation, and the difference value between a neighborhood point and a focus point is defined in the study. According to the above attention mechanism, the present study proposes a point attention layer, and the improved calculation formula is as follows:

unlike the common attentional mechanisms, position coding is also added to the alpha function to enhance the understanding of the features. On the basis of the point attention layer, the TRGCN encoder constructs a residual structure in the global feature learning branch. And adding a linear layer before and after the point attention layer, and connecting the final output with the input in a residual way, so that information exchange is promoted, network convergence is accelerated, and possibility is provided for training a deep network.

(4) T-G feature coupling layer

Through the processing, the feature matrix with two dimensions and identical shapes can be obtained: a matrix G with significant local features and a matrix T with complete global features. And (3) inputting the spliced G and T into a feature coupling layer to obtain a target feature matrix:

TG＝Linear(ReLU(Linear(T,G)))

In summary, the TRGCN network feature encoder portion may design a model that accommodates different visual tasks by varying the number of stacks of TRGCN blocks. Fewer TRGCN blocks may be used for lightweight classification networks, while more cascaded TRGCN blocks may be used for finer-grained tasks such as point cloud segmentation and target recognition.

The feature decoder section also stacks three concatenated TRGCN blocks and receives the outputs of the three TRGCN blocks in the encoder, respectively, but replaces the feature aggregation layer with the interpolation layer. Contrary to the feature aggregation layer, the interpolation layer in the decoder restores the features of the high-dimensional point set to the low-dimensional point set, but still outputs the grouping result of the K-nearest neighbor algorithm for the two branch computation of the TRGCN. For example segmentation result prediction, the decoder sets an independent interpolation layer behind the TRGCN block, adopts a single point attention layer to ensure information integrity, and finally adopts a multi-layer perceptron to output a segmentation result of point cloud.

And (5) training a network. All experiments in this study were performed on a separate server equipped with a 12 core 20 thread CPU, 64GB memory and a Nvidia GeForce RTX 3090Ti GPU. In the training stage, the 5-plant point cloud segmentation model adopts the same super parameters, and specifically comprises the following steps: training batch size was set to 32, initial learning rate was set to 0.001, the network was optimized using Adam method for 100 cycles, learning rate was halved every 20 cycles, weight decay was set to 0.0001, momentum was set to 0.9, K value of K nearest neighbor algorithm was set to 12, and feature dimension of point attention layer was set to 256.

Organ example segmentation tests were performed on 5 plant point cloud data, with the highest average cross-over ratio of 86.38% and average accuracy of 88.58%. In order to verify the segmentation capability of the TRGCN, three main stream point cloud segmentation networks are selected to be compared with the TRGCN, among 5 segmentation tasks, the TRGCN has 9 indexes leading other three methods, the optimal precision is obtained in most segmentation tasks, and particularly the precision improvement on sorghum leaves is more obvious, so that the TRGCN is better in treating monocotyledonous plant point clouds. Because the canopy structure of dicotyledonous crops is crowded, the problem of shielding is easy to cause, the segmentation effect of tobacco and tomato point clouds is not as good as that of monocotyledonous crops, but the segmentation effect is still better than that of other three segmentation networks. Specific test results are shown in table 1, and fig. 4 is a graph of the segmentation effect of five plant point clouds.

The invention also adopts sorghum point cloud as a research object to discuss the stacking quantity of TRGCN pooling layers and cascaded TRGCN blocks, and the result shows that the network segmentation performance adopting the maximum pooling is optimal, and the accuracy is about 2% higher than that of the average pooling and summation pooling. When the number of the cascaded TRGCN blocks is 3, the network is optimal in training time and segmentation effect, and higher segmentation accuracy is obtained at the expense of a certain time. The specific test results are shown in tables 2 and 3.

Table 1 is a table showing the comparison of the segmentation accuracy of the TRGCN network of the present invention with other mainstream networks

Pooling layer	Training time (seconds)	Average cross ratio (%)	Average accuracy (%)
				Maximum pooling	2082	78.9292	84.9198
Average pooling	2085	75.7748	80.6104
				Summation pooling	2086	76.3709	82.9647

Table 2 the ablation experiment 1 of the invention, the segmentation effect table of different pooling layers

TRGCN block number	Training time (seconds)	Average cross ratio (%)
			2	1728	73.7120
3	2082	78.9292
			4	2202	75.5498

Table 3 the present invention ablates experiment 2, a table of segmentation effects for different TRGCN block stacking numbers.

Claims

1. A plant organ point cloud segmentation method based on an attention mechanism and graph convolution is characterized by comprising the following steps of: the method comprises the following steps:

step one: the feature encoder takes an original point cloud as input, adopts a multi-layer perceptron to map features to a high-dimensional space, adopts a point cloud attention mechanism to extract the features preliminarily, and then inputs initial feature data into a TRGCN block, and the module can be used for cascade superposition to deepen understanding of the high-dimensional features; the feature aggregation layer in the TRGCN block extracts neighborhood features and downsamples point clouds at the same time, then enters a double-branch parallel network part, is a local feature capturing branch formed by space graph convolution and a global feature learning branch formed by a point attention mechanism respectively, and finally inputs feature data into the T-G feature coupling layer to obtain target number of point clouds and corresponding high-dimensional abstract features;

step three: training a network: a CPU with 12 cores and 20 threads, a 64GB memory and a Nvidia GeForce RTX 3090Ti GPU are arranged on an independent server; the neural network training is carried out by using an independent server, and in the training stage, all plant point cloud segmentation models adopt the same super parameters, wherein the super parameters are specifically as follows: training batch size was set to 32, initial learning rate was set to 0.001, the network was optimized using Adam method for 100 cycles, learning rate was halved every 20 cycles, weight decay was set to 0.0001, momentum was set to 0.9, K value of K nearest neighbor algorithm was set to 12, and feature dimension of point attention layer was set to 256.

2. The method for segmenting the plant organ point cloud based on the attention mechanism and the graph convolution according to claim 1, wherein the method comprises the following steps of: the first step is specifically as follows:

(1) Feature polymeric layer

(2) Local feature capture branching

f _i ＝？h(x _i ,y _i )

conv _i ＝Max(MLP(h(x _i ,x _i -x _j )))

(3) Global feature learning branching

wherein x is _j Is the point x _i X is an independent point set in each single plant point cloud, ρ is a regularization function, γ is a mapping function, β is a difference between a neighborhood point and a point of interest, φ,Alpha is a characteristic transformation method of point level, Q, K, V values in a self-attention mechanism are respectively obtained, delta is a position coding function, a point attention layer is provided according to the attention mechanism, and an improved calculation formula is as follows:

(4) T-G feature coupling layer

TG＝Linear(ReLU(Linear(T,G)))