CN117671666A - Target identification method based on self-adaptive graph convolution neural network - Google Patents

Target identification method based on self-adaptive graph convolution neural network Download PDF

Info

Publication number
CN117671666A
CN117671666A CN202311663122.4A CN202311663122A CN117671666A CN 117671666 A CN117671666 A CN 117671666A CN 202311663122 A CN202311663122 A CN 202311663122A CN 117671666 A CN117671666 A CN 117671666A
Authority
CN
China
Prior art keywords
adaptive
self
graph
features
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311663122.4A
Other languages
Chinese (zh)
Inventor
刘雪莲
卫烘州
王春阳
孙劭禹
席贯
丁跃洋
施春皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Technological University
Original Assignee
Xian Technological University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Technological University filed Critical Xian Technological University
Priority to CN202311663122.4A priority Critical patent/CN117671666A/en
Publication of CN117671666A publication Critical patent/CN117671666A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

A target identification method based on self-adaptive graph rolling neural network relates to the field of machine learning computer vision and artificial intelligence, and solves the problems that in the prior art, under a standard self-attention mechanism, three-dimensional point cloud data are applied to a transducer, so that suboptimal solution exists; the self-adaptive nearest neighbor graph feature interaction module uses MLP to conduct linear extraction and transformation on the features, interaction among the features is achieved in a dot product mode, and representation of target local information can be achieved; and finally, establishing a nearest neighbor graph by a self-attention mechanism transducer module to connect the center point and the corresponding local neighborhood feature, wherein the method enhances the interaction between the local and global features and can realize accurate training learning and target identification of the target point cloud.

Description

Target identification method based on self-adaptive graph convolution neural network
Technical Field
The invention relates to the field of machine learning computer vision and artificial intelligence, in particular to a target identification method based on a self-adaptive graph convolution neural network.
Background
Currently, three-dimensional point cloud identification is a hot research direction in the field of computer vision, and great progress has been made in recent years. The practical application of 3D point clouds in autopilot, computer graphics and 3D scene understanding makes the task of 3D point cloud identification increasingly important.
In 3D shape representations recently used for deep learning based recognition tasks, including meshes, voxels (or volumetric meshes) and hidden functions, point cloud representations remain viable choices for representing 3D shapes due to their flexibility and effectiveness in computation and modeling. However, the use of point cloud representation in downstream tasks presents some special challenges, such as the following:
1) Unlike voxel-based volumetric representations, where the point cloud has a fixed grid structure, 3D convolution can be readily applied, the point cloud being a substantially unordered collection of discrete points;
2) Each sampling point only carries three-dimensional coordinate information (x; y; z) has no rich explicit characterization. To understand the overall shape well, and the portion of the object represented by the point cloud, the "correct" and "information rich" features of individual points are extracted by their relationship to neighboring points (contexts).
Transformer is an emerging machine learning model that has undergone explosive development in terms of natural language processing and computer vision. Unlike convolutional neural networks that operate in a fixed image grid space, the attention mechanism in a transducer involves positional embedding in individual markers, which themselves are already unordered. This makes the transducer a viable representation and computational framework for three-dimensional point cloud recognition. However, under standard self-attention mechanisms, common transgenes lack local structures that effectively capture points in the point cloud, which tend to process information in a global scope, ignoring local details; and because the point cloud contains a large number of points (e.g., thousands or tens of thousands), the conventional transfomer is dealing with such large-scale data, because the computational complexity of its self-attention mechanism is square-scale, resulting in the occurrence of defects in the transfomer that provide sub-optimal solutions in the point cloud classification task.
Therefore, the invention provides a target identification method based on an adaptive graph roll-up neural network transducer (Adaptive GraphFeature ConvolutionalNeural Network andTransformer, AGF-transducer).
Disclosure of Invention
The invention provides a target identification method based on a self-adaptive graph convolution neural network, which aims to solve the problems of suboptimal solution and the like caused by the fact that three-dimensional point cloud data are applied to a transducer under a standard self-attention mechanism in the prior art.
The target recognition method based on the self-adaptive graph convolution neural network is realized through a constructed self-adaptive network model, wherein the self-adaptive network model comprises a self-adaptive graph convolution module, a self-adaptive nearest neighbor graph characteristic interaction module and a self-attention mechanism conversion module; the specific implementation process is as follows:
step one, carrying out self-adaptive graph convolution to extract the characteristics of a target through a self-adaptive graph convolution module according to node characteristics and coordinates of a graph structure generated by KNN; extracting target features and carrying out feature dimension lifting to obtain point cloud features after dimension lifting;
constructing a KNN image for the point cloud features of the first step by adopting a KNN method through the self-adaptive nearest neighbor image feature interaction module, constructing a point cloud image by searching KNN connection boundary points of each point cloud, and linearly extracting and transforming the point cloud image through MLP to obtain image structural feature information of a target;
and thirdly, inputting the graph structure characteristic information of the target obtained in the second step into a self-attention mechanism transducer module to obtain the global characteristic of the target point cloud, and realizing accurate training learning and target identification of the target point cloud.
The invention has the beneficial effects that: according to the target identification method, firstly, a self-adaptive graph rolling module performs feature dimension lifting on a target; the self-adaptive nearest neighbor graph feature interaction module uses MLP to conduct linear extraction and transformation on the features, interaction among the features is achieved in a dot product mode, and representation of target local information can be achieved; and finally, establishing a nearest neighbor graph by a self-attention mechanism transducer module to connect the center point and the corresponding local neighborhood feature thereof, enhancing the interaction between the local and global features, and realizing accurate training learning and target identification of the target point cloud.
The AGF-transducer network provided by the method can train the network to obtain the local and global characteristic information of the target through the means of graph modeling and transducer association; in addition, the AGF-transducer network connects the center point and the corresponding local neighborhood feature thereof by establishing a nearest neighbor graph, so that the interaction between the local and global features is enhanced, and accurate training learning and target identification of the target point cloud can be realized.
Drawings
FIG. 1 is a diagram of an AGF-transducer-based target recognition network in a target recognition method based on an adaptive graph convolutional neural network according to the present invention;
FIG. 2 is a schematic diagram of a structure of a neighborhood graph constructed using KNN;
FIG. 3 is a schematic diagram of an adaptive graph convolution module;
FIG. 4 is a schematic diagram of an adaptive nearest neighbor graph feature interaction module;
FIG. 5 shows the effect at point x i Is a self-adaptive graph convolution schematic diagram;
FIG. 6 is a schematic diagram of a self-attention mechanism transducer module;
FIG. 7 is a graph showing the effect of a Loss change curve in the network training process;
fig. 8 is an Accuracy change curve effect diagram in the network training process.
Detailed Description
Detailed description of the inventionin the first embodiment, referring to fig. 1 to 8, a target recognition method based on an adaptive graph convolution neural network, the network includes three independent modules: the system comprises an adaptive graph rolling module, an adaptive nearest neighbor graph feature interaction module and a self-attention mechanism transducer module.
The network structure diagram is shown in fig. 1; firstly, constructing a neighborhood coordinate difference (node characteristic of a graph structure) by using KNN through two adaptive graph convolution layers, wherein the neighborhood coordinate difference can be regarded as a characteristic reflecting the relation between nodes in space position, and the corresponding characteristic difference refers to the relation between node characteristics calculated in a network, and meanwhile, the characteristic difference (Deltax i ,Δy i ,Δz i ) Ascending dimensions from 6 dimensions to 64 dimensions of features that fit into the semantic space of the graph structure; secondly, generating neighbor space Features (Neighbors Spatial Features) by combining edge convolution (EdgeConv), and sharing node Features (Graph Features) of the KNN generated Graph structure; finally, the spatial features and the semantic features are subjected to adaptive convolution (adaptive Conv) through adaptive convolution, so that the structural information of the semantic space of each central node (Upated Center Features) is updated. And acquiring more complete global features through a global self-attention mechanism transducer module, and finally acquiring a recognition result through a multi-layer perceptron (MLP).
In this embodiment, the adaptive graph rolling module surrounds the requirement of efficient sensing of the point cloud space geometry and semantic features of the target recognition network, and utilizes the advantage that the point cloud has no structure, and can effectively characterize Gao Weidian cloud data through the graph structure, so that the accuracy of point cloud target recognition can be improved. In the embodiment, an adaptive graph convolution module is adopted, and the module carries out adaptive graph convolution by combining node features and coordinates of a graph structure to extract features of a target, and carries out effective coding by combining self, neighborhood coordinates and semantic features.
First, the input point cloud is expressed asThe corresponding characteristics are defined as follows:
wherein D is dimension, x i Representing the (x, y, z) coordinates of the i-th point. Then, calculating a normal vector graph G of the input point cloud;
G=(V,E) (2)
where v= {1,..n },representing a set of point cloud boundary points. And connecting point cloud boundary points through a K neighbor Search (K-NearestNeighbor Search, KNN Search) algorithm, so as to construct a graph of the input point cloud. The point cloud is converted into a graph structure, so that the extraction and analysis of the characteristics are more convenient. The use of KNN connection point cloud boundary points (point clouds at the edge of the target) is shown in fig. 2.
The specific structure of the adaptive graph convolution module is shown in fig. 3. Firstly, generating a neighbor space feature (Neighbors Spatial Features) by combining edge convolution (EdgeConv), and sharing node Features (Graph Features) of a KNN generated Graph structure; next, the spatial features and the semantic features are adaptively convolved (adaptive Conv) by adaptive convolution, thereby updating the structural information of the semantic space of each central node (Upated Center Features).
The self-adaptive nearest neighbor graph feature interaction module;
in the constructed self-adaptive graph rolling module, the input original point cloud is subjected to co-adaptive convolution of coordinates and features, so that the point cloud features are effectively encoded. But is composed ofIn terms of sparsity of point clouds, the use of coordinate-feature combined adaptive graph convolution alone is not effective in representing significant similar or homogeneous features in space. Therefore, the self-adaptive nearest neighbor graph feature interaction module is designed based on the global thought of vector matrix point multiplication aiming at the sparsity of the point cloud. The module mainly has two core ideas: (1) Nearest neighbor sampling is carried out in the feature space, the traditional operation of carrying out KNN search in a coordinate area is broken through, because the closer the space is to an object, the similarity of features is not represented, and the neighbors closer the feature space are can represent the same object. (2) And (4) performing point multiplication on the features of the nearest neighbor graph structure by utilizing a point multiplication global operation idea, and performing densification operation on important features, wherein the structural schematic diagram of the module is shown in fig. 4. Wherein f i And f j Is a local feature and a global feature, f ij Is a feature of the polymer after the polymerization,is the characteristic obtained by an MLP multi-layer perceptron, f ijg Is the final feature of the training.
1) Modeling a KNN nearest neighbor graph based on a feature space;
in this module, the neighborhood map of KNN is constructed based on the feature space, not the coordinate space. After constructing the neighborhood through the KNN graph, outputting the nearest neighbor feature code to be recorded as delta f ij Acting at point x i An adaptive convolution diagram of (1) is shown in fig. 5.
2) An adaptive graph feature interaction layer;
in order to dynamically adjust parameters of the convolution kernel so as to more accurately capture local geometric or topological characteristics of the input point cloud, the thought of adaptive graph convolution (adaptive Conv) is used for converting input adaptation into an output characteristic channel by introducing middle layers of the multi-layer perceptron. I.e. an adaptive convolution layer is employed for the D-dimensional features of a given input, which layer is capable of generating a new set of M-dimensional features without changing the number of points. Such a dynamic convolution kernel has a higher accuracy in capturing local geometric characteristics than conventional graph convolutions.
In the graph rolling operation, the following will be performedA point is considered as a central node, and its neighboring nodes are represented by the index set N (i) = { j: { i, j) ∈epsilon }. Because of the irregularities of the point cloud, a unified convolution kernel is typically used for all neighboring nodes ε to obtain the geometry information of the graph. However, this approach may not be applicable in all cases, especially when the center node is located at a significant location in the graph, such as an edge or corner. In these special cases, the use of fixed convolution kernels may reduce the geometric expressive power generated by the graph convolution, thereby affecting the performance of the classification task. In order to solve the problems, an adaptive convolution kernel function is provided in the module so as to capture the unique relation between each pair of points, thereby improving the characterization capability of the local characteristics of the point cloud. For each channel of the output (M-dimensional feature), the adaptive graph convolutional layer dynamically uses a point-based feature (f i ,f j ) To generate a kernel function of the graph structure:
where m=1, 2..m represents M output dimensions, each dimension corresponding to a single filter defined by the adaptive graph convolutional layer; Δf ij =[f i ,f j -f i ]Is a definition of the input features of the adaptation convolution kernel; f (f) i And f j Means features in feature vectors or local neighbors; f (f) j -f i Representing the difference between the two features; g m (. Cndot.) the representative feature mapping function employs a multi-layer perceptron to map an input feature to another feature space. The self-adaptive convolution technology integrates global shape information and local feature differences, and introduces a graph structure neighborhood representation based on a feature space; in the design of the self-adaptive graph characteristic convolution kernel, an intermediate characteristic channel is added, and based on MLP as a characteristic mapping function, the required output characteristic is generated in a self-adaptive mode according to the input of the graph structure. The method can densely characterize similar features of the point cloud sparse data, and enhance the significant features of the target, so that the target recognition accuracy is improved.
And convolving the self-adaptive convolution kernel with the corresponding point cloud, and performing convolution operation on the D input channels and the corresponding filter weights to obtain M output dimensions, wherein the M output dimensions are shown in the following formula:
wherein Deltax is ij Is [ x ] i ,x j -x i ];<·,·>Inner product output representing two vectorsSigma is a nonlinear activation function. As can be seen from the middle part of FIG. 5, the M-th adaptive graph convolution kernel +.>And (2) corresponding point->Is a spatial relationship deltax of (a) ij In combination, it means that the size of the convolution kernel should match the size of the dot product, i.e. the feature map described above can be expressed asWherein g m Representing the results after feature mapping. By the method, the spatial position in the input space can be effectively integrated into each layer, and h of each channel is combined with the corresponding relation of the features dynamically extracted by the graph convolution kernel ijm Are superimposed to obtain a junction (x i ,x j ) Edge features between:
finally, a center point x is defined by using an aggregation function on all edge features in the neighborhood i Output characteristics of (c):
where max is a maximum pooling function along the channel direction. The convolution weights of the adaptive graph convolution are defined as:
θ=(g 1 ,g 2 ,...,g M ) (7)
the adaptive graph convolution employed in the present embodiment is based on the respective features (f j -f i ) Generating an adaptive graph convolution kernel, and then convolving the kernelApplied to point pairs (x i ,x j ) Thereby characterizing the spatial geometry in the input space. The parameter sharing mechanism based on the feature space diagram structure convolution kernel sharing is adopted, the adopted self-adaptive design fully utilizes space geometric information, and convolution dimension improvement of the point cloud features is realized.
In this embodiment, the feature interaction module uses MLP to perform linear extraction and transformation on features, and achieves interaction between features through a dot product mode. The global self-adaptive nearest neighbor map feature interaction module adopts a KNN nearest neighbor search method to find out 20 nearest points to construct a KNN map. After the features of these neighboring points are stitched, they are transformed by the linear layer of the MLP. Finally, a maximum pooling is used to select the most representative feature from the 20 nearest neighbor features to iteratively update the feature representation of the graph. The AGCNN-transducer network provided can rapidly and accurately represent the characteristics of the target, so that training learning and accurate identification of the target are realized.
The self-attention mechanism transducer module;
after the self-adaptive graph convolution module and the self-adaptive nearest neighbor graph feature convolution module are adopted, the model can completely represent the graph structure feature information of the target, and in order to further improve the recognition accuracy, a global self-attention mechanism transform module is designed based on the idea of combining local features and global features.
The local characteristics of the point cloud are characterized based on graph convolution, and the method which only depends on the local characteristics can cause the model to have limitation in capturing the target neighborhood characteristics, so that the effective learning of the whole characteristics is affected. To solve this problem, a self-attention mechanism module was introduced. In this module, each point cloud is considered a "semantic unit" and the point embedding is done by its coordinates. The attention layer is then implemented using a self-attention transducer model. The main purpose of point embedding is to ensure that semantically similar points are closer together in the embedding space. By using the self-attention layer, the point cloud can obtain more comprehensive global characterization, so that points with similar characteristics and geometrical attributes have stronger characteristic expression capability. The self-attention transducer is used as a global feature characterization means, long-distance dependency relationship in the point cloud can be captured, and feature characterization capability of the model is further enhanced. In order to realize more comprehensive feature expression, the self-attention module is fused with point cloud features generated through the up-dimension of the self-adaptive graph convolution layer, and non-local self-attention coding is adopted to process multi-level features. Thus, the model not only can understand local information, but also can more effectively represent global features, thereby improving the overall characterization capability. The network layer structure is shown in fig. 6 and equation (8).
Wherein Q, K, V are respectively query, key and value matrix, W q ,W k ,W v Is a shared learnable linear transformation, F in Represents a linear transformation, d a Representing local features, d e Representing global features. First, feature attention weights extracted from the adaptive graph volume are calculated by matrix dot product using the query and key matrix:
the calculated weights are then normalized:
wherein the method comprises the steps ofIs a normalized feature.
Self-attention output feature F sa Is a weighted sum of value vectors using corresponding attention weights:
F sa =A·V (12)
where A is the value vector of the attention weight and V is the set of point cloud boundary points.
The Softmax function and the weighted sum operation are both permutation-invariant operators, and the whole self-attention process keeps the permutation unchanged. Finally, a layer of multi-layer perceptron is transmitted to perform dimension transformation, gradient disappearance and gradient explosion are relieved through residual connection, and the characteristic z after self-attention mechanism is adopted (k+1) The method comprises the following steps:
z (k+1) =z (k) +MLP(A·V) (13)
PointNet and PointNet++ are classical network models in the field of point cloud identification. They use the furthest point sampling strategy to select a representative set of center points from the input point cloud and perform a local search around these center points based on euclidean distance, defining a specific radius or point range. However, since these models use fixed convolution kernels, they focus mainly on learning features on multiple scales and extracting features from local neighbors. In contrast, the AGF-transducer network provided in the embodiment connects the center point and the corresponding local neighborhood feature thereof by establishing the nearest neighbor graph, so that the interaction between the local and global features is enhanced, and accurate training learning and target recognition of the target point cloud can be realized.
A second embodiment is described with reference to fig. 7 and 8, where the second embodiment is an example of a target recognition method based on an adaptive graph convolutional neural network according to the first embodiment.
1. A loss function based on cross entropy;
the loss function is a key concept in machine learning and deep learning, and is used for measuring the difference between model prediction and actual targets. The smaller the value of the loss function, the closer the prediction of the model to the target. The cross entropy loss function is adopted as a loss function, a real label is regarded as actual probability distribution, a prediction result of a model is regarded as predicted probability distribution, and the similarity of the two probability distributions is quantized by using cross entropy:
L=-∑y h *log(p h ) (14)
where h is the index of the category, p h Probability, y, of class h predicted for the model h Is a true tag (y if the sample belongs to category h h 1, otherwise 0).
2. An evaluation method;
in order to verify the performance of the object recognition network according to the first embodiment, the present embodiment performs training test on the public data set point cloud. Evaluation was performed by index class average accuracy (mAcc) and Overall Accuracy (OA):
to ensure fairness during the experiment, all methods involved in the experiment were run in the same hardware environment (PyTorch v2.0.0, RTX 3090 GPU) and were trained and tested using the same data.
3. Model net40 dataset comparison experiments;
to verify the effectiveness of the proposed network, training tests were performed on the ModelNet40 dataset and compared to algorithms such as 3DShapeNetParts, voxNet, subvolume, pointNet, pointNet ++, kd-Net, specGCN, spiderCNN, pointCNN, SO-Net, DGCNN, KPConv, 3D-GCN, pointASNL, and AdaptConv. For the target identification network in the method, setting the value of the neighborhood size k to be 15; setting the batch_size to 32, the drop rate Dropout to 0.5, and the epoch to 450; all modules used LeakyReLU and normalization; the self-attention mechanism layer adopts an SGD optimizer, the momentum of which is set to 0.9, and the initial learning rate is 0.1. The parameter selection mode of the compared method is consistent with the method. The results of the experiment are shown in Table 1, and Table 1 shows the comparison of classification accuracy of different methods on the ModelNet40 dataset.
TABLE 1
The Loss and Accuracy curves of the network training are shown in fig. 7 and 8.
It can be seen that on the ModelNet40 dataset, the proposed method achieves a technical index of class average accuracy of 90.2% and overall accuracy of 92.8%, which is superior to the other algorithms compared. The reason why the method has high performance is that: first, the use of local and non-local feature information allows for better characterization of complex higher-order relationships between data than other methods that do not use local and non-local feature information (e.g., pointNet, MVCNN, etc.). The higher-order relation contains richer information, so that the recognition performance can be better improved by modeling the higher-order relation between targets; and secondly, a self-attention mechanism transducer layer is used as a global feature learner, and the self-attention mechanism transducer can enlarge the scale of the receptive field, so that the characteristics with long-term capturing and discrimination increasing are obtained.
4. Ablation experiments;
and verifying the validity of the target identification network design and the related super-parameter setting based on the method by adopting an ablation learning experimental method.
(1) The effectiveness analysis of the self-adaptive nearest neighbor graph feature interaction module;
in order to verify the effectiveness of the proposed self-adaptive nearest neighbor map feature interaction module, a self-adaptive map convolution module in a network is reserved, coordinates and features are coded and then directly enter a self-attention mechanism transform module, the self-adaptive nearest neighbor map feature interaction module in a network structure is ablated, the network after the self-adaptive nearest neighbor map feature interaction module is recorded as WithoutAGF, and an ablation result is shown in table 2. Table 2 shows the results of the ablation experiments of the feature modules of the adaptive nearest neighbor map. It can be seen that the self-adaptive nearest neighbor map feature interaction module in the AGF-converter network plays an important role in improving the target recognition accuracy.
TABLE 2
(2) A transducer layer;
in order to verify the influence of the number of the attention mechanism transducer modules and the heads (heads) on the identification performance, ablation experiments are carried out on the transducer modules, the results are shown in table 3, and table 3 shows the ablation experiment results of the transducer modules; as can be seen from the results in the table, the recognition accuracy is highest when the number of heads is 1. The core idea of the self-attention mechanism transducer is to calculate the association degree between each input element and all other elements, which means that the point cloud can be simultaneously and locally associated with each point of the global neighborhood, and the feature interaction module of the self-adaptive nearest neighbor graph based on the local feature idea is complementary to the feature interaction module of the self-adaptive nearest neighbor graph, and the characteristic of the global association can enable the model to capture the dependency relationship of the remote features in the point cloud without being limited by the local neighborhood.
TABLE 3 Table 3
(3) Transformer output channel analysis;
the number of output channels has an important impact on the target recognition performance, and a trade-off between average accuracy and overall accuracy is required in selecting the number of output channels. The output channel ablation experiments for the transducer are shown in table 4. Table 4 shows the results of the ablation experiments for the transducer output channels. It can be seen from the table that the average accuracy is highest (90.2%) when the number of output channels is 32. While as the number of output channels increases to 64, 128 and 256, the average accuracy gradually decreases, indicating that higher average and overall accuracy can be achieved with a smaller number of output channels (32).
TABLE 4 Table 4
(4) KNN value analysis;
in order to find the optimal value of KNN, ablation experiments were performed with the values of KNN set to 15, 20, 25, 30, 35 and 40, and the results are shown in table 5, and table 5 shows the results of the ablation experiments for the values of KNN. As can be seen from tables 3 to 5, the overall accuracy of the network is highest when the attention header of the transducer layer takes 1, the output channel takes 32, and the KNN takes 15.
TABLE 5
Through experiments and analysis, the self-adaptive nearest neighbor map feature interaction module and the transducer module are very critical to the accuracy of point cloud target identification. The self-adaptive nearest neighbor graph feature interaction module can automatically adjust feature channels and automatically extract key features in combination with surrounding information. This mechanism helps the model capture complex point cloud structures, reduces the dimensional variation of the point cloud feature convolution, and avoids the over-fitting problem in the up-dimension. The transducer can extract local and global information of the target, so that the transducer can characterize global structural characteristics in the point cloud, and the characteristic characterization of the point cloud is more accurate. In conclusion, the ablation experiments verify the importance of each module in the proposed network.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (5)

1. A target identification method based on an adaptive graph convolution neural network is characterized by comprising the following steps: the method is realized through a constructed self-adaptive network model, and the self-adaptive network model comprises a self-adaptive graph convolution module, a self-adaptive nearest neighbor graph characteristic interaction module and a self-attention mechanism Transformer module; the specific implementation process is as follows:
step one, carrying out self-adaptive graph convolution to extract the characteristics of a target through a self-adaptive graph convolution module according to node characteristics and coordinates of a graph structure generated by KNN; extracting target features and carrying out feature dimension lifting to obtain point cloud features after dimension lifting;
constructing a KNN image for the point cloud features of the first step by adopting a KNN method through the self-adaptive nearest neighbor image feature interaction module, constructing a point cloud image by searching KNN connection boundary points of each point cloud, and linearly extracting and transforming the point cloud image through MLP to obtain image structural feature information of a target;
and thirdly, inputting the graph structure characteristic information of the target obtained in the second step into a self-attention mechanism transducer module to obtain the global characteristic of the target point cloud, and realizing accurate training learning and target identification of the target point cloud.
2. The method for identifying the target based on the adaptive graph convolutional neural network according to claim 1, wherein the method comprises the following steps: in the first step, the self-adaptive graph convolution module comprises two self-adaptive graph convolution layers and an edge convolution layer;
generating node characteristics of the graph structure through the two self-adaptive graph convolution layers by adopting KNN, and simultaneously, increasing the dimension of the characteristic difference of the two self-adaptive graph convolution layers from 6 dimensions to 64 dimensions which are consistent with the semantic space of the graph structure;
generating neighbor space features through the edge convolution layer, and sharing node features of a graph structure generated by KNN;
and carrying out self-adaptive convolution on the neighbor space features and the semantic features through a self-adaptive convolution layer, and updating the structural information of the semantic space of each central node.
3. The method for identifying the target based on the adaptive graph convolutional neural network according to claim 1, wherein the method comprises the following steps: the specific process of the second step is as follows:
step two, constructing a neighborhood graph of the KNN based on the feature space, and outputting nearest neighbor feature codes through the neighborhood graph;
and secondly, introducing MLP to perform linear extraction and transformation on the features through a self-adaptive graph feature interaction layer, and realizing interaction among the features in a dot product mode.
4. A method for identifying an object based on an adaptive graph convolutional neural network according to claim 3, wherein: in the second step, an adaptive convolution kernel function is used, and for each output channel, the adaptive graph convolution layer dynamically uses a point-based feature (f i ,f j ) Is a kernel function of the graph structure:
where m=1, 2..m represents M output dimensions, each dimension corresponding to a single filter defined by the adaptive graph convolutional layer; Δf ij =[f i ,f j -f i ]Input features that are adaptive convolution kernels; f (f) i And f j Means feature vectors or features in local neighborhoods; n (i) is an index set; g m (. Cndot.) the feature mapping function uses a multi-layer perceptron to map the input features to another feature space;
the self-adaptive convolution kernel is adopted to carry out convolution with the corresponding point cloud, and M output dimensions are obtained by carrying out convolution operation on D input channels and corresponding filter weights, and the M output dimensions are expressed as follows:
wherein Deltax is ij Is [ x ] i ,x j -x i ];<·,·>Inner product output representing two vectorsσ is a nonlinear activation function;
h for each channel ijm Are superimposed together to obtain a connection point (x i ,x j ) Edge features between:
finally, a center point x is defined by using an aggregation function on all edge features in the neighborhood i Is provided.
5. The method for identifying the target based on the adaptive graph convolutional neural network according to claim 1, wherein the method comprises the following steps: in the third step, the characteristic z after self-attention mechanism (k+1) The method comprises the following steps:
z (k+1) =z (k) +MLP(A·V)
wherein A is a value vector of attention weight, V is a set of point cloud boundary points, and k is the number of layers.
CN202311663122.4A 2023-12-06 2023-12-06 Target identification method based on self-adaptive graph convolution neural network Pending CN117671666A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311663122.4A CN117671666A (en) 2023-12-06 2023-12-06 Target identification method based on self-adaptive graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311663122.4A CN117671666A (en) 2023-12-06 2023-12-06 Target identification method based on self-adaptive graph convolution neural network

Publications (1)

Publication Number Publication Date
CN117671666A true CN117671666A (en) 2024-03-08

Family

ID=90076551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311663122.4A Pending CN117671666A (en) 2023-12-06 2023-12-06 Target identification method based on self-adaptive graph convolution neural network

Country Status (1)

Country Link
CN (1) CN117671666A (en)

Similar Documents

Publication Publication Date Title
Jia et al. Feature dimensionality reduction: a review
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN110263912B (en) Image question-answering method based on multi-target association depth reasoning
CN110163258B (en) Zero sample learning method and system based on semantic attribute attention redistribution mechanism
CN110728192B (en) High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
CN103605972B (en) Non-restricted environment face verification method based on block depth neural network
Bu et al. Learning high-level feature by deep belief networks for 3-D model retrieval and recognition
CN105138973B (en) The method and apparatus of face authentication
CN105760821B (en) The face identification method of the grouped accumulation rarefaction representation based on nuclear space
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN113191387A (en) Cultural relic fragment point cloud classification method combining unsupervised learning and data self-enhancement
CN103714148B (en) SAR image search method based on sparse coding classification
CN110175248B (en) Face image retrieval method and device based on deep learning and Hash coding
CN110263855B (en) Method for classifying images by utilizing common-basis capsule projection
CN112464004A (en) Multi-view depth generation image clustering method
CN115222998B (en) Image classification method
CN113435253A (en) Multi-source image combined urban area ground surface coverage classification method
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN117746260B (en) Remote sensing data intelligent analysis method and system
CN109002771B (en) Remote sensing image classification method based on recurrent neural network
Zeng et al. Multi-feature fusion based on multi-view feature and 3D shape feature for non-rigid 3D model retrieval
Ahmad et al. 3D capsule networks for object classification from 3D model data
CN108805280B (en) Image retrieval method and device
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN112085066B (en) Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination