CN115346055A

CN115346055A - Multi-kernel width map based neural network feature extraction and classification method

Info

Publication number: CN115346055A
Application number: CN202211013914.2A
Authority: CN
Inventors: 王青旺; 熊豪; 王盼新; 沈韬; 宋健; 汪志锋; 刘英莉
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-11-15

Abstract

The invention relates to a method for extracting and classifying neural network features based on a multi-kernel width map, and belongs to the technical field of artificial neural networks. The method comprises the steps of modeling input data in a graph structure form, and generating a data characteristic matrix and an adjacency matrix; performing linear transformation on the characteristic matrix to generate a linear mapping matrix h; forming a characteristic expansion matrix H', and constructing a characteristic expansion node H ^k (ii) a From the set of enhanced base kernels K ^e In which a kernel function k is selected ^e Benefit fromPerforming enhanced mapping on the feature expansion matrix h' by using the selected kernel function, calculating the attention weight of the graph kernel, constructing the convolution of the space domain graph under the attention mechanism of the graph kernel, and calculating the enhanced matrix h ^e Each enhancement matrix forms an enhancement node Z; extending a feature to a node H ^k The method comprises the following steps of placing the enhanced nodes Z in parallel, and calculating a characterization matrix X through a group of learnable weight weighted aggregation; and finally, inputting the characterization matrix X into a softmax function, and calculating the probability output of each node corresponding to the category in the data to obtain a data classification result.

Description

Multi-kernel width map based neural network feature extraction and classification method

Technical Field

The invention relates to a method for extracting and classifying neural network features based on a multi-kernel width diagram, and belongs to the technical field of artificial neural networks.

Background

A Convolutional Neural Network (CNN) has a significant effect of extracting and classifying structured data features such as images and natural language sequences, and is widely applied to the fields of image classification, semantic segmentation, natural language processing and the like. In some other fields, data in fields such as molecular structure, social network, biological protein-protein network, recommendation system and the like have complex spatial geometric adjacency relation besides the characteristics of the data, and modeling in Graph (Graph) form is required. Graph learning is an important branch in the field of machine learning, and unlike structured data such as images and sequences, a graph has a complex spatial structure, and nodes in the graph have complex connection relationships with each other. Graph learning is therefore a complex task.

In recent years, graph representation learning tasks have attracted increasing research attention. Kipf et al propose Graph Convolutional neural Networks (GCN), which learn and update the node self-characterization by aggregating and converting the information of its original neighbors or based on meta-path neighbors, achieving a very good effect. Petar et al propose Graph Attention Networks (GAT), introduce an Attention mechanism to learn Attention weights of neighbors to themselves, and when computing node tokens, aggregate and convert neighbor token information and update node self tokens by means of the Attention weights. Zhang et al propose a Gated Attention-seeking neural network (GaAN), which is different from the conventional multi-head Attention mechanism, and use a convolution operator network to control the importance of each Attention head, so as to improve the accuracy of model feature extraction and classification.

Research shows that deep characteristic over-smoothing is inevitably caused by the characteristics of a message transmission mechanism followed by the graph neural network, so that when the number of GCN layers is increased, the performance of the model cannot be improved, and even the effect of the model is poorer than that of a traditional non-graph neural network model. Therefore, how to ensure that the GCN does not have negative influence on model training and fitting during over-parameterization is a key scientific problem, the deepening limit of the GCN model is broken through, and the improvement of the nonlinear fitting and characterization capability of the network on data is improved.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method for extracting and classifying characteristics of a neural network based on a multi-kernel width map, which is used for solving the problems of excessive smoothness of characteristics of a depth map neural network and the like, increasing the information capacity of a map neural network model and enhancing the nonlinear fitting capacity of the map neural network to data.

The technical scheme of the invention is as follows: a multi-kernel width map based neural network feature extraction and classification method is characterized in that: firstly, modeling input data in a graph structure form to generate a data characteristic matrix and an adjacency matrix; for each vector v in the data feature matrix _i Linear transformation is carried out to generate a linear mapping vector h _i ；

Constructing an extended base kernel set K, extracting a kernel function from the extended base kernel set, carrying out nonlinear feature mapping on a linear mapping matrix by using the selected kernel function, calculating the attention weight of a graph kernel, constructing space domain graph convolution under the attention mechanism of the graph kernel, and calculating feature vectors by using all the kernel functions in the extended base kernel set K

Stitching to generate feature extension vectors

Constructing a feature expansion matrix H ^k ；

Construction enhancement basis kernel set K ^e Extracting a kernel function from the enhanced kernel set, and expanding the matrix H by using the selected kernel function pair ^k Performing enhanced mapping, calculating the attention weight of a graph kernel, constructing a spatial domain graph convolution under a graph kernel attention mechanism, and calculating a feature enhanced vector

Constructing a feature enhancement matrix

Expanding the features by a matrix H ^k With all the resulting feature enhancement matrices

The characterization matrixes are placed in parallel to obtain a characterization matrix set X, a set of science system weights omega is designed, all the characterization matrixes in the characterization matrix set are subjected to weighted aggregation, and a node characterization matrix is obtained

And mapping the node characterization matrix to a category probability output space through self-attention mechanism feature mapping to obtain the probability output of all categories in tasks corresponding to each node in the data, and obtaining a graph data classification result, thereby obtaining the classification result of the original input data.

The method comprises the following specific steps:

step1: the input data is modeled as graph model G = (V, E), embodied as a data feature matrix and an adjacency matrix, which are input to the network.

Step2: constructing a learnable linear feature transformation matrix W epsilon R ^F×F Characterizing a node as a vector v _i Linear transformation to another feature space to obtain a lineSexual mapping vector h _i Is shown as h _i ＝Wv _i ∈R ^F 。

Step3: design of self-attention mechanism attn R ^F ×R ^F → R, using kernel function to make nonlinear feature mapping, learning an attention weight for target node neighborhood N (i) node

A multi-core attention graph is constructed for the feature mapping mechanism.

Step4: the target point attention weight is normalized by the softmax function, expressed as

For a particular kernel function k selected _m The target point attention weight is expressed as

Step5: constructing the airspace graph convolution operation under the graph core attention mechanism, which is expressed as

Step6: selecting any number of kernel functions from the candidate kernel function group, setting kernel function parameters according to prior knowledge, and constructing a group of extended base kernel sets K = { K = ₁ ,k ₂ ,…,k _m }。

Step7: expanding the attention network data characteristic domain of the graph in a multi-core mode, and extracting a kernel function K from an expanded base kernel set K _m Using the airspace graph convolution operation in Step5 to calculate and obtain a kernel function k _m Feature expansion vector under mapping

Step8: step7 is repeated until all kernel functions in the extended base kernel set K are used to calculate the feature vector

Performing feature domain splicing on all the obtained feature vectors to generate feature extended vectors

Forming feature expansion matrix by all feature expansion vectors

Step9: selecting any number of kernel functions from the candidate kernel function group, setting kernel function parameters according to prior knowledge, and constructing a group of enhanced base kernel sets K ^e ＝{k ₁ ^e ,k ₂ ^e ,…,k _p ^e }。

Step10: network width extension from enhanced base kernel set K in multi-core manner ^e Extracting a kernel function k _p ^e And calculating to obtain a kernel function k by using the space domain map convolution operation in Step5 _p ^e Feature enhancement vector under mapping

The feature enhancement matrix is formed by feature enhancement vectors under all current kernel functions

Step11: step10 is repeated until the enhanced basis kernel set K is used ^e All kernel functions in the system are calculated and form a characteristic enhancement matrix

Expanding the matrix H by the features ^k Is juxtaposed with all the feature enhancement matrices obtained by calculation to form a characterization matrix set

Step12: designing a set of learnable weights Ω = { μ = ₁ ,μ ₂ ,...,μ _q }∈R ^X To satisfy

Weighting and aggregating the characterization matrix set X by the learnable weight omega to obtain a node characterization matrix

Step13: characterizing matrices with nodes

And calculating probability output of all categories in the task corresponding to each node in the data through the feature mapping of the self-attention mechanism to obtain a classification result of the nodes of the graph data, and further obtaining a classification result of the original input data.

Further, the set of candidate kernels described in Step6 and Step9 comprises the following kernels:

linear kernel k (x, x') = x ^T x '+ C, polynomial kernel k (x, x') = (x) ^T x′+C) ^d Sigmoid core k (x, x') = tanh (β x) ^T x' + C), gaussian nucleus

The linear kernel comprises kernel function parameters C, the polynomial kernel comprises kernel function parameters C and d, the sigmoid kernel comprises kernel function parameters beta and C, and the Gaussian kernel comprises kernel function parameters gamma.

Wherein x and x' are feature vectors with the same dimension, and when the basic kernel set is constructed, the same kernel function and different kernel parameters correspond to different kernel functions in the basic kernel set.

Further, the weighted aggregation in Step12 specifically includes:

let the set of characterization matrices be denoted as X = { X ¹ ,X ² ,...,X ^q H, then for any element in the matrix

The weighted aggregate is calculated as

Further, unlike the method of increasing the depth of the mesh, the feature expansion matrix for expanding the width of the mesh is generated by the graph core attention mechanism in Step7, and the feature expansion matrix H is expanded in Step10 ^k With all feature enhancement matrices

Placed in parallel, weighted aggregation is performed by a set of learnable weights Ω. The method has the advantages that the nonlinear information extraction capability is enhanced by adopting a multi-core learning and width expansion mode, the nonlinear distribution information in the data is mined, the nonlinear information fitting capability of the graph neural network model to the data is improved, and the problem of over-smooth characteristics in the depth graph neural network is avoided.

Further, step6 adopts a graph core attention mode to expand the feature domain, wherein the specific expansion mode is vector splicing, the dimension coordinate is 1, namely column expansion.

The method adopts a width expansion mode to enhance the nonlinear information extraction capability, excavates nonlinear distribution information in the data, and avoids the problem of over-smooth characteristics in the depth map neural network while increasing the nonlinear information fitting capability of the map neural network model to the data.

The invention has the beneficial effects that: the invention introduces a multi-core learning and width learning system to enhance the extraction and fitting capability of the graph neural network on nonlinear information. Compared with the prior art, the method mainly solves the problems of over-smooth characteristics, insufficient capacity and the like of the depth map neural network, increases the information capacity of the map neural network model and enhances the nonlinear fitting capacity of the map neural network to data.

Drawings

FIG. 1 is a flow chart of the steps of the present invention;

FIG. 2 is a diagram of a model network architecture of the present invention;

FIG. 3 is a plot of the true terrain profile of the Tobermoly harbor data set in the examples.

Detailed Description

The invention is further described with reference to the following drawings and detailed description.

Example 1: as shown in fig. 1, a method for extracting and classifying features based on a multi-kernel width map neural network is characterized in that: firstly, modeling is carried out on input data in a graph structure form, and a data characteristic matrix and an adjacent matrix are generated. For each vector v in the data feature matrix _i Linear transformation is carried out to generate a linear mapping vector h _i 。

Stitching to generate feature extension vectors

Constructing a feature expansion matrix H ^k 。

Construction enhancement basis kernel set K ^e Extracting a kernel function from the enhanced kernel set, and expanding a matrix H to the features by using the selected kernel function ^k Performing enhanced mapping, calculating the attention weight of a graph kernel, constructing a spatial domain graph convolution under a graph kernel attention mechanism, and calculating a feature enhanced vector

Constructing a feature enhancement matrix

The node representation matrixes are placed in parallel to obtain a representation matrix set X, a set of discipline system weights omega is designed, all the representation matrixes in the representation matrix set are subjected to weighted aggregation to obtain a node representation matrix

The method comprises the following specific steps:

Step2: constructing a learnable linear characteristic transformation matrix W epsilon R ^F×F Characterizing a node by a vector v _i Linear transformation to another feature space to obtain linear mapping vector h _i Is represented by h _i ＝Wv _i ∈R ^F 。

A multi-core attention map feature mapping mechanism is constructed.

For a particular kernel function k selected _m The attention weight of the target point is expressed as

Step5: constructing airspace graph convolution operation under the graph core attention mechanism and expressing the airspace graph convolution operation as

Step6: selecting any number of kernel functions from the candidate kernel function group, setting kernel function parameters according to prior knowledge, and constructing a group of extended base kernel sets K = { K) = ₁ ,k ₂ ,…,k _m }。

Step7: expanding the attention network data characteristic domain of the graph in a multi-core mode, and extracting a kernel function K from an expanded base kernel set K _m Using the airspace map convolution operation in Step5 to calculate and obtain a kernel function k _m Feature expansion vector under mapping

Performing feature domain splicing on all the obtained feature vectors to generate feature expansion vectors

Forming feature expansion matrix by all feature expansion vectors

Step10: network width extension from enhanced base kernel set K in multi-core manner ^e To extract a kernel function k _p ^e And calculating to obtain a kernel function k by using the space domain map convolution operation in Step5 _p ^e Feature enhancement vector under mapping

Forming a feature enhancement matrix by feature enhancement vectors under all current kernel functions

Step11: repeating Step10 until an enhanced basis kernel set K is used ^e All kernel functions in the system are calculated and form a characteristic enhancement matrix

Step12: designing a set of learnable weights Ω = { μ = { (μ) } ₁ ,μ ₂ ,...,μ _q }∈R ^X Satisfy the following requirements

Step13: characterizing matrices with nodes

The following description of example 1 shows how the present invention is feasible by way of experiment:

1. experimental data

Cora data set: the data set is a citation network used for document classification, 2708 articles are provided in total, 5429 citation connections are provided, the data set is a classic non-European data node classification task data set, nodes and edges in the data set respectively correspond to documents and citation connections, the data set divides the articles into 7 types, and the types are respectively based on cases, genetic algorithms, neural networks, probabilistic methods, reinforcement learning, rule learning and theories. And during composition, using the quoted connection in the data as a standard to construct an adjacency matrix of the data set, and performing a classification experiment.

Tobermori harbor dataset: the data set scene is three-band point cloud data collected by an Optech Titan laser radar in a small harbor located in Tobermoly, canada, the wavelengths are 1550nm, 1064nm and 532nm respectively, and the visualization effect of the data set is shown in FIG. 3. The study area was divided into 9 categories according to the height, material and semantic information of the land cover, bare land, grassland, watertight roads, buildings, trees, water, electric wires, cars and ships, respectively. Using a hyper voxel segmentation method, segmenting 4855839 points of an original point cloud into 24458 hyper voxel points, and extracting 10000 points in a down sampling mode to carry out composition. During composition, gaussian kernel operators are used for measuring the similarity of the sample point cloud in Hilbert space, the morphological characteristics of the samples are described by using the spatial similarity, an adjacency matrix of a data set is constructed, and a classification experiment is carried out.

2. Contents of the experiment

In the experiment, the data sets are classified and verified by adopting the method of the invention and 3 methods. The method of the invention tests the single-core classification effect and the multi-core width extension experiment effect, and the single-core method comprises four common cores: a polynomial kernel, a linear kernel, a sigmoid kernel, and a gaussian kernel; there are 3 methods including graph convolutional neural networks (GCN), graph attention neural networks (GAT), and gated graph attention neural networks (GaAN). In the experiment, the cora data set is divided into a training set by taking 20 nodes in each class as a training set and taking 500 and 1000 nodes as a verification set and a test set respectively, and the nodes for training, verifying and testing share the same graph structure. The Tombrori harbor dataset extracts 5000 nodes, wherein 2000 nodes serve as a training set, 1000 nodes serve as a verification set, 2000 nodes serve as a test set, and the nodes used for training, verifying and testing share the same graph structure. The method network model is realized on a Pythrch, and a computer provided with a 16GB video card is used for carrying out experiments. For the Cora data set, the learning rate is set to be 0.005, the weight attenuation is set to be 0.0005, the drop rate dropout is set to be 0.5, the parameter alpha of the LeakyReLU activation function is set to be 0.2, and the number of model iterations is set to be 500; for the Tobermoli harbor dataset, the learning rate is set to 0.002, the weight attenuation is set to 0.0005, the drop rate dropout is set to 0.6, the parameter α of the LeakyReLU activation function is set to 0.2, and the number of model iterations is set to 700. The classification results were evaluated using classification accuracy, and table 1 shows the overall classification accuracy for different methods under different data sets.

Table 1: overall classification accuracy precision of different methods under different data sets

Experimental results show that the effect of the method on the Cora data set and the Tobermoli harbor data set exceeds that of a single-core method and the existing 3 methods by 1-2 percentage points, and the characteristic extraction and classification method based on the multi-core width diagram neural network is proved to have strong nonlinear fitting capacity and can effectively improve the classification precision.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims

1. A multi-kernel width map based neural network feature extraction and classification method is characterized by comprising the following steps: firstly, modeling input data in a graph structure form to generate a data characteristic matrix and an adjacency matrix; for each vector v in the data feature matrix _i Linear transformation is carried out to generate a linear mapping vector h _i ；

Constructing an extended base kernel set K, extracting a kernel function from the extended base kernel set, carrying out nonlinear feature mapping on a linear mapping matrix by using the selected kernel function, calculating the attention weight of a graph kernel, constructing a space domain graph convolution under the graph kernel attention mechanism, and calculating feature vectors by using all kernel functions in the extended base kernel set K

Stitching to generate feature extension vectors

Constructing a feature expansion matrix H ^k ；

Construction enhancement basis kernel set K ^e Extracting a kernel function from the enhanced kernel set, and expanding the matrix H by using the selected kernel function pair ^k Performing enhanced mapping, calculating the attention weight of the image kernel, constructing the convolution of the space domain image under the attention mechanism of the image kernel, and calculating the characteristic enhanced vector

Constructing feature enhancement matrices

2. The method for extracting and classifying features of the neural network based on the multi-kernel width map according to claim 1, comprising the following specific steps:

step1: modeling input data into a graph model G = (V, E), specifically representing the graph model G as a data characteristic matrix and an adjacent matrix, and inputting the data characteristic matrix and the adjacent matrix into a network;

step2: constructing a learnable linear feature transformation matrix W epsilon R ^F×F Characterizing a node as a vector v _i Linear transformation to another feature space to obtain linear mapping vector h _i Is shown as h _i ＝Wv _i ∈R ^F ；

Step3: design of self-attention mechanism attn R ^F ×R ^F → R, using kernel functions for nonlinear feature mapping, learning an attention weight for the target node neighborhood N (i) node

Constructing a multi-core attention map feature mapping mechanism;

Step6: selecting any number of kernel functions from the candidate kernel function group, setting kernel function parameters according to prior knowledge, and constructing a group of extended base kernel sets K = { K = ₁ ,k ₂ ,…,k _m }；

Step7: expanding the attention network data characteristic domain of the graph in a multi-core mode, and extracting a kernel function K from an expanded base kernel set K _m Using the airspace graph convolution operation in Step5 to calculate and obtain a kernel function k _m Feature extension vector under mapping

Forming a feature expansion matrix from all feature expansion vectors

Step9: selecting any number of kernel functions from the candidate kernel function group, setting kernel function parameters according to prior knowledge, and constructing a group of enhanced base kernel sets K ^e ＝{k ₁ ^e ,k ₂ ^e ,…,k _p ^e }；

Step10: network width expansion from enhanced base core set K in multi-core mode ^e Extracting a kernel function k _p ^e Using the space domain map convolution operation in Step5, a kernel function k is obtained by calculation _p ^e Feature enhancement vector under mapping

Step12: designing a set of learnable weights Ω = { μ = { (μ) } ₁ ,μ ₂ ,...,μ _q }∈R ^X To satisfy

Step13: characterizing matrices with nodes

3. The method for extracting and classifying features based on the multi-kernel width map neural network as claimed in claim 2, wherein the set of candidate kernel functions described in Step6 and Step9 comprises the following kernel functions:

linear kernel k (x, x') = x ^T x '+ C, polynomial kernel k (x, x') = (x) ^T x′+C) ^d Sigmoid core k (x, x') = tanh (β x) ^T x' + C), gaussian kernel

The linear kernel comprises kernel function parameters C, the polynomial kernel comprises kernel function parameters C and d, the sigmoid kernel comprises kernel function parameters beta and C, and the Gaussian kernel comprises kernel function parameters gamma;

4. The method for extracting and classifying features based on the multi-kernel width map neural network as claimed in claim 2, wherein the weighted aggregation in Step12 is specifically:

set of characterization matrices expressed as X = { X ¹ ,X ² ,...,X ^q For any element in the matrix

The weighted aggregate is calculated as