CN114821804A

CN114821804A - Attention mechanism-based action recognition method for graph convolution neural network

Info

Publication number: CN114821804A
Application number: CN202210547472.3A
Authority: CN
Inventors: 翟晓东; 汝乐; 凌涛; 凌婧
Original assignee: Jiangsu Austin Photoelectric Technology Co ltd
Current assignee: Jiangsu Austin Photoelectric Technology Co ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-07-29

Abstract

The invention discloses an attention mechanism-based action recognition method for a graph convolution neural network, which comprises the following steps of: the method comprises the steps of 1, obtaining video stream data of human body action types to be identified, obtaining human body skeleton type data through a posture estimation algorithm to serve as a human body skeleton data set, 2, constructing a coordination attention module, calculating coordination characteristics generated by limbs and trunk in the human body movement process, obtaining a gravity center matrix with the coordination characteristics, adding the gravity center matrix into the human body skeleton data set, and 3, inputting the human body skeleton data set into a double-flow graph convolution neural network and outputting predicted actions. And adds an importance attention module in the double-flow graph convolution neural network. The action recognition model of the invention can improve the final classification accuracy rate and make the existing double-flow self-adaptive graph convolution model more fit with the action recognition task.

Description

Attention mechanism-based action recognition method for graph convolution neural network

Technical Field

The invention relates to the technical field of video motion recognition, in particular to a motion recognition method of a graph convolution neural network based on an attention mechanism.

Background

In the field of machine learning, motion recognition is a very important task, and many scenes such as automatic driving, human-computer interaction, public safety and the like can be used in daily life, so that the task is paid more and more attention to people. At present, due to the explosive development of machine learning and deep learning in recent years, many motion recognition algorithms with excellent performance are emerged, and the motion recognition algorithm based on the space-time diagram convolution achieves excellent performance.

The theory of human body motion balance is described in such a way that in the process of motion, in order to ensure that a person cannot fall down, the body needs to continuously adjust the posture to keep the position of the gravity center basically unchanged, and particularly for athletes, the person can maintain self balance through actions of arm swinging, leg stretching and the like. For ordinary people, daily behaviors and actions also need to be balanced, and the cooperation of limbs and the trunk is needed to ensure that the people cannot fall down. Therefore, in the process of completing a certain action, the limbs have a roughly fixed movement track, for example, in the action of 'running', when the left foot is stepped forward, the right arm must swing backwards, so that the position of the gravity center of the right arm can be kept unchanged, otherwise, the person is in a risk of falling.

In addition, the importance of different joints is different in different actions of the human body, and the more important joints are often not more than one, so that the existing model cannot pay good attention to the extraction of the characteristics. In addition, due to the fixity of physical connection of human bodies, the graph convolution neural network is often fixed when extracting features, the mutual features between several important joints cannot be better noticed from a global perspective, the joints are often disconnected in most actions, for example, in the action of "clapping hands", from the human skeleton diagram, nodes of two hands are not directly connected and are far away, but the two hands are important components of the action of "clapping hands", and the changes of various features are focused on the two hands.

While skeleton-based motion recognition algorithms have achieved excellent results on open data sets, current algorithms ignore both of these problems.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method and a device for identifying the action of a graph convolution neural network based on a double attention mechanism.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention discloses a method for identifying the action of a graph convolution neural network based on an attention mechanism, which is characterized by comprising the following steps of:

step 1, video stream data of human motion types to be identified are obtained, and human skeleton type data are obtained through a posture estimation algorithm and serve as a human skeleton data set.

Step 2, constructing a coordination attention module, calculating coordination characteristics generated by limbs and trunk in the human body movement process, acquiring a gravity center matrix with the coordination characteristics, and adding the gravity center matrix into a human body skeleton data set;

and 3, inputting the human skeleton data set into a double-flow graph convolution neural network, and outputting a prediction action.

Further, the coordination attention module in the step 2 is used for partitioning the human skeleton, calculating the gravity center matrix and calculating the coordination matrix, so as to obtain the gravity center matrix with coordination characteristics, and the method specifically comprises the following steps:

and 2.1, dividing the human skeleton map into 5 regions according to the structure of the human body, wherein the 5 regions respectively correspond to the head, the left arm, the right arm, the left leg and the right leg to obtain 5 region subgraphs.

Step 2.2, calculating the gravity center point of each area;

the center of gravity point coordinates of each region are calculated using the following formula:

in the formula, w _x Abscissa, x, of the center of gravity of the region _n The abscissa of each joint point in the region is shown, n is the reference number of the joint point in the region, and n is 1,2, …, n. The calculation method of the y coordinate and the z coordinate of the area gravity center point is the same as the above formula.

The resulting center of gravity matrix is shown in the form:

(w ₁ ,w ₂ ,w ₃ ,w ₄ ,w ₅ )

and 2.3, calculating the coordination between every two areas according to the covariance matrix.

The covariance matrix calculation formula is as follows:

w _i 、w _j respectively represent (w) ₁ ,w ₂ ,w ₃ ,w ₄ ,w ₅ ) 1,2 … … 5, as shown in the following form:

wherein cov (-) represents the calculation result of covariance, w _i And w _j The coordinates of the center of gravity point of the area are represented,

represents the average of the coordinates of the gravity point of each area.

Step S2.4, calculating a coordination matrix according to the formula, wherein the form of the coordination matrix is as follows:

step S2.5, 3 groups of coordination matrixes with the size of 5 multiplied by 5 can be obtained according to the step 2.4, and the 3 groups of coordination matrixes are respectively expressed as w _x ,w _y And w _z The 3 sets of matrices can be used to represent the coordination characteristics of the body. Will w _x ,w _y And w _z The compression is carried out in the same size as the dimension of the gravity matrix, the compression method is carried out in a mode of adding column by column, the first column is taken as an example, and the compression method is specifically illustrated as follows:

w ₁ ′＝cov(w ₁ ,w ₁ )+cov(w ₂ ,w ₁ )+cov(w ₃ ,w ₁ )+cov(w ₄ ,w ₁ )+cov(w ₅ ,w ₁ )

s2.6, adding the gravity center matrix and the compressed harmony matrix to obtain the gravity center matrix with harmony characteristics

Wherein,

furthermore, due to the limitation of the human body topological graph, the graph volume model is difficult to learn the relationship among various end nodes, and the nodes are often the important components of the action. In addition, the deep-layer graph volume model is easy to cause the phenomenon of over-smooth characteristics, and the deep-layer model is not suitable to be used. The invention introduces an attention mechanism into the model, acquires the importance characteristics of the nodes in the characteristic diagram and transfers the importance characteristics to the original characteristic diagram. The invention provides an importance attention module which is directly operated on a feature map without using an adjacent matrix when extracting features in importance and can effectively overcome the limitation caused by a graph convolution neural network.

The double-flow graph convolution neural network in the step 3 comprises an importance attention module, and a double-flow graph convolution neural network with the importance attention module is constructed; comprises the following steps

Step 3.1, constructing an adaptive graph convolution module, wherein the adaptive graph convolution module comprises a space graph convolution layer convs, a time graph convolution layer convt, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; in addition, a batch standardization layer and an activation function layer are respectively connected behind the space map convolution layer and the time convolution layer;

step 3.2, building an adaptive graph convolution module with an importance attention module, wherein the adaptive graph convolution module comprises a space graph convolution layer convs, an importance attention module, a time graph convolution layer convt, an importance attention module, an additional random discarding processing Dropout and a residual error connection which are sequentially connected; in addition, a batch standardization layer and an activation function layer are respectively connected behind the two importance attention modules;

step 3.3, constructing J flow module and B flow module

The J flow module and the B flow module have the same structure and respectively comprise a data BN layer, 9 adaptive graph convolution modules and 1 adaptive graph convolution module with an importance attention module, wherein the data BN layer is respectively added in front of the 10 adaptive graph convolution modules so as to standardize input data.

Step 3.4, building a double-flow graph convolution neural network with an important attention module

The double-flow graph convolutional neural network with the importance attention module comprises a J flow module, a B flow module, an importance attention module and two SoftMax layers; inputting a human skeleton data set with harmony characteristics into the dual-flow graph convolutional neural network, processing the human skeleton data set to generate joint data and skeleton data, respectively inputting the joint data and the skeleton data into the J flow module and the B flow module, executing a global average pooling layer after the processing to pool the characteristic mapping of different samples into the same size, respectively passing through the importance attention module and then reaching two SoftMax layers to obtain classification scores of the two flows, and then adding the two scores to obtain a fused score and predict an action label.

Further, the importance attention module in step 3 comprises two convolution layers and a Softmax layer, and the specific data processing process is as follows:

step 3.1.1, the characteristic diagram A is sent into two convolution layers to obtain two new characteristic diagrams B and C,

where nxm is the product of the batch size and the number of people, C represents the number of channels, T represents the number of action frames, and V represents the number of joint points.

Step 3.1.2, recombining the characteristic diagrams B and C

Where D ═ N × M × T × V, TableThe number of feature points on each channel is shown.

Step 3.1.3, perform matrix multiplication operation on the transposes of feature maps B and C and calculate the position attention feature map S by using Softmax layer, here

The calculation formula of the attention feature map is shown in the following form:

s _ji showing the influence of the ith position on the characteristics of the jth position on the characteristic diagram.

Step 3.1.4, the characteristic diagram S is recombined and multiplied by a scale coefficient alpha, and the multiplied value is added with the input characteristic diagram A to obtain the final output characteristic M, wherein

The initial value of α is set to 0, and a larger weight can be learned gradually.

The specific calculation mode is expressed as follows:

E _j it is shown that the feature M for each location is a weighted sum of all location features and the original features. It has a global view and can selectively aggregate context information according to spatial attention.

Further, the spatial map convolution layer convs in step 3.1 is used to learn the topology of the network from the adjacency matrix a in an end-to-end manner _k It was decided to obtain the following form:

in the formula K _v Is set to be 3, W _k Is a weight matrix, A _k Is an N x N contiguous matrix that represents the physical structure of the human body. B is _k Is also an N contiguous matrix, C _k Representing a data correlation graph for learning a unique graph for each sample, and in order to determine whether there is a connection between two adjacent nodes and how strong the connection is, the model uses a normalized gaussian function to calculate the similarity of the two nodes, as shown in the following form:

n in the formula represents the number of all nodes. The model uses dot product to calculate the similarity of two nodes in the embedding space, and the obtained C _k The method is an NxN similarity matrix, and the model normalizes values to be 0-1.

In a second aspect, an embodiment of the present invention provides an attention-based atlas neural network structure, which includes a human skeleton data set generation module, a coordinated attention module, a dual-flow atlas neural network, and a dual-flow atlas neural network.

The human body skeleton data set generation module is used for acquiring video stream data of human body action types to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphs, generating and simultaneously obtaining the coordinate and confidence characteristic of each key node, and generating a human body skeleton data set;

the coordination characteristic extraction module is used for calculating coordination characteristics generated during human body movement, dividing a human body skeleton diagram into 5 partitions, combining the definition of the center of gravity and the definition of a covariance matrix, calculating the interrelation between every two centers of gravity on each partition to obtain a group of coordination characteristic matrixes, and endowing the coordination characteristic matrixes to the original characteristic diagram so that the input data of the model has human body coordination characteristics;

the double-flow graph convolution neural network is used for carrying out action recognition on the human skeleton data set containing the harmony characteristics. The double-flow graph convolution neural network further comprises an importance feature extraction module, more key joints in human motion are extracted in the global visual angle range by using an improved space attention mechanism, the module is directly operated on the feature graph, the limitation of topological graph connection can be overcome, and more important joints for motion composition are extracted from the global visual angle.

The invention has the beneficial effects that: on one hand, the invention is inspired by the theory of human body movement coordination, and designs a module capable of extracting the coordination characteristics generated during the human body movement. On the other hand, aiming at the defects of the existing graph volume model and combining the existing attention mechanism, an improved space attention module is provided, and the module is used for learning more important joint features in the global scope. The action recognition model of the invention can improve the final classification accuracy rate and make the existing double-flow self-adaptive graph convolution model more fit with the action recognition task.

Drawings

FIG. 1 is a schematic view of a human skeletal zone strategy according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of a module for coordinating attention according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an adaptive graph rolling model according to an embodiment of the invention.

Fig. 4 is a schematic diagram of a motion recognition model of a dual attention mechanism-based graph convolution neural network according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of an importance attention module according to an embodiment of the present invention.

FIG. 6 is a diagram of an adaptive graph volume model of a fusion importance attention module according to an embodiment of the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

The action recognition method of the graph convolution neural network based on the attention mechanism comprises the following steps:

step 1, acquiring video stream data of a human body action type to be identified, and processing the imported video stream data by adopting an existing posture estimation algorithm to generate a human body skeleton data set.

Step 2, constructing a coordination attention module, calculating coordination characteristics generated by limbs and trunk in the human body movement process, and adding the coordination characteristics into a human body skeleton data set;

the coordination attention module is a calculation unit and is used for partitioning the human skeleton, calculating a gravity center matrix and calculating a coordination matrix so as to acquire the gravity center matrix with coordination characteristics, and the coordination attention module specifically comprises the following steps:

and 2.1, dividing a human skeleton map into 5 regions according to the structure of a human body, wherein the 5 regions respectively correspond to the head, the left arm, the right arm, the left leg and the right leg to obtain 5 region subgraphs as shown in the figure 1.

And 2.2, calculating the gravity point of each area according to the definition of the gravity point in mathematics and physics. In order to reduce the calculation amount of the model, the module uses the gravity centers of all the areas to calculate the harmony, so that the calculation amount is much smaller than that of the joint which is directly used, and the problem that the nodes of all the parts are inconsistent can be effectively avoided. The center of gravity point coordinates of each region are calculated using the following formula:

in the formula, the abscissa, x, of the gravity center point of the w region _n The abscissa of each joint point in the region is shown, n is the reference number of the joint point in the region, and n is 1,2, …, n. The calculation method of the y-coordinate and the z-coordinate is the same as the above formula.

The resulting center of gravity matrix is shown in the form:

(w ₁ ,w ₂ ,w ₃ ,w ₄ ,w ₅ )

step S2.3, the covariance matrix can be used to calculate the harmony between two regions according to its definition. The invention rewrites the calculation formula of covariance matrix to make X _i 、X _j Are respectively equal to (w) ₁ ,w ₂ ,w ₃ ,w ₄ ,w ₅ ) 1,2 … … 5, as shown in the following form:

cov (. cndot.) in the formula represents the calculation result of covariance, w _i And w _j The coordinates of the center of gravity point of the area are represented,

representing the average of the coordinates of the center of gravity point of each region, the values of i, j may be equal.

Wherein,

and 3, constructing an adaptive graph convolution layer with an important attention mechanism. Fig. 3 is a schematic diagram of an adaptive graph convolution model according to an embodiment of the present invention, fig. 4 is a schematic diagram of a motion recognition model of a graph convolution neural network based on a dual attention mechanism according to an embodiment of the present invention, fig. 5 is a schematic diagram of an importance attention module according to an embodiment of the present invention, and fig. 6 is a schematic diagram of an adaptive graph convolution model with an importance attention module according to an embodiment of the present invention.

Step 3.1, constructing an adaptive graph convolution model with an importance attention module, wherein the adaptive graph convolution model comprises a space graph convolution layer convs, a first importance attention module, a time graph convolution layer convt, a second importance attention module, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; wherein Dropout is set to 0.5; the first importance attention module and the second importance attention module are respectively connected with a batch standardization layer and an activation function layer;

step 3.1.1, convolving the space map to obtain a characteristic map A,

where nxm is the product of the batch size and the number of people, C represents the number of channels, T represents the number of action frames, and V represents the number of joint points. Then the feature map A is fed into two 1 x 1 convolutional layers to obtain two new feature maps B and C,

step 3.1.2, recombining the characteristic diagrams B and C

Where D is (N × M) × T × V, the number of feature points per channel is indicated.

Step 3.1.3, perform a matrix multiplication operation on the transposes of B and C and calculate a position attention feature map S using the Softmax layer, here

Step 3.1.4, the characteristic diagram S is recombined and multiplied by a scale coefficient alpha, and the obtained product is added with the characteristic diagram A to obtain the final output characteristic M, wherein

The initial value of α is set to 0, and can be gradually learned to be largerThe weight of (c). The representation form is as follows:

And S4, constructing a double-flow graph convolution neural network with an important attention module, taking the data set obtained in the step 1 as input data of the double-flow graph convolution neural network, and taking a predicted action label as output data. Two attention modules in step S2 and step S3 are merged into a dual-flow graph convolutional neural network, as shown in fig. 4.

S4.1, building a spatial graph convolution layer convs; for learning the topology of the network from the adjacency matrix A in an end-to-end manner _k It was decided to obtain the following form:

in the formula K _v Is set to be 3, W _k Is a weight matrix, A _k Is an N x N contiguous matrix that represents the physical structure of the human body. B is _k Is also an N contiguous matrix, but B _k The interior value has no specific constraint condition, which means that the graph is completely learned from training data, and based on the data-driven task, the model can completely learn the graph from the target task, considering B _k The value in (2) may be any value that can represent not only the physical structure of the human body but also the connection strength between adjacent nodes. C _k Can represent a data correlation graph which can learn a unique graph for each sample, and in order to determine whether a connection exists between two adjacent nodes and how the connection strength is, the model uses a normalized Gaussian function to calculate the similarity of the two nodes, as shown in the following form：

S4.2, building an adaptive graph convolution module; the self-adaptive graph convolution module comprises a spatial graph convolution layer convs, a temporal graph convolution layer convt, an additional random discarding treatment Dropout and a residual error connection which are sequentially connected; wherein Dropout is set to 0.5; a batch of standardization layers and an activation function layer are respectively connected behind the space map convolutional layers and the time convolutional layers;

s4.3, building an adaptive graph convolution module with an important attention module; on the basis of an original adaptive graph convolution module, an importance attention module proposed in the step S3 is added after the space graph convolution and the time convolution respectively;

s4.4, stacking the self-adaptive graph convolution modules to build a self-adaptive graph convolution network, wherein the model comprises 10 layers in total, 9 self-adaptive graph convolution modules and 1 self-adaptive graph convolution module with an importance attention module, which is provided in the step 4.3, and the output channels of each module are 64, 128, 256 and 256 respectively; adding a data BN layer at the beginning to standardize input data, executing a global tie pooling layer after the 10 th layer is finished to pool feature maps of different samples to the same size, and sending final model output to a SoftMax classifier to obtain a prediction result;

s4.5, constructing a motion recognition model of the graph convolution neural network with a double attention mechanism; and (5) dividing the bone data with the harmony characteristics into joint flow data and bone flow data, respectively inputting the joint flow data and the bone flow data into the model built in the step (S4.4), respectively obtaining SoftMax classification scores of the two flows, then adding the two scores to obtain a fused score, and predicting the action tag.

The invention also provides an attention-based atlas neural network, as shown in fig. 4, the network comprises a human skeleton data set generation module, a coordinated attention module and a double-flow atlas neural network.

the coordination attention module is used for calculating coordination characteristics generated during human body movement, dividing a human body skeleton map into 5 partitions, combining the definition of the center of gravity and the definition of a covariance matrix, calculating the interrelation between every two centers of gravity on each partition to obtain a group of coordination characteristic matrixes, and endowing the coordination characteristic matrixes to the original characteristic map so that the input data of the model has human body coordination characteristics;

the double-flow graph convolution neural network construction module is used for constructing a double-flow graph convolution neural network, wherein joint data and bone data are respectively used as input characteristics of double flows, and a prediction action tag is used as output data.

The double-flow graph convolution neural network comprises an importance attention module, the more key joints of human body movement are extracted in the global visual angle range by using an improved space attention mechanism, the module is directly operated on a characteristic graph, the limitation of topological graph connection can be overcome, and the more important joints for movement composition are extracted from the global visual angle.

Through the network of the second embodiment of the invention, the transmission object is determined through the data containing relation of the whole resume application, and the aim of identifying the human body action in the video stream is achieved. The network provided by the embodiment of the invention can execute the action recognition method based on the self-adaptive graph convolution neural network provided by any embodiment of the invention, and has corresponding functions and beneficial effects of the execution method.

Claims

1. The method for identifying the action of the graph convolution neural network based on the attention mechanism is characterized by comprising the following steps of:

step 1, acquiring video stream data of human motion types to be identified, obtaining human skeleton type data by a posture estimation algorithm as a human skeleton data set,

step 2, constructing a coordination attention module, calculating coordination characteristics generated by limbs and trunk in the process of human body movement, acquiring a gravity center matrix with the coordination characteristics, adding the gravity center matrix into a human body skeleton data set,

2. The method for identifying actions of the attention-based graph convolution neural network according to claim 1, wherein the coordinated attention module in the step 2 is used for carrying out partition, center-of-gravity matrix calculation and coordinated matrix calculation on human bones so as to obtain a center-of-gravity matrix with coordinated characteristics, and specifically comprises the following steps:

step 2.1, dividing a human skeleton map into 5 regions according to the structure of a human body, wherein the 5 regions respectively correspond to a head, a left arm, a right arm, a left leg and a right leg to obtain 5 region sub-maps;

step 2.2, calculating the gravity center point of each area;

in the formula, w _x Abscissa, x, of the center of gravity of the region _n The abscissa of each joint point in the region is shown, n is the index of the joint point in the region, and n is 1,2, …, n; in addition, the calculation methods of the y coordinate and the z coordinate of the area gravity center point are the same as the above formula;

the resulting center of gravity matrix is shown in the form:

(w ₁ ,w ₂ ,w ₃ ,w ₄ ,w ₅ )

step 2.3, calculating the coordination between every two areas according to the covariance matrix;

the covariance matrix calculation formula is as follows:

representing the average of the coordinates of the gravity center point of each area;

step S2.5, 3 groups of coordination matrixes with the size of 5 multiplied by 5 can be obtained according to the step 2.4, and the 3 groups of coordination matrixes are respectively expressed as w _x ,w _y And w _z The 3 groups of matrixes can be used for representing the coordination characteristics of the body; will w _x ,w _y And w _z Compressed into matrix dimension of center of gravityThe compression method is performed in a form of adding column by column, taking the first column as an example, and the compression method is specifically illustrated in the following form:

Wherein,

3. the method for identifying an action of a graph convolution neural network based on an attention mechanism according to claim 1, wherein the dual-flow graph convolution neural network in the step 3 comprises an importance attention module, and a dual-flow graph convolution neural network with the importance attention module is constructed; comprises the following steps

step 3.2, building an adaptive graph convolution module with an importance attention module, wherein the adaptive graph convolution module comprises a space graph convolution layer convs, an importance attention module, a time graph convolution layer convt, an importance attention module, an additional random discarding processing Dropout and a residual error connection which are sequentially connected; in addition, a batch standardization layer and an activation function layer are connected behind the two importance attention modules respectively;

step 3.3, constructing J flow module and B flow module

The J stream module and the B stream module have the same structure and respectively comprise a data BN layer, 9 adaptive graph convolution modules and 1 adaptive graph convolution module with an importance attention module, wherein the data BN layer is respectively added in front of the 10 adaptive graph convolution modules to standardize input data;

4. The method for identifying actions of the attention-based graph convolution neural network according to claim 3, wherein the importance attention module comprises two convolution layers and a Softmax layer, and the specific data processing procedures are as follows:

wherein NxM is the product of batch size and number of people, C represents the number of channels, T represents the number of action frames, and V represents the number of joint points;

step 3.1.2, recombining the characteristic diagrams B and C

Wherein D ═ N × M × T × V, denotes the number of feature points on each channel;

s _ji representing the influence of the ith position on the characteristics of the jth position on the characteristic diagram;

The α initial value is set to 0, and a larger weight can be learned gradually;

the specific calculation mode is expressed as follows:

E _j it is shown that the feature M for each location is a weighted sum of all location features and the original features.

5. The method of claim 3, wherein the spatial map convolution layer convs is used to learn the topology of the network from the adjacency matrix A in an end-to-end manner _k It was decided to obtain the following form:

in the formula K _v Is set to be 3, W _k Is a weight matrix, A _k Is an N × N adjacency matrix, which represents the physical structure of the human body; b is _k Is also an N contiguous matrix, C _k Representing a data correlation graph for learning a unique graph for each sample, and in order to determine whether there is a connection between two adjacent nodes and how strong the connection is, the model uses a normalized gaussian function to calculate the similarity of the two nodes, as shown in the following form:

n in the formula represents the number of all nodes; the model uses dot product to calculate the similarity of two nodes in the embedding space, and the obtained C _k The method is an NxN similarity matrix, and the model normalizes values to be 0-1.