CN115169521A

CN115169521A - Graph neural network interpretation method for keeping prediction sequence and structure dependency relationship

Info

Publication number: CN115169521A
Application number: CN202210884170.5A
Authority: CN
Inventors: 刘群; 张优敏; 李苑; 刘立; 王国胤
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2022-10-11

Abstract

The invention belongs to the technical field of neural network interpretation, and particularly relates to a graph neural network interpretation method for keeping prediction sequence and structural dependence; the method comprises the following steps: acquiring original image data, and inputting the original image data into an image neural network model to be explained to obtain an original prediction result; masking the original graph data by adopting a mask generator to obtain masked subgraphs; inputting the masked subgraph into a graph neural network model to be explained to obtain a masked prediction result; calculating total loss according to the original prediction result and the prediction result after the mask; iteratively updating mask generator parameters according to the total loss optimization mask generator until an optimal mask generator is obtained; masking the original graph data by adopting an optimal mask generator to obtain a masked subgraph, namely an explanation of an original prediction result; the invention can reliably explain the graph neural network and has high practicability.

Description

Graph neural network interpretation method for keeping prediction sequence and structure dependency relationship

Technical Field

The invention belongs to the technical field of neural network interpretation, and particularly relates to a graph neural network interpretation method for keeping a prediction sequence and a structure dependency relationship.

Background

In real life, there is a lot of graph data, including social networks, knowledge graphs, and protein interaction networks, among others. Graph Neural Networks (GNNs) generalize deep neural networks into graphs, learning a representation of the Graph in a continuous embedding space to facilitate downstream tasks. Due to the excellent capability of the GNNs in representing graph data, the GNNs have excellent performance in tasks such as link prediction, graph classification and node classification.

As the popularization of a deep neural network, the GNN is superposed in multiple layers, and a model is constructed by utilizing a nonlinear activation function. Although the complex design architecture ensures its powerful graphical representation capability, it also presents unexplainable problems. GNN models do not design explicit objective functions to interpret their predicted behavior, optimizing the parameters only in an end-to-end manner. Thus, GNN tends to act as a black box. Due to the unexplained mode, the fairness and the reliability of the GNN model cannot be guaranteed, so that the wide application of the GNN in key application is influenced.

For this reason, many researchers have begun to study the interpretability of graphical neural networks. These methods account for a given specific instance by determining the subgraph (subset of features/structures) that contributes most to the prediction of GNNs. Existing perturbation-based graph neural network interpretation methods use a mask generator to generate an interpretation in the form of a subset of edges. Wherein the optimization of the mask generator is guided by the differences in the results when the original graph and the generated subgraph are input to the model to be interpreted. The key to this type of process is two points: the first is to design an objective function for evaluating the difference in the predicted results when using the original graph and the generated graph as input. The second is to design a powerful mask generator that can determine the edges that should be masked or preserved. However, the conventional research method does not consider the sequence consistency problem of the prediction result when designing the objective function, and does not consider the key characteristic of structural dependence in graph data when designing the mask generator, so that the interpretation result cannot form a complete subgraph.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a graph neural network interpretation method for keeping prediction order and structural dependency relationship, which comprises the following steps:

s1: acquiring original graph data, and inputting the original graph data into a graph neural network model to be explained to obtain an original prediction result; the original graph data is graph classification data of emotion analysis, each graph is a sentence, each node in the graph is a word, and the edges are relations among the words;

s2: masking the original graph data by adopting a mask generator to obtain a masked subgraph;

s3: inputting the masked subgraph into a graph neural network model to be explained to obtain a masked prediction result;

s4: calculating total loss according to the original prediction result and the prediction result after the mask;

s5: according to the total loss optimization mask generator, repeating the steps S2-S4 until an optimal mask generator is obtained;

s6: and masking the original graph data by adopting an optimal mask generator to obtain a masked subgraph, namely an explanation of an original prediction result.

Preferably, the process of masking the original graph data by using the mask generator includes:

s21: converting the nodes of the original graph into the edges of the new graph, converting the edges of the original graph into the nodes of the new graph, and establishing edges among the nodes with data streams in the new graph;

s22: inputting the new graph into a graph attention network model for processing to obtain an implicit variable;

s23: and calculating the importance score of each edge by adopting a discrete distributed heavy parameter skill according to the hidden variable, and masking the edge of the original graph according to the importance score of the edge to obtain a masked subgraph.

Preferably, the formula for obtaining the hidden variable is:

wherein, omega represents a hidden variable set,

a model of a network of mask generators is shown,

an adjacency matrix representing the new graph;

the node characteristics of the new graph are represented,

representing parameters of the mask generator.

Further, the formula for calculating the importance score of each edge is:

m _pq ＝C(ω _i ，∈，τ ₂ )＝σ((log∈-log(1-∈)+ω _i )/τ ₂ )

wherein m is _pq Representing the edge e to which the node p and the node q are connected _i The importance score of ω _i Representing the ith hidden variable in the hidden variable set, representing the first parameter by epsilon and tau ₂ Representing the second hyperparameter.

Preferably, the process of calculating the total loss comprises:

s41: calculating an inverse sorting matrix of the prediction result after the mask, and adjusting the sequence of the original prediction result according to the inverse sorting matrix;

s42: inputting the result of the original prediction and adjustment sequence into a Plackett-Luce model to calculate sequencing loss;

s43: calculating the value difference loss between the original prediction result and the prediction result after the mask;

s44: obtaining a mask matrix according to the importance scores of all edges, and calculating a first-order norm of the mask matrix;

s45: the total loss is calculated from the ordering loss, the value difference loss, and the first order norm of the mask matrix.

Further, the formula for calculating the inverse rank matrix is:

wherein,

represents the ith row of the inverse ordering matrix, softmax () represents a normalized exponential function,

indicating the result of the prediction after the mask,

indicates the predicted result

The absolute value of the difference between two probability values in (1) represents the ith row of the matrix, 1 is a column vector of all 1, and tau ₁ Is a temperature parameter and n represents the number of nodes in a graph.

Further, the formula for adjusting the order of the original prediction results is:

wherein,

showing the result of the original prediction Y after adjusting the order,

representing the inverse ordering matrix and Y representing the original prediction result.

Further, a formula for calculating the ranking loss:

wherein L is _PL Which is indicative of a loss of ordering,

representing the results of the original prediction after the Y order adjustment

The value of the ith position of (a).

Further, the formula for calculating the difference loss of the values is as follows:

L _diff ＝||f(A，X，W)-f(M⊙A，X，W)|| ₁

wherein L is _diff Indicating a value difference loss, f (A, X, W) indicates a prediction result obtained by inputting original diagram data into the diagram neural network model, and f (M ^ A, X, W) indicates a prediction result obtained by inputting original diagram data mask into the diagram neural network model.

Further, the formula for calculating the total loss is:

wherein,

denotes the total loss, L _PL Represents a loss of ordering, L _diff Denotes the loss of value difference, L _M Representing the first order norm of the mask matrix.

The invention has the beneficial effects that: the mask generator is optimized, and the optimized mask generator is adopted to process the original image data to obtain the interpretation result of the original image data; compared with the prior art, the method considers the sequence consistency between the original prediction result and the prediction result after the mask and the dependency relationship between edges in the original graph data, can accurately extract complete input subgraphs which have important effects on classification results, and has high accuracy and good reliability on the interpretation result of the original graph data; the method solves the problems that the prior disturbance-based graph neural network interpretation method does not consider the sequence consistency between the original prediction result and the prediction result after the mask, and does not explicitly consider the dependency relationship between edges, and has high practicability.

Drawings

FIG. 1 is a flow chart of a graphical neural network interpretation method of the present invention that preserves prediction order and structural dependencies;

FIG. 2 is a block diagram of a neural network interpretation method of the present invention that preserves prediction order and structural dependencies;

FIG. 3 is a diagram illustrating raw graph data of a sentence according to the present invention;

FIG. 4 is a diagram illustrating a process of converting an original graph into a new graph according to the present invention;

FIG. 5 is a diagram illustrating a process of calculating the loss of ordering in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a graph neural network interpretation method for keeping prediction order and structure dependency relationship, as shown in fig. 1 and fig. 2, the method comprises the following steps:

s1: and acquiring original image data, and inputting the original image data into an image neural network model to be explained to obtain an original prediction result.

Acquiring original graph data, wherein the original graph data is graph classification data of emotion analysis, each graph is a sentence, each node in the graph is a word, and the edges are the relationship among the words; the original graph data is input into a graph neural network to be explained, and an original prediction result can be obtained, wherein the original prediction result is the probability of the current graph being divided into each class, namely the emotion of the sentence. For example, as shown in FIG. 3, FIG. 3 shows the sentence "the y're transforming and tracing palin's book to crop expressions of totalitarianism with Marmadake a protagonst. RESPRCT! "original graph data.

S2: and masking the original graph data by adopting a mask generator to obtain a masked subgraph.

S21: and converting the nodes of the original graph into the edges of the new graph, converting the edges of the original graph into the nodes of the new graph, and establishing the edges among the nodes with the data streams in the new graph.

As shown in fig. 4, the leftmost graph is the original graph, and the rightmost graph is the new graph after conversion. Firstly, an edge e in an original graph is divided ₀ 、e ₁ 、e ₂ 、e ₃ And transforming into nodes of the new graph. Next, edge e in the original image ₂ And e ₀ Shared node v ₁ And edge e ₂ And e ₀ Constructed with a chain structure of data streams, e in the new figure ₂ And e ₀ Between them establishes a side v ₁ . Otherwise, the edge e in the original image ₁ And e ₂ Shared node v ₂ However, they constitute a dataflow-less branch structure, according to which e is present in the new graph ₁ And e ₂ No edge is established between them.

S22: and inputting the new graph into a graph attention network (GAT) model to obtain a hidden variable omega.

Inputting the converted new graph into a graph attention network (GAT) model, obtaining a hidden variable omega through a Relu activation function and a Softmax layer, and calculating the hidden variable according to the formula:

wherein, omega represents a hidden variable set,

a model of a network of mask generators is shown,

an adjacency matrix representing the new graph;

the node characteristics of the new graph are shown,

representing parameters of the mask generator.

S23: and calculating the importance score of each edge by adopting a discrete distributed heavy parameter skill according to the hidden variable, and masking the edges of the original graph according to the importance scores of the edges to obtain a masked subgraph.

The formula for calculating the importance score of each edge is:

m _pq ＝C(ω _i ，∈，τ ₂ )＝σ((log∈-log(1-∈)+ω _i )/τ ₂ )

wherein m is _pq Representing the edge e to which the node p and the node q of the mask are connected _i The importance score of ω _i E omega represents the output value of the ith hidden variable in the hidden variable set, namely the mask generator; epsilon denotes the first hyperparameter, τ ₂ Denotes a second hyperparameter, and ∈ Uniform (0, 1), τ ₂ ∈[0,2]。

According to the importance scores of the edges, a mask matrix M, M can be obtained _pq And for the elements of the p-th row and the q-th column in the mask matrix M, masking the edges of the original graph through the mask matrix to obtain a masked subgraph.

S3: and inputting the masked subgraph into a graph neural network model to be explained to obtain a masked prediction result.

S4: and calculating the total loss according to the original prediction result and the prediction result after the mask.

S41: and calculating an inverse sorting matrix of the prediction result after the mask, and adjusting the sequence of the original prediction result according to the inverse sorting matrix.

As shown in fig. 5, a neural network sorting algorithm is used to calculate an inverse sorting matrix of the masked prediction result, and the formula is:

wherein,

the ith row of the inverse ordered matrix is shown,softmax () represents a normalized exponential function,

indicating the result of the prediction after the mask,

indicating the result of the prediction

Absolute value of difference between two middle probability values

i denotes the ith row of the matrix, 1 is a column vector of all 1, τ ₁ Is a temperature parameter, and n represents the number of nodes in a graph.

When tau is ₁ →0 ⁺ ，

Wherein

To represent

The index of the positions in the reverse order,

is that

The value of the ith bit.

The formula for adjusting the sequence of the original prediction result by adopting the sorting matrix is as follows:

wherein,

showing the result of the original prediction Y after adjusting the order,

S42: the original prediction Y is adjusted to the result after the sequence

Input to Plackett-Luce model to calculate the ranking loss.

The Plackett-Luce model is a probability distribution model, which is a model describing the ordered probability distribution. The Plackett-Lucet model divides the process of solving the sequencing probability into a plurality of stages, the probability of the optimal sequencing of each model parameter in the current stage is solved in each stage, and the optimal model parameter in the current stage is selected according to the probability. And then entering the next stage to continue selection until all stages are finished, and each stage is independent from each other and does not influence each other.

The formula for calculating the ordering loss is:

wherein L is _PL Which is indicative of a loss of ordering,

The value of the ith position of (a).

S43: a value difference loss between the original prediction result and the masked prediction result is calculated.

The formula for calculating the differential loss of the values is:

L _diff ＝||f(A，X，W)-f(M⊙A，X，W)|| ₁

wherein L is _diff Indicating a value difference loss, f (A, X, W) indicates a prediction result obtained when the original graph data is input into the graph neural network model, and f (M ^ A, X, W) indicates a prediction result obtained when the original graph data is input into the graph neural network model after being masked.

S44: a mask matrix, i.e. an importance score matrix of edges, is obtained from the importance scores of each edge. In order to use as few important edges as possible as a result of the interpretation, the mask matrix must be as sparse as possible. Therefore, a first-order norm of the mask matrix is calculated and used as a constraint condition of mask sparsity; the formula for calculating the first order norm of the mask matrix is:

L _M ＝||M|| ₁

wherein L is _M Representing a first order norm of a mask matrix M, M representing a mask matrix with the p-th row and q-th column elements as the importance scores M of the edges _pq 。

Summing the ordering loss, the value difference loss and the first-order norm of the mask matrix to obtain the total loss, wherein the calculation formula is as follows:

wherein,

denotes the total loss, L _PL Represents a loss of ordering, L _diff Denotes the value difference loss, L _M Representing the first order norm of the mask matrix.

The sequencing loss adopted by the invention can ensure the original prediction result Y (including the probability of the input graph being predicted to each category) and the prediction result after the mask

The sequence (the probability that the input image is predicted to each category after the mask is contained) is kept consistent, and the value difference loss can ensure that the probabilities of the corresponding categories of the two prediction results are consistent, so that the prediction probabilities of each category in the prediction results before and after the mask are kept consistent. In the prior art, the original prediction probability of the input graph predicted to the correct type is consistent with the prediction probability after the mask, namely, the prediction probability of one type in the prediction results is consistent, but the other types are not necessarily, so that the loss function optimization mask generator can be used for explaining the original graph data to ensure the consistency of the prediction probability of each type in the prediction results before and after the mask, thereby more accurately simulating the prediction behavior of the original model and ensuring that the explanation result is more accurate.

S5: and (5) according to the total loss optimization mask generator, repeating the steps S2 to S4 until an optimal mask generator is obtained.

And setting iteration times T, repeatedly executing the steps S2 to S4, reversely transmitting the total loss, updating mask generator parameters according to the total loss, optimizing the mask generator, and stopping execution when the iteration times T are reached to obtain the optimal mask generator.

For example, as shown in FIG. 3, the darker part of the diagram, "translating palin's book to crop representations" is the sub-diagram after masking. That is, the part of the content is considered as an important reason for the sentence to be classified into the negative emotion category, and is an explanation of the classification result of the neural network model of the graph; aiming at emotion analysis, the performance of the emotion classification model can be analyzed by acquiring the interpretation result of the emotion analysis model, and the model parameters can be adjusted conveniently to obtain a more accurate emotion analysis result.

It should be noted that, persons skilled in the art can understand that all or part of the processes in the above method embodiments can be implemented by a computer program to instruct related hardware, where the program can be stored in a computer-readable storage medium, and when executed, the program can include the processes in the above method embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-only Memory (rom), a Random Access Memory (RAM), or the like.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A graph neural network interpretation method that preserves prediction order and structural dependencies, comprising:

s2: masking the original graph data by adopting a mask generator to obtain masked subgraphs;

2. The graph neural network interpretation method for preserving prediction order and structural dependency according to claim 1, wherein the process of masking the original graph data with a mask generator comprises:

3. The neural network interpretation method for maintaining prediction order and structural dependency of the graph according to claim 2, wherein the formula for obtaining the hidden variables is:

wherein, omega represents a hidden variable set,

a model of a network of mask generators is represented,

an adjacency matrix representing the new graph;

the node characteristics of the new graph are shown,

representing parameters of the mask generator.

4. The graph neural network interpretation method for preserving prediction order and structural dependency according to claim 2, wherein the importance score of each edge is calculated by the formula:

m _pq ＝C(ω _i ,∈,τ ₂ )＝σ((log∈-log(1-∈)+ω _i )/τ ₂ )

5. The neural network interpretation method of the graph for preserving prediction order and structure dependency relationship according to claim 1, wherein the process of calculating the total loss comprises:

6. The method of claim 5, wherein the formula for calculating the inverse ranking matrix is:

wherein,

indicating the result of the prediction after the mask,

indicates the predicted result

7. The neural network interpretation method of claim 5, wherein the formula for adjusting the order of the original prediction results is:

wherein,

showing the result of the original prediction Y after adjusting the order,

8. The graphical neural network interpretation method for preserving prediction order and structural dependencies of claim 5, wherein the formula for calculating ordering loss is:

wherein L is _PL Which is indicative of a loss of ordering,

The value of the ith position of (a).

9. The method for interpreting neural networks for maintaining prediction orders and structural dependencies according to claim 5, wherein the formula for calculating the difference loss is as follows:

L _diff ＝||f(A,X,W)-g(M⊙A,X,W)|| ₁

10. The neural network interpretation method of the graph for preserving prediction order and structural dependency of claim 5, wherein the total loss is calculated by the formula:

wherein,