CN114896400A

CN114896400A - Graph neural network text classification method based on regular constraint

Info

Publication number: CN114896400A
Application number: CN202210532864.2A
Authority: CN
Inventors: 甘玲; 刘菊; 胡柳慧
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-05-11
Filing date: 2022-05-11
Publication date: 2022-08-12
Anticipated expiration: 2042-05-11
Also published as: CN114896400B

Abstract

The invention relates to a graph neural network text classification method based on regular constraint, which belongs to the field of natural language processing and comprises the following steps: patterning: the method comprises the steps of patterning by adopting a patterning method of textING, adding semantic edges and grammar edges, defining types of different edges, initializing edge type characteristics Ec, and inputting the edge type characteristics Ec into a graph neural network for training; performing word interaction based on the graph neural network: GAT with various regular constraints is adopted to distribute different attention weights to neighborhood nodes to filter edge noise information and guide the attention score distribution to reduce overlapping; text representation: aggregating the word node characteristics into discourse expressions through maximum pooling and average pooling, obtaining the classification result of the text according to the discourse expressions, and defining a loss function to constrain the updating process of the node characteristics. The invention enriches the semantic relevance of grammar among words, improves the long-distance and discontinuous word interaction capability and improves the model expression capability.

Description

Graph neural network text classification method based on regular constraint

Technical Field

The invention belongs to the field of natural language processing, and relates to a graph neural network text classification method based on regular constraint.

Background

The text classification is the most basic technical support of most natural language processing tasks, the workload of manual management, classification and other operations on character resources is huge under the background of information explosion, and the text classification by deep learning can realize efficient and rapid management on massive text information and improve the information retrieval efficiency.

The key of text classification is to mine text context information to obtain accurate semantic representation. Neural networks, represented by TextCNN and TextRNN, although capable of mining text semantics quickly and efficiently, lack long distance and non-continuous word interactions. Recently, Graph neural Networks have been proposed to solve this problem, and Graph Convolution Networks (GCNs) and Graph Attention Networks (GATs) follow the paradigm of neighborhood aggregation, can model the sequence structure and the syntax structure of texts, and flexibly capture the relationships among words, sentences and chapters in the texts. For example, the textGCN constructs a text graph at a corpus level, and the GCN is adopted to convert a text classification task into a semi-supervised node classification task; on the basis, the Text-Level GNN introduces a message passing mechanism to reduce the memory consumption of the TextGCN. However, these methods of transduction Learning (Inductive Learning) are computationally inefficient, and TextING and HyperGAT construct separate text graphs for each text, and use GNN to capture higher-order context information of words, all can effectively perform Inductive Learning (Inductive Learning). Thereafter, DADGNN effectively enlarges the receptive field of the junction through a diffusion mechanism and decouples the GNN propagation process.

The existing text classification method has the following defects: (1) the edge type is single, words only depend on the neighbor to update semantic representation, text grammar and semantic related information are lacked, in addition, different edge types have rich information, but the edge types are not fully utilized in most current models, and the missing edge information is likely to have great influence on the overall tendency of the text. (2) The interference of noise from nodes and edges in the graph structure on the network is ignored, and in addition, as the iteration number of the graph structure is increased, the noise information is multiplied, so that the classification performance is sharply reduced.

Disclosure of Invention

In view of the above, the present invention provides a graph neural network text classification method based on regular constraint, which solves the problem of insufficient text classification performance caused by single edge type and noise interference in a text classification model based on a graph structure.

In order to achieve the purpose, the invention provides the following technical scheme:

a graph neural network text classification method based on regular constraint includes the following steps:

patterning: the method comprises the steps of patterning by adopting a patterning method of textING, adding semantic edges and grammar edges, defining types of different edges, initializing edge type characteristics Ec, and inputting the edge type characteristics Ec into a graph neural network for training;

performing word interaction based on the graph neural network: GAT with various regular constraints is adopted to distribute different attention weights to neighborhood nodes to filter edge noise information and guide the attention score distribution to reduce overlapping;

text representation: aggregating the word node characteristics into discourse expressions through maximum pooling and average pooling, obtaining the classification result of the text according to the discourse expressions, and defining a loss function to constrain the updating process of the node characteristics.

Further, the composition specifically includes:

s11: adding semantic edges on the basis of a text graph G (V, E) constructed only by adjacency relation to capture high-order correlation of words and subject words, firstly mining potential subjects T from the text by using a subject generation model (LDA), and mining each subject T _i ＝(θ ₁ ,…θ _v ) Is expressed by probability distribution on words, wherein v represents the number of words, and connects the first N words with the maximum probability in the text sample and the corresponding subjects T _i Obtaining the edge related to the theme;

s12: modeling a syntactic relation among words in a text sequence by adopting SpaCy, and if the syntactic relation exists among the words, establishing a syntactic edge for the words;

s13: different types of edges are defined, including seven edge types of adjacent edge, semantic edge, adjacent-semantic edge, grammar edge, adjacent-grammar edge, semantic-grammar edge and adjacent-semantic-grammar edge, which are respectively defined as edge 1, edge 2, edge 3, edge 4, edge 5, edge 6 and edge 7, and are initialized to seven different edge type characteristics Ec.

Further, the performing of word interaction based on the graph neural network specifically includes: distributing different attention weights to the neighborhood nodes by using multi-head attention based on various regular terms to filter noise information; h-h for each text input ₁ ,h ₂ ,…,h _V }，(h _V ∈R ^d ) D is the feature dimension of each node, and a shared linear transformation W is applied to each node ₁ ∈R ^d ^×d And attention, attention coefficient e _ij Obtained by the following formula:

wherein the weight vector a ∈ R ^2d′ T denotes transposition, and | | denotes vector stitching operation;

the coefficients are normalized by a Softmax function, the attention score alpha _ij Comprises the following steps:

linearly combining the attention scores and the node characteristics to serve as final output characteristics of each node;

expanding a single attention head into multi-head attention, and splicing the outputs of K attention heads as the output of multi-head attention:

after merging the edge types, the multi-head attention formula is updated as follows:

the use of various regularization terms among the attention heads encourages the attention score distribution to reduce overlap, so that the attention heads capture more different information, wherein the various regularization terms are as follows:

in the formula | · | non-conducting phosphor ₂ Representing the L2 norm.

Further, the text representation specifically includes:

after the word node information is fully updated, average pooling and maximum pooling are carried out on the node characteristics, the node characteristics are aggregated into text representation, and final prediction is generated:

and (3) transmitting the pooled features into a Softmax layer predicted text label:

finally, the update process of the node features is constrained by minimizing the objective loss function:

in the formula, λ is a regular term coefficient.

The invention has the beneficial effects that: in the stage of constructing the text graph, word grammar and semantic related edges are added on the basis of the adjacency matrix, so that the grammar semantic relevance among words is enriched, and the long-distance and discontinuous word interaction capacity is improved; in addition, different edge types are modeled, the different edge types are coded into different characteristics and input into GAT to calculate attention coefficients among nodes, and the model expression capacity is improved. After a text graph is constructed, GAT is adopted for word interaction, and considering that the GAT lacks effective control over different attention heads, the GAT with various regular terms is introduced to distribute different attention weights for neighborhood nodes to filter noise information, so that the attention score distribution is encouraged to reduce overlapping, and the attention heads capture more different information.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a diagram of a graph neural network text classification method based on canonical constraints;

fig. 2 is an example syntactic analysis result.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

Referring to fig. 1, the present invention provides a graph neural network text classification method based on regular constraint, which includes three parts:

(1) patterning: the invention adopts a composition method of textING, uses a sliding window to construct a text graph G (V, E) for a single text, wherein V is a vertex set which represents words in the text, and E is an edge set which represents the adjacency relation between the words.

Specifically, the invention adds semantic edges on the basis of a text graph G (V, E) constructed only by adjacency relations to capture the high-order correlation between words and subject words, firstly uses a subject generation model (LDA) to mine potential subjects T from the text, and for each subject T, the potential subject T is mined _i ＝(θ ₁ ,…θ _v ) Is expressed by probability distribution on words, wherein v represents the number of words, and connects the first N words with the maximum probability in the text sample and the corresponding subjects T _i . By utilizing the edges related to the topics, each text can be enrichedContext high-order semantics of words.

In addition, the invention adopts SpaCy to model the syntactic relation among words in the text sequence, including the main and subordinate relation (SBV), the moving object relation (VOB), the inter-object relation (IOB) and the like, and if the syntactic relation exists among the words, the syntactic edges are established for the words. The syntactic structure can effectively reveal the syntactic dependency among words in the text, and enriches the syntactic dependency among the words to improve the remote word interaction capability.

Therefore, there are seven edge types of adjacent edge, semantic edge, adjacent-semantic edge, grammar edge, adjacent-grammar edge, semantic-grammar edge, adjacent-semantic-grammar edge, and adjacent-semantic-grammar edge, which are defined as edge 1, edge 2, edge 3, edge 4, edge 5, edge 6, and edge 7, respectively, and are initialized to seven different edge type features EC, and EC is trained in the network update process.

For example, in the sentence "animal name given tiger louisiana state unity? "in, word adjacency is shown in table 1:

TABLE 1

Performing subject term extraction by adopting LDA, wherein "animal" is one of subject terms of the corpus, and establishing semantic edges between the "animal" and all words in the example; example was parsed with SpaCy, and the results are shown in fig. 2

Therefore, the relationships between words after adding grammatical, semantically related edges on the basis of adjacency relationships are shown in table 2:

TABLE 2

(2) GNN-based word interaction. Noise information from edges and nodes in the text graph structure can degrade the classification performance of the model. For example, in The sentence "The filmakers know to please The eye," but it is not always able to The neighbor nodes that The TextING constructs "eye-button" adjacent edge, "eye" updates The feature based on its neighbor "button", but The correlation between The central node and The neighbor nodes is not high, further aggregation may damage The performance of The model based on such noise edge, so The present invention uses GAT to assign different attention weights to The neighbor nodes to filter edge noise information. But in specific practice it is found that: the features extracted by a plurality of attention heads are relatively consistent. In order to constrain different attention heads to capture the characteristics of different characterization subspace information, the invention uses the GAT with various regular constraints to guide the attention score distribution to reduce the overlapping, thereby not only effectively filtering noise information, but also improving the capability of a model for learning the semantic representation of the word node context.

In particular, the present invention filters noise information using multi-headed attention based on diverse regularization terms to assign different attention weights to neighborhood nodes. H-h for each text input ₁ ,h ₂ ,…,h _V }，(h _V ∈R ^d ) D is the feature dimension of each node, and a shared linear transformation W is applied to each node ₁ ∈R ^d×d And attention, attention coefficient e _ij Obtained by equation (1):

in the formula (1), the weight vector a ∈ R ^2d′ T denotes transposition and | | denotes vector stitching operation.

To make the coefficients easy to compare between different nodes, the coefficients are normalized using the Softmax function, so the attention score α _ij Obtained from equation (2):

the attention score is then linearly combined with the node features as the final output feature for each node. In order to learn richer features and stabilize the learning process of attention, a single attention head is expanded to multi-head attention, and the outputs of K attention heads are spliced together to be used as the output of multi-head attention:

after merging the edge types, the multi-head attention formula (3) is updated to formula (4):

although GAT assigns trainable, fine-grained weights to each node neighbor that can filter noise information to some extent, current research shows: the ability to capture different features simply using multiple heads of attention is difficult to guarantee. In specific practice it will be found that: the features extracted by a plurality of attention heads are relatively consistent. In order to restrict different attention heads from capturing the characteristics of different characterization subspace information, the invention uses various regular terms among the attention heads to encourage the attention score distribution to reduce the overlapping, so that the attention heads capture more different information. The multiple regularization term is shown in equation (5):

in the formula (5), | · non-woven phosphor ₂ Representing the L2 norm.

(3) A textual representation. Aggregating the word node characteristics into discourse representation through maximum pooling and average pooling, obtaining the classification result of the text according to the discourse representation, and defining a new loss function to constrain the updating process of the node characteristics, wherein the loss function fully considers the constraint of various regular terms on GAT.

finally, the update process of the node features is constrained by minimizing the objective loss function, as shown in equation (8):

in equation (8), λ is a regular term coefficient.

The data set of this example is: 1) emotion classification data sets MR, SST1 and SST 2; 2) the road agency news classification data sets R8 and R52; 3) the subject classification data set TREC. The statistics of the data set are shown in table 3. The training set is randomly divided into training data and verification data according to the ratio of 9:1 to carry out experiments.

TABLE 3

The evaluation index adopted in this embodiment is Accuracy, and the calculation method is as follows:

in the formula (10), T _p 、F _p 、T _n And F _n The numbers of true positive, false positive, true negative and false negative are shown respectively.

Setting experimental parameters: the word vector is initialized using a 300-dimensional GloVe. Preprocessing, filtering stop words and punctuation marks in the text, removing 10% of words with low TF-IDF values, and eliminating the noise influence of common words; specific parameter settings are shown in table 4.

TABLE 4

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A graph neural network text classification method based on regular constraint is characterized in that: the method comprises the following steps:

2. The canonical constraint-based graph neural network text classification method according to claim 1, wherein: in the composition, the method specifically comprises the following steps:

s11: adding semantic edges on the basis of a text graph G (V, E) constructed only by adjacency relation to capture high-order correlation of words and subject words, firstly mining potential subjects T from the text by using a subject generation model (LDA), and mining each subject T _i ＝(θ ₁ ,…θ _v ) Using a word-related synopsisThe rate distribution is expressed, wherein v represents the number of words, and the top N words with the highest probability in the text sample and the corresponding subjects T are connected _i Obtaining the edge related to the theme;

3. The canonical constraint-based graph neural network text classification method according to claim 1, wherein: the performing of word interaction based on the graph neural network specifically includes: distributing different attention weights to the neighborhood nodes by using multi-head attention based on various regular terms to filter noise information; h-h for each text input ₁ ,h ₂ ,…,h _V }，(h _V ∈R ^d ) D is the feature dimension of each node, and a shared linear transformation W is applied to each node ₁ ∈R ^d×d And attention, attention coefficient e _ij Obtained by the following formula:

in the formula | · | non-conducting phosphor ₂ Representing the L2 norm.

4. The canonical constraint-based graph neural network text classification method according to claim 1, wherein: the textual representation specifically includes:

in the formula, λ is a regular term coefficient.