CN116304061A

CN116304061A - Text classification method, device and medium based on hierarchical text graph structure learning

Info

Publication number: CN116304061A
Application number: CN202310551919.9A
Authority: CN
Inventors: 龙军; 王子冬; 杨柳; 陈庭轩; 黄金彩
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2023-05-17
Filing date: 2023-05-17
Publication date: 2023-06-23
Anticipated expiration: 2043-05-17
Also published as: CN116304061B

Abstract

The invention discloses a text classification method, a device and a medium based on hierarchical text graph structure learning, wherein the method comprises the following steps: step S1: preprocessing the training set text according to three linguistic features to obtain three graph structure matrixes; step S2: performing edge level diagram structure learning to obtain three types of edge vectors; step S3: removing redundancy to obtain three text edge vectors; step S4: weighted summation is carried out to obtain a text graph structural representation; step S5: processing by adopting a graph convolution neural network, and generating graph-level text representation through a graph pooling layer; step S6: and carrying out softmax classification, wherein the class with the highest probability is the final classification result. The method has the advantages that three linguistic features are adopted to preprocess the training set text, and the text classification problem is converted into the graph classification problem; according to the invention, through multi-granularity graph structure learning, different graph structures are integrated, so that the semantic loss of the graph structure in the subsequent learning process is prevented.

Description

Text classification method, device and medium based on hierarchical text graph structure learning

Technical Field

The invention relates to the field of natural language processing, in particular to a text classification method, device and medium based on hierarchical text graph structure learning.

Background

The text classification is used as a basic technology in the field of natural language processing and is widely applied to realistic scenes such as knowledge question-answering, emotion analysis and the like. Currently, with the development of deep learning, the graphic neural network has made remarkable progress in text classification. However, how to graphically represent text is a difficulty. Existing methods of representing text using drawings do not take into account the accuracy and integrity of the original text drawing. In the diagram construction stage, due to errors existing in the algorithm, errors are likely to exist in the text diagram constructed by using methods such as entity/relation extraction and the like, so that the edges of the errors in the diagram are sparse or redundant, and the performance of a subsequent text classification task is affected. And limited by the priori knowledge of humans, the predefined graph structure only carries part of the information of the system, which prevents understanding of the underlying mechanism of how edges in the graph affect subsequent tasks, thereby limiting the application of the graph method in text classification.

In view of the foregoing, there is an urgent need for a text classification method, apparatus, and medium based on hierarchical text graph structure learning to solve the problems in the prior art.

Disclosure of Invention

The invention aims to overcome the defects of the existing text classification technology, and provides a text classification method, a text classification device and a text classification medium based on hierarchical text graph structure learning, so that updating and error correction of the text graph structure are realized, and the accuracy and the robustness of text graph classification are improved. The method provided by the invention adopts a local to global view angle to learn the graph structure hierarchically, thereby enriching the structural representation of the text graph, reducing the error introduced by the initial graph structure and modeling the relationship among nodes in fine granularity, and the specific technical scheme is as follows:

the text classification method based on hierarchical text graph structure learning comprises the following steps:

step S1: inputting and preprocessing the training set text to be classified according to three different linguistic characteristics to obtain node sets and edge sets of the three training set text, namely three graph structure matrixes; the three linguistic features are a text co-occurrence diagram, a text grammar diagram and a text semantic diagram respectively;

step S2: adopting a characteristic representation model based on edge level graph structure learning to process the three node sets and the graph structures of the edge sets to obtain three edge vectors;

step S3: removing redundancy of the three types of edge vectors according to the measurement standard of mutual information to obtain three types of text edge vectors;

step S4: carrying out weighted summation on the three text edge vectors to obtain text graph structural representation;

step S5: processing the text graph structural representation obtained in the step S4 and text semantic features corresponding to the text graph structural representation by adopting a graph convolution neural network, and generating graph-level text representation through a graph pooling layer;

step S6: and (5) carrying out softmax classification on the graph-level text representation obtained in the step (S5), and taking the category with the highest probability as a final classification result.

Preferably, in step S1, the text co-occurrence diagram construction mode specifically includes: will be in text

Is->

Expressed as text co-occurrence diagram->

Node->

The edge weight between any two word nodes in the graph adopts the point-to-point information of the word nodes

The edge weight expression for the text co-occurrence graph is represented as follows:

；

wherein,,

edge weight representing text co-occurrence graph, +.>

Representation word node->

He word node->

Is a piece of dot mutual information.

Preferably, in step S1, the text grammar map construction mode specifically includes: extracting text using parsing tools

Is->

Syntax dependency of->

Generating a relation triplet->

Use +.>

As nodes of the text grammar, the dependency relationships are used as edges among the nodes, the edge weights are expressed by using the frequencies of the dependency relationships in the data set, and the edge weight expression of the text grammar is as follows:

；

wherein,,

edge weights representing text grammar map, +.>

Representing the number of times two words have syntactic dependencies in all sentences of the corpus, ++>

Representing the number of times two words are present in the same sentence in all sentences of the corpus.

Preferably, in step S1, the text semantic graph construction method specifically includes: encoding text using BERT model

Arbitrary word +.>

Obtaining a feature vector->

Using cosine similarity to calculate semantic similarity between feature vectors, if the semantic similarity is greater than a set threshold +.>

Then it is indicated that there is a semantic relationship to the word, and the edge weight expression of the text semantic graph is as follows:

；

wherein,,

edge weights representing text semantic graphs, +.>

Representing the number of times two words have semantic relations in all sentences of the corpus, +.>

Preferably, in step S2, the process of learning the graph structure specifically includes: giving confidence to the graph structure matrix, and optimizing the graph structure matrix based on the confidence; using Laplace regularization to restrict the characteristics of the nodes, and using the characteristics as likelihood functions of Bayesian estimation; setting a learning process of a priori function constraint adjacency matrix; combining the likelihood function and the prior function, and restraining the adjacency matrix of the learned graph through a Bayesian estimation framework;

the above optimization and constraint are performed on the three text graphs respectively, and the final loss function expression is as follows:

；

wherein,,

loss function representing graph structure learning at edge level, +.>

Representation->

Constraint function for constraining the adjacency matrix of +.>

Representing the structure of a learned text semantic graph, +.>

Representing a learned text dependency graph structure, +.>

Representing the learned text co-occurrence graph structure.

Preferably, in step S3, the redundancy elimination process is specifically: three text graph structure feature mappings generated in graph structure learning of opposite side levels of a graph convolution neural network are used to obtain mapped feature vectors, mutual information of different nodes in the same text graph is maximized, mutual information of nodes in different text graphs is minimized, estimation is carried out on the three text graphs based on the mutual information, and an optimization objective function is as follows:

；

wherein,,

representing an optimized objective function>

Representation->

And->

Mutual information estimation between +.>

Side vector representing text semantic graph, +.>

Side vector representing text dependency graph, +.>

An edge vector representing a text co-line graph.

Preferably, in step S4, the edge vectors of the three text graphs are weighted and summed to obtain the final optimized graph structure

The expression is as follows:

；

wherein,,

。

preferably, in step S5, the process of generating the graph-level text representation specifically includes: processing the final optimized text graph structure in step S4 using a graph convolution neural network

And its features, update text semantic features +.>

For->

And carrying out global pooling processing to obtain a graph-level text representation, wherein the expression is as follows:

；

wherein,,

representation of the text at the representation level->

For node->

Is characterized by->

Representing global pooling, ->

A set of nodes representing a text graph.

In addition, the invention also discloses a text classification device based on hierarchical text graph structure learning, which comprises:

at least one processor;

and a memory communicatively coupled to the at least one processor;

wherein the memory is used for storing a computer program;

the processor is configured to implement the text classification method as described above when executing the computer program.

In addition, the invention also discloses a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the text classification method when being executed by a processor.

The technical scheme of the invention has the following beneficial effects:

(1) The invention adopts three different methods to extract three linguistic features of the text, describes the characteristics of the text from different aspects, carries out pretreatment on the three linguistic features, converts the text classification problem into the graph classification problem, converts words into nodes in the graph, and converts the relationship between the words into edges in the graph.

(2) The invention adopts the graph structure learning of the edge level and the graph level to learn the graph structures of the three text graphs with different granularities, the edge level and the graph level capture the structural characteristics of the graph from a thin granularity and a thick granularity respectively, and the confidence is given to each edge for learning the characteristic relation among the nodes with fine granularity. For the graph level, the graph structures of multiple sources are adaptively integrated to obtain an optimal combination mode of the graph structures of different sources, and the graph structures of different semantic features are integrated on the basis of removing repeated information through multi-granularity graph structure learning, so that the occurrence of the phenomenon of semantic loss of the graph structures in the subsequent learning process is prevented, and the model performance is further improved.

In addition to the objects, features and advantages described above, the present invention has other objects, features and advantages. The present invention will be described in further detail with reference to the drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of the steps of a text classification method in a preferred embodiment of the invention;

FIG. 2 is a text co-occurrence diagram of a simulation experiment in a preferred embodiment of the present invention;

FIG. 3 is a text semantic graph of a simulation experiment in a preferred embodiment of the present invention;

FIG. 4 is a text grammar diagram of a simulation experiment in a preferred embodiment of the present invention;

fig. 5 is a final text diagram of a simulation experiment in a preferred embodiment of the present invention.

Detailed Description

Embodiments of the invention are described in detail below with reference to the attached drawings, but the invention can be implemented in a number of different ways, which are defined and covered by the claims.

Examples:

referring to fig. 1, the embodiment discloses a text classification method based on hierarchical text graph structure learning, which comprises the following steps:

step S1: inputting and preprocessing the training set text to be classified according to three different linguistic characteristics to obtain a node set and an edge set of the training set text, namely three graph structure matrixes; the three linguistic features are a text co-occurrence diagram, a text grammar diagram and a text semantic diagram respectively;

step S2: processing the three node sets and the graph structures of the edge sets by adopting a feature representation model based on graph structure learning of the edge level to obtain three edge vectors containing semantic information;

step S3: removing redundancy from the three edge vectors according to the measurement standard of mutual information to obtain three independent text edge vectors;

step S4: carrying out weighted summation on the three text edge vectors subjected to redundancy elimination to obtain a text graph structural representation integrating the global relationship;

step S5: processing the text graph structure obtained in the step S4 and the text semantic features corresponding to the text graph structure by adopting a graph convolution neural network, and generating graph-level text representation through a graph pooling layer;

Applying the above method to a text data set

Text classification is carried out, wherein->

Represents the->

Sample number->

Indicate->

Text units->

Representing the corresponding tag. For->

A text co-occurrence map is constructed using three patterning methods>

Text grammar map->

Text semantic graph

Wherein->

Representing text diagram->

Node of->

Adjacency matrix representing text diagram for representing text diagram +.>

Is a topology of (a). From the dataset->

Is selected to be +.>

Text of->

。

Specifically, in step S1, the text co-occurrence diagram construction mode specifically includes: will be in text

Is->

Expressed as text co-occurrence diagram->

Node->

；

wherein,,

edge weight representing text co-occurrence graph, +.>

Representation word node->

He word node->

Is a piece of dot mutual information.

Specifically, in step S1, the text grammar map construction mode specifically includes: make the following stepsExtracting text with parsing tool

Is->

Syntax dependency of->

Generating a relation triplet->

Use +.>

；

wherein,,

edge weights representing text grammar map, +.>

Specifically, in step S1, the text semantic graph construction method specifically includes: processing text using a pre-trained BERT model

Arbitrary word +.>

Obtaining a specialSyndrome vector->

；

wherein,,

edge weights representing text semantic graphs, +.>

Further, text graph structure learning is performed on the three constructed text graphs, including graph structure learning at the edge level and graph structure learning at the graph level. The edge level and the graph level capture the structural characteristics of the graph from a thin granularity and a thick granularity respectively, and confidence is given to each edge to learn the characteristic relation among nodes with the thin granularity. For the graph level, the graph structures of multiple sources are adaptively integrated to obtain an optimal combination mode of the graph structures of different sources. Modeling feature relationships from the fine granularity of edges enables more precise control of the flow of information during learning from the bottom layer, whereas a single source graph structure represents only graph structure data that is described from one perspective, potentially resulting in bias in classification results.

Specifically, in step S2, the edge level diagram structure learning (edge-level graph structure learning) process is specifically as follows:

because errors exist in the composition process, the original graph structure may be wrong and incomplete, firstly, the graph structure at the edge level is learned to endow confidence to the original graph structure matrix, the graph structure matrix is optimized based on the confidence, and the confidence optimization expression is as follows:

；

wherein,,

for confidence matrix, ++>

For the original graph matrix, ">

For linear mapping +.>

Matrix with all elements 1, +.>

Representing the optimized graph structure.

In this embodiment, when defining the confidence matrix, the assumption that the features of adjacent nodes are similar is adopted, and the relationship between nodes is modeled by using the attention of the global graph, and the elements

Is defined as follows:

；

wherein,,

representing node->

And node->

Relation between->

To activate the function.

Element(s)

Is substituted into the confidence optimization expression to obtain the final adjacency matrix +/at each iteration>

。

In order to further enhance the smoothness of the graph nodes, the present embodiment uses laplace regularization to constrain the characteristics of the nodes, and uses the characteristics as a likelihood function of bayesian estimation, and the expression is as follows:

；

wherein,,

features representing nodes of the graph, < >>

Representing a normalized Laplace matrix, +.>

Representing the parameters that are predefined and,

the smaller the difference between the neighboring nodes of the graph, the smaller the difference, indicating that the two nodes are more similar.

In order to further give the learned graph symmetry and simplicity properties, it is necessary to constrain the learning process of the adjacency matrix by an a priori function defined as follows:

；

wherein,,

for the constraint of symmetry on the graph, +.>

Is a constraint on simplicity in the drawing. Symmetry and simplicity for the graph satisfying both constraints, +.>

And->

For manually adjusted superparameters,/->

For learned graph structure->

Transposed matrix of>

Representing the diagram structure->

Is a matrix of the matrix.

And combining the likelihood function and the prior function, and restraining the adjacency matrix of the learned graph through a Bayesian estimation framework, wherein the expression is as follows:

；

wherein,,

representation->

Constraint function for constraining the adjacency matrix of +.>

For manually adjusted superparameters,/->

Expressed as natural constant->

An exponential function of the base.

；

wherein,,

loss function representing learning of edge level graph structure, +.>

Representing the structure of a learned text semantic graph, +.>

Representing a learned text dependency graph structure, +.>

Representing the learned text co-occurrence graph structure.

By learning the graph structures at the edge level for the three text graphs, three optimized graph structures are obtained, each of which contains some unique information. Integration of these three graph structures is required to produce the final graph structure. Since the three text graphs may contain repeated redundant information, the redundant information needs to be removed, so that the independence of the text graphs is improved. The present invention uses mutual information as a measure of text graph independence.

If the correlation between two text graphs is high, the mutual information between them is also large, and vice versa. However, in practical applications, it is difficult to directly calculate the mutual information of the graph, and thus the InfoNCE method is used to estimate the lower bound of the mutual information.

Specifically, in step S3, the redundancy elimination process specifically includes: feature mapping is carried out on three text graph structures generated in edge level graph structure learning by using graph convolution neural network GCN, mapped feature vectors are obtained, and semantic text graphs are used

For example, a mapped feature vector +.>

The other two types of text graphs are mapped in the same manner, and are not repeated here.

After the feature vectors of three texts are obtained, infoNCE is used for limiting the relation between the text graphs, the mutual information of different nodes in the same text graph is maximized, the mutual information of the nodes in different text graphs is minimized, the relation between the text semantic graph and the text dependency graph is taken as an example, and the expression is as follows:

；

wherein,,

representing node->

And node->

Similarity between->

Is the temperature coefficient in InfoNCE.

And estimating the three text graphs based on mutual information, wherein the optimization objective function is as follows:

；

wherein,,

representing an optimized objective function>

Representation->

And->

Mutual information estimation between +.>

Side vector representing text semantic graph, +.>

Side vector representing text dependency graph, +.>

An edge vector representing a text co-line graph.

Specifically, in step S4, the edge vectors of the three text graphs are weighted and summed to obtain a final optimized graph structure

The expression is as follows:

；

wherein,,

。

specifically, in step S5, the process of generating the graph-level text representation specifically includes: processing the final optimized graph structure in step S4 using a graph convolution neural network

And its features->

Update semantic feature of text->

For->

；

wherein,,

representation of the text at the representation level->

For node->

Is characterized by->

Representing a global pooling process.

Specifically, in step S6, the expression of the softmax classification is as follows:

；

wherein,,

representing the final result of the classification,/->

Is a learnable mapping matrix +.>

In the event of a bias that is to be learnable,

representation->

A function.

According to an embodiment of the present invention, step S1 is actually a preprocessing stage of the label text of the training set, in which process the text classification problem is actually converted into the graph classification problem, the words are converted into nodes in the graph, the relationships between the words are converted into edges in the graph, and the three different linguistic features are converted. The steps S2, S3 and S4 are the process of learning the graph structure of different granularity for the input text graph, and integrate the representation of the graph structure under different viewing angles. The above steps have two advantages over conventional methods: (1) When the text graph is generated, various linguistic rules can be utilized to extract graph nodes and edges, so that the accuracy of text classification is effectively improved; (2) Through multi-granularity graph structure learning, the graph structures with different semantic features are integrated on the basis of removing repeated information, so that the occurrence of the phenomenon of graph structure semantic loss in the subsequent learning process is prevented, and the model performance is further improved.

Simulation experiment:

the present example performed a simulation experiment on the public dataset MR. MR is a movie rating dataset that contains user reviews of movies and corresponding categories. These categories are classified into positive and negative evaluations. In the embodiment, a simulation experiment is performed by randomly extracting one comment sample in the MR data set, so as to evaluate whether the method disclosed by the invention achieves the effect of relearning the graph structure of the sample. Three text graphs, namely a text co-occurrence graph shown in fig. 2, a text grammar graph shown in fig. 3 and a text semantic graph shown in fig. 4, are constructed according to the step S1 of the invention for the randomly extracted sentence "Take Care of My Cat offers a refreshingly different slice of Asian cinema". From these three text graphs, the final text graph, i.e., the graph-level text representation, as shown in fig. 5 is obtained and text-classified. The abscissa in fig. 2, 3, 4, 5 represents the unique word in this comment sample, the left ordinate also represents the unique word in this comment sample, and the right ordinate represents the strength of the relationship between the words. Color blocks in the matrix are larger than 0 to indicate that the relation between the corresponding words is positive, and the larger the numerical value is, the stronger the positive relation between the words is, namely the positive effect on the classification result is, the smaller the color blocks are, the smaller the numerical value is, the relation between the corresponding words is negative, and the stronger the negative relation between the words is, namely the stronger the negative effect on the classification result is. Color bars equal to 0 indicate that the relationship between the two words in the comment sample has no effect on the classification result. In summary, relearned FIG. 5 facilitates the method disclosed in this example in evaluating whether this comment sample belongs to a positive or negative rating.

The translation of the randomly extracted comment sample "Take Care of My Cat offers a refreshingly different slice of Asian cinema" is that "care for my cat" provides a completely new bridge for asian movies. The true classification result of this comment sample is a positive evaluation. As can be seen from the finally learned text chart 5, the method disclosed in this embodiment relearns the text chart and discards some erroneous relations between words in the original text chart, such as (wake, care) in the text co-occurrence chart 2 and the text grammar chart 3, and these relations exist in the movie title Take Care of My Cat, and have no positive effect on correctly judging this comment text as a classification result of the forward evaluation. In addition, the learned final text fig. 5 also adds some new relationships, such as (references), which helps the method disclosed in this example to determine this text as a positive evaluation category.

In addition, the embodiment also discloses a text classification device based on hierarchical text graph structure learning, which comprises:

at least one processor;

and a memory communicatively coupled to the at least one processor;

wherein the memory is used for storing a computer program;

In addition, the embodiment also discloses a computer readable storage medium, and the computer readable storage medium stores a computer program, and the computer program realizes the text classification method when being executed by a processor.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The text classification method based on hierarchical text graph structure learning is characterized by comprising the following steps:

step S5: processing the text graph structural representation obtained in the step S4 and the text semantic features corresponding to the text structural representation by adopting a graph convolution neural network, and generating a graph-level text representation through a graph pooling layer;

2. The text classification method according to claim 1, wherein in step S1, the text co-occurrence graph construction mode specifically includes: will be in text

Is->

Expressed as text co-occurrence diagram->

Node->

The edge weight between any two word nodes in the graph adopts the dot mutual information of the word nodes +.>

；

wherein,,

edge weight representing text co-occurrence graph, +.>

Representation word node->

He word node->

Is a piece of dot mutual information.

3. The text classification method according to claim 1, wherein in step S1, the text grammar map is constructed in a manner that: extracting text using parsing tools

Is->

Syntax dependency of->

Generating a relation triplet->

Use +.>

；

wherein,,

edge weights representing text grammar map, +.>

4. The text classification method according to claim 1, wherein in step S1, the text semantic graph is constructed specifically by: encoding text using BERT model

Arbitrary word +.>

Obtaining a feature vector->

；

wherein,,

edge weights representing text semantic graphs, +.>

5. The text classification method according to claim 1, wherein in step S2, the process of learning the graph structure is specifically: giving confidence to the graph structure matrix, and optimizing the graph structure matrix based on the confidence; using Laplace regularization to restrict the characteristics of the nodes, and using the characteristics as likelihood functions of Bayesian estimation; setting a learning process of a priori function constraint adjacency matrix; combining the likelihood function and the prior function, and restraining the adjacency matrix of the learned graph through a Bayesian estimation framework;

；

wherein,,

loss function representing graph structure learning at edge level, +.>

Representation->

Constraint function for constraining the adjacency matrix of +.>

Representing the structure of a learned text semantic graph, +.>

Representing a learned text dependency graph structure, +.>

Representing the learned text co-occurrence graph structure.

6. The text classification method according to claim 5, wherein in step S3, the redundancy removing process is specifically: three text graph structure feature mappings generated in graph structure learning of opposite side levels of a graph convolution neural network are used to obtain mapped feature vectors, mutual information of different nodes in the same text graph is maximized, mutual information of nodes in different text graphs is minimized, estimation is carried out on the three text graphs based on the mutual information, and an optimization objective function is as follows:

；

wherein,,

representing an optimized objective function>

Representation->

And->

Mutual information estimation between +.>

Side vector representing text semantic graph, +.>

Side vector representing text dependency graph, +.>

An edge vector representing a text co-line graph.

7. The text classification method of claim 6, wherein in step S4, the edge vectors of the three text graphs are weighted and summed to obtain the final optimized graph structure

The expression is as follows:

；

wherein,,

。

8. the text classification method according to claim 7, characterized in that in step S5, a graph-level text is generatedThe process of the present representation is specifically: processing the final optimized text graph structure in step S4 using a graph convolution neural network

And its features, update text semantic features +.>

For->

；

wherein,,

representation of the text at the representation level->

For node->

Is characterized by->

Representing global pooling, ->

A set of nodes representing a text graph.

9. Text classification device based on hierarchical text graph structure study, characterized by comprising:

at least one processor;

and a memory communicatively coupled to the at least one processor;

wherein the memory is used for storing a computer program;

the processor is configured to implement the text classification method according to any of claims 1 to 8 when executing the computer program.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the text classification method according to any of claims 1 to 8.