CN117874611A

CN117874611A - Text classification method, device, electronic equipment and storage medium

Info

Publication number: CN117874611A
Application number: CN202311865962.9A
Authority: CN
Inventors: 张隆基; 任梦星; 刘迎建; 彭菲
Original assignee: Hanwang Technology Co Ltd
Current assignee: Hanwang Technology Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-12

Abstract

The application discloses a text classification method, a text classification device and electronic equipment, and belongs to the technical field of computers. The method comprises the following steps: acquiring text vectors of texts to be classified and text position information of entities in the texts to be classified; extracting context characteristics from the text vector to obtain a context characteristic vector; carrying out nonlinear feature aggregation on the context feature vectors, and extracting semantic feature vectors; extracting entity feature vectors corresponding to the entities in the text to be classified from the context feature vectors; performing attention calculation on the entity feature vector and the semantic feature vector to obtain an interaction vector; and performing classification mapping based on the interaction vector to obtain text categories of text matching to be classified. According to the method, text classification is carried out by combining entity information, and the effect of the entity on the text classification is remarkably enhanced and the accuracy of the text classification is remarkably improved through the deep interaction between the entity and sentence characteristics.

Description

Text classification method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to a text classification method, apparatus, electronic device, and computer readable storage medium.

Background

Medical field text classification plays an important role in medical field related applications. Taking medical knowledge question-answering application as an example, the aim of medical field text classification is to classify questions raised by a user into corresponding medical fields, so as to ensure that the question-answering application can output answers with high accuracy and specificity.

In the text classification method in the prior art, a neural network model is adopted, text classification is carried out based on the context characteristics and the full-quantity semantic characteristics of the text, and a graph roll neural network model (Graph Convolutional Neural Networks, GCN) is mainly adopted for extracting text syntax information. The GCN model extracts neighbor node information of a target node based on the syntactic dependency tree, and semantic features are extracted from the node multi-layer information through multi-layer convolution. In order to improve the efficiency of GCN information extraction, there is a method in the prior art that uses dependency tree pruning to exclude noise information. However, the fixed pruning standard does not achieve the best effect when the feature extraction is performed on the input text dynamically changing in the specified professional field.

It can be seen that there is still a need for improvements in the text classification methods of the prior art.

Disclosure of Invention

The embodiment of the application provides a text classification method, a text classification device, electronic equipment and a storage medium, which can improve the accuracy of text classification in the professional field.

In a first aspect, an embodiment of the present application provides a text classification method, including:

acquiring a text vector of a text to be classified and text position information of an entity in the text to be classified;

extracting the context characteristics of the text vector to obtain a context characteristic vector;

carrying out nonlinear feature aggregation on the context feature vector, and extracting a semantic feature vector;

extracting entity feature vectors corresponding to the entities in the text to be classified from the context feature vectors;

performing attention calculation on the entity feature vector and the semantic feature vector to obtain an interaction vector;

and carrying out classification mapping based on the interaction vector to obtain the text category matched with the text to be classified.

In a second aspect, an embodiment of the present application provides a text classification apparatus, including:

the text vector and entity information acquisition module is used for acquiring text vectors of texts to be classified and text position information of entities in the texts to be classified;

the context feature vector acquisition module is used for extracting the context feature of the text vector to obtain a context feature vector;

the semantic feature vector extraction module is used for carrying out nonlinear feature aggregation on the context feature vector and extracting a semantic feature vector;

The entity feature vector extraction module is used for extracting entity feature vectors corresponding to the entities in the text to be classified from the context feature vectors;

the interaction vector acquisition module is used for carrying out attention calculation on the entity feature vector and the semantic feature vector to obtain an interaction vector;

and the first classification result acquisition module is used for carrying out classification mapping based on the interaction vector to obtain the text category matched with the text to be classified.

In a third aspect, the embodiment of the application further discloses an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the text classification method described in the embodiment of the application when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the text classification method disclosed in embodiments of the present application.

According to the text classification method disclosed by the embodiment of the application, the text vector of the text to be classified and the text position information of the entity in the text to be classified are obtained; extracting the context characteristics of the text vector to obtain a context characteristic vector; carrying out nonlinear feature aggregation on the context feature vector, and extracting a semantic feature vector; extracting entity feature vectors corresponding to the entities in the text to be classified from the context feature vectors; performing attention calculation on the entity feature vector and the semantic feature vector to obtain an interaction vector; and carrying out classification mapping based on the interaction vector to obtain the text category matched with the text to be classified. According to the method, text classification is carried out by combining entity information, and the effect of the entity on the text classification is remarkably enhanced and the accuracy of the text classification is remarkably improved through the deep interaction between the entity and sentence characteristics.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

FIG. 1 is a flow chart of a text classification method disclosed in an embodiment of the present application;

FIG. 2 is a schematic diagram of a text classification model used in an embodiment of the present application;

FIG. 3 is a schematic diagram of the structural design of the attention mechanism in the text classification model according to the embodiment of the application;

FIG. 4 is a second schematic diagram of a text classification model used in an embodiment of the present application;

FIG. 5 is a schematic diagram of a text classification device according to an embodiment of the present application;

FIG. 6 schematically shows a block diagram of an electronic device for performing a method according to the present application; and

fig. 7 schematically shows a memory unit for holding or carrying program code implementing the method according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Referring to fig. 1, a text classification method disclosed in an embodiment of the present application includes: steps 110 to 160.

Step 110, obtaining text vectors of the text to be classified and text position information of entities in the text to be classified.

In some embodiments of the present application, a pre-trained entity recognition model may be used to perform entity recognition on the text to be classified, so as to obtain a text position, that is, text position information, of an entity included in the text to be classified, and meanwhile, obtain an entity class corresponding to each entity. Wherein, the entity recognition model can be trained by adopting a method in the prior art.

In the embodiment of the present application, a specific manner of obtaining the text vector of the text to be classified is not limited.

Is there a need to make an electrocardiographic examination with the text to be classified as sentence s "i am chest pain? For example, the entity "chest pain" and "electrocardiogram" can be identified in the text to be classified, and the text vector of the text to be classified can be expressed as x, x= [ x ] ₁ ,x ₂ ,…,x _n ]Where n represents the length of the text to be classified, x _i A vector representing the i-th term in the text to be classified. According to the text position information of the entity in the text to be classified, the vector representation of the position of the corresponding entity can be extracted and used as the vector representation of the entity.

According to the text position information of the entity in the text to be classified, the specific implementation manner of extracting the vector representation of the entity from the text vector of the text to be classified is referred to in the prior art, and is not repeated in the embodiment of the present application.

In the embodiment of the application, the text to be classified is classified through a pre-trained text classification model. Optionally, the text classification model uses the BERT model cascade GCN model as the main neural network structure, and fully extracts the structural information and the syntactic information of the data in the specific field.

As shown in fig. 2, optionally, the text classification model includes: a first feature extraction layer 210, a second feature extraction layer 220, an attention mechanism layer 230, and an output layer 240. The following describes the specific embodiments and performance of each step of the text classification process in conjunction with the structure of the text classification model.

And 120, extracting the context characteristics of the text vector to obtain the context characteristic vector.

Optionally, the text vector is subjected to contextual feature extraction by the first feature extraction layer 210, so as to obtain a contextual feature vector.

The first feature extraction layer 210 may use a lightweight model such as DistilBERT, ALBERT, ROBERT, which is any BERT model, to extract the contextual feature of the text vector of the text to be classified, and extract the contextual feature vector as a valuable subset of the sentence feature vectors of the text to be classified. The DistillBERT model is a light-weight version of the BERT model proposed by Hugging Face company, and high accuracy is maintained on the basis of reducing the parameter scale. The model structure of the first feature extraction layer 210 is referred to in the prior art, and is not described in detail in the embodiments of the present application.

Taking the DistilBERT model structure as an example, the first feature extraction layer 210 is implemented, after the text vector x is input to the first feature extraction layer 210, the first feature extraction layer 210 performs coding processing on the input text vector x to obtain a context feature vector. The algorithm of the first feature extraction layer 210 can be expressed as:

h _bert ＝DistilBERT(wx)，

wherein w represents model weight, h _bert Representing the contextual feature vector.

And 130, carrying out nonlinear feature aggregation on the context feature vectors, and extracting semantic feature vectors.

Next, based on the contextual feature vector, a semantic feature vector is extracted.

In the embodiment of the present application, the second feature extraction layer 220 performs nonlinear feature aggregation on the context feature vector, to extract a semantic feature vector. The second feature extraction layer 220 may be implemented using a GCN network structure.

Optionally, the performing nonlinear feature aggregation on the context feature vector, extracting a semantic feature vector, includes: substep 1301 to substep 1306.

Sub-step 1301, constructing an edge connecting the nodes according to the adjacency matrix corresponding to the words in the text to be classified by taking the word vectors in the context feature vectors as the nodes, and obtaining a graph structure.

And establishing a character corresponding adjacency matrix in the text to be classified according to adjacency relations of all characters in the text to be classified. The elements of the rows and columns in the adjacency matrix respectively represent whether the two words are adjacent, if so, the corresponding matrix element is 1, and if not, the corresponding matrix element is 0. For example, the matrix element of the ith column in the ith-1 row and the ith+1 row takes a value of 1 if the ith word is adjacent to the ith-1 word and the ith+1 word.

Optionally, in the context feature vector h _bert The vector of each word in the adjacency matrix represents the node serving as the graph according to the adjacency relation of each word in the adjacency matrixAnd constructing edges of the connection nodes to obtain a graph structure corresponding to the text to be classified.

Sub-step 1302, inputting the graph structure to a pre-trained graph convolution neural network, obtaining a vector representation of each of the terms through the graph convolution neural network.

The graph structure is used as an input of a graph roll-up neural network GCN. Then, the vector representation of the target node is obtained by aggregating the neighboring node (i.e., word vector) information by the GCN layer. Optionally, the algorithm for obtaining the vector representation of each word by the graph convolution neural network is expressed as follows:

Wherein c _i Representing the reciprocal of the degree of the word i,vector representation representing the jth word, A _ij Matrix elements representing the j-th column of the i-th row in the adjacency matrix, h _i And representing the characteristic vector representation of the target node i after the aggregation operation, namely representing the vector representation of the word corresponding to the ith node.

Substep 1303, obtaining a first vector representation of the text to be classified according to the vector representations of the words.

Alternatively, the vector representation of each word may be spliced according to the position of the word in the text to be classified, and the spliced vector is used as the first vector representation of the text to be classified.

In a substep 1304, a tensor outer product operation is performed on the first vector representation to generate a K-order cross feature.

And then, performing tensor outer product operation on the vector representation of each word in the text to be classified, and generating k-order cross features. For example, a tensor outer product operation may be performed on a vector representation of each term in the text to be classified by the following formula:

wherein h represents a first vector representation, h ^k Representing the K-order cross feature, i.e. the K-order cross text vector,representing tensor outer product operations. The process of performing the tensor outer product operation on the first vector representation h is the process of performing the tensor outer product operation on the vector representations of the words in the first vector representation h. Wherein K is an integer of 2 or more. Preferably, K has a value of 2.

In a sub-step 1305, the K-order cross feature is subjected to linear feature transformation to obtain a second vector representation.

Then, the K-order cross feature h ^k Substituting the linear characteristic combination into the characteristic conversion process of the graph rolling neural network GCN to obtain an output vector h of the graph rolling neural network GCN _GCN Denoted as "second vector representation". The specific algorithm for obtaining the second vector representation by performing linear feature transformation on the K-order cross features by using the graph roll neural network GCN can be expressed as follows:

h _GCN ＝σ(w ^k h ^k +b ^k )；

wherein w is ^k Model parameters representing a graph roll-up neural network GCN, b ^k Representing the bias, σ represents the nonlinear activation function.

In the prior art, the efficiency of the method for combining linear features is low by adopting the graph rolling neural network model based on the syntax tree in the node feature conversion stage, and the method for combining linear features is used as the input of the nonlinear activation function of the graph rolling neural network GCN after the K-order cross feature conversion is carried out on the first vector representation, so that the text feature extraction is more effective, and the correlation and potential association relation between the features can be fully mined.

And step 1306, performing dimension reduction processing on the second vector representation to obtain the semantic feature vector.

Assuming the context of the output of the first feature extraction layer 210Feature vector h _bert ∈R ⁿ K-order cross characteristics obtained after K-order crossThen the characteristic vector is obtained through the characteristic conversion process of the graph convolution neural network GCN>The K-order feature vector generated by the cross features belongs to a high-dimensional feature vector, and the calculation complexity and the parameter complexity of the index level are brought by matrix operation, so that the calculation load of the model is greatly increased. In order to reduce the calculation complexity of the subsequent steps and the parameter number of the downstream model and improve the performance of the text classification model, in the embodiment of the application, the second vector representation is subjected to dimension reduction processing to obtain a low-dimension feature vector.

Representing h by the second vector _GCN Is expressed as the (e) th dimension characteristicFor example, then->The calculation mode of (2) can be expressed by the following formula:

wherein,weight matrix corresponding to the e-th dimension, +.>Corresponding to the offset of the e-th dimension, sum () represents the sum operation, +.>Representing the hadamard product operation.

Assuming k=2, taking the second-order cross feature as an example to perform low-order near-rank derivation, the following formula will not be repeated to describe the activation function and bias:

wherein n represents the number of dimensions represented by the second vector, i and j represent the dimension numbers, h, respectively _i And h _j The ith and jth dimension vectors respectively representing the second vector h, W represents a weight matrix, W _i 、w _j And w _e Respectively representing the ith dimension, the jth dimension and the e dimension weight,h ² ∈R ^n×n from the tensor outer product algorithm, it is known that: />h∈R ⁿ T represents a transpose operation. According to the matrix decomposition principle, w ² Equivalent to the vector product of two dimensions n, i.ew _e ∈R ⁿ 。

Based on the above derivation, extending from computing the e-th dimension feature to the overall second dimension represents h _GNN The calculation method is as follows:

wherein h is _GCN Representing semantic feature vectors, w, obtained through graph convolution neural network GCN ¹ 、w ² 、w ^k Respectively representing different weight matrixes, wherein h represents the first vector representation of the text to be classified. By low-order rank reduction derivation, the parameter and the computational complexity are reduced from an exponential level to a linear level, so that the complexity of matrix computation is within the range born by the model.

The algorithm of the graph roll-up neural network GCN comprises two stages of adjacent node aggregation and node feature conversion, wherein the node feature conversion stage adopts a weighted summation linear feature combination mode, and the method is low in efficiency and insufficient for extracting potential relations between features. According to the method, the cross feature pairs are constructed in a mode based on feature products, and the generalization capability of the graph convolution neural network GCN is improved in a nonlinear feature combination mode. Furthermore, in order to reduce the computational complexity caused by the increase of the feature vector dimension obtained by the nonlinear feature combination mode, feature dimension reduction processing is further performed.

And 140, extracting entity characteristic vectors corresponding to the entities in the text to be classified from the context characteristic vectors.

Through the first feature extraction layer 210, the text vector is subjected to context feature extraction, so as to obtain a context feature vector h _bert Thereafter, further, at the second feature extraction layer 220, based on the text position information of each entity in the text to be classified, an entity feature vector of each entity, for example, denoted as h, may be extracted _target . Wherein, the entity characteristic vector h _target Is the context feature vector h _bert Is a subsequence of (a).

The specific method for extracting the entity feature vector corresponding to the entity in the text to be classified from the context feature vector is referred to in the prior art, and is not described herein.

And step 150, performing attention calculation on the entity feature vector and the semantic feature vector to obtain an interaction vector.

And then, performing attention calculation on the feature vector and the semantic feature vector through an attention mechanism layer 230 of the text classification model to obtain an interaction vector.

Optionally, the performing attention calculation on the entity feature vector and the semantic feature vector to obtain an interaction vector includes: substep 1501 to substep 1505.

The following describes a specific embodiment of performing attention computation on the entity feature vector and the semantic feature vector to obtain an interaction vector in combination with a schematic structural design of the attention mechanism layer 230 in fig. 3.

In step 1501, a matrix multiplication operation is performed on the entity feature vector and the semantic feature vector, so as to obtain an interaction matrix.

Alternatively, the interaction matrix may be obtained by the following formula:

wherein h is _GCN The semantic feature vectors are represented as such,representing a transpose of the entity feature vector.

Sub-step 1502, according to the interaction matrix, calculating a first matrix of attention coefficients of the text to be classified relative to the entity and a second matrix of attention coefficients of the entity relative to the semantic feature vector, respectively.

The attention coefficients between the semantic and physical features can then be calculated from the lateral and longitudinal angles, respectively, using a softmax function (i.e., a normalized exponential function). The probability value of the sum of the index value of each element value of the ith row and the jth column of the cross matrix relative to all element index values of the ith row is used as the element value of the ith row and the jth column in the first matrix, so that a first matrix of the attention coefficient of the text to be classified relative to the entity is obtained; and taking the sum probability value of the index value of each element value of the ith row and the jth column of the cross matrix relative to all element index values of the jth row as the element value of the ith row and the jth column in the second matrix to obtain a second matrix of the attention coefficient of the entity relative to the semantic feature vector.

For example, the following formula may be used to calculate a first matrix M of the attention coefficients of the text to be classified with respect to the entity, and a second matrix N of the attention coefficients of the entity with respect to the semantic feature vector, respectively:

wherein I and j respectively represent the row number and the column number of the interaction matrix, I _ij Representing elements of the ith row and jth column of the interaction matrix, M _ij Elements representing the ith row and jth column of the first matrix M, N _ij Representing the elements of the ith row and jth column of the second matrix N.

In step 1503, the second matrix is subjected to average pooling processing, so as to obtain a first attention vector of each word in the entity relative to the semantic feature vector.

Then, the second matrix N is subjected to average pooling processing by using an average pool to obtain a first attention vectorThe first attention vector is used for representing the attention value of each character in the entity to the semantic feature vector of the whole text to be classified. Alternatively, the second matrix N may be subjected to the average pooling process in the following manner:

where n represents the number of rows of the second matrix.

Sub-step 1504 multiplies the first matrix with the first attention vector to obtain a second attention vector of the semantic feature vector.

Then, the first attention vector obtained after the average pooling processMultiplying the first matrix M to calculate the attention vector of the whole semantic feature vector +.>Optionally, the attention vector of the whole semantic feature vector is calculated +.>The formula is as follows:

wherein,

sub-step 1505, multiplying said second attention vector with said semantic feature vector and obtaining an interaction vector using a residual mechanism.

Finally, the attention vector of the semantic feature vector (namely the second attention vector) is multiplied by the semantic feature vector, and an output vector of the attention mechanism layer text feature is obtained by means of a residual mechanism and is used as an interaction vector. Alternatively, the interaction vector may be calculated using the following formula:

wherein h is _GCN Representing the semantic feature vector h obtained after the dimension reduction processing _att Representing the interaction vector.

As can be seen from the foregoing calculation process, at the attention mechanism layer, feature vectors of the entities are extracted from the context feature layer, semantic feature vectors of the sentences are extracted from the convolutional neural network GCN, and the text classification model can focus on information related to the entities with pertinence through a series of matrix products, attention calculation, average pooling and residual connection. The interactive learning mode further improves understanding of entity semantics, and enables the text classification model to be more focused on capturing and utilizing key information.

And 160, performing classification mapping based on the interaction vector to obtain the text category matched with the text to be classified.

In some embodiments of the present application, taking the text classification model structure shown in fig. 2 as an example, the output layer 240 may perform classification mapping based on the interaction vector to obtain the text category matched with the text to be classified. The output layer 240 classifies the interaction vector output by the previous layer by applying a softmax function, so as to accurately judge the text category to be classified. Alternatively, the classification algorithm of the output layer 240 may be expressed as follows:

wherein h is _att And representing the interaction vector, C epsilon C represents a classification label of the text to be processed, C represents a classification decision space, i represents the number of categories, and exp () represents an exponential function.

In some alternative embodiments, as shown in fig. 4, the text classification model is between the attention mechanism layer 230 and the output layer 240, and further includes: masking layer 250.

Optionally, the performing classification mapping based on the interaction vector to obtain the text category matched with the text to be classified includes: masking the non-entity vector representation in the interaction vector according to the text position information to obtain a masking vector; and performing classification mapping based on the mask vector to obtain the text category of the text to be classified for matching.

The masking layer 250 is configured to mask, according to the text position information, a vector representation of a non-entity in the interaction vector to obtain a mask vector; the output layer 240 is configured to perform classification mapping based on the mask vector, so as to obtain a text category matched with the text to be classified.

The implementation principle of the mask layer 250 may be expressed as follows:

wherein h is _att Representing the interaction vector, mask represents a mask matrix, and m representsNumber of characters in entity, h _f Representing the mask vector.

By masking the interaction vector, interference of invalid features (such as non-entity character vectors) on final output is weakened, final feature representation of the entity is reserved, and the feature processing process of the text classification model can be controlled more accurately. By applying a mask mechanism to the feature vectors obtained through interaction, only output features related to the entities are reserved, so that the final output feature set is ensured to fully reflect the semantic information of the entities, and the robustness and generalization capability of text classification are effectively improved.

Accordingly, the classification algorithm of the output layer 240 may be expressed as follows:

wherein h is _f Representing the mask vector, C epsilon C represents the classification label of the text to be processed, C represents the classification decision space, and i represents the number of categories.

According to the embodiment of the application, through the design of the multi-layer structure of the text classification model, the text classification process can deeply mine the semantic and structural information of the input text, so that the text classification method is better suitable for the complexity and the professionality of the text in professional fields such as medical fields.

In the embodiment of the application, the text to be classified is subjected to label processing, the text position information of the entity is only the position information of one entity in the text to be classified, and the classification label of the text to be processed is the classification label of the entity corresponding to the text position information. When a text includes a plurality of entities, a text to be classified is generated according to the text position information of each entity in the text, so that a plurality of texts to be classified are obtained, and text classification processing is performed respectively, so that a text classification result corresponding to each text to be classified is obtained. And finally, according to specific requirements, combining text classification results corresponding to each text to be classified, and outputting an aggregate classification result of the text.

Optionally, the text position information is text position information of a single entity currently determined in the text to be classified, and the obtained text category is: based on the text category of the text to be classified matching obtained by the single entity currently determined, the text category of the text to be classified matching obtained by performing classification mapping based on the interaction vector further comprises: and carrying out aggregation processing on the text types matched with the texts to be classified and obtained based on the single entity respectively determined to obtain an aggregation classification result corresponding to the texts to be classified. For example, when a certain text S2 to be classified includes two entities, namely entity 1 and entity 2, the foregoing steps 120 to 160 may be executed according to the text vector of the text S2 to be classified and the text position information of entity 1 as inputs, to obtain a text classification result matched with the text to be classified, and then the foregoing steps 120 to 160 may be executed according to the text vector of the text S2 to be classified and the text position information of entity 2 as inputs, to obtain another text classification result matched with the text to be classified, and finally, the two text classification results are aggregated, to obtain an aggregate classification result of the text S2 to be classified.

In summary, according to the text classification method disclosed by the embodiment of the application, text vectors of texts to be classified and text position information of entities in the texts to be classified are obtained; extracting the context characteristics of the text vector to obtain a context characteristic vector; carrying out nonlinear feature aggregation on the context feature vector, and extracting a semantic feature vector; extracting entity feature vectors corresponding to the entities in the text to be classified from the context feature vectors; performing attention calculation on the entity feature vector and the semantic feature vector to obtain an interaction vector; and carrying out classification mapping based on the interaction vector to obtain the text category matched with the text to be classified. According to the method, text classification is carried out by combining entity information, and the effect of the entity on the text classification is remarkably enhanced and the accuracy of the text classification is remarkably improved through the deep interaction between the entity and sentence characteristics.

Referring to fig. 5, the embodiment of the application further discloses a text classification device, which includes:

a text vector and entity information obtaining module 510, configured to obtain a text vector of a text to be classified and text position information of an entity in the text to be classified;

The context feature vector obtaining module 520 is configured to perform context feature extraction on the text vector to obtain a context feature vector;

a semantic feature vector extraction module 530, configured to perform nonlinear feature aggregation on the context feature vector, and extract a semantic feature vector;

an entity feature vector extraction module 540, configured to extract an entity feature vector corresponding to the entity in the text to be classified from the context feature vector;

the interaction vector obtaining module 550 is configured to perform attention computation on the entity feature vector and the semantic feature vector to obtain an interaction vector;

and the first classification result obtaining module 560 is configured to perform classification mapping based on the interaction vector to obtain a text category matched with the text to be classified.

Optionally, the semantic feature vector extraction module 530 is further configured to:

using word vectors in the context feature vectors as nodes, and constructing edges connected with the nodes according to adjacent matrixes corresponding to words in the text to be classified to obtain a graph structure;

inputting the graph structure into a pre-trained graph convolution neural network, and obtaining vector representations of the words through the graph convolution neural network;

According to the vector representation of each word, obtaining a first vector representation of the text to be classified;

performing tensor outer product operation on the first vector representation to generate K-order cross features;

performing linear feature transformation on the K-order cross features to obtain a second vector representation;

and performing dimension reduction processing on the second vector representation to obtain the semantic feature vector.

Optionally, the interaction vector obtaining module 550 is further configured to:

performing matrix multiplication operation on the entity feature vector and the semantic feature vector to obtain an interaction matrix;

according to the interaction matrix, respectively calculating a first matrix of attention coefficients of the text to be classified relative to the entity and a second matrix of attention coefficients of the entity relative to the semantic feature vector;

carrying out average pooling treatment on the second matrix to obtain a first attention vector of each word in the entity relative to the semantic feature vector;

multiplying the first matrix with the first attention vector to obtain a second attention vector of the semantic feature vector;

multiplying the second attention vector with the semantic feature vector, and obtaining an interaction vector by utilizing a residual mechanism.

Optionally, the first classification result obtaining module 560 is further configured to:

masking the non-entity vector representation in the interaction vector according to the text position information to obtain a masking vector;

and performing classification mapping based on the mask vector to obtain the text category of the text to be classified for matching.

Optionally, the text position information is text position information of a single entity currently determined in the text to be classified, and the obtained text category is: based on the text category of the text to be classified matching obtained by the single entity currently determined, the device further comprises:

and a second classification result obtaining module (not shown in the figure) is used for carrying out aggregation processing on the text types matched with the texts to be classified and obtained based on the single entity respectively determined to obtain an aggregation classification result corresponding to the texts to be classified.

The embodiment of the application discloses a text classification device for implementing the text classification method described in the embodiment of the application, and the specific implementation of each module of the device is not repeated, and can refer to the specific implementation of the corresponding step of the embodiment of the method.

According to the text classification device disclosed by the embodiment of the application, the text vector of the text to be classified and the text position information of the entity in the text to be classified are obtained; extracting the context characteristics of the text vector to obtain a context characteristic vector; carrying out nonlinear feature aggregation on the context feature vector, and extracting a semantic feature vector; extracting entity feature vectors corresponding to the entities in the text to be classified from the context feature vectors; performing attention calculation on the entity feature vector and the semantic feature vector to obtain an interaction vector; and carrying out classification mapping based on the interaction vector to obtain the text category matched with the text to be classified. According to the method, text classification is carried out by combining entity information, and the effect of the entity on the text classification is remarkably enhanced and the accuracy of the text classification is remarkably improved through the deep interaction between the entity and sentence characteristics.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

The foregoing has described in detail a method and apparatus for classifying text provided herein, and specific examples have been employed herein to illustrate the principles and embodiments of the present application, the above examples being provided only to assist in understanding the method and a core idea of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in an electronic device according to embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application may also be embodied as an apparatus or device program (e.g., computer program and computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

For example, fig. 6 shows an electronic device in which a method according to the present application may be implemented. The electronic device may be a PC, a mobile terminal, a personal digital assistant, a tablet computer, etc. The electronic device conventionally comprises a processor 610 and a memory 620 and a program code 630 stored on said memory 620 and executable on the processor 610, said processor 610 implementing the method described in the above embodiments when said program code 630 is executed. The memory 620 may be a computer program product or a computer readable medium. The memory 620 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 620 has a storage space 6201 for program code 630 of a computer program for performing any of the method steps described above. For example, the memory space 6201 for the program code 630 may include individual computer programs for implementing the various steps in the above methods, respectively. The program code 630 is computer readable code. These computer programs may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. The computer program comprises computer readable code which, when run on an electronic device, causes the electronic device to perform a method according to the above-described embodiments.

The embodiment of the application also discloses a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the text classification method according to the embodiment of the application.

Such a computer program product may be a computer readable storage medium, which may have memory segments, memory spaces, etc. arranged similarly to the memory 620 in the electronic device shown in fig. 6. The program code may be stored in the computer readable storage medium, for example, in a suitable form. The computer readable storage medium is typically a portable or fixed storage unit as described with reference to fig. 7. In general, the memory unit comprises computer readable code 630', which computer readable code 630' is code that is read by a processor, which code, when executed by the processor, implements the steps of the method described above.

Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Furthermore, it is noted that the word examples "in one embodiment" herein do not necessarily all refer to the same embodiment.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method of text classification, the method comprising:

2. The method of claim 1, wherein the non-linear feature aggregation of the contextual feature vectors to extract semantic feature vectors comprises:

3. The method of claim 1, wherein performing attention calculations on the entity feature vector and the semantic feature vector to obtain an interaction vector comprises:

4. The method according to claim 1, wherein the performing the classification mapping based on the interaction vector to obtain the text category of the text to be classified for matching includes:

5. The method according to claim 1, wherein the text position information is text position information of a single entity currently determined in the text to be classified, and the obtained text category is: based on the text category of the text to be classified matching obtained by the single entity currently determined, the text category of the text to be classified matching obtained by performing classification mapping based on the interaction vector further comprises:

and carrying out aggregation processing on the text types matched with the texts to be classified and obtained based on the single entity respectively determined to obtain an aggregation classification result corresponding to the texts to be classified.

6. A text classification device, the device comprising:

7. The apparatus of claim 6, wherein the semantic feature vector extraction module is further to:

8. The apparatus of claim 6, wherein the first classification result acquisition module is further configured to:

9. An electronic device comprising a memory, a processor and program code stored on the memory and executable on the processor, wherein the processor implements the text classification method of any of claims 1 to 5 when the program code is executed by the processor.

10. A computer readable storage medium having stored thereon program code, which when executed by a processor performs the steps of the text classification method of any of claims 1 to 5.