CN113987188A - Short text classification method and device and electronic equipment - Google Patents

Short text classification method and device and electronic equipment Download PDF

Info

Publication number
CN113987188A
CN113987188A CN202111326798.5A CN202111326798A CN113987188A CN 113987188 A CN113987188 A CN 113987188A CN 202111326798 A CN202111326798 A CN 202111326798A CN 113987188 A CN113987188 A CN 113987188A
Authority
CN
China
Prior art keywords
short text
vector
keywords
knowledge information
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111326798.5A
Other languages
Chinese (zh)
Other versions
CN113987188B (en
Inventor
夏书银
唐祚
张勇
付京成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202111326798.5A priority Critical patent/CN113987188B/en
Publication of CN113987188A publication Critical patent/CN113987188A/en
Application granted granted Critical
Publication of CN113987188B publication Critical patent/CN113987188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a short text classification method, a short text classification device and electronic equipment, which relate to the technical field of data processing, and have the technical scheme that: determining knowledge information and key words of the short text; embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords; processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text; attention calculation is carried out on semantic information of the short text and a vector matrix of knowledge information or a vector matrix of keywords to obtain a vector of the knowledge information or the keywords; and performing feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result. The method solves the problem that the text classification can not be accurately carried out in the face of short text context semantic missing in the short text classification method in the prior art, and improves the accuracy of text classification.

Description

Short text classification method and device and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a short text classification method and apparatus, and an electronic device.
Background
In recent years, with the development of deep learning, such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) is widely used in text classification, and has achieved good results in longer texts. However, the traditional deep learning neural network has a huge challenge in short text due to the problems of sparsity and ambiguity of data. In order to solve the problems of data sparseness and fuzziness, the current work is focused on acquiring more implicit information from short texts to understand the short texts. The text representation model is mainly divided into explicit representation and implicit representation, the explicit representation is based on parts of speech tagging, knowledge base and other aspects to create effective features, people can understand the features subjectively easily, but the explicit representation mode usually separates each feature information independently and ignores the context information of short texts. The implicit expression is to map each word into a high-dimensional vector, and to express the text information by using a word vector matrix, so that the neural network model can conveniently learn the information contained in the text. But some entity information, implicit representation, in the text may not be able to obtain this information. For example, { Ant will dispose new products }, the implicit expression does not take Ant as an entity and classifies it as a representation of an original word, but Ant as a name of a sports brand may affect the tendency of classification.
In the past, a model structure in which a display and an implicit text mode are integrated has been proposed. There are still several disadvantages, first, in conceptualizing text information, corresponding weight information is obtained through a large knowledge base and integrated into a neural model, but the weight information is static and independent from the text information. And secondly, the keyword information of the text is often ignored, especially in the text with less knowledge information acquired in the emotion two classification task.
Disclosure of Invention
The invention aims to provide a short text classification method, a short text classification device and electronic equipment, which solve the problem that the short text context semantic is lost in the short text classification method in the prior art, and can not accurately classify texts.
The technical purpose of the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a short text classification method, including the following steps:
determining knowledge information and key words of the short text;
embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords;
processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text;
attention calculation is carried out on semantic information of the short text and a vector matrix of knowledge information or a vector matrix of keywords to obtain a vector of the knowledge information or the keywords;
and performing feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
The method aims to solve the problem that the short text context semantic is lacked in the short text classification method in the prior art. Therefore, the invention expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static and does not concern about semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information; in addition, influence generated when knowledge information is insufficient is often ignored in the conventional classification method, so that the method adopts a convolutional neural network to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, performs aggregated classification on the knowledge information and the features of the keywords to obtain a final short text classification result, and realizes generation of a more detailed classification result on the short text from different granularities so as to improve the classification accuracy.
Further, entity recognition is carried out on the short text to obtain an entity set of the short text, and the entity set is recognized to determine knowledge information of the short text.
Further, the short text, the knowledge information and the keywords are input into a neural network model embedding layer, and the word vector model is adopted to pre-train the short text, the knowledge information and the keywords in the embedding layer to obtain vector representation of the short text, the knowledge information and the keywords.
Further, the vector representations of the short text and the knowledge information are spliced in a superior sub-network of the neural network model to obtain a vector matrix of the knowledge information;
the short text and the keywords are pre-trained by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and the character-level vector representations of the short text and the keywords are spliced in a sub-network at the lower level of a neural network model to obtain a vector matrix of the keywords at the character level.
Further, the short text vector matrix and the character-level short text vector matrix are input to a bidirectional memory network layer for processing, and semantic information of the short text context of the upper and lower sub-networks is respectively obtained.
Furthermore, in the upper-level sub-network, attention calculation is carried out on semantic information and knowledge information of the short text context to obtain a self-attention result of the knowledge information, products of the self-attention result of the knowledge information and the semantic information are calculated, and each product is spliced to obtain a vector of the knowledge information;
in a next-level sub-network, attention calculation is carried out on semantic information of the short text context and the keywords to obtain self-attention results of the keywords, products of the self-attention results of the keywords and the semantic information are calculated, and each product is spliced to obtain vectors of the keywords.
Further, note thatThe calculation formula of the intention calculation is yi=softmax(a1(tanh(a2[ci;p]+b2) ); wherein, yiRepresenting the weight of knowledge information or keywords to short text, tanh representing a hyperbolic tangent function, softmax representing the normalization of the self-attention result,
Figure BDA0003347190030000031
a matrix of weights is represented by a matrix of weights,
Figure BDA0003347190030000032
representing a weight vector, b2Representing an offset vector, p an intermediate result, W a vector, ciThe representation represents the i-th knowledge vector in the superior subnetwork and the i-th keyword vector in the inferior subnetwork.
Further, splicing the vector of the knowledge information and the vector matrix of the short text in the superior sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the superior sub-network to obtain a classification result of the superior sub-network;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
In a second aspect, the invention provides a short text classification device based on keywords and knowledge information, which is used for realizing the classification method provided in the first aspect, and comprises a determining unit, a splicing unit, a processing unit, a calculating unit and a classifying unit;
the determining unit is used for determining knowledge information and keywords of the short text;
the splicing unit is used for embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords;
the processing unit is used for processing the short text vector matrix by adopting a two-way memory network layer to obtain semantic information of the short text;
the computing unit is used for carrying out attention computing on the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain the vector of the knowledge information or the keyword;
and the classification unit is used for extracting the characteristics of the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the classification method as provided in the first aspect when executing the computer program.
Compared with the prior art, the invention has the following beneficial effects:
the invention firstly conceptualizes the short text to obtain knowledge information and extracts keywords of the short text, provides a concept of an upper and lower two-stage sub-network, and trains the text information and the knowledge information to generate a vector matrix by using a pre-trained word vector model in the upper sub-network. Then, an attention mechanism based on the context of the short text is introduced to measure the importance degree of knowledge information to the short text; and embedding the measured knowledge information and semantic information into a two-dimensional convolution network to capture features and finally classifying. In the lower network, inspired by character level embedding, the text and the keywords are embedded by using the character level embedding to obtain different granularity characteristic information, then the upper sub-network and the lower sub-network are kept consistent in subsequent operation, and finally the classification results of the upper sub-network and the lower sub-network are aggregated and classified to obtain the final text classification result.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network model structure according to an embodiment of the present invention;
FIG. 3 is a block diagram of a frame of an apparatus according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly or indirectly connected to the other element.
It will be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like, as used herein, refer to an orientation or positional relationship indicated in the drawings that is solely for the purpose of facilitating the description and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and is therefore not to be construed as limiting the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Example one
The method provided by the embodiment can be applied to the classification of short texts with less than 15 characters and can also be applied to the classification of short texts with 15-25 characters, and the problem that the short text context semantic is lacked in the short text classification method in the prior art is solved, so that the classification of the short texts is more accurate.
As shown in fig. 1, the first embodiment provides a short text classification method, which includes the following steps:
step S10, determining knowledge information and keywords of the short text.
Specifically, short text information is conceptualized. This is accomplished using existing general knowledge bases (e.g., Yago, Freebase, and base). And (3) the Probase knowledge base is used, and because the information contained in the Probase knowledge base is more extensive, the concept information in the short text can be excavated more. And obtaining an entity set E of the short text by utilizing a network interface of entity identification provided by the Probase. And then E for each entity. And acquiring the concept information of the information from the existing knowledge base by taking the isA relation as a standard. For example, the short text "Yahoo fixes two flies in mail system" obtains an entity set E ═ Yahoo mail } through an entity recognition network interface in the base, and then obtains a concept set C ═ search engine company and appucation service } by conceptualizing the isA relationship selected by the Yahoo entity.
Keywords are extracted from the short text. In this embodiment, a key word set K of a text is obtained through a Yake keyword extraction algorithm, which is an unsupervised keyword extraction algorithm characterized by capitalization of words, positions of words, word frequencies, context relationships, and frequency of occurrence of words in sentences. For example, the short text "Yahoo fixes two flies in mail system", K ═ Yahoo, fixes, flies } is obtained by the keyword extraction algorithm.
And step S20, embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords.
Specifically, as shown in fig. 2, unlike other short text classifications, only the knowledge information of the short text is used to expand the text features, the embodiment of the present application further uses the keyword information of the text, embeds the keyword information into the sub-networks of the next level for classification, and proposes to input the short text, the knowledge information, and the keywords into the embedded layers of the sub-networks of the upper and lower levels of the neural network model for stitching to obtain the vector matrix of the short text.
And step S30, processing the short text vector matrix by using a two-way memory network layer to obtain the semantic information of the short text.
Specifically, the operation of the upper sub-network coincides with that of the lower sub-network, and the upper sub-network is described as an example. Vector matrix W of short text words obtained by input unitw={W1,W2,...,WnInputting the context semantic information of the short text in the upper sub-network into an LSTM network; similarly, character-level contextual semantic information for short text may also be obtained at a subordinate subnetwork.
Step S40, the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword are attentively calculated to obtain the vector of the knowledge information or the keyword.
Specifically, the knowledge information and the keywords of the short text can supplement the feature information of the short text, and the determination of the class label of the short text is facilitated. In the superior sub-network, the short text semantic information and the knowledge information are coded for attention calculation. In the next-level sub-network, attention calculation is carried out on the short text semantic information and the keyword codes, a context-dependent attention mechanism is provided, and the weight of the concept or the keyword is calculated according to the semantic information contained in the context of the short text.
And step S50, extracting the features of the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
In particular, the Convolutional Neural Network (CNN) can extract more feature information from short texts. The method comprises the steps that vectors of semantic information and knowledge information of short texts are spliced in a superior sub-network to serve as input of the superior sub-network, vectors of the semantic information and keywords of the short texts are spliced in a subordinate sub-network to serve as input of the subordinate sub-network, the input of the superior sub-network and the input of the subordinate sub-network are subjected to convolution, pooling and classification processing through a convolution neural network, classification results of the superior sub-network and the subordinate sub-network are obtained, and aggregation classification processing is conducted on the classification results of the superior sub-network and the subordinate sub-network to obtain a final short text classification result.
According to the technical scheme, the short text classification method expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static and does not pay attention to semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information. In addition, influence generated when knowledge information is insufficient is often ignored in the conventional classification method, so that the method adopts a convolutional neural network to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, performs aggregated classification on the knowledge information and the features of the keywords to obtain a final short text classification result, and realizes generation of a more detailed classification result on the short text from different granularities so as to improve the classification accuracy.
A description is given below of a possible implementation manner of each step of the short text classification method provided in the first embodiment of the present application.
On the basis of the first embodiment, in a further embodiment of the present application, entity recognition is performed on the short text to obtain an entity set of the short text, and the entity set is recognized to determine knowledge information of the short text.
Specifically, how to determine the knowledge information of the short text is already described in step S10, and is not described here.
Based on the above embodiment, in a further embodiment of the present application, the short text, the knowledge information, and the keywords are input to the neural network model embedding layer, and the word vector model is used to pre-train the short text, the knowledge information, and the keywords in the embedding layer, so as to obtain vector representations of the short text, the knowledge information, and the keywords.
Specifically, in the embedding layer of the upper sub-network, we embed words and concepts into a high-dimensional vector space. Here we use Word vectors pre-trained with the Word2vec model to obtain a vector representation of each Word. Ww,WcRespectively, as embedded representations of words, knowledge information. The concrete formula is as follows
Figure BDA0003347190030000061
Figure BDA0003347190030000062
Wherein, it needs to be explained that one short text contains a plurality of words, for example, zhang san today goes to an orchard and plants a plurality of fruit trees; in the text, a plurality of words appear, such as orchards, fruit trees, planting and the like;
Figure BDA0003347190030000063
the splicing operation is shown in all the embodiments of the application, m and n respectively represent the maximum word number of words and knowledge information, wherein
Figure BDA0003347190030000064
Is a vector representation of the ith word,
Figure BDA0003347190030000065
for the vector representation of the ith knowledge information, finally obtaining the vector representation W of the short text through a splicing operationwVector representation W with knowledge informationc. If the vector length of the text and knowledge information is not enough, 0 is used for filling.
Figure BDA0003347190030000071
Figure BDA0003347190030000072
In the next sub-network, we use a standard Convolutional Neural Network (CNN) to obtain the character-level vector representation of the ith word
Figure BDA0003347190030000073
Character level vector representation with ith keyword
Figure BDA0003347190030000074
T and v are respectively the maximum number of words and keywords, and a vector matrix E of a character-level short text and keyword set is obtained through the same splicing operation as that of a superior subnetworkw,Ek
On the basis of the first embodiment, in a further embodiment of the present application, the vector representations of the short text and the knowledge information are spliced in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information;
the short text and the keywords are pre-trained by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and the character-level vector representations of the short text and the keywords are spliced in a sub-network at the lower level of a neural network model to obtain a vector matrix of the keywords at the character level.
Specifically, how to obtain the vector matrix of the short text, the knowledge information and the keyword is described in the above embodiments, and will not be described here.
On the basis of the first embodiment, in a further embodiment of the present application, the short text vector matrix and the character-level short text vector matrix are both input to the two-way memory network layer for processing, so as to obtain semantic information of the short text contexts of the upper and lower subnetworks, respectively.
Specifically, the upper subnetwork and the lower subnetwork obtain the vector matrix in accordance with each other, and thus the upper subnetwork will be described as an example. A word vector matrix W obtained by an input unitw={W1,W2,...,WnAnd inputting the text into the LSTM network to obtain the context semantic information of the text. Forward LSTM reads in the normal order (W)1~Wn) The inverted LSTM is read in reverse order (W) as shown in equation (4)n~W1) The following formula (5):
Figure BDA0003347190030000075
Figure BDA0003347190030000076
Figure BDA0003347190030000077
wherein h istRepresenting the neuron output at time t, wiRepresenting the ith short text vector. Combining each forward output at time t
Figure BDA0003347190030000078
And reverse input
Figure BDA0003347190030000079
To obtain the final htAs in formula (6) above, we use HsupSemantic representation of the superordinate subnetwork, i.e. the final htI.e. Hsup={h1,h2,...,ht}. The same operation procedure as that of the upper sub-network is adopted, and Ew={E1,E2,...,EtH is input into LSTM network in subordinate sub-network to obtain semantic representation H of subordinate sub-networksub
In the lower sub-network, the calculation is the same as in the upper sub-network, and the Q, K, V vectors in the lower sub-network are all equal to HsubThe final E is obtained by the same operation mode as the superior subnetworkk
On the basis of the first embodiment, in a further embodiment of the present application, in the upper-level sub-network, attention calculation is performed on semantic information and knowledge information of a short text context to obtain a self-attention result of the knowledge information, products of the self-attention result of the knowledge information and the semantic information are calculated, and each product is spliced to obtain a vector of the knowledge information;
in a next-level sub-network, attention calculation is carried out on semantic information of the short text context and the keywords to obtain self-attention results of the keywords, products of the self-attention results of the keywords and the semantic information are calculated, and each product is spliced to obtain vectors of the keywords.
Specifically, since the lower subnetwork is calculated in the same manner as the upper subnetwork, the upper subnetwork will be explained as an example, and first, the zoom point product attention mechanism is used to capture the word-word dependency relationship between sentences, and learn the internal structure of the sentences. Given a query vector Q, a key matrix K and a value matrix V. Where Q, K, V are three matrices of the same value and all equal to Hsup2r denotes a scaling factor, and r denotes the number of neurons of the upper subnetwork. And (4) performing maximum pooling operation on the calculated result A, as shown in the following formula (8), and representing the dependency relationship of the words of the short text by the maximum value in each dimension. The specific formula is as follows:
Figure BDA0003347190030000081
p=maxpool(A) (8)
after calculating p, we propose a context-based attention calculation for calculating the importance of knowledge information to text in the upper sub-network, and the specific formula is as follows:
yi=softmax(a1(tanh(a2[ci;p]+b2))) (9)
Figure BDA0003347190030000082
yi represents a concept pairIn the weighting of the text, a larger y indicates that the concept/keyword is more important for short text. tanh is a hyperbolic tangent function, normalizing the attention result to [0,1 ] using a softmax function]Within the range of (a).
Figure BDA0003347190030000083
A matrix of weights is represented by a matrix of weights,
Figure BDA0003347190030000084
representing weight vectors, R being a vector space representation, drRepresents a hyper-parameter, b2Representing an offset vector. Finally, the calculated weight yiMultiplication by
Figure BDA0003347190030000085
And splicing to obtain final WcAs in the above formula (10).
In a further embodiment of the present application, based on the first embodiment, the calculation formula of the attention calculation is yi=softmax(a1(tanh(a2[ci;p]+b2) ); wherein, yiRepresenting the weight of knowledge information or keywords to short text, tanh representing a hyperbolic tangent function, softmax representing the normalization of the self-attention result,
Figure BDA0003347190030000091
a matrix of weights is represented by a matrix of weights,
Figure BDA0003347190030000092
representing a weight vector, b2Representing an offset vector, p an intermediate result, W a vector, ciThe representation represents the i-th knowledge vector in the superior subnetwork and the i-th keyword vector in the inferior subnetwork.
Specifically, the previous embodiment has already explained how to perform the attention calculation, and therefore is not described here.
On the basis of the first embodiment, in a further embodiment of the present application, the vector of the knowledge information and the vector matrix of the short text are spliced in the upper-level sub-network, feature extraction is performed on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and the feature vector is classified through a full connection layer of the upper-level sub-network to obtain a classification result of the upper-level sub-network;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
Specifically, in the upper sub-network, we will use the semantic information H of the short textsupAnd WcSplicing as input WsupIn the subordinate subnetwork we will be HsubAnd EkSplicing as input Wsub. The corresponding formula is as follows:
Figure BDA0003347190030000093
Figure BDA0003347190030000094
wherein the content of the first and second substances,
Figure BDA0003347190030000095
m denotes the word vector dimension, nc/nkRepresenting the number of concepts/keywords, R is a vector space representation. And then, performing convolution, pooling and classification operations on the upper and lower subnetworks respectively by using a CNN model.
Firstly, the convolution kernels with the width fixed as m and the different heights h are used for respectively carrying out Wsup,WsubConvolution operation is carried out, so as to extract the features of the short text and generate a group of feature vectors vi. Feature vector [ v ] to be generated1;vi]Via an activation function relu for activation. The specific formula is as follows:
Ssup=relu(w·vi+b) (13)
wherein w is and viThe dimensions of the weight matrix are the same, and b represents an offset vector. Obtaining S of lower sub-network by the same operationsub
In the Pooling layer, Max Pooling is used, the maximum value in a certain area is taken as a representative to be output, and a vector T with a fixed length is extracted from the feature mapi. Therefore, the generalization capability of the neural network model is improved, and the network parameters are reduced. After CNN pooling, we introduce a fully connected softmax (·) layer to sort the upper and lower levels separately. And finally, performing combined classification on the classification results of the upper and lower sub-networks to obtain a final classification result Output, as shown in fig. 2, wherein the specific formula is as follows:
Figure BDA0003347190030000101
by combining the above technical solutions, as shown in fig. 2, fig. 2 is a proposed short text network classification model, and the model is generally composed of an upper level network and a lower level network, and each of the upper level network and the lower level network includes four units. The method comprises the steps that four units are arranged in a superior network, short texts are conceptualized by using an external knowledge base (base), corresponding word vector matrixes are generated by using pre-trained word vectors, and the vector matrixes of the texts are input into an LSTM network to obtain semantic representation of the texts. Thirdly, the semantic representation and the concept word vector matrix are subjected to a dynamic attention mechanism to obtain a knowledge information vector. And finally, connecting the semantic representation with the vector of the knowledge information together to complete classification through a CNN network. In a lower-level network, keywords in a short text are obtained by using a Yake keyword automatic extraction algorithm, corresponding vector representation is generated through character-level features, other units in the lower-level network are consistent with a higher-level sub-network, and knowledge information is replaced by keyword information. And finally, combining the classification results of the upper and lower networks together, and acquiring the probability of each class by using an output layer.
Example two
Based on the same concept, the second embodiment provides a short text classification device, which can be applied to a computer and some other electronic devices to execute the short text classification method described in the first embodiment, as shown in fig. 3, which shows a structural block diagram of the short text classification device provided in the second embodiment of the present application, and includes a determining unit 110, a splicing unit 120, a processing unit 130, a calculating unit 140, and a classifying unit 150;
the determining unit 110 is configured to determine knowledge information and keywords of a short text;
the splicing unit 120 is configured to embed the short text, the knowledge information, and the keyword into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information, and the keyword;
the processing unit 130 is configured to process the short text vector matrix by using a two-way memory network layer to obtain semantic information of the short text;
the calculating unit 140 is configured to perform attention calculation on the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain a vector of the knowledge information or the keyword;
the classification unit 150 is configured to perform feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
According to the technical scheme, the short text classification device in the second embodiment of the application expands the representation range of the short text by determining the knowledge information and the keywords of the short text, but the embedding of the knowledge information by the existing classification method is only static, and does not pay attention to semantic information of the context of the short text, so that a self-attention mechanism based on the context is provided, and the knowledge information is selectively embedded through the context information. In addition, influence generated when knowledge information is insufficient is often ignored in the conventional classification method, so that the method adopts a convolutional neural network to extract the keywords and the vector of the knowledge information and the feature information of the semantic information of the short text, performs aggregated classification on the knowledge information and the features of the keywords to obtain a final short text classification result, and realizes generation of a more detailed classification result on the short text from different granularities so as to improve the classification accuracy.
Optionally, the determining unit 110 is further configured to perform entity identification on the short text, obtain an entity set of the short text, and identify the entity set to determine knowledge information of the short text.
Optionally, the splicing unit 120 is further configured to input the short text, the knowledge information, and the keyword into the neural network model embedding layer, and pre-train the short text, the knowledge information, and the keyword in the embedding layer by using a word vector model to obtain a vector representation of the short text, the knowledge information, and the keyword.
Optionally, the splicing unit 120 includes a first splicing unit and a second splicing unit, where the first splicing unit is configured to splice the vector representations of the short text and the knowledge information in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information;
the second splicing unit is used for pre-training the short text and the keywords by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, splicing the character-level vector representations of the short text and the keywords in a sub-network of a lower level of the neural network model to obtain a vector matrix of the keywords at a character level.
Optionally, the calculating unit 140 is configured to input the short text vector matrix and the vector matrix of the character-level short text into the two-way memory network layer for processing, and obtain semantic information of the short text context of the upper and lower subnetworks, respectively.
Optionally, the calculating unit 140 is further configured to perform attention calculation on the semantic information and the knowledge information of the short text context in the upper-level sub-network to obtain a self-attention result of the knowledge information, calculate a product of the self-attention result of the knowledge information and the semantic information, and splice each product to obtain a vector of the knowledge information;
in a next-level sub-network, attention calculation is carried out on semantic information of the short text context and the keywords to obtain self-attention results of the keywords, products of the self-attention results of the keywords and the semantic information are calculated, and each product is spliced to obtain vectors of the keywords.
Optionally, the calculation formula of the attention calculation is yi=softmax(a1(tanh(a2[ci;p]+b2) ); wherein, yiRepresenting the weight of knowledge information or keywords to short text, tanh representing a hyperbolic tangent function, softmax representing the normalization of the self-attention result,
Figure BDA0003347190030000111
a matrix of weights is represented by a matrix of weights,
Figure BDA0003347190030000112
representing a weight vector, b2Representing an offset vector, p an intermediate result, W a vector, ciThe representation represents the i-th knowledge vector in the superior subnetwork and the i-th keyword vector in the inferior subnetwork.
Optionally, splicing the vector of the knowledge information and the vector matrix of the short text in the superior sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the superior sub-network to obtain a classification result of the superior sub-network;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
The feasible implementation manners of each unit of the short text classification apparatus provided in the second embodiment of the present application are all described in the first embodiment of the short text classification method, and therefore will not be described here.
EXAMPLE III
Based on the same concept, as shown in fig. 4, a third embodiment of the present application provides an electronic device, which includes a memory 330, a processor 310, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of implementing the classification method provided in the first embodiment when executing the computer program.
Fig. 4 is a schematic entity structure diagram of an electronic device according to a third embodiment of the present invention, and as shown in fig. 4, the electronic device may include: a processor (processor)310, a communication interface (communication interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke a computer program stored on the memory 330 and executable on the processor 310 to perform the text classification methods provided by the various embodiments described above, including, for example: determining knowledge information and key words of the short text; embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords; processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text; attention calculation is carried out on the semantic information of the short text, the vector matrix of the knowledge information and the vector matrix of the keywords to obtain the knowledge information or the vectors of the keywords of the short text; and performing feature extraction on the vector and semantic information of the short text by using a convolutional neural network to obtain a short text classification result.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A short text classification method is characterized by comprising the following steps:
determining knowledge information and key words of the short text;
embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords;
processing the short text vector matrix by adopting a bidirectional memory network layer to obtain semantic information of the short text;
attention calculation is carried out on semantic information of the short text and a vector matrix of knowledge information or a vector matrix of keywords to obtain a vector of the knowledge information or the keywords;
and performing feature extraction on the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
2. The short text classification method according to claim 1, characterized in that entity recognition is performed on the short text to obtain an entity set of the short text, and the entity set is recognized to determine knowledge information of the short text.
3. The short text classification method according to claim 2, characterized in that the short text, knowledge information and keywords are input to the neural network model embedding layer, and the word vector model is used to pre-train the short text, knowledge information and keywords in the embedding layer to obtain the vector representation of the short text, knowledge information and keywords.
4. The short text classification method according to claim 3, characterized in that the vector representations of the short text and the knowledge information are spliced in a superior subnetwork of the neural network model to obtain a vector matrix of the knowledge information;
the short text and the keywords are pre-trained by adopting a convolutional neural network to obtain character-level vector representations of the short text and the keywords, and the character-level vector representations of the short text and the keywords are spliced in a sub-network at the lower level of a neural network model to obtain a vector matrix of the keywords at the character level.
5. The short text classification method according to claim 4, characterized in that the short text vector matrix and the character-level short text vector matrix are both input to the two-way memory network layer for processing, and semantic information of the short text context of the upper and lower sub-networks is obtained respectively.
6. The short text classification method according to any one of claims 4-5, characterized in that in the upper sub-network, attention calculation is performed on semantic information and knowledge information of a short text context to obtain a self-attention result of the knowledge information, products of the self-attention result of the knowledge information and the semantic information are calculated, and each product is spliced to obtain a vector of the knowledge information;
in a next-level sub-network, attention calculation is carried out on semantic information of the short text context and the keywords to obtain self-attention results of the keywords, products of the self-attention results of the keywords and the semantic information are calculated, and each product is spliced to obtain vectors of the keywords.
7. The short text classification method according to claim 6,
the formula for the attention calculation is yi=softmax(a1(tanh(a2[ci;p]+b2) ); wherein, yiRepresenting the weight of knowledge information or keywords to short text, tanh representing a hyperbolic tangent function, softmax representing the normalization of the self-attention result,
Figure FDA0003347190020000021
a matrix of weights is represented by a matrix of weights,
Figure FDA0003347190020000022
representing a weight vector, b2Representing an offset vector, p an intermediate result, W a vector, ciThe representation represents the i-th knowledge vector in the superior subnetwork and the i-th keyword vector in the inferior subnetwork.
8. The short text classification method according to claim 6,
splicing the vector of the knowledge information and the vector matrix of the short text in a superior subnetwork, extracting the characteristics of the spliced matrix through a two-dimensional convolutional neural network to obtain a characteristic vector, and classifying the characteristic vector through a full-connection layer of the superior subnetwork to obtain a classification result of the superior subnetwork;
splicing the vector of the keyword and the vector matrix of the keyword in the lower sub-network, performing feature extraction on the spliced matrix through a two-dimensional convolutional neural network to obtain a feature vector, and classifying the feature vector through a full connection layer of the lower sub-network to obtain a classification result of the lower sub-network;
and carrying out combined classification on the classification result of the superior sub-network and the classification result of the subordinate sub-network to obtain the classification result of the short text.
9. A short text classification device based on keywords and knowledge information is characterized by comprising a determining unit, a splicing unit, a processing unit, a calculating unit and a classifying unit;
the determining unit is used for determining knowledge information and keywords of the short text;
the splicing unit is used for embedding the short text, the knowledge information and the keywords into a vector space for splicing to obtain a vector matrix of the short text, the knowledge information and the keywords;
the processing unit is used for processing the short text vector matrix by adopting a two-way memory network layer to obtain semantic information of the short text;
the computing unit is used for carrying out attention computing on the semantic information of the short text and the vector matrix of the knowledge information or the vector matrix of the keyword to obtain the vector of the knowledge information or the keyword;
and the classification unit is used for extracting the characteristics of the vector and the vector matrix by using a convolutional neural network to obtain a short text classification result.
10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the classification method according to any one of claims 1 to 8 when executing the computer program.
CN202111326798.5A 2021-11-10 2021-11-10 Short text classification method and device and electronic equipment Active CN113987188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111326798.5A CN113987188B (en) 2021-11-10 2021-11-10 Short text classification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111326798.5A CN113987188B (en) 2021-11-10 2021-11-10 Short text classification method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113987188A true CN113987188A (en) 2022-01-28
CN113987188B CN113987188B (en) 2022-07-08

Family

ID=79747702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111326798.5A Active CN113987188B (en) 2021-11-10 2021-11-10 Short text classification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113987188B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115048515A (en) * 2022-06-09 2022-09-13 广西力意智能科技有限公司 Document classification method, device, equipment and storage medium
CN115617990A (en) * 2022-09-28 2023-01-17 浙江大学 Electric power equipment defect short text classification method and system based on deep learning algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
KR20180112590A (en) * 2017-04-04 2018-10-12 한국전자통신연구원 System and method for generating multimedia knowledge base
CN109710761A (en) * 2018-12-21 2019-05-03 中国标准化研究院 The sentiment analysis method of two-way LSTM model based on attention enhancing
CN110321562A (en) * 2019-06-28 2019-10-11 广州探迹科技有限公司 A kind of short text matching process and device based on BERT
CN111460142A (en) * 2020-03-06 2020-07-28 南京邮电大学 Short text classification method and system based on self-attention convolutional neural network
CN113515632A (en) * 2021-06-30 2021-10-19 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification method based on graph path knowledge extraction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
KR20180112590A (en) * 2017-04-04 2018-10-12 한국전자통신연구원 System and method for generating multimedia knowledge base
CN109710761A (en) * 2018-12-21 2019-05-03 中国标准化研究院 The sentiment analysis method of two-way LSTM model based on attention enhancing
CN110321562A (en) * 2019-06-28 2019-10-11 广州探迹科技有限公司 A kind of short text matching process and device based on BERT
CN111460142A (en) * 2020-03-06 2020-07-28 南京邮电大学 Short text classification method and system based on self-attention convolutional neural network
CN113515632A (en) * 2021-06-30 2021-10-19 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification method based on graph path knowledge extraction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邵云飞: "融合主题模型与词向量的短文本分类方法研究", 《中国优秀硕士学位论文全文库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115048515A (en) * 2022-06-09 2022-09-13 广西力意智能科技有限公司 Document classification method, device, equipment and storage medium
CN115617990A (en) * 2022-09-28 2023-01-17 浙江大学 Electric power equipment defect short text classification method and system based on deep learning algorithm
CN115617990B (en) * 2022-09-28 2023-09-05 浙江大学 Power equipment defect short text classification method and system based on deep learning algorithm

Also Published As

Publication number Publication date
CN113987188B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
Wen et al. Ensemble of deep neural networks with probability-based fusion for facial expression recognition
KR102071582B1 (en) Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
CN111079639B (en) Method, device, equipment and storage medium for constructing garbage image classification model
Shao et al. Feature learning for image classification via multiobjective genetic programming
CN111444344B (en) Entity classification method, entity classification device, computer equipment and storage medium
CN111259144A (en) Multi-model fusion text matching method, device, equipment and storage medium
CN112711953A (en) Text multi-label classification method and system based on attention mechanism and GCN
CN112818861A (en) Emotion classification method and system based on multi-mode context semantic features
CN113987188B (en) Short text classification method and device and electronic equipment
CN111177383B (en) Text entity relation automatic classification method integrating text grammar structure and semantic information
CN112905795A (en) Text intention classification method, device and readable medium
CN110796199A (en) Image processing method and device and electronic medical equipment
CN116186594B (en) Method for realizing intelligent detection of environment change trend based on decision network combined with big data
US20210073628A1 (en) Deep neural network training method and apparatus, and computer device
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN113806580A (en) Cross-modal Hash retrieval method based on hierarchical semantic structure
CN117649567B (en) Data labeling method, device, computer equipment and storage medium
CN115168579A (en) Text classification method based on multi-head attention mechanism and two-dimensional convolution operation
Li et al. Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model
Ghayoumi et al. Local sensitive hashing (LSH) and convolutional neural networks (CNNs) for object recognition
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
CN113553326A (en) Spreadsheet data processing method, device, computer equipment and storage medium
CN113743079A (en) Text similarity calculation method and device based on co-occurrence entity interaction graph
CN113536784A (en) Text processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant