CN115168574A - Method and device for classifying problem texts with multi-value chains - Google Patents

Method and device for classifying problem texts with multi-value chains Download PDF

Info

Publication number
CN115168574A
CN115168574A CN202210734541.1A CN202210734541A CN115168574A CN 115168574 A CN115168574 A CN 115168574A CN 202210734541 A CN202210734541 A CN 202210734541A CN 115168574 A CN115168574 A CN 115168574A
Authority
CN
China
Prior art keywords
text
training
word
label
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210734541.1A
Other languages
Chinese (zh)
Inventor
覃博
马祖扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210734541.1A priority Critical patent/CN115168574A/en
Publication of CN115168574A publication Critical patent/CN115168574A/en
Priority to CN202211355585.XA priority patent/CN115757773A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of text classification, in particular to a method and a device for classifying texts with a multivalent value chain problem, wherein the method comprises the following steps: collecting question sentences searched by users in each business search platform in an enterprise and labeling text type labels to obtain a data set of multi-value chain problem texts; preprocessing the problem text data set, and dividing the preprocessed problem text data set into a training text set and a test text set; extracting word vectors from the training text set and the test text set, and constructing a full-connection network graph by using the extracted word vectors and labeled text category labels based on the training text set; training an initial label graph attention model based on the full-connection network graph to obtain a trained label graph attention model; training the initial text semantic extraction model based on the word vectors extracted from the training text set to obtain a trained text semantic extraction model; and performing semantic extraction on the test text set according to the trained text semantic extraction model, and classifying the test text set according to the trained label graph attention model. By adopting the method and the device, the intelligent classification of the multi-value chain problem texts can be realized.

Description

Method and device for classifying problem texts with multi-value chains
Technical Field
The invention relates to the technical field of text classification, in particular to a method and a device for classifying texts with multivalent value chains.
Background
Under the continuous self-updating of the collaborative work and data sharing platform of each enterprise in the multivalence value chain, more and more enterprises establish own knowledge bases and store important knowledge and experience in the aspects of enterprise operation, production, sales and the like, and an intelligent question-answering system is developed therewith and used for searching the knowledge bases by enterprise workers, and enterprise workers tend to use natural language to inquire wanted knowledge, so that the classification of question sentences is an important link in the establishment of the multivalence value chain intelligent question-answering system and has an important propulsion effect on the intelligent question-answering system service.
The core problem of text classification is the text representation and classification model. The traditional text representation utilizes Bag-of-Words principle to represent Words into a high-dimensional Vector Space Model (Vector Space Model) out of order, and text structure information such as word order and grammar is discarded. The traditional machine learning text classification model obtains a classifier by using Bayesian theory based on the assumption that classification obeys certain probability distribution, but the classification accuracy is influenced once the assumption is not established.
Disclosure of Invention
The invention provides a method and a device for classifying a problem text of a multi-value chain, which are used for realizing intelligent classification of the problem text of the multi-value chain.
The technical scheme is as follows:
in one aspect, a method for classifying texts of a multivalent value chain problem is provided, and the method includes:
s1, collecting question sentences searched by users in each business search platform in an enterprise and labeling text type labels to obtain a data set of a multi-value chain problem text;
s2, preprocessing the problem text data set, and dividing the preprocessed problem text data set into a training text set and a test text set;
s3, extracting word vectors from the training text set and the test text set, and constructing a full-connection network diagram by using the extracted word vectors and labeled text category labels based on the training text set;
s4, training the initial label graph attention model based on the full-connection network graph to obtain a trained label graph attention model;
s5, training the initial text semantic extraction model based on the word vectors extracted from the training text set to obtain a trained text semantic extraction model;
s6, semantic extraction is carried out on the test text set according to the trained text semantic extraction model, and the test text set is classified according to the trained label graph attention model.
Optionally, the preprocessing the question text data set in S2 includes:
performing Chinese word segmentation processing and stop word removal processing on the problem text data set;
the Chinese word segmentation processing and stop word filtering processing comprise the following steps:
segmenting words of the text based on a pre-constructed multivalent value chain text expert dictionary, and determining a word segmentation result by adopting a principle of maximizing the number of large-granularity words; wherein the text is a question text data set;
and filtering the stop words based on a pre-constructed polyvalent value chain stop word dictionary to remove noise words.
Optionally, the performing word vector extraction on the preprocessed training text set and test text set in S3, and constructing a fully-connected network graph by using the extracted word vector and the labeled text category label based on the training text set, includes:
obtaining word vectors of a preprocessed training text set and a preprocessed test text set based on a pretrained BERT word vector extraction model;
based on a training text set, using extracted word vectors and labeled text category labels as nodes respectively, and constructing a fully-connected network graph of full connection among all the nodes, wherein the nodes represented by the text category labels are initialized to random vectors, the dimensions are consistent with the dimensions of the word vectors, the edge weights among the word vector nodes are initialized randomly, and the edge weights among category-word nodes are initialized to the probability of the word in the training text set appearing in the category.
Optionally, in S4, based on the fully-connected network map, training the initial label map Attention model to be a GNN-Attention model, where the GNN-Attention model includes a GNN submodel and an Attention submodel;
the GNN submodel is used for obtaining the output characteristics of each category label node;
the Attention submodel is used to control Attention weight assignment among nodes in the GNN submodel.
Optionally, the training process of the GNN-Attention model includes:
iteratively updating the eigenvectors of the nodes in the GNN-Attention model according to the following formula (2);
Figure BDA0003715210040000031
wherein σ 0 Denotes the sigmoid activation function, W k Represents the parameter matrix at the kth iteration of GNN, k ∈ [1,K]Wherein K ∈ R + K > 1, which is a preset constant, N (v) represents all neighbor nodes of the node v, | N (v) | represents the size of N (v),
Figure BDA0003715210040000032
representing the feature vector of node u at iteration k-1,
Figure BDA0003715210040000033
is node v at the k-1 st iterationThe feature vector of the generation time is,
Figure BDA0003715210040000034
is the output vector of node v at the kth iteration, B k Is the bias parameter at the kth iteration, k is 1, which means the first iteration, and all parameters are initialized randomly. Alpha is alpha vu Expressing the attention weight between the node u and the node v, which is defined as expressed by the following formula (3);
Figure BDA0003715210040000035
wherein e is vu The definition is obtained by the following formula (4);
Figure BDA0003715210040000036
wherein a represents a function for calculating attention weight;
the defined loss function is a class equilibrium tag similarity loss function, as shown in the following equation (5):
Figure BDA0003715210040000037
where m denotes the number of batches into which the entire training text set is divided, b denotes the subscript of the batch, n b Denotes the number of training texts in a batch with subscript b, i denotes the subscript of the training texts in a certain batch, L denotes the total number of category labels, s j Represents the frequency of the jth label appearing in the whole training text set, g b (i) Denotes the class label, z, corresponding to the training text with index i in the batch with index b b (i) Represents the node index, z, corresponding to the training text with index i in the batch with index b in the full-connected graph b (g b (i) Represent category label g) b (i) Node subscript, W, corresponding in the fully-connected graph c Representing a parameter matrix, W, for linear transformation of nodes of a training text l Indicating targetPerforming linear transformation on the label nodes to obtain a parameter matrix, wherein K is the total number of the iteration;
and updating parameters in the GNN-Attention model by adopting a batch gradient descent algorithm according to the class equilibrium label similarity loss function to obtain a trained label graph Attention model.
Optionally, in the S5, based on the word vector extracted from the training text set, the trained initial text semantic extraction model is a Bi-LSTM model;
training the Bi-LSTM submodel according to the following formula (6);
Figure BDA0003715210040000041
wherein N represents the number of sentences in the training text set, y i Class label vector representing ith sentence, represented by one-hot coding, p i A probability distribution representing the final output of the ith sentence, defined as expressed by the following formula (7);
p i =Softmax(z)……(7)
wherein Softmax represents a Softmax activation function, z represents a dot product result vector of an output vector of the current text passing through the Bi-LSTM and each label vector, and is defined as the following formula (8);
z t =e i ·h t ,t∈[0,M)……(8)
wherein z is t Representing the t-th component, e, of the vector z i Representing the output vector, h, of the Bi-LSTM model for the ith text t A representation vector representing the t-th class label, and M represents the total number of class labels.
In another aspect, the present invention provides an apparatus for classifying a text in a problem with a multivalent value chain, the apparatus comprising:
the data collection module is used for collecting question sentences searched by users in various business search platforms in an enterprise and labeling text type labels to obtain a data set of multi-value chain problem texts;
the data preprocessing module is used for preprocessing the problem text data set and dividing the preprocessed problem text data set into a training text set and a test text set;
the extraction and construction module is used for extracting word vectors from the training text set and the test text set, and constructing a full-connection network graph by using the extracted word vectors and the labeled text category labels based on the training text set;
the training module is used for training the initial label graph attention model based on the full-connection network graph to obtain a trained label graph attention model; training an initial text semantic extraction model based on the word vectors extracted from the training text set to obtain a trained text semantic extraction model;
and the classification module is used for performing semantic extraction on the test text set according to the trained text semantic extraction model and classifying the test text set according to the trained label graph attention model.
Optionally, the data preprocessing module is specifically configured to:
performing Chinese word segmentation processing and stop word removal processing on the problem text data set;
the Chinese word segmentation processing and stop word removal processing comprises the following steps:
segmenting words of the text based on a pre-constructed multivalent value chain text expert dictionary, and determining a word segmentation result by adopting a principle of maximizing the number of large-granularity words; wherein the text is a question text data set;
and filtering stop words based on a pre-constructed multi-value chain stop word dictionary to remove noise words.
Optionally, the extraction building block is specifically configured to:
obtaining word vectors of a preprocessed training text set and a preprocessed test text set based on a pretrained BERT word vector extraction model;
based on a training text set, using extracted word vectors and labeled text category labels as nodes respectively, and constructing a fully-connected network graph of full connection among all the nodes, wherein the nodes represented by the text category labels are initialized to random vectors, the dimensions are consistent with the dimensions of the word vectors, the edge weights among the word vector nodes are initialized randomly, and the edge weights among category-word nodes are initialized to the probability of the word in the training text set appearing in the category.
Optionally, the initial tag map Attention model is a GNN-Attention model, the GNN-Attention model including a GNN submodel and an Attention submodel;
the GNN submodel is used for obtaining the output characteristics of each category label node;
the Attention submodel is used to control Attention weight assignment among nodes in the GNN submodel.
Optionally, the training module is specifically configured to:
iteratively updating the eigenvectors of the nodes in the GNN-Attention model according to the following formula (2);
Figure BDA0003715210040000051
wherein σ 0 Denotes the sigmoid activation function, W k Represents the parameter matrix at the kth iteration of GNN, k ∈ [1,K]Wherein K ∈ R + K > 1, which is a preset constant, N (v) represents all neighbor nodes of the node v, | N (v) | represents the size of N (v),
Figure BDA0003715210040000052
representing the feature vector of node u at iteration k-1,
Figure BDA0003715210040000053
is the feature vector of node v at the k-1 iteration,
Figure BDA0003715210040000054
is the output vector of node v at the kth iteration, B k Is the bias parameter at the kth iteration, k is 1, which means the first iteration, and all parameters are initialized randomly. Alpha is alpha vu Represents the attention weight between the node u and the node v, and is defined asThe following formula (3);
Figure BDA0003715210040000061
wherein e is vu The definition is obtained by the following formula (4);
Figure BDA0003715210040000062
wherein a represents a function for calculating attention weight;
the defined loss function is a class equilibrium tag similarity loss function, as shown in the following equation (5):
Figure BDA0003715210040000063
where m denotes the number of batches into which the entire training text set is divided, b denotes the subscript of the batch, n b Denotes the number of training texts in a batch with subscript b, i denotes the subscript of the training texts in a certain batch, L denotes the total number of category labels, s j Representing the frequency of occurrence of the jth label in the entire training text set, g b (i) Indicates the category label corresponding to the training text with index i in the batch with index b, z b (i) Represents the node index, z, corresponding to the training text with index i in the batch with index b in the full-connected graph b (g b (i) Represent category label g) b (i) Node subscript, W, corresponding in the fully-connected graph c Representing a parameter matrix, W, for linear transformation of nodes of a training text l A parameter matrix representing the linear transformation of the label nodes, where K is the total number of iterations;
and updating parameters in the GNN-Attention model by adopting a batch gradient descent algorithm according to the class equilibrium label similarity loss function to obtain a trained label graph Attention model.
Optionally, the initial text semantic extraction model is a Bi-LSTM model, and the training module is specifically configured to:
training the Bi-LSTM submodel according to the following formula (6);
Figure BDA0003715210040000064
wherein N represents the number of sentences in the training text set, y i Class label vector representing ith sentence, represented by one-hot coding, p i A probability distribution representing the final output of the ith sentence, defined as expressed by the following formula (7);
p i =Softmax(z)……(7)
wherein Softmax represents a Softmax activation function, z represents a dot product result vector of an output vector of the current text passing through the Bi-LSTM and each label vector, and is defined as the following formula (8);
z t =e i ·h t ,t∈[0,M)……(8)
wherein z is t Representing the t-th component, e, of the vector z i Representing the output vector, h, of the Bi-LSTM model for the ith text t A representation vector representing the t-th class label, and M represents the total number of class labels.
In another aspect, an electronic device is provided, which includes a processor and a memory, where at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the text classification method for multi-value chain problems.
In another aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the above text classification method for a multiple value chain problem.
The technical scheme provided by the invention has the beneficial effects that at least:
according to the method, firstly, a multi-value chain problem text is collected, wherein a special dictionary in the multi-value chain field is constructed, and through combing special nouns in the multi-value chain field, great help can be provided for the word segmentation accuracy of the text in the multi-value chain field;
and then classifying question texts based on a graph attention network, classifying search questions of enterprise workers in the field of multivalent value chains by training a graph attention tag model and a text semantic extraction model, providing great help for construction of a business question answering system in an enterprise, greatly reducing the search range and improving the search efficiency according to the question classification result by the question answering system, and finally enabling the question answering system to obtain faster response and more accurate semantic search.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a text classification method for a multivalent value chain problem according to an embodiment of the present invention;
FIG. 2 is a detailed architecture diagram of a multi-value chain problem text classification model based on a graph attention network according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for constructing a text-aided classification decision model of a multi-value chain problem based on a graph attention network according to an embodiment of the present invention;
FIG. 4 is a block diagram of a text classification apparatus for a multi-valence value chain problem according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
To make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a method for classifying a problem text of a multivalent value chain, which can be realized by a device for classifying the problem text of the multivalent value chain. As shown in fig. 1, a flowchart of a text classification method for a multivalent value chain problem, a processing flow of the method may include the following steps:
s1, collecting question sentences searched by users in each business search platform in an enterprise and labeling text type labels to obtain a data set of a multi-value chain problem text;
s2, preprocessing the problem text data set, and dividing the preprocessed problem text data set into a training text set and a test text set;
s3, extracting word vectors from the training text set and the test text set, and constructing a full-connection network diagram by using the extracted word vectors and labeled text category labels based on the training text set;
s4, training the initial label graph attention model based on the full-connection network graph to obtain a trained label graph attention model;
s5, training the initial text semantic extraction model based on the word vectors extracted from the training text set to obtain a trained text semantic extraction model;
s6, semantic extraction is carried out on the test text set according to the trained text semantic extraction model, and the test text set is classified according to the trained label graph attention model.
Optionally, the preprocessing the question text data set in S2 includes:
performing Chinese word segmentation processing and stop word removal processing on the problem text data set;
the Chinese word segmentation processing and stop word filtering processing comprise the following steps:
segmenting words of the text based on a pre-constructed multivalent value chain text expert dictionary, and determining a segmentation result by adopting a principle of maximizing the number of large-granularity words; wherein the text is a question text data set;
and filtering stop words based on a pre-constructed multi-value chain stop word dictionary to remove noise words.
Optionally, the performing word vector extraction on the preprocessed training text set and test text set in S3, and constructing a fully-connected network graph by using the extracted word vector and the labeled text category label based on the training text set, includes:
obtaining word vectors of a preprocessed training text set and a preprocessed test text set based on a pretrained BERT word vector extraction model;
based on a training text set, using extracted word vectors and labeled text category labels as nodes respectively, and constructing a fully-connected network graph of full connection among all the nodes, wherein the nodes represented by the text category labels are initialized to random vectors, the dimensions are consistent with the dimensions of the word vectors, the edge weights among the word vector nodes are initialized randomly, and the edge weights among category-word nodes are initialized to the probability of the word in the training text set appearing in the category.
Optionally, in S4, based on the fully-connected network map, training the initial label map Attention model to be a GNN-Attention model, where the GNN-Attention model includes a GNN submodel and an Attention submodel;
the GNN submodel is used for obtaining the output characteristics of each category label node;
the Attention submodel is used to control Attention weight assignment among nodes in the GNN submodel.
Optionally, the training process of the GNN-Attention model includes:
iteratively updating the eigenvectors of the nodes in the GNN-Attention model according to the following formula (2);
Figure BDA0003715210040000091
wherein σ 0 Denotes the sigmoid activation function, W k Represents the parameter matrix at the kth iteration of GNN, k ∈ [1,K]Wherein K ∈ R + K > 1, which is a preset constant, N (v) represents all neighbor nodes of the node v, | N (v) | represents the size of N (v),
Figure BDA0003715210040000092
representing the feature vector of node u at iteration k-1,
Figure BDA0003715210040000093
is the feature vector of node v at the k-1 iteration,
Figure BDA0003715210040000094
is the output vector of node v at the kth iteration, B k Is the bias parameter at the kth iteration, k is 1, which means the first iteration, and all parameters are initialized randomly. Alpha is alpha vu Expressing the attention weight between the node u and the node v, which is defined as expressed by the following formula (3);
Figure BDA0003715210040000101
wherein e is vu The definition is obtained by the following formula (4);
Figure BDA0003715210040000102
wherein a represents a function for calculating attention weight;
the defined loss function is a class equilibrium tag similarity loss function, as shown in the following equation (5):
Figure BDA0003715210040000103
where m denotes the number of batches into which the entire training text set is divided, b denotes the subscript of the batch, n b Denotes the number of training texts in a batch with subscript b, i denotes the subscript of the training texts in a certain batch, L denotes the total number of category labels, s j Representing the frequency of occurrence of the jth label in the entire training text set, g b (i) Indicates the category label corresponding to the training text with index i in the batch with index b, z b (i) Batch denoted at subscript bThe subscript of the node corresponding to the training text with the subscript i in the full-connection graph, z b (g b (i) Represent category label g) b (i) Node subscript, W, corresponding in the fully-connected graph c Representing a parameter matrix, W, for linear transformation of nodes of a training text l A parameter matrix representing the linear transformation of the label nodes, wherein K is the total number of the iterations;
and updating parameters in the GNN-Attention model by adopting a batch gradient descent algorithm according to the class equilibrium label similarity loss function to obtain a trained label graph Attention model.
Optionally, in the S5, based on the word vector extracted from the training text set, the trained initial text semantic extraction model is a Bi-LSTM model;
training the Bi-LSTM submodel according to the following formula (6);
Figure BDA0003715210040000104
wherein N represents the number of sentences in the training text set, y i Class label vector representing ith sentence, represented by one-hot coding, p i A probability distribution representing the final output of the ith sentence, defined as expressed by the following formula (7);
p i =Softmax(z)……(7)
wherein Softmax represents a Softmax activation function, z represents a dot product result vector of an output vector of the current text passing through the Bi-LSTM and each tag vector, and is defined as the following formula (8);
z t =e i ·h t ,t∈[0,M)……(8)
wherein z is t Representing the t-th component, e, of the vector z i Representing the output vector, h, of the Bi-LSTM model for the ith text t A representation vector representing the t-th class label, and M represents the total number of class labels.
In the embodiment of the invention, firstly, the multi-value chain problem text is collected, wherein a special dictionary in the multi-value chain field is constructed, and the method can provide great help for the word segmentation accuracy of the text in the multi-value chain field by combing the special nouns in the multi-value chain field;
then, problem text classification based on a graph attention network is carried out, search question sentences of enterprise workers in the field of multivalent value chains are classified through a training graph attention label model and a text semantic extraction model, great help is provided for construction of a business question-answering system inside an enterprise, the question-answering system can greatly reduce the search range and improve the search efficiency according to the problem classification result, and finally the question-answering system can obtain faster response and more accurate semantic search.
The following detailed description of embodiments of the invention:
first, the relevant knowledge in the field of text classification is introduced:
text classification is an important part of text mining, whose main task is to partition a given set of text into a set of one or more categories that are known. At present, the text classification task is applied to a plurality of fields, such as emotion analysis, problem classification, spam detection and the like. The construction of a text classification system requires a training set D = (D) as well as supervised learning 1 ,d 2 ,...,d N ) Where N is the number of sequences, each text is labeled with its category (c) 1 ,c 2 ,...,c N ) The classification model is then learned using the training data to predict the class of the new text x.
The core problem of text classification is the text representation and classification model. The traditional text representation utilizes Bag-of-Words principle to represent Words into a high-dimensional Vector Space Model (Vector Space Model) out of order, and text structure information such as word order and grammar is discarded. The traditional machine learning text classification model obtains a classifier by using Bayesian theory based on the assumption that classification obeys certain probability distribution, but the classification accuracy is influenced once the assumption is not established. As deep learning progresses, more and more neural network models are used for text classification. And if the fastText is used, superposing and averaging the words of the whole document and the n-gram vector to obtain a document vector, and then using the document vector to perform softmax multi-classification. However, the network result in fastText is completely without considering word sequence information, and the n-gram feature used by the network result just indicates the important meaning of local sequence information. TextCNN is then proposed, applying convolutional neural network CNN to the text classification task, using a number of different size kernels to extract key information in the sentence (similar to multi-window sized n-grams), so that local relevance can be better captured.
Although TextCNN can perform well in many tasks, CNN has a biggest problem of fixing the view of filter _ size, on one hand, it is not possible to model longer sequence information, and on the other hand, the super-reference adjustment of filter _ size is also cumbersome. The nature of CNN is to do the feature expression work of text, and the recurrent neural network is more commonly used in natural language processing, which can better express the context information.
CNN and RNN are used in text classification tasks, although they are significant in effect, but are not well interpretable. The Attention (Attention) mechanism is a common modeling long-term memory mechanism in the field of natural language processing, and can intuitively give the contribution of each word to the result. For example, the Hierarchical Attention Network layers a document by sentences and words in the document classification task, and uses a very effective Attention mechanism in each layer. These deep learning models may capture semantic and syntactic information well in locally continuous word sequences, but in corpora with non-continuous and long-distance semantics, the co-occurrence of global words may be ignored, and the GNN model based on the graph neural network may solve this problem well by iterative updating. Therefore, the embodiment of the invention combines GNN and Attention to well solve the problems of feature expression and interpretability, and uses the variant Bi-LSTM of RNN to extract text semantics.
The multi-value chain problem text classification scheme provided by the embodiment of the invention applies supervised learning and graph neural network learning theories, forms a full-connection graph by words in a training text and all nodes representing class labels to perform feature learning of the class labels, extracts the text features through a training text semantic extraction model, obtains the label distribution of the text classes through the similarity of the text features and the class label features, and takes the maximum probability as a prediction class as shown in FIG. 3.
Based on the above idea, as shown in fig. 2, a flowchart of a text classification method for a multiple value chain problem based on a graph attention network, a processing flow of the method may include the following steps:
s1, collecting question sentences searched by users in each business search platform in an enterprise and labeling text type labels to obtain a data set of a multi-value chain problem text;
s2, preprocessing the problem text data set, and dividing the preprocessed problem text data set into a training text set and a test text set;
optionally, the preprocessing of the question text data set in the step includes chinese participle processing and stop word removal processing.
The Chinese word segmentation processing and stop word filtering processing comprise the following steps:
segmenting words of the text based on a pre-constructed multivalent value chain text expert dictionary, and determining a word segmentation result by adopting a principle of maximizing the number of large-granularity words; wherein the text is a question text data set;
in a feasible implementation manner, because the segmentation of the multi-value chain field needs to be more accurate and accurate compared with the openness requirement of the segmentation of the open field, the embodiment of the invention adopts a segmentation method for constructing an expert dictionary to segment the multi-value chain problem text, the dictionary is adopted to extract words or phrases by analyzing word levels or short text field values in the documents, and the term lexicon accumulated by enterprises in the multi-value chain and the general lexicon of the open field are combined, so that the professional segmentation lexicon of the field is constructed.
In a feasible implementation mode, a personal dictionary is set as the content of the expert dictionary through an open-source Jieba Chinese word segmentation library, then a built-in word segmentation algorithm is used, because the Jieba word segmentation library provides an accurate word segmentation mode and a full-mode word segmentation, and the later shows all possible word segmentation results, the embodiment of the invention adopts the accurate word segmentation mode, and can obtain a more accurate word segmentation result of a multivalence value chain problem text through word segmentation.
And filtering stop words based on a pre-constructed multi-value chain stop word dictionary to remove noise words.
In a possible implementation, after text segmentation, the text is already segmented into words, but a large amount of low-quality data, called stop words, is included in the words, and the words have little influence on the text semantics, so that filtering the stop words is beneficial to reducing the data amount, and further reducing the calculation amount of subsequent work on text processing. The general stop word list comprises the word numbers, the quantifier words, conjunctions of 'place/in/and' and the like, adverbs and the like, the words appear in the word segmentation result in a large amount, and have little influence on context semantics, so that the words are eliminated.
The Chinese stop word list in the open field used by the embodiment of the invention comprises the following components: the word list comprises a word library for Hadamard large stop, a word library for Baidu stop and a word list for Sichuan university machine intelligent laboratory stop, and is constructed on the basis of expert knowledge in the multi-value chain field, wherein the word lists basically and completely cover most Chinese stop words and stop words in the multi-value chain field in the open field.
S3, extracting word vectors of the training text set and the test text set, and constructing a fully-connected network graph by using the extracted word vectors and the labeled text category labels based on the training text set;
in one possible implementation, the text of the multiple value chain problem is segmented and de-stop words to obtain a series of discrete Chinese words which are not processed by the computer. Therefore, a mathematical model is constructed, each word is represented in the form of a computer-computable number, that is, a natural language vocabulary is converted into a number vector of a certain dimension, and a probability calculation is performed by inputting the mathematical model, that is, a process of extracting a word vector.
Optionally, the word vector extraction performed on the preprocessed training text set and the preprocessed test text set in S3 may include the following steps:
and obtaining word vectors of the preprocessed training text set and the preprocessed test text set based on a pretrained BERT word vector extraction model, and finishing word vector extraction.
Optionally, the training process of the BERT word vector extraction model includes the following steps S31 to S33:
s31, constructing an initial BERT word vector extraction model and initializing parameters thereof, wherein the input of the initial BERT word vector extraction model is a word sequence after mask processing, and the word sequence after mask processing is expressed as
Figure BDA0003715210040000141
Wherein n is the sequence length, m 1 ,m 2 ,...m k For randomly selecting k subscripts, k is less than n, and words on the subscript positions are uniformly used as MASK words [ MASK ]]Replacing; the output vector of the ith position of the initial BERT word vector extraction model is the probability distribution p of predicting the output word of the corresponding position i (ii) a Wherein the word sequence represents
Figure BDA0003715210040000142
Except the words covered by the mask, the rest of the words are initialized to random vectors, and the words covered by the mask are uniformly initialized to 0 vectors.
S32, defining a loss function as a multi-classification cross entropy loss function as shown in the following formula (1);
Figure BDA0003715210040000143
and S33, updating parameters in the initial BERT word vector extraction model by adopting a batch random gradient descent method based on the multi-classification cross entropy loss function to obtain the pretrained BERT word vector extraction model.
S4, training the initial label chart attention model based on the full-connection network chart to obtain a trained label chart attention model;
in a feasible implementation manner, after word segmentation, word deactivation preprocessing and word vectorization digitization conversion are carried out on a multi-value chain problem text, vector data representation is obtained, a fully-connected network diagram is formed by word vectors and category label vectors, the dependency relationship between the words and the category labels in the text can be modeled, and the feature vector representation of each category label is trained through diagram convolution and attention mechanism, so that the method is a technical key point to be mainly solved by a multi-value chain problem text classification model in the embodiment of the invention.
Optionally, the training initial label map Attention model is a GNN-Attention model, and the GNN-Attention model includes a GNN submodel and an Attention submodel;
and the GNN submodel is used for obtaining the output characteristics of each category label node.
The Attention submodel is used to control Attention weight assignment among nodes in the GNN submodel.
Optionally, the training process of the GNN-Attention model includes:
iteratively updating the eigenvectors of the nodes in the GNN-Attention model according to the following formula (2);
Figure BDA0003715210040000151
wherein σ 0 Denotes the sigmoid activation function, W k Represents the parameter matrix at the kth iteration of GNN, k ∈ [1,K]Wherein K ∈ R + K > 1, which is a preset constant, N (v) represents all neighbor nodes of the node v, | N (v) | represents the size of N (v),
Figure BDA0003715210040000152
representing the feature vector of node u at iteration k-1,
Figure BDA0003715210040000153
is the feature vector of node v at the k-1 iteration,
Figure BDA0003715210040000154
is the output vector of node v at the kth iteration, B k Is the bias parameter at the kth iteration, k is 1 to represent the first iteration, and all parameters are initialized randomly. Alpha is alpha vu Expressing the attention weight between the node u and the node v, which is defined as expressed by the following formula (3);
Figure BDA0003715210040000155
wherein e is vu The definition is obtained by the following formula (4);
Figure BDA0003715210040000156
where a represents a function for calculating the attention weight.
Alternatively, a is an inner product calculation function in the embodiment of the present invention, and is defined as a (x, y) = x · y. Coefficient matrix in model
Figure BDA0003715210040000157
Where n is the dimension of the word vector,
Figure BDA0003715210040000158
as a matrix of offset coefficients, the output in the current iteration
Figure BDA0003715210040000159
Features at node v from the previous layer
Figure BDA00037152100400001510
Multiplying by a matrix B of bias coefficients k Adding the attention weight weighted sum of the characteristics of all the neighbor nodes of the node v and multiplying the sum by a coefficient matrix W, and finally passing through an activation function sigma 0 The coefficient matrix W is obtained for mapping the input vector to the hidden layer, whose ith column represents the transformed vector of the ith term of the weighted sum result for the neighboring nodes. Activation function sigma 0 The hidden layer is mapped to the output layer to obtain each item of the output vector.
The defined loss function is a class equilibrium tag similarity loss function, as shown in the following equation (5):
Figure BDA0003715210040000161
where m denotes the number of batches into which the entire training text set is divided, b denotes the subscript of the batch, n b Denotes the number of training texts in a batch with subscript b, i denotes the subscript of the training texts in a certain batch, L denotes the total number of category labels, s j Representing the frequency of occurrence of the jth label in the entire training text set, g b (i) Denotes the class label, z, corresponding to the training text with index i in the batch with index b b (i) Represents the node subscript, z corresponding to the training text with subscript i in the batch with subscript b in the full-connection graph b (g b (i) Represent category label g) b (i) Node subscript, W, corresponding in the fully-connected graph c Representing a parameter matrix, W, for linear transformation of nodes of a training text l A parameter matrix representing the linear transformation of the label nodes, K being the total number of iterations as described above.
And updating parameters in the GNN-Attention model by adopting a batch gradient descent algorithm according to the class equilibrium label similarity loss function to obtain a trained label graph Attention model.
S5, training the initial text semantic extraction model based on the word vectors extracted from the training text set to obtain a trained text semantic extraction model;
in a feasible implementation mode, training a text semantic extraction model is to label question text categories of multi-value chain question texts; performing text word segmentation, word stop removal and word vector extraction to obtain word vector representation of the text; according to word vectors extracted from a training text, through a forward LSTM network and a backward LSTM network, the forward LSTM network extracts text features from left to right, the backward LSTM network extracts text features from right to left, and then the two features are spliced to form final text semantic features; and obtaining a prediction category based on the similarity between the text semantic features and the vector features of the category labels in the text, and iteratively updating the model through a cross entropy loss function and a gradient descent algorithm.
Optionally, the training process of the Bi-LSTM model includes:
training the Bi-LSTM submodel according to the following formula (6);
Figure BDA0003715210040000162
wherein N represents the number of sentences in the training text set, y i Class label vector representing ith sentence, represented by one-hot coding, p i A probability distribution representing the final output of the ith sentence, defined as expressed by the following formula (7);
p i =Softmax(z)……(7)
wherein Softmax represents a Softmax activation function, z represents a dot product result vector of an output vector of the current text passing through the Bi-LSTM and each tag vector, and is defined as the following formula (8);
z t =e i ·h t ,t∈[0,M)……(8)
wherein z is t Representing the t-th component, e, of the vector z i Represents the output vector of the Bi-LSTM model to the ith text, h t A representation vector representing the t-th class label, M representing the total number of class labels, e i ·h t And (4) representing a result of dot product of the ith text and the category label, calculating the similarity between the text and the category label, and normalizing through a Softmax function to obtain the probability that the text belongs to a certain category label. Wherein the definition of the Softmax function is shown in the following formula (9);
Figure BDA0003715210040000171
wherein z represents an input vector, C represents the length of the input vector, z c Representing the c-th component of the vector z, z i Represents the ith component of the vector z, and the left side of the formula represents the value of the ith component of the output of the vector z after passing through the Softmax function. Softmax is a multi-classification version of the Sigmoid function, and can be well normalized according to the distribution of vectors, the obtained outputs are all between 0 and 1 and are added to be 1, and the probability of predicting the output as the class distribution can be regarded as the probability.
The class label prediction results are for all p i The largest is taken, i.e. the prediction result is given by the following formula (10);
o=argmax i∈[0,M) (p i )……(10)
wherein i represents a category label with a serial number i, and o represents a result of finally judging which category label is represented by a numerical serial number.
S6, semantic extraction is carried out on the test text set according to the trained text semantic extraction model, and the test text set is classified according to the trained label graph attention model.
In a feasible implementation mode, semantic extraction of a text is performed through Bi-LSTM based on word vectors of an extracted test text set to obtain text vectors; and then performing dot product calculation similarity on the text vector and the trained label vector to obtain predicted text category probability distribution.
In one possible implementation, the method for classifying the text of the multivalence value chain problem based on the training of the graph attention mechanism model can be described as follows: firstly, marking the problem text category of a multi-value chain problem text; performing text word segmentation, word stop removal and word vector extraction to obtain word vector representation of the text; constructing a full-connection network graph according to the word vectors extracted from the training texts and the category label nodes; training an attention tag network based on a fully-connected network diagram; for the input test problem text, firstly aligning and extracting text semantic features, then respectively calculating the similarity of the text semantic features and each category label feature to obtain the probability distribution of text prediction categories, and then taking the category with the highest probability as the final output.
The following description of the implementation mode is carried out on a test text 'who is the service responsible person of the market development department', after word segmentation and stop word processing, the text becomes the form of 'who is the service responsible person of the market development department', wherein four words respectively pass through a BERT pre-training word vector extraction model, the word vector extraction dimension is 30, and the obtained four word vectors are respectively:
department of market development: [ -3.06069449,8.08047426, -1.04657962,2.54707903, -5.45107929,7.71724758, -7.21749163,7.71894487,7.39433977,2.44432663, -2.92139483, -3.49030684, -15.05859546, -8.33196712, -4.01220638, -5.44485077, -3.03348011,3.26214281, -2.34982418, -0.71703633, -0.29940802,7.9193014, -3.54363642, -1.94711914,4.08936812, -2.04229542, -4.19505067, -7.50068999, -5.51120226,5.35286249]
Service: [ -11.49322276, -9.0087295,2.98726354,6.07464975, -0.21259287, -6.2843385,1.84081827,10.30846377, -2.41484459,6.04847987,0.5687931,2.82225513,7.81969373, -1.63986626,0.47389163,4.96921189,10.62406197, -5.75455476, -3.0132695, -7.46912705, -1.2137597,3.00314305, -10.63480443, -0.82058986, -1.31221196,7.27846531, -9.13755732, -7.62935541, -1.39987815, -0.57207014]
The responsible person: [0.838648, -2.32750228, -2.11157415,12.02372248,1.65800467,0.73202825,2.32446473,7.97169743,8.66769421,3.99108438, -1.1913395,9.52866323, -4.00235065, -6.49928031,2.77208777,7.91763519, -8.94140453,0.69549411, -2.56620284, -0.83404516,0.51637593, -2.18320035,2.72544347,4.41380924,9.34230022,7.80633086,7.66541672, -6.44587512, -11.85742246,0.92828721]
Who: [3.54257652, -2.46677404,6.74304161,3.0546189,8.07932277, -1.55837684, -1.70794653,8.42905947, -0.91862296,5.50648557,6.29667199,3.22428499, -6.24749457, -4.03600091,0.37989248,3.819908,3.12067776, -2.09767777, -4.56015178, -4.10751633, -2.41868704,2.29356171,0.24205031,8.31607237, -6.97760015, -3.66999542, -3.23085068,1.68964771,8.75373907,0.06482891]]
Then inputting the semantic extraction model into a text semantic extraction model, wherein the dimensionality of a semantic extraction feature vector is 50, and the extracted semantic feature vector is obtained by:
[0.52673236,0.3993348,-0.9075469,0.20685045,-0.47519173,-0.25433797,-0.54043339,0.63237959,1.02988192,1.54368237,-0.16636896,-0.22707328,0.71075747,0.27404235,0.40035875,-0.77351184,-1.20920109,0.4209648,-1.47120421,1.83841554,-1.549165,-1.66126562,0.50334785,0.29269983,0.30320881,-0.65911658,1.40018188,0.08858524,0.64016785,-0.12970235,-0.45579809,0.59939497,2.19082701,-1.33034428,0.53647678,2.88246469,0.67153008,-0.03897543,0.86870672,0.94777567,-1.04138594,0.47477545,0.21174536,-1.73312968,1.06127785,-0.62674278,-0.95263472,-0.3574392,-0.45807574,-1.08791478]
according to the label graph attention model trained based on the training text set, category feature vectors of all category labels are obtained, wherein 5 category labels are respectively represented by 0, 1,2, 3 and 4, and the category feature vectors of the category labels are obtained by the last iteration obtained in the training process according to the training text set
Figure BDA0003715210040000201
Is obtained, where l is the subscript of the class label and K is the maximum number of iterations, i.e. the feature vector
Figure BDA0003715210040000202
Wherein m is the number of batches in the training text set. The obtained feature vectors of the five tags are respectively as follows:
label 0: [6.22224509,1.25236757,5.76307916, -0.17946888, -3.9552577,9.30892701,13.22358767, -16.73601084,15.23879299,1.0829373, -5.59663971,8.2802847, -8.64210783, -3.84631036,16.92925892, -1.1073331, -5.94752419,9.51731006, -6.3277868,12.03070463, -6.33621219, -18.38287457,6.2027111,3.82932457,16.47385438, -2.71609963,0.54453846,5.77940675, -8.06122818,1.00034761, -4.49338511, -0.79850251,5.53174973,0.58391409, -11.9442521,14.41069912, -6.58643516,8.8108187,5.62632291, -2.76650273,7.72521331, -2.59458137, -5.84662537, -8.38614455, -4.3733154,8.88788102,7.9281024,4.30021329,4.11297809,2.0158162],
label 1: [ -13.52978119,10.18655095, -22.65309115,11.19357032, -10.03651148,3.53468868,17.72594014, -7.40914667,3.75224215, -6.49460953,6.03641369,13.25902568, -2.0260207,10.99278901, -0.94538989, -2.86637046, -1.35209965, -9.31925225, -6.04641304, -9.05954462, -4.19337716,14.57395911,9.17987103,2.00481698, -1.54548618,20.95201259,1.19885046,11.36862312,9.69431927,8.30669821,1.65183686,2.70179518, -26.48967033,5.1492787,9.97703957, -0.25899668,16.80103617,5.54094704,6.67475959,20.23955869,4.4467589, -1.1262427, -9.41496104, -5.75355392, -0.57321567, -15.7887971, -5.76783193, -20.41854168,9.03704669, -7.8332908],
and 2, labeling: [ -15.5523014, -0.05098922, -14.34667976, -2.25306942,8.27374374,12.72074316, -12.71875066, -10.296529, -2.06284104,2.45320212,16.16718339, -6.62264998,16.27301166,13.89477223, -12.65328534,0.27039158,16.53604385, -5.42513727, -9.11123829, -2.47018876,7.82217354, -4.22237532, -17.04346112, -6.69827197,5.73699076,13.35587841,1.2153476, -3.83075819,5.12494883, -9.38151991, -14.57682911,10.56865872, -13.68774591,10.21696476,6.8567849,9.01829401, -0.95757215, -16.13063979, -0.262992,8.68835084,10.6138184, -9.17414989,6.33474456,5.94803427,20.02490159, -3.20275873,11.40157664, -1.19329514,14.87974036,3.87267909],
and (3) labeling: [4.85479927, -1.99093857, -5.56741885,6.20331452,3.62071848,12.37271197,12.47100925,5.55905451, -6.24048162, -0.73280996,6.89689023,7.22896023,17.30554227, -20.74874563,10.3976227, -17.11342545, -4.66771674, -0.05348024, -2.11445769,5.72611576, -12.52517376, -2.6273724,3.61542192, -0.82440065, -8.68445132, -11.81142574,8.62742712,7.07431355,6.59820306,1.6810814, -7.27525461,7.07595003,1.73496954,6.5950062,17.02141687,9.91207892, -1.93083413, -2.874787, -7.68151885,3.13225138,21.4477146,12.68083358,8.85897204,9.11184453,1.76935689,7.08592237,0.60035095, -12.19481081, -4.15659429, -7.84300446],
and (4) label: [10.35834579,4.27755478,8.28834525, -4.32956095,7.56058244,9.48078205,1.71802476,4.50813155,18.53001609,8.5279681, -13.37840118,0.40000298,11.91627408, -1.03129535, -18.31844916,1.5001133,7.10047334,3.00095028,3.72462107, -11.59926491, -5.0415842,10.09667557, -7.99359476, -14.64089078,7.11108861,10.33194604,5.26059789, -11.85866284, -1.02213081,3.93500563, -15.22253635, -5.38307785,11.19184711,16.25874224, -2.78640284,5.35442683,0.55587949,8.84622493,8.15230091, -3.76267201, -8.21651066,2.41864845,11.19213948,5.50713794, -5.88278869,8.13093996,2.22734626,4.69525982, -8.82085003,0.83872203]
Calculating the normalized similarity between the text feature vector and each category feature vector according to a similarity calculation function as the probability of predicting the text feature vector to the category, wherein the similarity calculation formula is sim (x, y) =10 4 *(x·y)*(||x||·||y||) -1 And then performing softmax normalization, wherein the calculation result is as follows:
[8.6843e-01,4.5041e-04,2.1220e-04,1.3028e-01,6.2073e-04]
and taking the class with the highest probability, namely the label 0, as 0.86843 as output, and finishing the test.
FIG. 4 is a block diagram of an apparatus for building a text classification model of a multiple value chain problem based on a graph attention network according to an example embodiment. The apparatus 400 comprises: the system comprises a data collection module 410, a data preprocessing module 420, an extraction construction module 430, a training module 440 and a classification module 450;
the data collection module 410 is used for collecting question sentences retrieved by users in various business retrieval platforms in an enterprise and labeling text type labels to obtain a data set of multi-value chain problem texts;
a data preprocessing module 420, configured to preprocess the problem text data set, and divide the preprocessed problem text data set into a training text set and a test text set;
an extraction and construction module 430, configured to perform word vector extraction on the training text set and the test text set, and construct a fully-connected network graph by using the extracted word vectors and labeled text category labels based on the training text set;
a training module 440, configured to train an initial label graph attention model based on the fully-connected network graph to obtain a trained label graph attention model; training an initial text semantic extraction model based on the word vectors extracted from the training text set to obtain a trained text semantic extraction model;
and the classification module 450 is configured to perform semantic extraction on the test text set according to the trained text semantic extraction model, and classify the test text set according to the trained label graph attention model.
Optionally, the data preprocessing module is specifically configured to:
performing Chinese word segmentation processing and stop word removal processing on the problem text data set;
the Chinese word segmentation processing and stop word removal processing comprises the following steps:
segmenting words of the text based on a pre-constructed multivalent value chain text expert dictionary, and determining a segmentation result by adopting a principle of maximizing the number of large-granularity words; wherein the text is a question text data set;
and filtering stop words based on a pre-constructed multi-value chain stop word dictionary to remove noise words.
Optionally, the extraction building block is specifically configured to:
obtaining word vectors of a preprocessed training text set and a preprocessed test text set based on a pretrained BERT word vector extraction model;
based on a training text set, using extracted word vectors and labeled text category labels as nodes respectively, and constructing a fully-connected network graph of full connection among all the nodes, wherein the nodes represented by the text category labels are initialized to random vectors, the dimensions are consistent with the dimensions of the word vectors, the edge weights among the word vector nodes are initialized randomly, and the edge weights among category-word nodes are initialized to the probability of the word in the training text set appearing in the category.
Optionally, the initial tag map Attention model is a GNN-Attention model, the GNN-Attention model including a GNN submodel and an Attention submodel;
the GNN submodel is used for obtaining the output characteristics of each category label node;
the Attention submodel is used to control Attention weight assignment among nodes in the GNN submodel.
Optionally, the training module is specifically configured to:
iteratively updating the eigenvectors of the nodes in the GNN-Attention model according to the following formula (2);
Figure BDA0003715210040000231
wherein σ 0 Denotes the sigmoid activation function, W k Represents the parameter matrix at the kth iteration of GNN, k ∈ [1,K]Wherein K ∈ R + K > 1, which is a preset constant, N (v) represents all neighbor nodes of the node v, | N (v) | represents the size of N (v),
Figure BDA0003715210040000241
representing the feature vector of node u at iteration k-1,
Figure BDA0003715210040000242
is the feature vector of node v at the k-1 iteration,
Figure BDA0003715210040000243
is the output vector of node v at the kth iteration, B k Is the bias parameter at the kth iteration, k is 1, which means the first iteration, and all parameters are initialized randomly. Alpha (alpha) ("alpha") vu Expressing the attention weight between the node u and the node v, which is defined as expressed by the following formula (3);
Figure BDA0003715210040000244
wherein e is vu The definition is obtained by the following formula (4);
Figure BDA0003715210040000245
wherein a represents a function for calculating attention weight;
the defined loss function is a class equilibrium tag similarity loss function, as shown in the following equation (5):
Figure BDA0003715210040000246
where m denotes the number of batches into which the entire training text set is divided, b denotes the subscript of the batch, n b Denotes the number of training texts in a batch with subscript b, i denotes the subscript of the training texts in a certain batch, L denotes the total number of category labels, s j Representing the frequency of occurrence of the jth label in the entire training text set, g b (i) Denotes the class label, z, corresponding to the training text with index i in the batch with index b b (i) Represents the node index, z, corresponding to the training text with index i in the batch with index b in the full-connected graph b (g b (i) Represent category label g) b (i) Node subscript, W, corresponding in the fully-connected graph c Parameter matrix, W, representing a linear transformation of training text nodes l A parameter matrix representing the linear transformation of the label nodes, where K is the total number of iterations;
and updating parameters in the GNN-Attention model by adopting a batch gradient descent algorithm according to the class equilibrium label similarity loss function to obtain a trained label graph Attention model.
Fig. 5 is a schematic structural diagram of an electronic device 500 according to an embodiment of the present invention, where the electronic device 500 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 501 and one or more memories 502, where the memory 502 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 501 to implement the steps of the method for classifying texts of a multivalent value chain problem.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor in a terminal to perform one of the multivalent value chain problem text classification methods described above. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (10)

1. A method for text classification of a multivalent value chain problem, the method comprising:
s1, collecting question sentences searched by users in each business search platform in an enterprise and labeling text type labels to obtain a data set of a multi-value chain problem text;
s2, preprocessing the problem text data set, and dividing the preprocessed problem text data set into a training text set and a test text set;
s3, extracting word vectors from the training text set and the test text set, and constructing a full-connection network diagram by using the extracted word vectors and labeled text category labels based on the training text set;
s4, training the initial label chart attention model based on the full-connection network chart to obtain a trained label chart attention model;
s5, training the initial text semantic extraction model based on the word vectors extracted from the training text set to obtain a trained text semantic extraction model;
s6, semantic extraction is carried out on the test text set according to the trained text semantic extraction model, and the test text set is classified according to the trained label graph attention model.
2. The method of claim 1, wherein the preprocessing the question text data set in S2 comprises:
performing Chinese word segmentation processing and stop word removal processing on the problem text data set;
the Chinese word segmentation processing and stop word filtering processing comprise the following steps:
segmenting words of the text based on a pre-constructed multivalent value chain text expert dictionary, and determining a segmentation result by adopting a principle of maximizing the number of large-granularity words; wherein the text is a question text data set;
and filtering stop words based on a pre-constructed multi-value chain stop word dictionary to remove noise words.
3. The method according to claim 1, wherein performing word vector extraction on the preprocessed training text set and test text set in S3, and constructing a fully-connected network graph based on the training text set by using the extracted word vectors and labeled text category labels, comprises:
obtaining word vectors of a preprocessed training text set and a preprocessed test text set based on a pretrained BERT word vector extraction model;
based on a training text set, using extracted word vectors and labeled text category labels as nodes respectively, and constructing a fully-connected network graph of full connection among all the nodes, wherein the nodes represented by the text category labels are initialized to random vectors, the dimensions are consistent with the dimensions of the word vectors, the edge weights among the word vector nodes are initialized randomly, and the edge weights among category-word nodes are initialized to the probability of the word in the training text set appearing in the category.
4. The method according to claim 1, wherein in S4, based on the fully-connected network map, the initial label map Attention model is trained to be a GNN-Attention model, and the GNN-Attention model includes a GNN submodel and an Attention submodel;
the GNN submodel is used for obtaining the output characteristics of each category label node;
the Attention submodel is used to control Attention weight assignment among nodes in the GNN submodel.
5. The method of claim 4, wherein the training process of the GNN-Attention model comprises:
iteratively updating the eigenvectors of the nodes in the GNN-Attention model according to the following formula (2);
Figure FDA0003715210030000021
wherein σ 0 Denotes the sigmoid activation function, W k Represents the parameter matrix at the kth iteration of GNN, k ∈ [1,K]Wherein K ∈ R + ,K>1, which is a preset constant, N (v) represents all neighbor nodes of the node v, | N (v) | represents the size of N (v),
Figure FDA0003715210030000022
representing the feature vector of node u at iteration k-1,
Figure FDA0003715210030000023
is the feature vector of node v at the k-1 iteration,
Figure FDA0003715210030000024
is the k-th iterationOutput vector of time node v, B k Is a bias parameter in the kth iteration, k is 1 to represent the first iteration, and all parameters are initialized randomly; alpha (alpha) ("alpha") vu Expressing the attention weight between the node u and the node v, which is defined as expressed by the following formula (3);
Figure FDA0003715210030000025
wherein e is vu The definition is obtained by the following formula (4);
Figure FDA0003715210030000026
wherein a represents a function for calculating attention weight;
the defined loss function is a class equilibrium tag similarity loss function, as shown in the following equation (5):
Figure FDA0003715210030000027
where m denotes the number of batches into which the entire training text set is divided, b denotes the subscript of the batch, n b Denotes the number of training texts in a batch with subscript b, i denotes the subscript of the training texts in a certain batch, L denotes the total number of category labels, s j Representing the frequency of occurrence of the jth label in the entire training text set, g b (i) Denotes the class label, z, corresponding to the training text with index i in the batch with index b b (i) Represents the node index, z, corresponding to the training text with index i in the batch with index b in the full-connected graph b (g b (i) Represent category labels g) b (i) Node subscript, W, corresponding in the fully-connected graph c Representing a parameter matrix, W, for linear transformation of nodes of a training text l A parameter matrix representing the linear transformation of the label nodes, where K is the total number of iterations;
and updating parameters in the GNN-Attention model by adopting a batch gradient descent algorithm according to the class equilibrium label similarity loss function to obtain a trained label graph Attention model.
6. An apparatus for classifying a problem text of a multivalent value chain, the apparatus comprising:
the data collection module is used for collecting question sentences searched by users in various business search platforms in an enterprise and labeling text type labels to obtain a data set of multi-value chain problem texts;
the data preprocessing module is used for preprocessing the problem text data set and dividing the preprocessed problem text data set into a training text set and a test text set;
the extraction and construction module is used for extracting word vectors from the training text set and the test text set, and constructing a fully-connected network graph by using the extracted word vectors and the labeled text category labels based on the training text set;
the training module is used for training the initial label graph attention model based on the full-connection network graph to obtain a trained label graph attention model; training the initial text semantic extraction model based on the word vectors extracted from the training text set to obtain a trained text semantic extraction model;
and the classification module is used for performing semantic extraction on the test text set according to the trained text semantic extraction model and classifying the test text set according to the trained label graph attention model.
7. The apparatus of claim 6, wherein the data preprocessing module is specifically configured to:
performing Chinese word segmentation processing and stop word removal processing on the problem text data set;
the Chinese word segmentation processing and stop word removal processing comprises the following steps:
segmenting words of the text based on a pre-constructed multivalent value chain text expert dictionary, and determining a segmentation result by adopting a principle of maximizing the number of large-granularity words; wherein the text is a question text data set;
and filtering stop words based on a pre-constructed multi-value chain stop word dictionary to remove noise words.
8. The apparatus according to claim 6, wherein the extraction building block is specifically configured to:
obtaining word vectors of a preprocessed training text set and a preprocessed test text set based on a pretrained BERT word vector extraction model;
based on a training text set, using extracted word vectors and labeled text category labels as nodes respectively, and constructing a fully-connected network graph of full connection among all the nodes, wherein the nodes represented by the text category labels are initialized to random vectors, the dimensionality is consistent with the dimensionality of the word vectors, the edge weight among the word vector nodes is initialized randomly, and the edge weight among category-word nodes is initialized to the probability of the word in the training text set in the category.
9. The apparatus of claim 6, wherein the initial tagmap Attention model is a GNN-Attention model, the GNN-Attention model including a GNN submodel and an Attention submodel;
the GNN submodel is used for obtaining the output characteristics of each category label node;
the Attention submodel is used to control Attention weight assignment among nodes in the GNN submodel.
10. The apparatus of claim 9, wherein the training module is specifically configured to:
iteratively updating the eigenvectors of the nodes in the GNN-Attention model according to the following formula (2);
Figure FDA0003715210030000041
wherein σ 0 Denotes the sigmoid activation function, W k Represents the parameter matrix at the kth iteration of GNN, k ∈ [1,K]Wherein K ∈ R + ,K>1, which is a preset constant, N (v) represents all neighbor nodes of the node v, | N (v) | represents the size of N (v),
Figure FDA0003715210030000042
representing the feature vector of node u at iteration k-1,
Figure FDA0003715210030000043
is the feature vector of node v at the k-1 iteration,
Figure FDA0003715210030000044
is the output vector of node v at the kth iteration, B k Is a bias parameter in the kth iteration, k is 1 to represent the first iteration, and all parameters are initialized randomly; alpha (alpha) ("alpha") vu Expressing the attention weight between the node u and the node v, which is defined as expressed by the following formula (3);
Figure FDA0003715210030000045
wherein e is vu The definition is obtained by the following formula (4);
Figure FDA0003715210030000051
wherein a represents a function for calculating attention weight;
the defined loss function is a class equilibrium tag similarity loss function, as shown in the following equation (5):
Figure FDA0003715210030000052
where m denotes the whole trainingThe number of batches into which the exercise text set is divided, b represents the subscript of the batch, n b Denotes the number of training texts in a batch with subscript b, i denotes the subscript of the training texts in a certain batch, L denotes the total number of category labels, s j Representing the frequency of occurrence of the jth label in the entire training text set, g b (i) Denotes the class label, z, corresponding to the training text with index i in the batch with index b b (i) Represents the node index, z, corresponding to the training text with index i in the batch with index b in the full-connected graph b (g b (i) Represent category label g) b (i) Node subscript, W, corresponding in the fully-connected graph c Representing a parameter matrix, W, for linear transformation of nodes of a training text l A parameter matrix representing the linear transformation of the label nodes, wherein K is the total number of the iterations;
and updating parameters in the GNN-Attention model by adopting a batch gradient descent algorithm according to the class equilibrium label similarity loss function to obtain a trained label graph Attention model.
CN202210734541.1A 2022-06-27 2022-06-27 Method and device for classifying problem texts with multi-value chains Withdrawn CN115168574A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210734541.1A CN115168574A (en) 2022-06-27 2022-06-27 Method and device for classifying problem texts with multi-value chains
CN202211355585.XA CN115757773A (en) 2022-06-27 2022-11-01 Method and device for classifying problem texts with multi-value chains

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210734541.1A CN115168574A (en) 2022-06-27 2022-06-27 Method and device for classifying problem texts with multi-value chains

Publications (1)

Publication Number Publication Date
CN115168574A true CN115168574A (en) 2022-10-11

Family

ID=83487147

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210734541.1A Withdrawn CN115168574A (en) 2022-06-27 2022-06-27 Method and device for classifying problem texts with multi-value chains
CN202211355585.XA Pending CN115757773A (en) 2022-06-27 2022-11-01 Method and device for classifying problem texts with multi-value chains

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202211355585.XA Pending CN115757773A (en) 2022-06-27 2022-11-01 Method and device for classifying problem texts with multi-value chains

Country Status (1)

Country Link
CN (2) CN115168574A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753540A (en) * 2020-06-24 2020-10-09 云南电网有限责任公司信息中心 Method and system for collecting text data to perform Natural Language Processing (NLP)
CN116308464A (en) * 2023-05-11 2023-06-23 广州钛动科技股份有限公司 Target client acquisition system and method
CN118246720A (en) * 2024-05-30 2024-06-25 西昌学院 Agricultural production management method and system applied to intelligent rural service system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992600B (en) * 2024-04-07 2024-06-11 之江实验室 Service execution method and device, storage medium and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753540A (en) * 2020-06-24 2020-10-09 云南电网有限责任公司信息中心 Method and system for collecting text data to perform Natural Language Processing (NLP)
CN111753540B (en) * 2020-06-24 2023-04-07 云南电网有限责任公司信息中心 Method and system for collecting text data to perform Natural Language Processing (NLP)
CN116308464A (en) * 2023-05-11 2023-06-23 广州钛动科技股份有限公司 Target client acquisition system and method
CN116308464B (en) * 2023-05-11 2023-09-08 广州市沃钛移动科技有限公司 Target client acquisition system and method
CN118246720A (en) * 2024-05-30 2024-06-25 西昌学院 Agricultural production management method and system applied to intelligent rural service system

Also Published As

Publication number Publication date
CN115757773A (en) 2023-03-07

Similar Documents

Publication Publication Date Title
CN113761936B (en) Multi-task chapter-level event extraction method based on multi-head self-attention mechanism
CN108984526B (en) Document theme vector extraction method based on deep learning
CN113011533B (en) Text classification method, apparatus, computer device and storage medium
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN108628823B (en) Named entity recognition method combining attention mechanism and multi-task collaborative training
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN104834747B (en) Short text classification method based on convolutional neural networks
CN107229610B (en) A kind of analysis method and device of affection data
CN111783474B (en) Comment text viewpoint information processing method and device and storage medium
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
WO2019153737A1 (en) Comment assessing method, device, equipment and storage medium
CN115168574A (en) Method and device for classifying problem texts with multi-value chains
CN112364638B (en) Personality identification method based on social text
CN111274829B (en) Sequence labeling method utilizing cross-language information
CN113704416B (en) Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
CN113515632B (en) Text classification method based on graph path knowledge extraction
CN112818118B (en) Reverse translation-based Chinese humor classification model construction method
CN111274790A (en) Chapter-level event embedding method and device based on syntactic dependency graph
CN112667782A (en) Text classification method, device, equipment and storage medium
CN114358201A (en) Text-based emotion classification method and device, computer equipment and storage medium
CN111914556A (en) Emotion guiding method and system based on emotion semantic transfer map
CN113435211A (en) Text implicit emotion analysis method combined with external knowledge
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN111753088A (en) Method for processing natural language information
CN112699685A (en) Named entity recognition method based on label-guided word fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20221011

WW01 Invention patent application withdrawn after publication