CN113987209A

CN113987209A - Natural language processing method and device based on knowledge-guided prefix fine tuning, computing equipment and storage medium

Info

Publication number: CN113987209A
Application number: CN202111300021.1A
Authority: CN
Inventors: 陈华钧; 陈想; 张宁豫; 李磊; 谢辛
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-01-28
Anticipated expiration: 2041-11-04
Also published as: CN113987209B

Abstract

The invention discloses a natural language processing method, a device, computing equipment and a storage medium based on knowledge-guided prefix fine tuning, which are characterized by firstly constructing prefix cue words related to a downstream task and label words related to task categories obtained from a knowledge map, then utilizing embedded vectors of the prefix cue words to be spliced with key values and value values of input texts and then carrying out self-attention calculation so as to enable the prefix cue words and the input texts to be closely combined for learning, and simultaneously integrating all the label words to determine learning labels, namely utilizing ontology knowledge related to the task categories to guide fine tuning of a pre-training language model, so that the prediction effect of the fine-tuned pre-training language model on the downstream task is better, and the prediction accuracy of the pre-training language model is improved. The downstream tasks are emotion analysis tasks and relation extraction tasks, and the emotion analysis accuracy and the relation extraction accuracy improved by the pre-training language model obtained by the corresponding method are adopted.

Description

Natural language processing method and device based on knowledge-guided prefix fine tuning, computing equipment and storage medium

Technical Field

The invention belongs to a natural language processing technology, and particularly relates to a natural language processing method and device based on knowledge-guided prefix fine tuning, computing equipment and a storage medium.

Background

The pre-training model is a model obtained by training on a large reference data set, such as a large pre-training language model like BERT, GPT, XLNet, etc., and is obtained by pre-training on a large amount of corpora. Because the pre-trained model has been unsupervised learning with a large corpus, knowledge in the corpus has been migrated into Eembedding of the pre-trained model.

The fine tuning/fine-tune is a main method for transferring the PTM knowledge to the downstream task, and the currently common fine tuning methods all need to add a network structure for fine tuning aiming at a specific task so as to adapt to a specific task. However, such trimming methods have the following drawbacks: (1) the parameter efficiency is low: each downstream task has its own fine tuning parameters; (2) the training target and the fine tuning target of the pre-training are different, so that the generalization capability of the pre-training model is poor; (3) compared with the network parameters added in the pre-training stage, a large amount of data is needed to learn the newly added parameters. The shortcomings of these fine tuning methods lead to poor task performance in emotion analysis tasks, relationship extraction tasks, and various classification tasks.

The prior patent document CN112100383A discloses a meta-knowledge fine tuning method and a platform facing a multi-task language model, wherein the method obtains highly transferable common knowledge, namely meta-knowledge, on different data sets of similar tasks based on cross-domain typicality fraction learning, mutually associates and mutually strengthens the learning processes of the similar tasks on different domains corresponding to different data sets, improves the fine tuning effect of similar downstream tasks on the data sets of different domains in the application of the language model, and improves the parameter initialization capability and generalization capability of a general language model of the similar tasks. The method does not consider ontology knowledge, and has poor fine tuning effect on downstream tasks.

As disclosed in CN113032559A, a language model fine-tuning method for low-resource-adhesion language text classification constructs a low-noise fine-tuning dataset through morphological analysis and stem extraction, fine-tunes a cross-language pre-training model on the dataset, provides a meaningful and easy-to-use feature extractor for downstream text classification tasks, better selects relevant semantic and syntactic information from the pre-trained language model, and uses these features for the downstream text classification tasks. The method does not consider ontology knowledge, and has poor fine tuning effect on downstream tasks.

Disclosure of Invention

In view of the foregoing, an object of the present invention is to provide a natural language processing method, apparatus, computing device and storage medium based on prefix fine tuning guided by knowledge, wherein a pre-trained language model is trained by considering prefix hints and ontology knowledge related to a downstream task, so as to improve accuracy of prediction of the pre-trained language model on the downstream task.

In a first aspect, an embodiment provides a natural language processing method based on knowledge-guided prefix fine tuning, including the following steps:

constructing an initial prefix cue word according to a downstream task, and mapping the initial prefix cue word into embedded vectors with the same number as the number of layers of a pre-training language model through a function, wherein the dimension of each embedded vector is 2 times that of the corresponding model layer;

linking each task category of the downstream tasks to a knowledge graph, and taking words related to each task category in the knowledge graph as tag words;

converting the pre-training language model into a downstream task of shielding the token according to the prefix prompt words and the label words, and performing fine tuning training on the pre-training language model, wherein the fine tuning training comprises the following steps: inputting a training text into a pre-training language model, splitting an embedded vector of a prefix cue word into 2 parts with the same dimensionality as that of a corresponding model layer on each layer, splicing a key value and a value corresponding to the training text respectively, participating in self-attribute calculation, and simultaneously optimizing the embedded vector of the prefix cue word, parameters of the pre-training language model and the weight of a label word by taking the weighted result of all label words corresponding to each task category as a label;

when the method is applied, the embedded vectors of the predicted text and the prefix cue words are input into the fine-tuned pre-training language model, and the predicted values of all the label words and the weighting results of the corresponding weights are used as prediction results after calculation.

Preferably, the mapping the initial prefix cue words into the embedded vectors with the same number as the number of layers of the pre-training language model through the function includes:

and initially encoding the initial prefix cue words into initial embedded vectors, and then mapping the initial embedded vectors once by adopting function mapping to obtain the embedded vectors with the same number as the number of layers of the pre-training language model.

and initially encoding the initial prefix cue words into initial embedded vectors, and mapping the initial embedded vectors to each layer of the pre-training language model by adopting multiple layers of MLPs (Multi-level MLPs) to obtain the embedded vectors corresponding to each layer.

Preferably, when the pre-training language model is subjected to fine-tuning training, the calculation mode of participating in self-entry is as follows:

wherein l represents the number of layers, Q^lDenotes the query value, K^lDenotes the key value, V^lA value is represented by a value,

the embedded vector representing the prefix hint is split into the portion that corresponds to the key value,

representing prefix hintsThe embedded vector of words is split into the part corresponding to the value, soft (-) meaning, sign; indicating a splicing operation.

Preferably, the pre-trained language model comprises: BERT, RoBerta, GPT series model.

In one embodiment, the downstream task is an emotion analysis task, the corresponding initial prefix words are emotion analysis, and each task category of the emotion analysis task is connected to the financial field knowledge map so as to search words related to each task category as tag words; then converting the pre-training language model into an emotion analysis task of the shielding token according to emotion analysis and the label words, and carrying out fine tuning training on the pre-training language model; and finally, inputting the embedded vectors of the predicted text and the emotion analysis into the finely-tuned pre-training language model during application, and taking the predicted values of all the label words and the weighting results of the corresponding weights as emotion analysis prediction results after calculation.

In another embodiment, the downstream task is a relationship extraction task, the corresponding initial prefix words are extracted as relationships, and each task category of the relationship extraction task is connected to the medical field knowledge map so as to search words related to each task category as tag words; then converting the pre-training language model into a relation extraction task of a shielding token according to the relation extraction and the label words, and carrying out fine tuning training on the pre-training language model; and finally, inputting the prediction text and the embedded vector of the relation extraction into the fine-tuned pre-training language model during application, and taking the prediction values of all the label words and the weighting results of the corresponding weights as the relation extraction results after calculation.

In a second aspect, an embodiment provides a natural language processing apparatus based on knowledge-guided prefix fine tuning, including:

the prefix cue word processing module is used for constructing an initial prefix cue word according to a downstream task, and mapping the initial prefix cue word into embedded vectors with the same number as the number of layers of the pre-training language model through a function, wherein the dimension of each embedded vector is 2 times that of the corresponding model layer;

the system comprises a tag word processing module, a knowledge graph and a task processing module, wherein the tag word processing module is used for linking each task category of a downstream task to the knowledge graph and taking words related to each task category in the knowledge graph as tag words;

the fine tuning module is used for converting the pre-training language model into a downstream task of shielding the token according to the prefix cue words and the label words, and performing fine tuning training on the pre-training language model, and comprises the following steps: inputting a training text into a pre-training language model, splitting an embedded vector of a prefix cue word into 2 parts with the same dimensionality as that of a corresponding model layer on each layer, splicing a key value and a value corresponding to the training text respectively, participating in self-attribute calculation, and simultaneously optimizing the embedded vector of the prefix cue word, parameters of the pre-training language model and the weight of a label word by taking the weighted result of all label words corresponding to each task category as a label;

and the application module is used for inputting the embedded vectors of the predicted text and the prefix prompt words into the finely-tuned pre-training language model, and taking the predicted values of all the label words and the weighting results of the corresponding weights as prediction results after calculation.

In a third aspect, an embodiment provides a computing device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the natural language processing method based on knowledge-guided prefix fine tuning described in the first aspect when executing the computer program.

In a fourth aspect, an embodiment provides a computer storage medium, on which a computer program is stored, and when the computer program is processed and executed, the method for natural language processing based on knowledge-guided prefix fine tuning in the first aspect is implemented.

Compared with the prior art, the invention has the beneficial effects that at least:

according to the technical scheme provided by the embodiment, the prefix prompt words related to the downstream task and the label words related to the task categories obtained from the knowledge map are firstly constructed, then the embedded vectors of the prefix prompt words are spliced with the key values and the value values of the input text, and then self-entry calculation is carried out, so that the prefix prompt words and the input text are closely combined for learning, and meanwhile, all the label words are integrated to determine learning labels, namely, body knowledge related to the task categories is used for guiding fine adjustment of the pre-training language model, so that the prediction effect of the fine-adjusted pre-training language model on the downstream task is better, and the prediction accuracy of the pre-training language model is improved. The downstream tasks are emotion analysis tasks and relation extraction tasks, and the emotion analysis accuracy and the relation extraction accuracy improved by the pre-training language model obtained by the corresponding method are adopted.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a natural language processing method based on knowledge-guided prefix hinting provided by an embodiment;

fig. 2 is a schematic structural diagram of a natural language processing apparatus for guiding prefix fine-tuning based on knowledge according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

Aiming at the problem that the emotion analysis task and the relation extraction task are inaccurate by using the fine-tuned pre-training language model, the embodiment provides a fine-tuning mode of the pre-training language model by taking a knowledge body and a prefix prompt word related to the emotion analysis task and the relation extraction task as guidance, and the pre-training language model obtained through the fine-tuning mode can improve the task prediction result.

Fig. 1 is a flowchart of a natural language processing method based on knowledge-guided prefix hinting provided by an embodiment. As shown in fig. 1, the natural language processing method based on knowledge-guided prefix fine tuning provided by the embodiment includes the following steps:

step 1, constructing an initial prefix cue word related to a downstream task, and mapping to obtain an embedded vector.

In an embodiment, the prefix hint words are phrases that are closely related to downstream tasks, and the phrases are text that is composed of at least one word. When the downstream task is an emotion analysis task for each certain text statement (e.g., today stock is all green, and is bad), the prefix hints are emotion analysis. When the downstream task is a relationship extraction task for each text sentence (e.g., external illumination may be effective to improve pain symptoms in patients with chronic pancreatitis), the prefix cue is relationship extraction. After the prefix cue words are initialized, mapping is carried out on the prefix cue words to obtain embedded vectors with the number being the same as the number of layers of the pre-training language model, and in order to realize that the embedded vectors are respectively combined with the key value and the value of each layer of the pre-training language model, the dimension of each embedded vector is required to be 2 times of the dimension of the corresponding model layer.

In an embodiment, the initial prefix hint words may be initially encoded into initial embedded vectors, and then the initial embedded vectors are mapped once by using function mapping to obtain embedded vectors with the same number of layers as the number of layers of the pre-training language model. For example, the pre-trained language model has 10 layers, each layer has a size of 5 × 768, and can be mapped once through the mapping function, and the mapping is directly to an embedded vector of 5 × 768 × 10 × 2, 10 indicates that 10 embedded vectors of 5 × 768 × 2 are obtained, and 2 indicates that the dimension is 2 times the size of 5 × 768 of each layer.

In an embodiment, the initial prefix hint words may also be initially encoded into initial embedded vectors, and the initial embedded vectors are mapped to each layer of the pre-training language model by using multiple layers of MLPs, so as to obtain embedded vectors corresponding to each layer. That is, after multiple times of mapping, the embedded vector corresponding to each layer is obtained, but the dimension of the embedded vector corresponding to each layer is also ensured to be 2 times of the size of each layer.

And 2, linking each task category of the downstream task to the knowledge graph, and taking words related to each task category in the knowledge graph as tag words.

In an embodiment, the task categories are related to downstream tasks, and for the emotion analysis task, the task categories comprise positive emotions, negative emotions and the like. The positive emotion and the negative emotion can be linked to financial field knowledge maps such as a HowNet emotion dictionary, words related to the positive emotion are obtained and used as label words, and for example, good evaluation, excellence, goodness and the like related to the positive emotion can be obtained to form a label word set for constructing task supervision learning labels. For the relationship extraction task, the task category includes radiation therapy and the like. The radiotherapy can be connected to a knowledge map in the medical field such as DiseasKG, Yidu-N7K and the like to obtain a text related to the radiotherapy, wherein the text "external irradiation can effectively improve the pain symptoms of patients with chronic pancreatitis", and tag words of the external irradiation, the chronic pancreatitis and the pain symptoms are extracted from the text.

And 3, converting the pre-training language model into a downstream task of the shielding token according to the prefix prompt words and the label words, and performing fine tuning training on the pre-training language model.

In an embodiment, the pre-training language model may employ a BERT, RoBerta, GPT series model. The models can map input texts to obtain query, key and value vectors, and all contain a self-attribute mechanism to carry out self-attribute calculation.

In an embodiment, the fine tuning training process comprises: inputting a training text into a pre-training language model, splitting an embedded vector of a prefix cue word into 2 parts with the same dimensionality as that of a corresponding model layer on each layer, splicing the 2 parts with a key value and a value corresponding to the training text respectively, participating in self-attribute calculation, and simultaneously optimizing the embedded vector of the prefix cue word, parameters of the pre-training language model and the weight of a label word by taking the weighted result of all label words corresponding to each task category as a label.

In the l-th layer of a pre-trained language model, a representation of an input text sequence

First mapped to the query/key/value vector:

Q^l＝X^lW^Q，K^l＝X^lW^K，V^l＝X^lW^V

wherein, W is a model parameter, and then the calculation mode participating in self-attention is as follows:

wherein Q is^lDenotes the query value, K^lDenotes the key value, V^lA value is represented by a value,

the embedded vector representing the prefix hint word is split into a part corresponding to a value, soft (-) represents and a symbol; indicating a splicing operation.

In an embodiment, a weight is initialized for each tag word, and then each tag word is summed according to the weight to obtain a training tag, for example, when the weights of epi-illumination and chronic pancreatitis and pain symptoms are initialized to 0.2, 0.5 and 0.3, respectively, and the task of masking token prediction is performed on the pre-trained language model, that is, when the vocabulary at the masking token position [ MASK ] in the input text sequence is predicted, the loci of this category are treated with radiation of 0.2 +0.5 + 0.2. Then, based on the embedded vector and the weight vector of the learnable prefix cue word, the parameters of the pre-training language model are finely adjusted on the sample data, and better performance of the pre-training language model can be obtained.

And 4, inputting the embedded vectors of the predicted text and the prefix prompt words into the finely-tuned pre-training language model during application, and taking the predicted values of all the label words and the weighting results of the corresponding weights as prediction results through calculation.

In the embodiment, for the emotion prediction task, the embedded vectors of the predicted text and the prefix prompt words are input into the fine-tuned pre-training language model, and the prediction values of all the label words and the weighting results of the corresponding weights are used as the prediction results of the predicted text after calculation.

The natural language processing method based on knowledge-guided prefix fine tuning provided by the embodiment generates an embedded vector and a tagged word set of a multilayer knowledge prefix cue word based on the downstream task description and the external knowledge base design, and converts the downstream task into a task of masking token prediction.

In the natural language processing method based on knowledge-guided prefix fine tuning provided by the above embodiment, the pre-training language model is a neural network model that is specially used for learning semantic information in a corpus from a large-scale unmarked corpus in an unsupervised manner, and is a complex learning model composed of multiple layers of neural networks, and the pre-training language model can more accurately capture semantic information in a text, thereby improving the accuracy of the model in performing downstream tasks.

In the natural language processing method based on knowledge-guided prefix fine tuning provided by the embodiment, the fine tuning technology based on knowledge-guided prefix is adopted, so that the accuracy and efficiency of downstream tasks can be remarkably improved, the requirements of different applications can be met, the method is not limited to a classification task in natural language processing, and the method is also suitable for a text generation task.

As shown in fig. 2, the embodiment further provides a fine tuning apparatus 200 for a language model, including:

the prefix cue word processing module 201 is configured to construct an initial prefix cue word according to a downstream task, and map the initial prefix cue word into embedded vectors with the same number as the number of layers of the pre-training language model through a function, wherein the dimension of each embedded vector is 2 times that of the corresponding model layer;

the tag word processing module 202 is configured to link each task category of the downstream task to the knowledge graph, and use a word related to each task category in the knowledge graph as a tag word;

the fine tuning module 203 is configured to convert the pre-training language model into a downstream task of masking the token according to the prefix cue word and the tag word, and perform fine tuning training on the pre-training language model, including: inputting a training text into a pre-training language model, splitting an embedded vector of a prefix cue word into 2 parts with the same dimensionality as that of a corresponding model layer on each layer, splicing a key value and a value corresponding to the training text respectively, participating in self-attribute calculation, and simultaneously optimizing the embedded vector of the prefix cue word, parameters of the pre-training language model and the weight of a label word by taking the weighted result of all label words corresponding to each task category as a label;

and the application module 204 is configured to input the embedded vectors of the predicted text and the prefix prompt word into the fine-tuned pre-training language model, and take the prediction values of all the label words and the weighting results of the corresponding weights as prediction results through calculation.

It should be noted that, when the natural language processing apparatus provided in the embodiment performs automatic generation, the division of each functional module is taken as an example, and the above-mentioned function distribution may be performed by different functional modules as needed, that is, the internal structure of the terminal or the server is divided into different functional modules to perform all or part of the above-described functions. In addition, the natural language processing apparatus provided in the embodiment and the natural language processing method embodiment belong to the same concept, and specific implementation procedures thereof are detailed in the natural language processing method embodiment and are not described herein again.

Embodiments also provide a computing device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing a knowledge-guided prefix-fine-tuning-based natural language processing method when executing the computer program.

Embodiments provide a computer storage medium having stored thereon a computer program that, when executed by a processor, implements a natural language processing method based on knowledge-guided prefix hinting.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others.

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A natural language processing method based on knowledge-guided prefix fine tuning is characterized by comprising the following steps:

2. The knowledge-guided prefix-fine-tuning-based natural language processing method of claim 1, wherein the functionally mapping the initial prefix hints to the same number of embedded vectors as the number of layers of the pre-trained language model comprises:

3. The knowledge-guided prefix-fine-tuning-based natural language processing method of claim 1, wherein the functionally mapping the initial prefix hints to the same number of embedded vectors as the number of layers of the pre-trained language model comprises:

4. The natural language processing method based on knowledge-guided prefix fine tuning of claim 1, wherein when the pre-trained language model is subjected to fine tuning training, the calculation mode of participating in self-attention is as follows:

embedded vector splitting to represent prefix hintsThe portion corresponding to the key value is come out,

5. The knowledge-guided prefix-fine-tuning-based natural language processing method of claim 1, wherein the pre-trained language model comprises: BERT, RoBerta, GPT series model.

6. The natural language processing method based on knowledge-guided prefix refinement of claim 1, wherein the downstream task is an emotion analysis task, the corresponding initial prefix word is emotion analysis, and each task category of the emotion analysis task is connected to a financial domain knowledge graph to search for words related to each task category as tag words; then converting the pre-training language model into an emotion analysis task of the shielding token according to emotion analysis and the label words, and carrying out fine tuning training on the pre-training language model; and finally, inputting the embedded vectors of the predicted text and the emotion analysis into the finely-tuned pre-training language model during application, and taking the predicted values of all the label words and the weighting results of the corresponding weights as emotion analysis prediction results after calculation.

7. The natural language processing method based on knowledge-guided prefix refinement of claim 1, wherein the downstream task is a relationship extraction task, the corresponding initial prefix words are relationship extractions, each task category of the relationship extraction task is connected to a medical field knowledge graph to search for words related to each task category as tag words; then converting the pre-training language model into a relation extraction task of a shielding token according to the relation extraction and the label words, and carrying out fine tuning training on the pre-training language model; and finally, inputting the prediction text and the embedded vector of the relation extraction into the fine-tuned pre-training language model during application, and taking the prediction values of all the label words and the weighting results of the corresponding weights as the relation extraction results after calculation.

8. A natural language processing apparatus that directs prefix hinting based on knowledge, comprising:

9. A computing device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the natural language processing method based on knowledge-guided prefix refinement of any one of claims 1-7 when executing the computer program.

10. A computer storage medium having a computer program stored thereon, wherein the computer program when executed is configured to implement the natural language processing method for knowledge-based guided prefix refinement of any of claims 1-7.