CN114896395A

CN114896395A - Language model fine-tuning method, text classification method, device and equipment

Info

Publication number: CN114896395A
Application number: CN202210443857.5A
Authority: CN
Inventors: 谭传奇; 黄非; 黄松芳; 张宁豫; 李泺秋; 陈想; 邓淑敏; 毕祯; 陈华钧
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-08-12

Abstract

The embodiment of the application provides a language model fine-tuning method, a text classification method, a device and equipment. The fine tuning method comprises the following steps: obtaining a first input word vector, the first input word vector comprising: training sample word vectors corresponding to the training samples, initial template word vectors corresponding to the template words and a first mask; inputting the first input word vector into a pre-training language model to obtain a first predicted word vector corresponding to the first mask; obtaining a first loss value based on the first predicted word vector, the initial label word vectors corresponding to the preset label words and the real label word vectors of the training samples; and training the pre-training language model, the initial template word vector and the initial label word vector based on the first loss value to obtain a trained language model, template word vector and label word vector. According to the embodiment of the application, the manual workload can be reduced, and meanwhile, the prediction performance of the finally obtained language model is improved.

Description

Language model fine-tuning method, text classification method, device and equipment

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a language model fine-tuning method, a text classification method, a device and equipment.

Background

With the continuous development of natural language processing technology, model pre-training and fine-tuning become a training paradigm of natural language processing models (language models), and make a significant breakthrough on a large number of reference data sets and domain tasks such as language understanding, question answering and the like. However, during the pre-training and fine-tuning process, due to the inconsistency of the optimization targets at different stages, different downstream classification tasks need to adjust the parameters in the pre-training language model.

To avoid the above problems, a prompt learning technique of a language model is developed, specifically: after the pre-training language model is obtained, model fine adjustment is carried out through given template words (cue words) and a small number of samples, so that different downstream classification tasks can share a fixed and unchangeable parameter pre-training language model. The fine-tuning technology converts the downstream classification task into a complete blank filling task consistent with an optimization target in a pre-training stage by constructing template words and label words suitable for the downstream classification task, so that the knowledge learned in the pre-training process of a language model is utilized as much as possible. For example: if the downstream task is an emotion classification task, the input is 'nothing is disclosed in the drama', a template word 'this sentence is _' can be added after the input, and then the template word is input into the pre-training language model, and because the form of the template word is the same as that of the model pre-training input, what the model needs to complete is a shape filling task in the pre-training stage: the filler words are output, and then the input information is classified according to the similarity between the output filler words (usually words of emotion class, such as 'good' and the like) and the label words (such as 'positive' and 'negative').

The selection of appropriate template words and label words is a key influencing the prompt learning performance, and in the prompt learning scheme at the present stage, discrete template words and label words are usually constructed in a manual mode, but for a large number of different downstream classification tasks, the above mode is very difficult to select appropriate template words and label words, and further, the prediction performance of a language model may be influenced.

Disclosure of Invention

In view of the above, embodiments of the present application provide a language model fine-tuning method, a text classification method, a device and an apparatus, so as to at least partially solve the above problems.

According to a first aspect of embodiments of the present application, there is provided a method for fine tuning a language model, including:

obtaining a first input word vector, the first input word vector comprising: training sample word vectors corresponding to the training samples, initial template word vectors corresponding to the template words and a first mask;

inputting the first input word vector into a pre-training language model to obtain a first predicted word vector corresponding to the first mask;

obtaining a first loss value based on the first predicted word vector, the initial label word vectors corresponding to the preset label words and the real label word vectors of the training samples; and training the pre-training language model, the initial template word vector and the initial label word vector based on the first loss value to obtain a trained language model, template word vector and label word vector.

According to a second aspect of embodiments of the present application, there is provided a text classification method, including:

acquiring a target text to be classified;

obtaining a target word vector, wherein the target word vector comprises: target text word vectors corresponding to the target texts, template word vectors corresponding to the template words and masks;

inputting the target word vector into a pre-trained language model to obtain a predicted word vector corresponding to the mask;

determining category labels of the target text based on the similarity between the predicted word vectors and the label word vectors corresponding to the preset label words;

wherein the language model, the template word vector and the label word vector are obtained by the language model fine tuning method of the first aspect.

According to a third aspect of the embodiments of the present application, there is provided a language model fine-tuning apparatus, including:

a first obtaining module, configured to obtain a first input word vector, where the first input word vector includes: training sample word vectors corresponding to the training samples, initial template word vectors corresponding to the template words and a first mask;

the first prediction module is used for inputting the first input word vector into a pre-training language model to obtain a first prediction word vector corresponding to the first mask;

the training module is used for obtaining a first loss value based on the first predicted word vector, the initial label word vectors corresponding to the preset label words and the real label word vectors of the training samples; and training the pre-training language model, the initial template word vector and the initial label word vector based on the first loss value to obtain a trained language model, template word vector and label word vector.

According to a fourth aspect of embodiments of the present application, there is provided a text classification apparatus including:

the target text acquisition module is used for acquiring a target text to be classified;

a target word vector obtaining module, configured to obtain a target word vector, where the target word vector includes: target text word vectors corresponding to the target texts, template word vectors corresponding to the template words and masks;

a predicted word vector obtaining module, configured to input the target word vector into a pre-trained language model, so as to obtain a predicted word vector corresponding to the mask;

the category label determining module is used for determining category labels of the target text based on the similarity between the predicted word vectors and the label word vectors corresponding to the preset label words;

According to a fifth aspect of embodiments of the present application, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the language model fine-tuning method in the first aspect or the operation corresponding to the text classification method in the second aspect.

According to a sixth aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a language model fine-tuning method as described in the first aspect, or a text classification method as described in the second aspect.

According to the language model fine-tuning method, the text classification device and the language model fine-tuning equipment, the template word vectors corresponding to the template words and the label word vectors corresponding to the label words are used as differentiable and trainable parameters, and the template word vectors and the label word vectors are optimized and learned while fine-tuning the internal parameters of the language model, so that the optimized representation of the template words and the label words suitable for a downstream classification task is automatically searched in a continuous space. Therefore, the method and the device for predicting the language model can reduce the manual workload and improve the prediction performance of the finally obtained language model, namely, the accuracy of the output result is higher after the pre-training language model is subjected to small-sample fine adjustment.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a flowchart illustrating steps of a method for fine tuning a language model according to an embodiment of the present application;

FIG. 2 is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 1;

FIG. 3 is a flowchart illustrating steps of a method for fine tuning a language model according to a second embodiment of the present application;

FIG. 4 is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 3;

FIG. 5 is a flowchart illustrating steps of a method for classifying texts according to a third embodiment of the present application;

FIG. 6 is a block diagram of a language model fine tuning apparatus according to a fourth embodiment of the present application;

fig. 7 is a block diagram of a text classification apparatus according to a fifth embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

Example one

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a method for fine tuning a language model according to an embodiment of the present application. Specifically, the language model fine-tuning method provided by the embodiment includes the following steps:

step 102, obtaining a first input word vector, wherein the first input word vector comprises: training sample word vectors corresponding to the training samples, initial template word vectors corresponding to the template words, and a first mask.

Specifically, a text for performing model fine tuning may be obtained as a training sample, and the training sample may be subjected to word segmentation; and then determining the corresponding word element vector of each participle in the training sample according to the corresponding word element vector of each word element in the vocabulary of the language model (pre-training language model) after pre-training, and further combining the word element vector of the training sample corresponding to the training sample.

In the embodiment of the present application, the template word may be any preset lemma, and further, the template word may be any preset lemma without semantics (i.e., non-semantic lemma). In addition, the number of template words may be pre-selected according to actual needs or experience, and is not limited herein.

Each template word corresponds to an initial template word vector, the value of the initial template word vector can be set at will, and continuous optimization is carried out in the subsequent model fine tuning process.

And 104, inputting the first input word vector into the pre-training language model to obtain a first predicted word vector corresponding to the first mask.

In the embodiment of the present application, the specific structure of the pre-training language model is not limited, for example: can be a transformer-based language model, such as: GPT2, T5, and so on.

Step 106, obtaining a first loss value based on the first predicted word vector, the initial label word vectors corresponding to the preset label words and the real label word vectors of the training samples; and training the pre-training language model, the initial template word vector and the initial label word vector based on the first loss value to obtain a trained language model, template word vector and label word vector.

In the embodiment of the present application, similar to the template word, the preset tag word may also be any preset lemma, and further, the preset tag word may also be any preset lemma without semantics (i.e., non-semantic lemma). The number of preset tag words may be set according to the total category number of the downstream classification tasks, for example: and if the downstream task is a classified emotion analysis task, presetting the number of the label words to be 2, and the like.

Each preset label word corresponds to an initial label word vector, the value of the initial label word vector can be set at will, and continuous optimization is carried out based on the first loss value.

In the embodiment of the present application, when a first loss value is obtained based on the first predicted word vector, the initial label word vector corresponding to each preset label word, and the real label word vector of the training sample, any classification loss function may be used for calculation, for example: a negative log-likelihood loss function, a cross-entropy loss function, an exponential loss function, a squared loss function, and the like, and here, the specific loss function employed in calculating the first loss value is not limited.

After the first loss value is obtained, the internal parameters of the pre-trained language model, the initial template word vector and the specific values of the initial label word vector can be adjusted and trained based on the first loss value, so that the trained language model, the trained template word vector and the trained label word vector are obtained, and the actual downstream text classification task can use the training result.

Referring to fig. 2, fig. 2 is a schematic view of a corresponding scenario in the embodiment of the present application, and the following describes the embodiment of the present application with a specific scenario example by referring to the schematic view shown in fig. 2:

referring to fig. 2, a training sample "this drama is not disclosed, and template words T1, T2, T3 and a first mask located after the template words are added to the training sample (in the embodiment of the present application, the number of the template words and the positional relationship between the first mask and each template word are not limited, the first mask may be located at a certain position between the template words, or located after the template words, or located before the template words, and fig. 2 only shows 3 template words, and the first mask is located after the template words, which does not constitute a limitation of the embodiment of the present application); obtaining a first input word vector, the first input word vector comprising: training sample word vectors corresponding to the training samples, initial template word vectors corresponding to the template words, and a first mask, wherein the training sample word vectors may be a combination of the lemma vectors of each lemma included in the training samples, specifically: "e (this) e (field) e (drama) e (what) e (also) e (no) e (disclosure)", where "e (this)" represents the lemma vector corresponding to the lemma, "e (field)" represents the lemma vector corresponding to the lemma, "e (drama)" represents the lemma vector corresponding to the lemma, "e (what)" represents the lemma vector corresponding to the lemma, "e (also)" represents the lemma vector corresponding to the lemma, "e (no)" represents the lemma vector corresponding to the lemma, "e (disclosure)" represents the lemma vector corresponding to the lemma "no," and "e (disclosure)" represents the lemma vector corresponding to the lemma "disclosure"; the initial template word vectors are respectively: a template word vector h ([ T1]) corresponding to the template word T1, a template word vector h ([ T2]) corresponding to the template word T2, and a template word vector h ([ T3]) corresponding to the template word T3. In addition, "e ([ first mask ])" in fig. 2 represents a vector representation corresponding to "first mask"; e ([ step ]) represents a vector representation corresponding to the segmentation symbol of the text sentence.

After the first input word vector is obtained, inputting the first input word vector into a pre-training language model, and obtaining a first predicted word vector corresponding to the first mask through the pre-training language model; and then based on the first predicted word vector and the initial label word vectors corresponding to the label words: the initial tagword vector h ([ Y1]) corresponding to the preset tagword Y1 "positive", the initial tagword vector h ([ Y2]) corresponding to the preset tagword Y2 "negative" (the preset tagwords and the number thereof can be set according to a specific downstream classification task, only 2 preset tagwords are taken as an example in fig. 2 and do not constitute a limitation to the embodiment of the present application), and the real tagword vector of the training sample: h ([ Y2]), obtaining a first loss value; further, the pre-trained language model, the initial template word vectors h ([ T1]), h ([ T2]), h ([ T3]), and the initial tag word vectors h ([ Y1]), h ([ Y2]) are trained based on the first loss value, and the trained language model, the template word vectors h ([ T1]), h ([ T2]), h ([ T3]), and the tag word vectors h ([ Y1]), h ([ Y2]) are obtained.

According to the language model fine-tuning method provided by the embodiment of the application, the template word vector corresponding to the template word and the label word vector corresponding to the label word are used as differentiable and trainable parameters, and the template word vector and the label word vector are optimized and learned while the internal parameters of the language model are fine-tuned, so that the optimized representation of the template word and the label word suitable for a downstream classification task is automatically searched in a continuous space. Therefore, the method and the device for predicting the language model can reduce the manual workload and improve the prediction performance of the finally obtained language model, namely, the accuracy of the output result is higher after the pre-training language model is subjected to small-sample fine adjustment.

The language model fine tuning method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: servers, PCs, etc.

Example two

Referring to fig. 3, fig. 3 is a flowchart illustrating steps of a method for fine tuning a language model according to a second embodiment of the present application. Specifically, the language model fine-tuning method provided by the embodiment includes the following steps:

step 302, obtaining a first input word vector, where the first input word vector includes: training sample word vectors corresponding to the training samples, initial template word vectors corresponding to the template words and a first mask.

For a specific implementation of this step, reference may be made to relevant contents of step 102 in the first embodiment, and details are not described here.

Step 304, inputting the first input word vector into the pre-training language model to obtain a first predicted word vector corresponding to the first mask.

Step 306, obtaining a first loss value based on the first predicted word vector, the initial label word vectors corresponding to the preset label words, and the real label word vectors of the training samples.

For the specific implementation of step 302 to step 306, reference may be made to the relevant contents in step 102 to step 106 in the first embodiment, which are not described herein again.

Optionally, in some embodiments, a cross-entropy loss function may be used to obtain the first loss value, specifically:

and obtaining a first loss value through a cross entropy loss function based on the first predicted word vector, the initial label word vector corresponding to each preset label word and the real label word vector of the training sample.

Taking the mean square error loss function as an example for comparison, the sigmoid derivative exists in the partial derivative result of the parameter by the mean square error loss function, and the sigmoid derivative approaches 0 when the variable value is large or small, so the partial derivative result is likely to approach 0, when the partial derivative result is small, the parameter updating speed becomes slow, and when the partial derivative result approaches 0, the parameter is hardly updated, therefore, the parameter updating speed of the mean square error loss function is slow. And the partial derivative result of the cross entropy to the parameter does not contain a sigmoid derivative, so that the updating speed of the parameter is high, and the efficiency of model training can be improved.

Optionally, in some embodiments, the template word and the preset tag word are non-semantic lemmas that exist and are not used in the vocabulary corresponding to the pre-training language model.

Since the vocabulary corresponding to the pre-trained language model usually includes unused non-semantic lemmas, these non-semantic lemmas, although they are not used in the model prediction process, still belong to parameters in the model and need to be trained and fine-tuned, that is, in the embodiment of the present application, the parameters to be trained include: parameters (including the word vector of the non-semantic lemma), template word vectors and label word vectors in the pre-training language model. Therefore, in the embodiment of the present application, when the template word and the tag word are preset, the above-mentioned non-semantic lemma in the vocabulary may be directly used, so that in the process of model fine tuning, the total number of the parameters to be trained may be reduced, and the parameters to be trained only include: parameters in the language model are pre-trained. Therefore, the efficiency of language model fine tuning can be improved.

Step 308, obtaining a second input word vector, wherein the second input word vector comprises: mask sample word vectors corresponding to the mask samples, initial template word vectors and real tag word vectors.

The mask sample is obtained by performing mask processing on the training sample; the mask sample word vector includes a second mask.

In the embodiment of the application, the number of the mask participles and the specific mask strategy are not limited, and can be set by self-definition according to actual needs.

And 310, inputting the second input word vector into the pre-training language model to obtain a second predicted word vector corresponding to the second mask.

Step 312, a second loss value is obtained based on the second predicted word vector, the lemma vector of each lemma in the vocabulary corresponding to the pre-training language model, and the real lemma vector corresponding to the second mask.

Optionally, in some embodiments, the second loss value may be obtained through a cross entropy loss function based on the second predicted word vector, the lemma vector of each lemma in the vocabulary corresponding to the pre-training language model, and the real lemma vector corresponding to the second mask.

In the embodiment of the application, the order of obtaining the first loss value and obtaining the second loss value is not limited, and the first loss value may be obtained first, and then the second loss value is obtained; or the second loss value can be obtained first, and then the first loss value is obtained; it is also possible to obtain the first loss value and the second loss value simultaneously.

And step 314, fusing the first loss value and the second loss value to obtain a fused loss value.

In the embodiment of the application, the specific fusion strategy is not limited, and the user-defined setting can be carried out according to actual needs.

Optionally, in some embodiments, the fusion loss value may be obtained by:

acquiring a first weight value corresponding to the first loss value and a second weight value corresponding to the second loss value;

and performing weighted fusion on the first loss value and the second loss value based on the first weight value and the second weight value to obtain a fusion loss value.

Specifically, in the embodiment of the present application, the obtaining manner of the first weight value and the second weight value is not limited. For example: both of which may be preset constants, etc.

And 316, training the pre-training language model, the initial template word vector and the initial label word vector based on the fusion loss value to obtain the trained language model, the template word vector and the label word vector.

Referring to fig. 4, fig. 4 is a schematic view of a corresponding scenario in the embodiment of the present application, and the following describes the embodiment of the present application with a specific scenario example by referring to the schematic view shown in fig. 4:

the fine tuning process shown in fig. 4 is further improved based on the fine tuning process of the model shown in fig. 2. Specifically, the method comprises the following steps: the left branch is a branch for obtaining the first loss value, and the specific content can be referred to fig. 2, which is not described herein again.

The right branch of fig. 4 is explained in detail below:

performing random masking processing on a training sample "nothing is disclosed in the drama" to obtain a masking sample "nothing is disclosed in the drama [ second mask ]" (in the embodiment of the present application, masking is performed only on the drama "in the training sample, and no limitation is formed on the masking policy), and further obtaining a masking sample word vector corresponding to the masking sample: "e (this) e ([ second mask ]) e (drama) e (what) e (also) e (not) e (disclosure)"

Obtaining a second input word vector, wherein the second input word vector comprises: the mask sample word vector and the initial template word vector are as follows: h ([ T1]), h ([ T2]), h ([ T3]), and, the true tagline vector h ([ Y ]) for the training sample. Then, inputting the obtained vector under the second input word into a pre-training language model to obtain a second predicted word vector corresponding to a second mask; obtaining word vectors of each token in the vocabulary, as shown in fig. 2: e (…), e (drama), … …, e (report); based on the second predicted word vector, the above-mentioned word element vectors, and the real word element vector corresponding to the second mask: e (drama), obtaining a second loss value; to this end, two loss values are obtained: and then fusing the first loss value and the second loss value to obtain a fused loss value, and training a pre-trained language model, an initial template word vector h ([ T1]), h ([ T2]), h ([ T3]), and initial tag word vectors h ([ Y1]), h ([ Y2]) based on the fused loss value to obtain a trained language model, a template word vector h ([ T1]), h ([ T2]), h ([ T3]), and tag word vectors h ([ Y1]), h ([ Y2 ]).

According to the method and the device, the template word vectors corresponding to the template words and the label word vectors corresponding to the label words are used as differentiable and trainable parameters, internal parameters of a language model are finely adjusted, and meanwhile, the template word vectors and the label word vectors are optimized and learned, so that the optimized representation of the template words and the label words suitable for a downstream classification task is automatically searched in a continuous space. Therefore, the method and the device can reduce the manual workload and improve the prediction performance of the finally obtained language model, namely, the accuracy of the output result of the pre-training language model is higher after the pre-training language model is subjected to small sample fine adjustment.

Simultaneously, in the fine tuning training process to pre-training language model, template word vector and label word vector, from two different angles simultaneously, carry out the fine tuning training based on corresponding loss value: and performing fine tuning training based on the first loss value from the viewpoint of classification target optimization, and performing fine tuning training based on the second loss value from the viewpoint of text content prediction. Compared with the mode of the first embodiment, due to the introduction of the second loss value obtained from the viewpoint of text content prediction accuracy, the language model can better understand the context in the text, and the performance of the language model is further improved.

In addition, by adopting the embodiment of the application, the pre-training language models of 15 different types of downstream classification tasks are subjected to fine tuning, and the result shows that the embodiment of the application can improve the performance of the language models. In addition, for a specific downstream classification task, when each class label only contains a small number of training samples (for example, 8 training samples), the performance of the language model obtained by training in the manner of the embodiment of the present application may reach 90% of the performance of the language model obtained by using a large number of training samples. That is, the embodiments of the present application can be applied to a wide range of classification tasks, and achieve better low-sample performance.

EXAMPLE III

Referring to fig. 5, fig. 5 is a flowchart illustrating steps of a text classification method according to a third embodiment of the present application. Specifically, the text classification method provided by this embodiment includes the following steps:

step 502, a target text to be classified is obtained.

Step 504, obtaining a target word vector, wherein the target word vector comprises: a target text word vector corresponding to the target text, a template word vector corresponding to the template word, and a mask.

In the embodiment of the present application, the template word vector corresponding to the template word is obtained by training through the method of the first embodiment or the second embodiment.

Step 506, inputting the target word vector into the pre-trained language model to obtain a predicted word vector corresponding to the mask.

The language model in this step is obtained by fine-tuning training by the method of the first embodiment or the second embodiment.

And step 508, determining the category label of the target text based on the similarity between the predicted word vector and the label word vector corresponding to each preset label word.

In this step, the label word vector corresponding to each preset label word is also obtained by training through the method of the first embodiment or the second embodiment.

In the embodiment of the application, the adopted language model is obtained by fine tuning in the following way: the template word vector corresponding to the template word and the label word vector corresponding to the label word are used as differentiable and trainable parameters, internal parameters of the language model are finely adjusted, and meanwhile, the template word vector and the label word vector are optimized and learned, so that the optimized representation of the template word and the label word suitable for a downstream classification task is automatically searched in a continuous space, and therefore the language model in the embodiment of the application has better prediction performance, namely, the accuracy of the output result of the model is higher. Furthermore, the accuracy of the category label of the obtained target text is higher when the text is classified based on the language model obtained in the above way.

The text classification method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: servers, PCs, etc.

Example four

Referring to fig. 6, fig. 6 is a block diagram illustrating a language model fine-tuning apparatus according to a fourth embodiment of the present application. The language model fine-tuning device provided by the embodiment of the application comprises:

a first obtaining module 602, configured to obtain a first input word vector, where the first input word vector includes: training sample word vectors corresponding to the training samples, initial template word vectors corresponding to the template words and a first mask;

a first prediction module 604, configured to input the first input word vector into the pre-training language model to obtain a first predicted word vector corresponding to the first mask;

the training module 606 is configured to obtain a first loss value based on the first predicted word vector, the initial label word vectors corresponding to the preset label words, and the real label word vectors of the training samples; and training the pre-training language model, the initial template word vector and the initial label word vector based on the first loss value to obtain a trained language model, template word vector and label word vector.

Optionally, in some embodiments, the language model fine tuning apparatus further includes:

a second obtaining module, configured to obtain a second input word vector, where the second input word vector includes: mask sample word vectors, initial template word vectors and real label word vectors corresponding to the mask samples; the mask sample is a sample obtained by performing mask processing on the training sample; the mask sample word vector comprises a second mask;

the second prediction module is used for inputting the second input word vector into the pre-training language model to obtain a second prediction word vector corresponding to the second mask;

a second loss value obtaining module, configured to obtain a second loss value based on the second predicted word vector, the lemma vector of each lemma in the vocabulary corresponding to the pre-training language model, and the real lemma vector corresponding to the second mask;

the fusion module is used for fusing the first loss value and the second loss value to obtain a fusion loss value;

the training module 606, when executing the step of training the pre-trained language model, the initial template word vector, and the initial label word vector based on the first loss value to obtain the trained language model, the template word vector, and the label word vector, is specifically configured to:

and training the pre-training language model, the initial template word vector and the initial label word vector based on the fusion loss value to obtain a trained language model, template word vector and label word vector.

Optionally, in some embodiments, when the step of obtaining the first loss value based on the first predicted word vector, the initial tagged word vector corresponding to each preset tagged word, and the real tagged word vector of the training sample is executed by the training module 606, specifically, the step is to:

obtaining a first loss value through a cross entropy loss function based on the first predicted word vector, the initial label word vectors corresponding to the preset label words and the real label word vectors of the training samples;

and the second loss value obtaining module is specifically used for obtaining a second loss value through a cross entropy loss function based on the second predicted word vector, the lemma vector of each lemma in the vocabulary corresponding to the pre-training language model and the real lemma vector corresponding to the second mask.

Optionally, in some embodiments, the fusion module is specifically configured to:

The language model fine-tuning device in the embodiment of the present application is used to implement the corresponding language model fine-tuning method in the first or second embodiment of the foregoing method, and has the beneficial effects of the corresponding method embodiment, which are not described herein again. In addition, the functional implementation of each module in the language model tuning device in the embodiment of the present application can refer to the description of the corresponding part in the foregoing method embodiment one or embodiment two, and is not repeated here.

EXAMPLE five

Referring to fig. 7, fig. 7 is a block diagram of a text classification apparatus according to a fifth embodiment of the present application. The text classification device provided by the embodiment of the application comprises:

a target text obtaining module 702, configured to obtain a target text to be classified;

a target word vector obtaining module 704, configured to obtain a target word vector, where the target word vector includes: target text word vectors corresponding to the target texts, template word vectors corresponding to the template words and masks;

a predicted word vector obtaining module 706, configured to input the target word vector into a pre-trained language model, so as to obtain a predicted word vector corresponding to the mask;

a category label determination module 708, configured to determine a category label of the target text based on a similarity between the predicted word vector and a label word vector corresponding to each preset label word;

the language model, the template word vector and the label word vector are obtained by the method of the first embodiment or the second embodiment.

The text classification device in the embodiment of the present application is used to implement the corresponding text classification method in the third embodiment of the foregoing method, and has the beneficial effects of the corresponding method embodiment, which are not described herein again. In addition, the functional implementation of each module in the text classification device in the embodiment of the present application can refer to the description of the corresponding part in the third method embodiment, and is not repeated here.

EXAMPLE six

Referring to fig. 8, a schematic structural diagram of an electronic device according to a sixth embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 8, the electronic device may include: a processor (processor)802, a Communications Interface 804, a memory 806, and a communication bus 808.

Wherein:

the processor 802, communication interface 804, and memory 806 communicate with one another via a communication bus 808.

A communication interface 804 for communicating with other electronic devices or servers.

The processor 802, configured to execute the program 810, may specifically execute the language model fine tuning method described above, or related steps in the text classification method embodiment.

In particular, the program 810 may include program code comprising computer operating instructions.

The processor 802 may be a CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 806 stores a program 810. The memory 806 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 810 may be specifically configured to cause the processor 802 to perform the following operations: obtaining a first input word vector, the first input word vector comprising: training sample word vectors corresponding to the training samples, initial template word vectors corresponding to the template words and a first mask; inputting the first input word vector into a pre-training language model to obtain a first predicted word vector corresponding to the first mask; obtaining a first loss value based on the first predicted word vector, the initial label word vectors corresponding to the preset label words and the real label word vectors of the training samples; and training the pre-training language model, the initial template word vector and the initial label word vector based on the first loss value to obtain the trained language model, the trained template word vector and the trained label word vector.

Alternatively, the program 810 may be specifically configured to cause the processor 802 to perform the following operations: acquiring a target text to be classified; obtaining a target word vector, wherein the target word vector comprises: target text word vectors corresponding to the target texts, template word vectors corresponding to the template words and masks; inputting the target word vector into a language model which is trained in advance to obtain a predicted word vector corresponding to the mask; determining category labels of the target text based on the similarity between the predicted word vectors and the label word vectors corresponding to the preset label words; the language model, the template word vector and the label word vector are obtained by the method of the first embodiment or the second embodiment.

For specific implementation of each step in the program 810, reference may be made to the above embodiment of the language model fine tuning method, or corresponding descriptions in corresponding steps and units in the embodiment of the text classification method, which is not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

By the electronic device of the embodiment, the template word vector corresponding to the template word and the label word vector corresponding to the label word are used as differentiable and trainable parameters, and the template word vector and the label word vector are optimized and learned while the internal parameters of the language model are finely adjusted, so that the optimized representation of the template word and the label word suitable for the downstream classification task is automatically searched in a continuous space. Therefore, the method and the device for predicting the language model can reduce the manual workload and improve the prediction performance of the finally obtained language model, namely, the accuracy of the output result is higher after the pre-training language model is subjected to small-sample fine adjustment.

The embodiment of the present application further provides a computer program product, which includes a computer instruction, where the computer instruction instructs a computing device to execute an operation corresponding to any one of the language model fine-tuning methods in the foregoing method embodiments, or an operation corresponding to a text classification method.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the language model tuning methods described herein, or alternatively, the text classification methods. Further, when a general-purpose computer accesses code for implementing the language model fine-tuning methods, or text classification methods, illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the language model fine-tuning methods, or text classification methods, illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. A method of language model fine tuning, comprising:

2. The method of claim 1, wherein the method further comprises:

obtaining a second input word vector, wherein the second input word vector comprises: mask sample word vectors corresponding to mask samples, the initial template word vectors and the real label word vectors; the mask sample is obtained by performing mask processing on the training sample; the mask sample word vector comprises a second mask;

inputting the second input word vector into the pre-training language model to obtain a second predicted word vector corresponding to the second mask;

obtaining a second loss value based on the second predicted word vector, the lemma vector of each lemma in the vocabulary corresponding to the pre-training language model and the real lemma vector corresponding to the second mask;

fusing the first loss value and the second loss value to obtain a fused loss value;

the training of the pre-training language model, the initial template word vector and the initial label word vector based on the first loss value to obtain a trained language model, template word vector and label word vector includes:

and training the pre-training language model, the initial template word vector and the initial label word vector based on the fusion loss value to obtain a trained language model, a template word vector and a label word vector.

3. The method of claim 2, wherein the obtaining a first loss value based on the first predicted word vector, the initial tagged word vector corresponding to each preset tagged word, and the true tagged word vector of the training sample comprises:

obtaining a second loss value based on the second predicted word vector, the lemma vector of each lemma in the vocabulary corresponding to the pre-training language model, and the real lemma vector corresponding to the second mask, including:

and obtaining a second loss value through a cross entropy loss function based on the second predicted word vector, the word element vector of each word element in the vocabulary corresponding to the pre-training language model and the real word element vector corresponding to the second mask.

4. The method of claim 2 or 3, wherein said fusing the first loss value and the second loss value to obtain a fused loss value comprises:

5. The method according to claim 1, wherein the template word and the pre-set tag word are non-semantic lemmas that exist and are not used in a vocabulary corresponding to the pre-training language model.

6. A method of text classification, comprising:

acquiring a target text to be classified;

wherein the language model, the template word vector and the tag word vector are obtained by any one of the methods of claims 1-5.

7. A language model fine-tuning apparatus comprising:

8. A text classification apparatus comprising:

9. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the language model fine adjustment method according to any one of claims 1-5 or the operation corresponding to the text classification method according to claim 6.

10. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements a language model fine-tuning method as claimed in any one of claims 1 to 5, or implements a text classification method as claimed in claim 6.

11. A computer program product comprising computer instructions for instructing a computing device to perform operations corresponding to the language model fine-tuning method of any one of claims 1 to 5, or to perform operations corresponding to the text classification method of claim 6.