CN115455144A

CN115455144A - Data enhancement method of completion type space filling type for small sample intention recognition

Info

Publication number: CN115455144A
Application number: CN202211071426.7A
Authority: CN
Inventors: 陈洪辉; 张鑫; 蔡飞; 江苗; 郑建明; 宋城宇; 邵太华; 郭昱普; 王梦如
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2022-12-09

Abstract

One or more embodiments in the present application provide a data enhancement method for a full type space filling for small sample intention identification, which includes: a complete gap-filling type data enhancement task is constructed based on a pre-training language model and used for intention recognition, an unsupervised learning method is adopted to enable the result of data enhancement to be semantically similar to an original input sentence, then a supervised contrast learning method is adopted based on metric classification to enable intention samples of the same category to be closer to each other in an embedding space, and intention samples of different categories are further away from each other, and the contrast learning method comprises prototype-level contrast learning and example-level contrast learning. Meaningful data is generated without destroying syntactic structure and increasing noise, limited data is fully utilized and separable embedding is achieved. A better distance distribution in the embedding space is obtained, thereby improving the performance of the metric-based classification method.

Description

Data enhancement method of completion type space filling type for small sample intention recognition

Technical Field

The invention belongs to the technical field of intention identification, and particularly relates to a complete type gap-filling data enhancement method for small sample intention identification.

Background

Intent recognition aims at recognizing the potential intent of a user from the user's utterance, which is a key component in task-oriented dialog systems. However, one practical challenge is that the number of intent categories grows faster than the speed of manually annotating data, resulting in a small amount of data available for many new intent categories. This lack of data causes the traditional deep neural network to be over-fitted on a small amount of training data, which seriously affects the practical application. Therefore, some researchers have proposed small sample learning to solve the data scarcity problem. An effective approach is text data enhancement, but it always generates noise or meaningless data.

In real-world applications, new intent classes are emerging rapidly and only have limited data labeled in order to be difficult to directly apply to optimizing existing deep neural networks. These networks typically include a pre-trained language model as their backbone to encode the text data into continuous low-dimensional vectors such as BERT and RoBERTa. Such models always have a complex architecture with multiple layers, and therefore have a very significant number of parameters. If on the basis of the traditional training paradigm, parameters of the deep neural network model are updated by directly utilizing a small amount of training data, the model can only capture local features, so that the problems of insufficient generalization capability and overfitting are caused, namely the model is well represented on a training set and is not well represented on a test set. To address such issues, snell et al propose a sample-less learning (FSL) strategy to help the model obtain generalization capability only in limited data. The above researchers consider small sample intent recognition as a meta-learning problem. It simulates a small sample scene through a series of small meta-tasks. The method is widely applied to the field of small sample text classification tasks such as relation classification, event detection and intention detection.

One major challenge is that the small sample learning method based on meta-learning still easily falls into the dilemma of overfitting on the bias distribution due to the limited training samples. Some researchers have attempted to prevent the overfitting problem by data enhancement methods. One of the key ideas is translation: the other language representation of the input text is translated into the initial language. Another common approach is to use an external knowledge base to obtain an expression that is semantically similar to the original sentence. In particular, rather than reordering words in sentences, dopierre et al introduced multiple knowledge bases to generate different paraphrase sentences of the original input. However, although the translation back may generate different expressions of the same semantics, it does not work well in short text. It generates an expression that is generally similar or even identical to the original input sentence. With regard to the paraphrase generation method, we consider it not suitable for text enhancement in all fields, because it is not always possible to find a corresponding external knowledge base. Also, previous data enhancement related methods, such as those proposed by Liu et al, are inefficient to train and difficult to expand to tasks with a large number of intents.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a data enhancement method of a complete fill-in type for small sample intent recognition, so as to solve at least one of the above problems in the prior art.

In view of the above, one or more embodiments of the present application provide a data enhancement method for a full space filling for small sample intention recognition, which includes: the method comprises the steps of constructing a complete space-filling type intention recognition task based on a pre-training language model for data enhancement, adopting an unsupervised learning method to enable the result of the data enhancement to be similar to an original input sentence in semanteme, then adopting a supervised comparative learning method based on metric classification to enable intention samples of the same category to be closer to each other in an embedding space, and enabling intention samples of different categories to be farther away from each other, wherein the comparative learning method comprises the comparative learning of a prototype level and the comparative learning of an example level.

Based on the technical scheme of the invention, the following improvements can be made:

optionally, the unsupervised learning method includes: the method comprises the steps that a pre-training language model is used as a feature extractor, input words with set proportion are covered by marks, and the covered marks are predicted according to the semantics of the context of an input sentence; the feature extractor encodes the sentences in the completed gap-filling type intention recognition task into hidden layer vector representation after adding two special marks, wherein the two special marks respectively represent the beginning and the end of the sentences in the completed gap-filling type intention recognition task, and the process is represented by the following formula:

wherein F (-) is a feature extractor, T is a complete fill-in template, [ MASK ]]For the label of the word to be masked, pat (T, x) is the data enhancement mode, x is the input sentence,

represents the hidden layer vector, [ CLS]And [ SEP ]]Respectively representing a start and an end;

vector of hidden layer

Words considered to be masked [ MASK]Is shown by

Viewed as a sentence representation generated from the schema Pat (T, x), in the semantic senseThe above is similar to the input sentence x, repeating equation (1) on all input samples yields corresponding data enhancement results.

Optionally, the pre-training language model is constrained by a loss function, so as to weaken the vector which is not matched with the semantics of the input sentence, and finally obtain a proper data enhancement result.

Optionally, without introducing any external knowledge and labels, the input sentence x is fed into a pre-trained language model to obtain a low-dimensional vector representation thereof, which is expressed as:

wherein the hidden layer vector

Expressed as the whole sentence x, by the following loss function:

for zooming out

And

the distance between them.

Optionally, after the unsupervised and complete gap-fill data enhancement is completed, the metric-based prototype network is used as a classifier to check the effect of the data enhancement.

Optionally, an average representation of samples in the same category is first calculated, and the average representation is used as a prototype of the category:

wherein, c _i Presentation classPrototype representation of other, K _i Representing a supporting dataset under a current metatask T

The number of samples of the medium category i,

is a representation of the kth sentence in category i; with this prototype representation, the average distance of samples of the same class to their center can be shortest; as such, based on

An enhanced prototype c 'can be obtained through a prototype calculation formula (4)' _i 。

Optionally, in order to make the final prototype more fully cover the common features of its classes, the prototype of the input sample and the data-enhanced result prototype are weighted, and the formula is as follows:

where α is a weighting coefficient to control the respective contributions from the original input data and the enhancement data;

given a score function s (·, the prototype network predicts the labels x of query instances by computing a softmax distribution of similarity between the query embedding vector and the prototype ^Q The formula is as follows:

wherein y is a prediction tag, x ^Q Is the current meta task

J is a true tag,

representing final prototypes of the initial and enhanced data based on the j category, selecting cosine similarity as s (·); learning by minimizing the negative log probability:

to proceed with.

Optionally, a loss based on contrast learning is introduced into the contrast learning at the prototype level, so as to separate the prototypes of different classes as much as possible and to make the average representations of different classes distant from each other, which is expressed by the following formula:

where s (·, ·) is the same similarity measure function as in the formula, thus, s (c) _i ，c _i ) Is a constant 1, reducing equation (4) to the following form:

wherein e is a constant, and wherein e is a constant,

loss of contrast at prototype level.

Optionally, the comparison learning based on the prototype level introduces the comparison learning on the instance level, so that the instances of the same category are close to each other, and are expressed by the following formula:

wherein the content of the first and second substances,

represent

And

the positive example of (2) includes a vector representation of the original corpus and an enhanced vector representation belonging to the same category as the original corpus, wherein the similarity between sample vector representations of the same category can be improved, and the similarity between sample vectors of different categories can be improved by minimizing loss

To decrease.

The invention has the beneficial effects that the invention provides a complete type blank filling type data enhancement method for small sample intention identification, and particularly provides a data enhancement method which is suitable for short texts and does not need any knowledge-based participation. We view the pre-trained language model itself as a knowledge base because it has been trained on a large corpus of text and can therefore perform some simple tasks. And a complete type gap-filling task in a form similar to the pre-training task is constructed for data enhancement, so that the knowledge of the pre-training language model is fully utilized. The hidden state vector of the "[ MASK ]" token, which is restored using the model, is used as a data enhancement result of the input sentence, rather than a real sentence consisting of a series of words. Furthermore, in order for the result of data enhancement to be not meaningless noise, it is semantically similar to the original input sentence using an unsupervised learning method. Thereafter, to make maximum use of the small number of samples in the meta-task, a supervised contrast learning strategy is employed, with samples of the same class being closer to each other in the embedding space, and samples of different classes being further from each other. Meaningful data is generated without breaking syntactic structure and increasing noise, making full use of limited data and achieving separable embedding. A better distance distribution in the embedding space is obtained, thereby improving the performance of the metric-based classification method.

Drawings

Fig. 1 is a model general framework diagram of a data enhancement method for a completion space filling type of small sample intention recognition according to an embodiment of the present invention.

FIG. 2 is a diagram illustrating the performance of different templates of the data enhancement method for the completion filler type of small sample intention recognition in the meta-tasks of "5-way1-shot" and "5-way5-shot" on the CLINC-150 and BANKING-77 data sets according to an embodiment of the present invention.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present application shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the present application does not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

Referring to fig. 1 and 2, a data enhancement method of a completion space-filling type for small sample intention recognition, according to one or more embodiments of the present application, includes constructing a data enhancement task of the completion space-filling type for intention recognition based on a pre-trained language model, and using an unsupervised learning method to make a result of the data enhancement semantically similar to an original input sentence, and then using a supervised contrast learning method to perform metric-based classification, so that intention samples of the same category are closer to each other in an embedding space and intention samples of different categories are farther from each other, wherein the contrast learning method includes prototype-level contrast learning and example-level contrast learning.

As an alternative embodiment, the unsupervised learning method includes: the method comprises the steps that a pre-training language model is used as a feature extractor, input words with set proportion are covered by marks, and the covered marks are predicted according to the semantics of the context of the input sentences; the feature extractor encodes the sentences in the completed gap-filling type intention recognition task into hidden layer vector representation after adding two special marks, wherein the two special marks respectively represent the beginning and the end of the sentences in the completed gap-filling type intention recognition task, and the process is represented by the following formula:

represents a hidden layer vector, [ CLS]And [ SEP ]]Respectively representing a start and an end;

vector of hidden layer

Words deemed to be masked [ MASK ]]Is shown by

Considering a sentence representation generated from the pattern Pat (T, x), which is semantically similar to the input sentence x, repeating equation (1) over all input samples yields corresponding data enhancement results.

An auxiliary full-space template T is introduced to construct a data enhanced mode Pat, which is specifically as follows:

T＝The sentence：′__′means[MASK].#

Pat(T,x)＝The sentence:′x′means[MASK]:#

as an alternative embodiment, the pre-trained language model may not always be able to generate a vector that exactly matches the semantics of the input sentence. Therefore, it is necessary to design a method to constrain the model to attenuate this mismatch and ultimately achieve the proper data enhancement results. Under the condition of not introducing any external knowledge and labels, an unsupervised learning method is designed, and the semantic understanding capability of the model is utilized to enable the model to generate a proper result as far as possible. And constraining the pre-training language model through a loss function to weaken the vector which is not matched with the semantics of the input sentence, and finally obtaining a proper data enhancement result.

Without introducing any external knowledge and labels, the input sentence x is fed into a pre-trained language model, a low-dimensional vector representation thereof is obtained, and the low-dimensional vector representation is expressed as:

wherein the hidden layer vector

Expressed as the whole sentence x, by the following loss function:

for reducing

And

the distance between them.

As an alternative embodiment, after unsupervised completion-type gap-fill data enhancement, a metric-based prototype network is used as a classifier to check the effect of the data enhancement. The prototype network first computes an average representation of samples in the same class as the prototype for that class:

wherein, c _i Prototype representation, K, representing class i _i Representing a supporting dataset under a current metatask T

The number of samples of the medium category i,

Calculating formula (4) through the prototype to obtain an enhanced prototype c' _i 。

Optionally, in order to make the final prototype more fully cover the common features of its classes, the prototype of the input sample and the data-enhanced result prototype are weighted, as follows:

wherein y is a prediction tag, x ^Q Is the current meta task

J is a true tag,

to proceed with. Since the prototype network predicts tags by measuring the distance between query instances and the prototype, proper distance distribution is crucial to improve intent recognition performance.

As an alternative embodiment, the prototype may represent a common feature of the samples in the class, considering that the prototype is computed from all samples of the corresponding class in the current meta-task. Also, considering that prototype networks are metric-based, one intuitive idea to improve classification accuracy is to increase the distance between different classes of prototypes in the embedding space.

Thus, a comparison-based learning penalty is introduced for prototype-level learning to separate different classes of prototypes as much as possible. In particular, the goal is to have the similarity of prototype embedding of different classes as small as possible, which can be expressed as:

wherein, e is a constant, and the average value of the constant,

is a loss of contrast at the prototype level. It is expected that different classes of prototypes can be located remotely from each other. However, the comparison learning is directly performed at the prototype level, and only the average representations of different classes can be kept away from each other. Such methods do not guarantee that samples of the same category are close, and the accuracy of the intended identification is not improved enough.

As an alternative embodiment, to further improve the intent recognition performance, example level contrast learning is introduced. The strategy can not only make the examples in different classes far away from each other, but also make the examples in the same class close to each other, and the specific formula is as follows:

wherein the content of the first and second substances,

to represent

And

To decrease.

In order to verify the technical effects of the embodiments, the following experiments were designed:

experimental setup

Two published intent recognition data sets were used to evaluate our model and the ability to participate in the baseline model in question, including CLINC-150 and BANKING-77.CLINC-150 consists of 150 intent categories from 10 areas of daily life, 150 samples each. In addition, there are some intended sentences in the dataset that are labeled "out of range" that are considered to have multiple unknown classes of noise. To accurately test the performance of the model in question, we deleted those samples labeled "out of range" and only trained and tested with the well-labeled samples. BANKING-77 is a single domain data set for intent recognition that contains 13,083 samples of 77 classes of the BANKING domain. Table 1 provides statistics for CLINC-150 and BANKING-77.

TABLE 1 statistics of data sets CLINC-150 and BANKING-77

Model abstract

The validity of our proposed model was verified by comparison with the following baseline model:

prototypical Networks a metric-based few-sample classification model that measures the similarity of samples in embedding space by their distance between them. It treats the prototype label closest to the query sample as a prediction of its class.

GCN-graph convolution network-based approach to small sample classification that treats small sample learning as a supervised information transfer task and can be trained in an end-to-end fashion.

Matching Networks A small sample classification framework that trains a network to map a small set of labeled support and unlabeled instances to their labels and avoids relying on tweaks to adapt to new classes.

Problem setting

RQ1: can our proposed method be better at the intent recognition task than the baseline model with these competencies?

RQ2: which module of our proposed CDA plays a greater role in boosting recognition accuracy?

RQ3: what are the different templates' effects on model performance?

Model setting

According to the common practice of small sample learning experiments, we discuss two different sample number meta-tasks, including "5-way1-shot" and "5-way 5-shot". For all models participating in the discussion, we apply the same feature extractor (i.e., bert-base-uncased) to encode the input sentence to ensure fairness in the performance comparison.

Based on RQ1, the intention recognition model capability of two types of meta-tasks was examined on CLINC-150 and BANKING-77. The overall intent recognition performance of all the discussed models is shown in table 2.

Table 2 accuracy (%) on test sets and overall performance of 95% confidence intervals for both types of meta-tasks. The results produced by the best performing person in each column are shown bolded. The results produced by the best baseline are underlined.

First, we can find that all models perform meta-tasks better with a large number of samples of a single class, regardless of which data set. This is because as the number of samples of a single category increases, the total number of samples available for the model also increases, and the common features obtained from the samples more closely approach the true common features.

We then focused on analyzing the performance of the baseline. We can see that MatchNet achieves the highest accuracy in the "5-way1-shot" meta-task on both datasets, while ProtoNet achieves the best performance in the "5-way5-shot" meta-task on both datasets. The advantage of MatchNet on the task of the 1-shot element can be explained in that the performance of the model can be well improved by the temporary similarity matching calculation. For ProtoNet, it is advantageous in the "5-shot" meta task because it can fuse features of instances in the same class to gain their commonality.

Next, we focus on discussing the performance of our proposed model. Comparing the baseline model to the CDA model, we can see that the CDA-PC and CDA-IC perform almost exclusively over all the baseline models discussed, on data set CLINC-150. However, on the BANKING-77 dataset, CDA-PC performed less well on the "5-way1-shot" and "5-way5-shot" meta-tasks than MatchNet and ProtoNet, respectively. This is because samples of the same class in the CLINC-150 dataset are short sentences, more similar than the samples in BANKING-77. Thus, the result of the data enhancement is very close to the initially input sentence, which helps the models to obtain their common features. Furthermore, since the BANKING-77 dataset is more specialized than CLINC-150, the pre-trained language model has less relevant knowledge than CLINC-150. If the enhanced samples are directly used to calculate the class prototypes, noise is introduced, which weakens the characteristics of the classes themselves and reduces the recognition performance.

Aiming at the problems in the application of the CDA-PC, the CDA-IC utilizes an example-level comparison learning strategy to improve the small sample intention identification performance. The advantages of CDA-IC can be explained as: the example-level contrast learning strategy treats both the initial data and the corresponding enhanced data in the same class as positive examples. By the method, each sample interacts with more data, so that the spatial distance of initial input data of the same category can be shortened, and the enhanced data can be close to the original data of the same category semantically.

On the CLINC-150 dataset, the CDA-IC showed an improvement of 4.36% in the "5-way1-shot" meta task and 4.91% in the "5-way5-shot" meta task, compared to the best baseline model. On the BANKING-77 data set, the precision of the 5-way1-shot meta task is improved by 1.69%, and the precision of the 5-way5-shot meta task is improved by 1.86%.

Ablation experiment

To answer the RQ2 question, we analyzed the importance of the different modules in our CDA-IC model by deleting two basic components of the CDA-IC, an instance-level comparative learning module and an unsupervised learning module, respectively. The results of the ablation experiments are shown in table 3:

TABLE 3 ablation experiments for the "5-way1-shot", "5-way5-shot" meta-tasks performed by CDA-IC on the CLINC-150 dataset and BANKING-77 dataset. After the largest amplitude-reducing independent module in each column

And (4) marking.

Clearly, removing any portion of the CDA-IC results in performance degradation, suggesting that the unsupervised learning module and the example-level contrast learning module play a significant role in improving the small-sample intent recognition capability. In particular, in both types of meta-tasks, removing the instance-level contrast learning module results in the most severe performance degradation regardless of the data set. For example, on the CLINC-150 dataset, the CDA-IC model without the instance-level contrast learning module performed 3.63% and 3.82% reductions in the "5-way1-shot" meta-task and the "5-way5-shot" meta-task, respectively. For the BANKING-77 dataset, the CDA-IC model without the instance-level contrast learning module achieved 4.16% and 4.44% performance degradation in the "5-way1-shot" meta-task and the "5-way5-shot" meta-task, respectively.

Furthermore, it is worth noting that each module has its own unique contribution. Specifically, in the "5-way1-shot" meta task, the performance reduction caused by removing the unsupervised learning module is more than that in the "5-way5-shot" meta task, which shows that under the condition of insufficient characteristics, the unsupervised learning module has more obvious effect and is more beneficial to improving the intentional identification performance of the small sample. In addition, in the 5-way5-shot meta task, the example-level contrast learning module plays a more important role than the 5-way1-shot meta task. This phenomenon can be explained by the fact that in this case the bottleneck limiting performance is no longer a lack of features, but rather a mining of commonalities of the same class and uniqueness of different classes. The example-level comparison learning module can shorten the distance between samples of the same category in the embedding space and can increase the distance between vector representations of different categories, namely mining the commonality of the same category and the uniqueness of different categories.

Influence of different templates

To answer the RQ3 question, we have designed three different templates and applied them to the pattern of data enhancement. All template types discussed are shown in Table 4.

Table 4 three templates for full gap-fill data enhancement

Since our proposed model is based on a pre-trained language model, it requires the use of templates to generate semantically similar data. Because different templates use different words and punctuation, i.e., tokens, the semantic vectors derived from the pre-trained language model are also different, as is the case in FIG. 2.

FIG. 2 shows the performance of different templates in the "5-way1-shot" and "5-way5-shot" meta-tasks on the CLINC-150 and BANKING-77 datasets.

It is clear that it indicates that different templates do indeed result in significant changes in model performance. In particular, in the "5-way1-shot" meta task performed on the CLINC-150 dataset, the performance difference of different templates approaches 1%. Furthermore, as shown in FIG. 2, the performance difference between different templates can even reach 1.3% on the "5-way5-shot" meta task performed on the BANKING-77 dataset.

From the overall trend, the length of the template has no direct relation to the effect of the data increment. In particular, although template 2 is the shortest, it does not perform the worst on the CLINC-150 dataset. Its performance on the "5-way1-shot" meta task is very close to that of template 3, and better than that of template 1. Notably, template 3 performed best on all tasks of the CLINC-150 and BANKING-77 datasets compared to the other two templates. This phenomenon can be explained by the most clear semantic guidance of template 3 for [ MASK ] tags. When the original input sentence is filled into the template, template 3 explicitly states that [ MASK ] represents the intent of the input sentence. Thus, the semantic embedded vector generated is more directional.

In summary, the design of the template has a significant impact on the performance of data enhancement. A good template can provide proper semantic guidance, and the data enhancement performance is effectively improved.

We propose a full-filling data enhancement (CDA) model for small sample intent recognition. Inspired by the task of pre-training language models, we have designed a template-based unsupervised data enhancement strategy that is intended to produce meaningful data without destroying syntactic structure and adding noise. Furthermore, to take full advantage of the limited data and achieve separable embedding, we perform a comparative learning between the original data and the enhanced data. Thus, each sample can interact with all remaining classes of samples, thereby distinguishing different classes of embedding in the embedding space. The results of experiments performed on the CLINC-150 and BANKING-77 data sets demonstrate effectiveness at all baseline discussed. Furthermore, ablation studies indicate that the contrast module is the most important component of the entire model.

The skilled person will appreciate that the apparatus described above may also comprise only the components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments in this application as described above, which are not provided in detail for the sake of brevity.

It is intended that the one or more embodiments of the present application embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. The data enhancement method of the completion type filling for identifying the small sample intention is characterized by comprising the following steps: a complete gap-filling type data enhancement task is constructed based on a pre-training language model and used for intention recognition, an unsupervised learning method is adopted to enable the result of data enhancement to be semantically similar to an original input sentence, then a supervised contrast learning method is adopted based on metric classification to enable intention samples of the same category to be closer to each other in an embedding space, and intention samples of different categories are further away from each other, and the contrast learning method comprises prototype-level contrast learning and example-level contrast learning.

2. The method of claim 1, wherein the unsupervised learning method comprises: the method comprises the steps that a pre-training language model is used as a feature extractor, input words with set proportion are covered by marks, and the covered marks are predicted according to the semantics of the context of the input sentences; the feature extractor encodes the sentences in the completed gap-filling type intention recognition task into hidden layer vector representation after adding two special marks, wherein the two special marks respectively represent the beginning and the end of the sentences in the completed gap-filling type intention recognition task, and the process is represented by the following formula:

wherein, F (-) is a feature extractor, T is a complete filling type template, [ MASK ]]For the label of the word to be masked, pat (T, x) is the data enhancement mode, x is the input sentence,

vector of hidden layer

Words deemed to be masked [ MASK ]]Is shown by

Seen as a sentence representation generated from the pattern Pat (T, x)), which is semantically similar to the input sentence x, repeating equation (1) over all input samples yields a corresponding data enhancement result.

3. The method as claimed in claim 2, wherein the pre-trained language model is constrained by a loss function to weaken vectors that do not match the semantics of the input sentence, and finally obtain the proper data enhancement result.

4. The method of claim 3, wherein the input sentence x is fed into a pre-trained language model without introducing any external knowledge and labels, and a low-dimensional vector representation is obtained and expressed as:

wherein the hidden layer vector

Expressed as the whole sentence x, by the following loss function:

for reducing

And

the distance between them.

5. The method of claim 4, wherein the unsupervised data enhancement of the space-filling pattern is performed by using a metric-based prototype network as a classifier to check the effect of the data enhancement.

6. The method of claim 5, wherein an average representation of samples in the same class is first calculated and used as a prototype for the class:

The number of samples of the middle category i,

is a representation of the kth sentence in category i; with this prototype representation, the average distance of a sample of the same class to its center is shortest; all in oneIs based on

7. The method of claim 6, wherein the prototype of the input sample and the resulting prototype of the data enhancement are weighted such that the final prototype more fully covers the common features of its class, and wherein the formula is as follows:

given a score function s (,) the prototype network predicts the tags of the query instances by computing a softmax distribution of similarity between the query embedding vector and the prototype

The formula is as follows:

wherein, y is a prediction tag,

is the current meta task

Query set of

J is a true tag,

to proceed with.

8. The method of claim 1, wherein the comparison learning at prototype level introduces a loss based on comparison learning, so as to separate prototypes of different classes as much as possible and to keep average representations of different classes apart from each other, expressed by the following formula:

wherein e is a constant, and wherein e is a constant,

loss of contrast at prototype level.

9. The method of claim 8, wherein the prototype-level-based contrast learning introduces instance-level contrast learning to make instances of the same class close to each other, and is expressed by the following formula:

wherein the content of the first and second substances,

to represent

And

the positive example of (2) includes a vector representation of the original corpus and an enhanced vector representation belonging to the same category as the original corpus, wherein the similarity between sample vector representations of the same category is improved, and the similarity between sample vectors of different categories is improved by minimizing the loss

To decrease.