CN115455144A - Data enhancement method of completion type space filling type for small sample intention recognition - Google Patents

Data enhancement method of completion type space filling type for small sample intention recognition Download PDF

Info

Publication number
CN115455144A
CN115455144A CN202211071426.7A CN202211071426A CN115455144A CN 115455144 A CN115455144 A CN 115455144A CN 202211071426 A CN202211071426 A CN 202211071426A CN 115455144 A CN115455144 A CN 115455144A
Authority
CN
China
Prior art keywords
prototype
data enhancement
data
learning
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211071426.7A
Other languages
Chinese (zh)
Inventor
陈洪辉
张鑫
蔡飞
江苗
郑建明
宋城宇
邵太华
郭昱普
王梦如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211071426.7A priority Critical patent/CN115455144A/en
Publication of CN115455144A publication Critical patent/CN115455144A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

One or more embodiments in the present application provide a data enhancement method for a full type space filling for small sample intention identification, which includes: a complete gap-filling type data enhancement task is constructed based on a pre-training language model and used for intention recognition, an unsupervised learning method is adopted to enable the result of data enhancement to be semantically similar to an original input sentence, then a supervised contrast learning method is adopted based on metric classification to enable intention samples of the same category to be closer to each other in an embedding space, and intention samples of different categories are further away from each other, and the contrast learning method comprises prototype-level contrast learning and example-level contrast learning. Meaningful data is generated without destroying syntactic structure and increasing noise, limited data is fully utilized and separable embedding is achieved. A better distance distribution in the embedding space is obtained, thereby improving the performance of the metric-based classification method.

Description

Data enhancement method of completion type space filling type for small sample intention recognition
Technical Field
The invention belongs to the technical field of intention identification, and particularly relates to a complete type gap-filling data enhancement method for small sample intention identification.
Background
Intent recognition aims at recognizing the potential intent of a user from the user's utterance, which is a key component in task-oriented dialog systems. However, one practical challenge is that the number of intent categories grows faster than the speed of manually annotating data, resulting in a small amount of data available for many new intent categories. This lack of data causes the traditional deep neural network to be over-fitted on a small amount of training data, which seriously affects the practical application. Therefore, some researchers have proposed small sample learning to solve the data scarcity problem. An effective approach is text data enhancement, but it always generates noise or meaningless data.
In real-world applications, new intent classes are emerging rapidly and only have limited data labeled in order to be difficult to directly apply to optimizing existing deep neural networks. These networks typically include a pre-trained language model as their backbone to encode the text data into continuous low-dimensional vectors such as BERT and RoBERTa. Such models always have a complex architecture with multiple layers, and therefore have a very significant number of parameters. If on the basis of the traditional training paradigm, parameters of the deep neural network model are updated by directly utilizing a small amount of training data, the model can only capture local features, so that the problems of insufficient generalization capability and overfitting are caused, namely the model is well represented on a training set and is not well represented on a test set. To address such issues, snell et al propose a sample-less learning (FSL) strategy to help the model obtain generalization capability only in limited data. The above researchers consider small sample intent recognition as a meta-learning problem. It simulates a small sample scene through a series of small meta-tasks. The method is widely applied to the field of small sample text classification tasks such as relation classification, event detection and intention detection.
One major challenge is that the small sample learning method based on meta-learning still easily falls into the dilemma of overfitting on the bias distribution due to the limited training samples. Some researchers have attempted to prevent the overfitting problem by data enhancement methods. One of the key ideas is translation: the other language representation of the input text is translated into the initial language. Another common approach is to use an external knowledge base to obtain an expression that is semantically similar to the original sentence. In particular, rather than reordering words in sentences, dopierre et al introduced multiple knowledge bases to generate different paraphrase sentences of the original input. However, although the translation back may generate different expressions of the same semantics, it does not work well in short text. It generates an expression that is generally similar or even identical to the original input sentence. With regard to the paraphrase generation method, we consider it not suitable for text enhancement in all fields, because it is not always possible to find a corresponding external knowledge base. Also, previous data enhancement related methods, such as those proposed by Liu et al, are inefficient to train and difficult to expand to tasks with a large number of intents.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a data enhancement method of a complete fill-in type for small sample intent recognition, so as to solve at least one of the above problems in the prior art.
In view of the above, one or more embodiments of the present application provide a data enhancement method for a full space filling for small sample intention recognition, which includes: the method comprises the steps of constructing a complete space-filling type intention recognition task based on a pre-training language model for data enhancement, adopting an unsupervised learning method to enable the result of the data enhancement to be similar to an original input sentence in semanteme, then adopting a supervised comparative learning method based on metric classification to enable intention samples of the same category to be closer to each other in an embedding space, and enabling intention samples of different categories to be farther away from each other, wherein the comparative learning method comprises the comparative learning of a prototype level and the comparative learning of an example level.
Based on the technical scheme of the invention, the following improvements can be made:
optionally, the unsupervised learning method includes: the method comprises the steps that a pre-training language model is used as a feature extractor, input words with set proportion are covered by marks, and the covered marks are predicted according to the semantics of the context of an input sentence; the feature extractor encodes the sentences in the completed gap-filling type intention recognition task into hidden layer vector representation after adding two special marks, wherein the two special marks respectively represent the beginning and the end of the sentences in the completed gap-filling type intention recognition task, and the process is represented by the following formula:
Figure BDA0003830486410000021
wherein F (-) is a feature extractor, T is a complete fill-in template, [ MASK ]]For the label of the word to be masked, pat (T, x) is the data enhancement mode, x is the input sentence,
Figure BDA0003830486410000022
represents the hidden layer vector, [ CLS]And [ SEP ]]Respectively representing a start and an end;
vector of hidden layer
Figure BDA0003830486410000023
Words considered to be masked [ MASK]Is shown by
Figure BDA0003830486410000024
Viewed as a sentence representation generated from the schema Pat (T, x), in the semantic senseThe above is similar to the input sentence x, repeating equation (1) on all input samples yields corresponding data enhancement results.
Optionally, the pre-training language model is constrained by a loss function, so as to weaken the vector which is not matched with the semantics of the input sentence, and finally obtain a proper data enhancement result.
Optionally, without introducing any external knowledge and labels, the input sentence x is fed into a pre-trained language model to obtain a low-dimensional vector representation thereof, which is expressed as:
Figure BDA0003830486410000025
wherein the hidden layer vector
Figure BDA0003830486410000026
Expressed as the whole sentence x, by the following loss function:
Figure BDA0003830486410000027
for zooming out
Figure BDA0003830486410000028
And
Figure BDA0003830486410000029
the distance between them.
Optionally, after the unsupervised and complete gap-fill data enhancement is completed, the metric-based prototype network is used as a classifier to check the effect of the data enhancement.
Optionally, an average representation of samples in the same category is first calculated, and the average representation is used as a prototype of the category:
Figure BDA0003830486410000031
wherein, c i Presentation classPrototype representation of other, K i Representing a supporting dataset under a current metatask T
Figure BDA00038304864100000311
The number of samples of the medium category i,
Figure BDA0003830486410000032
is a representation of the kth sentence in category i; with this prototype representation, the average distance of samples of the same class to their center can be shortest; as such, based on
Figure BDA0003830486410000033
An enhanced prototype c 'can be obtained through a prototype calculation formula (4)' i
Optionally, in order to make the final prototype more fully cover the common features of its classes, the prototype of the input sample and the data-enhanced result prototype are weighted, and the formula is as follows:
Figure BDA0003830486410000034
where α is a weighting coefficient to control the respective contributions from the original input data and the enhancement data;
given a score function s (·, the prototype network predicts the labels x of query instances by computing a softmax distribution of similarity between the query embedding vector and the prototype Q The formula is as follows:
Figure BDA0003830486410000035
wherein y is a prediction tag, x Q Is the current meta task
Figure BDA00038304864100000312
J is a true tag,
Figure BDA0003830486410000036
representing final prototypes of the initial and enhanced data based on the j category, selecting cosine similarity as s (·); learning by minimizing the negative log probability:
Figure BDA0003830486410000037
to proceed with.
Optionally, a loss based on contrast learning is introduced into the contrast learning at the prototype level, so as to separate the prototypes of different classes as much as possible and to make the average representations of different classes distant from each other, which is expressed by the following formula:
Figure BDA0003830486410000038
where s (·, ·) is the same similarity measure function as in the formula, thus, s (c) i ,c i ) Is a constant 1, reducing equation (4) to the following form:
Figure BDA0003830486410000039
wherein e is a constant, and wherein e is a constant,
Figure BDA00038304864100000310
loss of contrast at prototype level.
Optionally, the comparison learning based on the prototype level introduces the comparison learning on the instance level, so that the instances of the same category are close to each other, and are expressed by the following formula:
Figure BDA0003830486410000041
wherein the content of the first and second substances,
Figure BDA0003830486410000042
represent
Figure BDA0003830486410000043
And
Figure BDA0003830486410000044
the positive example of (2) includes a vector representation of the original corpus and an enhanced vector representation belonging to the same category as the original corpus, wherein the similarity between sample vector representations of the same category can be improved, and the similarity between sample vectors of different categories can be improved by minimizing loss
Figure BDA0003830486410000045
To decrease.
The invention has the beneficial effects that the invention provides a complete type blank filling type data enhancement method for small sample intention identification, and particularly provides a data enhancement method which is suitable for short texts and does not need any knowledge-based participation. We view the pre-trained language model itself as a knowledge base because it has been trained on a large corpus of text and can therefore perform some simple tasks. And a complete type gap-filling task in a form similar to the pre-training task is constructed for data enhancement, so that the knowledge of the pre-training language model is fully utilized. The hidden state vector of the "[ MASK ]" token, which is restored using the model, is used as a data enhancement result of the input sentence, rather than a real sentence consisting of a series of words. Furthermore, in order for the result of data enhancement to be not meaningless noise, it is semantically similar to the original input sentence using an unsupervised learning method. Thereafter, to make maximum use of the small number of samples in the meta-task, a supervised contrast learning strategy is employed, with samples of the same class being closer to each other in the embedding space, and samples of different classes being further from each other. Meaningful data is generated without breaking syntactic structure and increasing noise, making full use of limited data and achieving separable embedding. A better distance distribution in the embedding space is obtained, thereby improving the performance of the metric-based classification method.
Drawings
Fig. 1 is a model general framework diagram of a data enhancement method for a completion space filling type of small sample intention recognition according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating the performance of different templates of the data enhancement method for the completion filler type of small sample intention recognition in the meta-tasks of "5-way1-shot" and "5-way5-shot" on the CLINC-150 and BANKING-77 data sets according to an embodiment of the present invention.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present application shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the present application does not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
Referring to fig. 1 and 2, a data enhancement method of a completion space-filling type for small sample intention recognition, according to one or more embodiments of the present application, includes constructing a data enhancement task of the completion space-filling type for intention recognition based on a pre-trained language model, and using an unsupervised learning method to make a result of the data enhancement semantically similar to an original input sentence, and then using a supervised contrast learning method to perform metric-based classification, so that intention samples of the same category are closer to each other in an embedding space and intention samples of different categories are farther from each other, wherein the contrast learning method includes prototype-level contrast learning and example-level contrast learning.
As an alternative embodiment, the unsupervised learning method includes: the method comprises the steps that a pre-training language model is used as a feature extractor, input words with set proportion are covered by marks, and the covered marks are predicted according to the semantics of the context of the input sentences; the feature extractor encodes the sentences in the completed gap-filling type intention recognition task into hidden layer vector representation after adding two special marks, wherein the two special marks respectively represent the beginning and the end of the sentences in the completed gap-filling type intention recognition task, and the process is represented by the following formula:
Figure BDA0003830486410000051
wherein F (-) is a feature extractor, T is a complete fill-in template, [ MASK ]]For the label of the word to be masked, pat (T, x) is the data enhancement mode, x is the input sentence,
Figure BDA0003830486410000052
represents a hidden layer vector, [ CLS]And [ SEP ]]Respectively representing a start and an end;
vector of hidden layer
Figure BDA0003830486410000053
Words deemed to be masked [ MASK ]]Is shown by
Figure BDA0003830486410000054
Considering a sentence representation generated from the pattern Pat (T, x), which is semantically similar to the input sentence x, repeating equation (1) over all input samples yields corresponding data enhancement results.
An auxiliary full-space template T is introduced to construct a data enhanced mode Pat, which is specifically as follows:
T=The sentence:′__′means[MASK].#
Pat(T,x)=The sentence:′x′means[MASK]:#
as an alternative embodiment, the pre-trained language model may not always be able to generate a vector that exactly matches the semantics of the input sentence. Therefore, it is necessary to design a method to constrain the model to attenuate this mismatch and ultimately achieve the proper data enhancement results. Under the condition of not introducing any external knowledge and labels, an unsupervised learning method is designed, and the semantic understanding capability of the model is utilized to enable the model to generate a proper result as far as possible. And constraining the pre-training language model through a loss function to weaken the vector which is not matched with the semantics of the input sentence, and finally obtaining a proper data enhancement result.
Without introducing any external knowledge and labels, the input sentence x is fed into a pre-trained language model, a low-dimensional vector representation thereof is obtained, and the low-dimensional vector representation is expressed as:
Figure BDA0003830486410000055
wherein the hidden layer vector
Figure BDA0003830486410000061
Expressed as the whole sentence x, by the following loss function:
Figure BDA0003830486410000062
for reducing
Figure BDA0003830486410000063
And
Figure BDA0003830486410000064
the distance between them.
As an alternative embodiment, after unsupervised completion-type gap-fill data enhancement, a metric-based prototype network is used as a classifier to check the effect of the data enhancement. The prototype network first computes an average representation of samples in the same class as the prototype for that class:
Figure BDA0003830486410000065
wherein, c i Prototype representation, K, representing class i i Representing a supporting dataset under a current metatask T
Figure BDA00038304864100000613
The number of samples of the medium category i,
Figure BDA0003830486410000066
is a representation of the kth sentence in category i; with this prototype representation, the average distance of samples of the same class to their center can be shortest; as such, based on
Figure BDA0003830486410000067
Calculating formula (4) through the prototype to obtain an enhanced prototype c' i
Optionally, in order to make the final prototype more fully cover the common features of its classes, the prototype of the input sample and the data-enhanced result prototype are weighted, as follows:
Figure BDA0003830486410000068
where α is a weighting coefficient to control the respective contributions from the original input data and the enhancement data;
given a score function s (·, the prototype network predicts the labels x of query instances by computing a softmax distribution of similarity between the query embedding vector and the prototype Q The formula is as follows:
Figure BDA0003830486410000069
wherein y is a prediction tag, x Q Is the current meta task
Figure BDA00038304864100000610
J is a true tag,
Figure BDA00038304864100000611
representing final prototypes of the initial and enhanced data based on the j category, selecting cosine similarity as s (·); learning by minimizing the negative log probability:
Figure BDA00038304864100000612
to proceed with. Since the prototype network predicts tags by measuring the distance between query instances and the prototype, proper distance distribution is crucial to improve intent recognition performance.
As an alternative embodiment, the prototype may represent a common feature of the samples in the class, considering that the prototype is computed from all samples of the corresponding class in the current meta-task. Also, considering that prototype networks are metric-based, one intuitive idea to improve classification accuracy is to increase the distance between different classes of prototypes in the embedding space.
Thus, a comparison-based learning penalty is introduced for prototype-level learning to separate different classes of prototypes as much as possible. In particular, the goal is to have the similarity of prototype embedding of different classes as small as possible, which can be expressed as:
Figure BDA0003830486410000071
where s (·, ·) is the same similarity measure function as in the formula, thus, s (c) i ,c i ) Is a constant 1, reducing equation (4) to the following form:
Figure BDA0003830486410000072
wherein, e is a constant, and the average value of the constant,
Figure BDA0003830486410000073
is a loss of contrast at the prototype level. It is expected that different classes of prototypes can be located remotely from each other. However, the comparison learning is directly performed at the prototype level, and only the average representations of different classes can be kept away from each other. Such methods do not guarantee that samples of the same category are close, and the accuracy of the intended identification is not improved enough.
As an alternative embodiment, to further improve the intent recognition performance, example level contrast learning is introduced. The strategy can not only make the examples in different classes far away from each other, but also make the examples in the same class close to each other, and the specific formula is as follows:
Figure BDA0003830486410000074
wherein the content of the first and second substances,
Figure BDA0003830486410000075
to represent
Figure BDA0003830486410000076
And
Figure BDA0003830486410000077
the positive example of (2) includes a vector representation of the original corpus and an enhanced vector representation belonging to the same category as the original corpus, wherein the similarity between sample vector representations of the same category can be improved, and the similarity between sample vectors of different categories can be improved by minimizing loss
Figure BDA0003830486410000078
To decrease.
In order to verify the technical effects of the embodiments, the following experiments were designed:
experimental setup
Two published intent recognition data sets were used to evaluate our model and the ability to participate in the baseline model in question, including CLINC-150 and BANKING-77.CLINC-150 consists of 150 intent categories from 10 areas of daily life, 150 samples each. In addition, there are some intended sentences in the dataset that are labeled "out of range" that are considered to have multiple unknown classes of noise. To accurately test the performance of the model in question, we deleted those samples labeled "out of range" and only trained and tested with the well-labeled samples. BANKING-77 is a single domain data set for intent recognition that contains 13,083 samples of 77 classes of the BANKING domain. Table 1 provides statistics for CLINC-150 and BANKING-77.
TABLE 1 statistics of data sets CLINC-150 and BANKING-77
Figure BDA0003830486410000081
Model abstract
The validity of our proposed model was verified by comparison with the following baseline model:
prototypical Networks a metric-based few-sample classification model that measures the similarity of samples in embedding space by their distance between them. It treats the prototype label closest to the query sample as a prediction of its class.
GCN-graph convolution network-based approach to small sample classification that treats small sample learning as a supervised information transfer task and can be trained in an end-to-end fashion.
Matching Networks A small sample classification framework that trains a network to map a small set of labeled support and unlabeled instances to their labels and avoids relying on tweaks to adapt to new classes.
Problem setting
RQ1: can our proposed method be better at the intent recognition task than the baseline model with these competencies?
RQ2: which module of our proposed CDA plays a greater role in boosting recognition accuracy?
RQ3: what are the different templates' effects on model performance?
Model setting
According to the common practice of small sample learning experiments, we discuss two different sample number meta-tasks, including "5-way1-shot" and "5-way 5-shot". For all models participating in the discussion, we apply the same feature extractor (i.e., bert-base-uncased) to encode the input sentence to ensure fairness in the performance comparison.
Based on RQ1, the intention recognition model capability of two types of meta-tasks was examined on CLINC-150 and BANKING-77. The overall intent recognition performance of all the discussed models is shown in table 2.
Table 2 accuracy (%) on test sets and overall performance of 95% confidence intervals for both types of meta-tasks. The results produced by the best performing person in each column are shown bolded. The results produced by the best baseline are underlined.
Figure BDA0003830486410000082
Figure BDA0003830486410000091
First, we can find that all models perform meta-tasks better with a large number of samples of a single class, regardless of which data set. This is because as the number of samples of a single category increases, the total number of samples available for the model also increases, and the common features obtained from the samples more closely approach the true common features.
We then focused on analyzing the performance of the baseline. We can see that MatchNet achieves the highest accuracy in the "5-way1-shot" meta-task on both datasets, while ProtoNet achieves the best performance in the "5-way5-shot" meta-task on both datasets. The advantage of MatchNet on the task of the 1-shot element can be explained in that the performance of the model can be well improved by the temporary similarity matching calculation. For ProtoNet, it is advantageous in the "5-shot" meta task because it can fuse features of instances in the same class to gain their commonality.
Next, we focus on discussing the performance of our proposed model. Comparing the baseline model to the CDA model, we can see that the CDA-PC and CDA-IC perform almost exclusively over all the baseline models discussed, on data set CLINC-150. However, on the BANKING-77 dataset, CDA-PC performed less well on the "5-way1-shot" and "5-way5-shot" meta-tasks than MatchNet and ProtoNet, respectively. This is because samples of the same class in the CLINC-150 dataset are short sentences, more similar than the samples in BANKING-77. Thus, the result of the data enhancement is very close to the initially input sentence, which helps the models to obtain their common features. Furthermore, since the BANKING-77 dataset is more specialized than CLINC-150, the pre-trained language model has less relevant knowledge than CLINC-150. If the enhanced samples are directly used to calculate the class prototypes, noise is introduced, which weakens the characteristics of the classes themselves and reduces the recognition performance.
Aiming at the problems in the application of the CDA-PC, the CDA-IC utilizes an example-level comparison learning strategy to improve the small sample intention identification performance. The advantages of CDA-IC can be explained as: the example-level contrast learning strategy treats both the initial data and the corresponding enhanced data in the same class as positive examples. By the method, each sample interacts with more data, so that the spatial distance of initial input data of the same category can be shortened, and the enhanced data can be close to the original data of the same category semantically.
On the CLINC-150 dataset, the CDA-IC showed an improvement of 4.36% in the "5-way1-shot" meta task and 4.91% in the "5-way5-shot" meta task, compared to the best baseline model. On the BANKING-77 data set, the precision of the 5-way1-shot meta task is improved by 1.69%, and the precision of the 5-way5-shot meta task is improved by 1.86%.
Ablation experiment
To answer the RQ2 question, we analyzed the importance of the different modules in our CDA-IC model by deleting two basic components of the CDA-IC, an instance-level comparative learning module and an unsupervised learning module, respectively. The results of the ablation experiments are shown in table 3:
TABLE 3 ablation experiments for the "5-way1-shot", "5-way5-shot" meta-tasks performed by CDA-IC on the CLINC-150 dataset and BANKING-77 dataset. After the largest amplitude-reducing independent module in each column
Figure BDA0003830486410000102
And (4) marking.
Figure BDA0003830486410000101
Clearly, removing any portion of the CDA-IC results in performance degradation, suggesting that the unsupervised learning module and the example-level contrast learning module play a significant role in improving the small-sample intent recognition capability. In particular, in both types of meta-tasks, removing the instance-level contrast learning module results in the most severe performance degradation regardless of the data set. For example, on the CLINC-150 dataset, the CDA-IC model without the instance-level contrast learning module performed 3.63% and 3.82% reductions in the "5-way1-shot" meta-task and the "5-way5-shot" meta-task, respectively. For the BANKING-77 dataset, the CDA-IC model without the instance-level contrast learning module achieved 4.16% and 4.44% performance degradation in the "5-way1-shot" meta-task and the "5-way5-shot" meta-task, respectively.
Furthermore, it is worth noting that each module has its own unique contribution. Specifically, in the "5-way1-shot" meta task, the performance reduction caused by removing the unsupervised learning module is more than that in the "5-way5-shot" meta task, which shows that under the condition of insufficient characteristics, the unsupervised learning module has more obvious effect and is more beneficial to improving the intentional identification performance of the small sample. In addition, in the 5-way5-shot meta task, the example-level contrast learning module plays a more important role than the 5-way1-shot meta task. This phenomenon can be explained by the fact that in this case the bottleneck limiting performance is no longer a lack of features, but rather a mining of commonalities of the same class and uniqueness of different classes. The example-level comparison learning module can shorten the distance between samples of the same category in the embedding space and can increase the distance between vector representations of different categories, namely mining the commonality of the same category and the uniqueness of different categories.
Influence of different templates
To answer the RQ3 question, we have designed three different templates and applied them to the pattern of data enhancement. All template types discussed are shown in Table 4.
Table 4 three templates for full gap-fill data enhancement
Figure BDA0003830486410000111
Since our proposed model is based on a pre-trained language model, it requires the use of templates to generate semantically similar data. Because different templates use different words and punctuation, i.e., tokens, the semantic vectors derived from the pre-trained language model are also different, as is the case in FIG. 2.
FIG. 2 shows the performance of different templates in the "5-way1-shot" and "5-way5-shot" meta-tasks on the CLINC-150 and BANKING-77 datasets.
It is clear that it indicates that different templates do indeed result in significant changes in model performance. In particular, in the "5-way1-shot" meta task performed on the CLINC-150 dataset, the performance difference of different templates approaches 1%. Furthermore, as shown in FIG. 2, the performance difference between different templates can even reach 1.3% on the "5-way5-shot" meta task performed on the BANKING-77 dataset.
From the overall trend, the length of the template has no direct relation to the effect of the data increment. In particular, although template 2 is the shortest, it does not perform the worst on the CLINC-150 dataset. Its performance on the "5-way1-shot" meta task is very close to that of template 3, and better than that of template 1. Notably, template 3 performed best on all tasks of the CLINC-150 and BANKING-77 datasets compared to the other two templates. This phenomenon can be explained by the most clear semantic guidance of template 3 for [ MASK ] tags. When the original input sentence is filled into the template, template 3 explicitly states that [ MASK ] represents the intent of the input sentence. Thus, the semantic embedded vector generated is more directional.
In summary, the design of the template has a significant impact on the performance of data enhancement. A good template can provide proper semantic guidance, and the data enhancement performance is effectively improved.
We propose a full-filling data enhancement (CDA) model for small sample intent recognition. Inspired by the task of pre-training language models, we have designed a template-based unsupervised data enhancement strategy that is intended to produce meaningful data without destroying syntactic structure and adding noise. Furthermore, to take full advantage of the limited data and achieve separable embedding, we perform a comparative learning between the original data and the enhanced data. Thus, each sample can interact with all remaining classes of samples, thereby distinguishing different classes of embedding in the embedding space. The results of experiments performed on the CLINC-150 and BANKING-77 data sets demonstrate effectiveness at all baseline discussed. Furthermore, ablation studies indicate that the contrast module is the most important component of the entire model.
The skilled person will appreciate that the apparatus described above may also comprise only the components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments in this application as described above, which are not provided in detail for the sake of brevity.
It is intended that the one or more embodiments of the present application embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (9)

1. The data enhancement method of the completion type filling for identifying the small sample intention is characterized by comprising the following steps: a complete gap-filling type data enhancement task is constructed based on a pre-training language model and used for intention recognition, an unsupervised learning method is adopted to enable the result of data enhancement to be semantically similar to an original input sentence, then a supervised contrast learning method is adopted based on metric classification to enable intention samples of the same category to be closer to each other in an embedding space, and intention samples of different categories are further away from each other, and the contrast learning method comprises prototype-level contrast learning and example-level contrast learning.
2. The method of claim 1, wherein the unsupervised learning method comprises: the method comprises the steps that a pre-training language model is used as a feature extractor, input words with set proportion are covered by marks, and the covered marks are predicted according to the semantics of the context of the input sentences; the feature extractor encodes the sentences in the completed gap-filling type intention recognition task into hidden layer vector representation after adding two special marks, wherein the two special marks respectively represent the beginning and the end of the sentences in the completed gap-filling type intention recognition task, and the process is represented by the following formula:
Figure FDA0003830486400000011
wherein, F (-) is a feature extractor, T is a complete filling type template, [ MASK ]]For the label of the word to be masked, pat (T, x) is the data enhancement mode, x is the input sentence,
Figure FDA0003830486400000012
represents the hidden layer vector, [ CLS]And [ SEP ]]Respectively representing a start and an end;
vector of hidden layer
Figure FDA0003830486400000013
Words deemed to be masked [ MASK ]]Is shown by
Figure FDA0003830486400000014
Seen as a sentence representation generated from the pattern Pat (T, x)), which is semantically similar to the input sentence x, repeating equation (1) over all input samples yields a corresponding data enhancement result.
3. The method as claimed in claim 2, wherein the pre-trained language model is constrained by a loss function to weaken vectors that do not match the semantics of the input sentence, and finally obtain the proper data enhancement result.
4. The method of claim 3, wherein the input sentence x is fed into a pre-trained language model without introducing any external knowledge and labels, and a low-dimensional vector representation is obtained and expressed as:
Figure FDA0003830486400000015
wherein the hidden layer vector
Figure FDA0003830486400000016
Expressed as the whole sentence x, by the following loss function:
Figure FDA0003830486400000017
for reducing
Figure FDA0003830486400000018
And
Figure FDA0003830486400000019
the distance between them.
5. The method of claim 4, wherein the unsupervised data enhancement of the space-filling pattern is performed by using a metric-based prototype network as a classifier to check the effect of the data enhancement.
6. The method of claim 5, wherein an average representation of samples in the same class is first calculated and used as a prototype for the class:
Figure FDA0003830486400000021
wherein, c i Prototype representation, K, representing class i i Representing a supporting dataset under a current metatask T
Figure FDA0003830486400000022
The number of samples of the middle category i,
Figure FDA0003830486400000023
is a representation of the kth sentence in category i; with this prototype representation, the average distance of a sample of the same class to its center is shortest; all in oneIs based on
Figure FDA0003830486400000024
Calculating formula (4) through the prototype to obtain an enhanced prototype c' i
7. The method of claim 6, wherein the prototype of the input sample and the resulting prototype of the data enhancement are weighted such that the final prototype more fully covers the common features of its class, and wherein the formula is as follows:
Figure FDA0003830486400000025
where α is a weighting coefficient to control the respective contributions from the original input data and the enhancement data;
given a score function s (,) the prototype network predicts the tags of the query instances by computing a softmax distribution of similarity between the query embedding vector and the prototype
Figure FDA0003830486400000026
The formula is as follows:
Figure FDA0003830486400000027
wherein, y is a prediction tag,
Figure FDA0003830486400000028
is the current meta task
Figure FDA00038304864000000215
Query set of
Figure FDA0003830486400000029
J is a true tag,
Figure FDA00038304864000000210
representing final prototypes of the initial and enhanced data based on the j category, selecting cosine similarity as s (·); learning by minimizing the negative log probability:
Figure FDA00038304864000000211
to proceed with.
8. The method of claim 1, wherein the comparison learning at prototype level introduces a loss based on comparison learning, so as to separate prototypes of different classes as much as possible and to keep average representations of different classes apart from each other, expressed by the following formula:
Figure FDA00038304864000000212
where s (·, ·) is the same similarity measure function as in the formula, thus, s (c) i ,c i ) Is a constant 1, reducing equation (4) to the following form:
Figure FDA00038304864000000213
wherein e is a constant, and wherein e is a constant,
Figure FDA00038304864000000214
loss of contrast at prototype level.
9. The method of claim 8, wherein the prototype-level-based contrast learning introduces instance-level contrast learning to make instances of the same class close to each other, and is expressed by the following formula:
Figure FDA0003830486400000031
wherein the content of the first and second substances,
Figure FDA0003830486400000032
to represent
Figure FDA0003830486400000033
And
Figure FDA0003830486400000034
the positive example of (2) includes a vector representation of the original corpus and an enhanced vector representation belonging to the same category as the original corpus, wherein the similarity between sample vector representations of the same category is improved, and the similarity between sample vectors of different categories is improved by minimizing the loss
Figure FDA0003830486400000035
To decrease.
CN202211071426.7A 2022-09-02 2022-09-02 Data enhancement method of completion type space filling type for small sample intention recognition Pending CN115455144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211071426.7A CN115455144A (en) 2022-09-02 2022-09-02 Data enhancement method of completion type space filling type for small sample intention recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211071426.7A CN115455144A (en) 2022-09-02 2022-09-02 Data enhancement method of completion type space filling type for small sample intention recognition

Publications (1)

Publication Number Publication Date
CN115455144A true CN115455144A (en) 2022-12-09

Family

ID=84301651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211071426.7A Pending CN115455144A (en) 2022-09-02 2022-09-02 Data enhancement method of completion type space filling type for small sample intention recognition

Country Status (1)

Country Link
CN (1) CN115455144A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435738A (en) * 2023-12-19 2024-01-23 中国人民解放军国防科技大学 Text multi-intention analysis method and system based on deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435738A (en) * 2023-12-19 2024-01-23 中国人民解放军国防科技大学 Text multi-intention analysis method and system based on deep learning
CN117435738B (en) * 2023-12-19 2024-04-16 中国人民解放军国防科技大学 Text multi-intention analysis method and system based on deep learning

Similar Documents

Publication Publication Date Title
Meng et al. Aspect based sentiment analysis with feature enhanced attention CNN-BiLSTM
CN111444343B (en) Cross-border national culture text classification method based on knowledge representation
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN111738007B (en) Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
CN110020438A (en) Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN110457585B (en) Negative text pushing method, device and system and computer equipment
Mozafari et al. BAS: an answer selection method using BERT language model
CN117236338B (en) Named entity recognition model of dense entity text and training method thereof
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114676255A (en) Text processing method, device, equipment, storage medium and computer program product
Yu et al. Learning DALTS for cross‐modal retrieval
Wang et al. Information-enhanced hierarchical self-attention network for multiturn dialog generation
CN115455144A (en) Data enhancement method of completion type space filling type for small sample intention recognition
CN114282592A (en) Deep learning-based industry text matching model method and device
CN113901224A (en) Knowledge distillation-based secret-related text recognition model training method, system and device
CN114330483A (en) Data processing method, model training method, device, equipment and storage medium
Wang et al. Deep Semantics Sorting of Voice-Interaction-Enabled Industrial Control System
Long et al. Cross-domain personalized image captioning
Zhang A study on the intelligent translation model for English incorporating neural network migration learning
CN116663539A (en) Chinese entity and relationship joint extraction method and system based on Roberta and pointer network
CN116595189A (en) Zero sample relation triplet extraction method and system based on two stages
CN112613316B (en) Method and system for generating ancient Chinese labeling model
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN111428499B (en) Idiom compression representation method for automatic question-answering system by fusing similar meaning word information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination