CN117114004A

CN117114004A - Door control deviation correction-based few-sample two-stage named entity identification method

Info

Publication number: CN117114004A
Application number: CN202311386316.4A
Authority: CN
Inventors: 吕明翰; 王明文; 谢文; 陈筱; 罗文兵; 黄琪
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-10-25
Filing date: 2023-10-25
Publication date: 2023-11-24
Anticipated expiration: 2043-10-25
Also published as: CN117114004B

Abstract

The invention discloses a method for identifying a few-sample two-stage named entity based on gating deviation correction, which comprises the following steps: firstly, splicing a label prompt and an input text, acquiring all possible entity spans after inputting a span detection model, then inputting all the entity spans into a span classification model, and jointly generating a category prototype by using the label prompt and an original prototype through a gating module by the span classification model to classify the entity spans.

Description

Door control deviation correction-based few-sample two-stage named entity identification method

Technical Field

The invention relates to the technical field of natural language processing, in particular to a method for identifying a few-sample two-stage named entity based on gating deviation correction.

Background

Named entity recognition is often used in question-and-answer, information retrieval, and other language understanding class applications, with the aim of identifying and classifying spans of entities in text into predefined categories, such as person names, regions, organizations, times, etc.; named entity recognition is a basic task in natural language processing; in recent years, deep learning has achieved remarkable success in named entity recognition, especially in the aspect of a pre-training language model trained by using a self-supervision mode, when enough annotation data exists, an impressive performance can be obtained by a method based on the deep learning; in practical application, entity categories which do not appear during training need to be identified in a new field; however, collecting additional annotation data for these new entity categories requires a significant amount of time and effort, which can be costly; therefore, a few sample named entity recognition aimed at identifying entities based on a few labeled data has attracted great attention from the research community; many approaches have been proposed by researchers to solve the problem of recognition of few sample named entities, one popular algorithm being a prototype network that is based on a meta-learning framework and metric learning; firstly, training in a data set containing a large amount of general field labeling data, generalizing a model into a new field through learning, generating a prototype for each category according to a small number of labeling data of each category when testing in the new field, and then distributing a corresponding category for each query instance by calculating the distance between the query instance and the prototype.

However, in recent years, algorithms based on prototype networks are mainly end-to-end methods, and these methods need to learn complex structures composed of span boundaries and entity types at the same time, when the domain knowledge span is large, since only a few pieces of marking data are difficult to capture such complex structure information, the span boundary information is inadequately learned, and the span of the general domain, that is, the problem of false positives, may be identified in the new domain, which makes it difficult for the model to obtain satisfactory performance; moreover, most of the existing prototype network algorithms mainly obtain class prototypes by averaging only a given few labeling data in each class, which makes it difficult for prototypes to fully represent specific classes, and although some researchers propose to optimize the representation of prototypes in combination with external information, these methods are all implicit in combination with external information, and restrict the learning of prototype representations by contrast learning and attention mechanisms; the effect of this inadequate and weak implicit constraint in handling outlier samples is limited.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method for identifying a few-sample two-stage named entity based on gating deviation correction, which solves the problems in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions: a method for identifying a few-sample two-stage named entity based on gating deviation correction comprises the following steps:

step S1: acquiring a disclosed few-sample named entity identification data set, and constructing tag information according to entity types of the few-sample named entity identification data set; the method comprises the steps that a few sample named entity recognition data set is divided into a training set and a testing set, wherein the training set and the testing set are composed of a supporting set and a query set, and the supporting set and the query set are composed of sentence texts and labeled real labels;

step S2: defining a few-sample two-stage named entity recognition task formalization based on gating deviation correction; constructing a named entity recognition model, wherein the named entity recognition model comprises a span detection model and a span classification model, and the span detection model is formed by a feature encoder and a linear classification layer and is in a serial structure; the span classification model is formed by a characteristic encoder and a gating module in a serial structure; the gating module consists of tag gating and prototype gating;

step S3: splicing the label information constructed in the step S1 to the sentence text of the support set in the training set in the step S1, inputting the spliced sentence text into a feature encoder of the span detection model, and obtaining embedded feature vectors of all characters in the spliced sentence text;

step S4: inputting the embedded feature vectors obtained in the step S3 into a linear classification layer of a span detection model, predicting all entity spans according to the obtained embedded feature vectors by using a sequence labeling mode, and calculating span detection loss through a cross entropy loss functionThe loss is detected by the calculated span>Optimizing and updating parameters of the span detection model;

step S5: inputting the sentence text of the support set in the training set spliced in the step S3 into a feature encoder in a span classification model, and acquiring the entity span in the sentence text and the embedded feature vector of the tag information spliced in the sentence text;

step S6: averaging the embedded feature vectors of the entity spans belonging to the same entity category in the step S5 to obtain an original prototype representing the entity category, and inputting the original prototype and the embedded feature vectors of the tag information after supporting sentence text in a training set into a gating module in a span classification model to rectify the original prototype to obtain a rectified category prototype;

step S7: calculating distance between embedded feature vectors of entity spans of query sets in training sets and the corrected category prototype obtained in step S6, distributing corresponding entity categories for the entity spans of the query sets according to the calculated distance, and calculating span classification loss through cross entropyOptimizing and updating parameters of the span classification model;

step S8: the query set in the test set is spliced with label information and then input into a span detection model, and all entity spans are predicted to obtain embedded feature vectors;

step S9: inputting the test set with the spliced label information into a feature encoder in a span classification model to obtain entity spans predicted by a span detection model in sentence texts of the query set and embedded feature vectors of the label information spliced in the sentence texts; inputting all original prototypes obtained by the support set in the test set through the method of the step S6 and the embedded feature vectors of all obtained label information into a gating module in a span classification model to obtain a class prototype subjected to deviation correction; and calculating the distance between the entity span predicted by the span detection model and each category prototype, and distributing the entity category corresponding to the category prototype closest to the entity span to obtain a named entity set in the final query set.

Further, the specific process of step S1 includes: converting the constructed tag information into a corresponding natural language character set; the support set of the few sample named entity recognition dataset represents the noted few data for training, and the query set represents the data that needs to be predicted.

Further, the specific process of defining the formalization of the fewer-sample two-stage named entity recognition task based on the gate-controlled deviation correction in the step S2 is as follows:

step S2.1: defining a training set for training a model，/>Representing a support set in a training set, wherein the support set comprises N entity categories, and each entity category comprises K samples; />Representing a query set in the training set, wherein the query set is consistent with entity types in the support set, and the support set and the query set are respectively provided with a plurality of sentence texts +.>The composition, n is the number of characters, +.>Representing an ith character in sentence text;

step S2.2: in the prediction stage, a test set from a new field is defined，Representing the support set in the test set, +.>Representing a set of queries in a test set; used in training set->On training model, use support set in test set +.>Query set in test set->Predicting;

step S2.3: defining a span boundary prediction tag set using a span detection modelThe span detection model allocates a label to each character in the text of the input sentence, and obtains an entity span set according to the labelThe method comprises the steps of carrying out a first treatment on the surface of the Wherein B represents the beginning of the multi-character span, I represents the middle of the multi-character span, O is the non-entity span,representing the i-th entity span in the entity span set, S representing the entity span in the sentence text,；

step S2.4: defining entity class tab sets using a span classification model ，/>Representing entity categories; the span classification model distributes an entity category for each entity span in the entity span set output by the span detection model。

Further, step S3 includes: converting the entity class label set corresponding to the sentence text X into a corresponding natural language character setSplicing to the rear of the sentence text X to obtain a spliced sentence textThe method comprises the steps of carrying out a first treatment on the surface of the The feature encoder in the span detection model consists of a pre-trained language model BERT, and spliced sentence text +.>Inputting into a pre-training language model BERT, obtaining corresponding embedded feature vector +.>The specific calculation steps are shown in the following formula:

;

in the method, in the process of the invention,representing the 1 st character in sentence text, < >>Representing the nth character in the sentence text, +.>Representing charactersEmbedded feature vectors obtained via a pre-trained language model BERT +.>Representing character->An embedded feature vector is obtained through a pre-training language model BERT; />Representing the 1 st natural language character spliced after sentence text,/and>representing the Nth natural language character spliced after sentence text,/and>representing natural language characters ++>Embedded feature vectors obtained via a pre-trained language model BERT +.>Representing natural language characters ++>The embedded feature vectors are obtained through a pre-trained language model BERT.

Further, the specific process of calculating the span detection loss in the step S4 is as follows:

step S4.1: the embedded feature vectors of all the characters obtained in the step S3 are input into a linear classification layer of the span detection model to calculate charactersTag set +.>The specific calculation steps are as follows:

;

in the method, in the process of the invention,representing character->Belongs to the tag collection->Probability of the middle label;softmaxrepresenting a normalization function;Wa weight matrix representing a linear classification layer,bbias term representing linear classification layer, +.>Representing an i-th character-embedded feature vector;

step S4.2: probability distribution to be predictedAnd character->Is->Input to cross entropy loss function, calculate span detection loss +.>The specific calculation steps are shown in the following formula:

;

in the method, in the process of the invention,representing span detection loss; />Representing character->Is a real tag of (a).

Further, the specific process of step S6 is as follows:

step S6.1: calculating the representation of the entity span by averaging all character embedded feature vectors in the entity span obtained in step S5, wherein the specific calculation steps are as follows:

;

in the method, in the process of the invention,representing entity span->Is indicated by->Representing character->To->Is a collection of (3); />Representing the embedded feature vector of the kth character in the sentence text;

step S6.2: definition of belonging to entity classIs used to calculate the entity class +.>Original prototype +.>The specific calculation steps are shown in the following formula:

;

in the method, in the process of the invention,representing entity class->Is a primitive prototype of (a); />Representation belonging to entity category->The number of all entity spans;

step S6.3: category of entitiesOriginal prototype +.>And entity class->Is embedded with feature vector->The method comprises the steps of obtaining the reserved label information and the replaced label information through label gating, wherein the specific calculation steps are as follows:

;

in the method, in the process of the invention,representing entity categories/>Corresponding natural language character->Is embedded with feature vectors; />Representing entity categoriesIs a primitive prototype of (a); />A weight matrix representing tag gating; />A bias term representing tag gating; />Representing a normalization function; />The weight which needs to be reserved for the label information is represented; />Information indicating that label information needs to be reserved; />Information indicating that the tag information needs to be replaced;

step S6.4: label information that would require replacementAnd original prototype->Inputting information of an original prototype by prototype gating, wherein the specific calculation steps are as follows:

;

in the method, in the process of the invention,weight matrix representing prototype gating, +.>Bias terms representing prototype gating; />Representing a normalization function; />The weight of the original prototype needing to keep information is represented; />Information which indicates that the original prototype needs to be reserved;

step S6.5: the corrected category prototype is obtained by adding the original prototype and the information which needs to be reserved in the label information, and the specific calculation steps are shown in the following formula:

;

in the method, in the process of the invention,representing entity class->And (5) correcting the class prototype.

Further, the specific process of step S7 is as follows:

step S7.1: by calculating entity spansAnd entity class->Class prototype after correctionDistance acquisition entity span set belonging to entity category +.>The specific calculation steps are as follows:

;

in the method, in the process of the invention,indicating that the entity span belongs to the entity category->Probability of->Representing distance function>Representing except entity class->The category prototypes belong to the entity category label set C at will;

step S7.2: belonging entity span to entity classProbability of->And its real label->Input into the cross entropy loss function, calculate the span class loss +.>The specific calculation steps are shown in the following formula:

；

in the method, in the process of the invention,representing span classification loss; />Representing the support set in the training set.

Further, the specific process of step S8 is as follows:

step S8.1: acquiring a set of queries in a test setEmbedding feature vectors of sentence texts of the spliced tag information into a linear classification layer of the span detection model;

step S8.2: the linear classification layer of the span detection model predicts a corresponding span boundary prediction tag set for each character of the input sentence textAnd decodes it; span detection model predicts label set for span boundary according to preset rule +.>Decoding is carried out to obtain the entity span.

Further, the specific process of obtaining the entity span is as follows:

step S8.21: the entity span decoding sequence is decoded word by word from left to right according to sentence text;

step S8.22: when identifying the 'B' in the span boundary prediction label set, continuing to identify to the right, if identifying the 'O' or the 'B' in the span boundary prediction label set, namely, the sentence text corresponding to the 'B' to the 'I' corresponds to a complete entity span;

step S8.23: the "O" flag in the span boundary prediction tag set represents a non-entity span, is an invalid label, and is skipped in the decoding process.

Further, in the step S9, a named entity set in the query set of the final test set is obtained, which specifically includes:

step S9.1: acquiring a set of queries in a test setEmbedding feature vector of sentence text spliced with tag information, and supporting set +.>Calculating according to the method of step S6 to obtain category prototype +.>；

Step S9.2: collect queries in a test setThe embedded feature vectors and the category prototypes of the entity spans in the entity span set acquired in the step S8>Calculating distance and obtaining probability of entity class +.>By taking the entity class with highest probability +.>The corresponding labels are distributed for the entity spans in the entity span set, and the specific calculation steps are as follows:

；

in the method, in the process of the invention,representing the entity class of the final prediction of the entity span, argmax represents the maximum function.

Compared with the prior art, the invention has the following beneficial effects:

(1) According to the invention, the named entity recognition task is decomposed into the span detection task and the span classification task, and each model only executes one task in each stage, so that the task complexity is reduced, the model is easier to learn in a few-sample scene, and the performance of the model in the few-sample scene is improved.

(2) The invention adds label information for sentence text in span detection stage to reduce entities under non-new field predicted by model, and reduce false positive problem.

(3) In the invention, a gating module is introduced in the span classification stage, and the original prototype is subjected to deviation correction by explicitly utilizing the label information, so that the prototype comprises global information of the label information and local information of the original prototype, the complete representation of the prototype on the entity class is enhanced, and the accuracy of model classification is improved.

Drawings

FIG. 1 is a flow chart of the structure of a named entity recognition model of the present invention;

FIG. 2 is a flow chart of the structure of a gating module in a named entity recognition model according to the present invention;

Detailed Description

Referring to fig. 1-2, the present invention provides the following technical solutions: a method for identifying a few-sample two-stage named entity based on gating deviation correction comprises the following steps:

the support set of the few-sample named entity recognition data set represents marked few data for training, and the query set represents data needing to be predicted; converting the constructed tag information into a corresponding natural language character set, for example: label setConversion to the natural language character set +.>；

step S2.3: defining a span boundary prediction tag set using a span detection modelThe span detection model allocates a label to each character in the text of the input sentence, and obtains an entity span set according to the labelThe method comprises the steps of carrying out a first treatment on the surface of the Wherein B represents the beginning of the multicharacter span, I represents the middle of the multicharacter span, O is the non-entity span,>representing the i-th entity span in the entity span set, S representing the entity span in the sentence text,；

step S2.4: defining entity class tab sets using a span classification model ，/>Representing entity categories; the span classification model distributes an entity class for each entity span in the entity span set output by the span detection model>；

converting the entity class label set corresponding to the sentence text X into a corresponding natural language character setSplicing to the rear of the sentence text X to obtain a spliced sentence textThe method comprises the steps of carrying out a first treatment on the surface of the The feature encoder in the span detection model consists of a pre-trained language model BERT, and spliced sentence text +.>Inputting into a pre-training language model BERT, obtaining corresponding embedded feature vector +.>The specific calculation steps are shown in the following formula:

；

in the method, in the process of the invention,representing the 1 st character in sentence text, < >>Representing the nth character in the sentence text, +.>Representing charactersEmbedded feature vectors obtained via a pre-trained language model BERT +.>Representing character->An embedded feature vector is obtained through a pre-training language model BERT; />Representing the 1 st natural language character spliced after sentence text,/and>representing the Nth natural language character spliced after sentence text,/and>representing natural language characters ++>Embedded feature vectors obtained via a pre-trained language model BERT +.>Representing natural language characters ++>An embedded feature vector is obtained through a pre-training language model BERT;

the specific process for calculating the span detection loss is as follows:

step S4.1: the step S3 is performed to obtainThe obtained embedded feature vectors of all characters are input into a linear classification layer of a span detection model to calculate the charactersTag set +.>The specific calculation steps are as follows:

；

in the method, in the process of the invention,representing span detection loss; />Representing character->Is a real tag of (1);

step S5: inputting the sentence text of the support set in the training set spliced in the step S3 into a feature encoder in a span classification model, and acquiring the entity span in the sentence text and the embedded feature vector of the tag information spliced in the sentence text; in the step, the feature encoder is composed of a pre-trained language model BERT, and a natural language character set corresponding to the tag information corresponding to the sentence text in the step S5 is spliced behind the sentence text, and the specific process is the same as that in the step S3, so that details are not repeated here;

step S6: the embedded feature vectors of the entity spans belonging to the same entity category in the step S5 are averaged to obtain an original prototype representing the entity category, then the original prototype and the embedded feature vectors of the tag information after supporting sentence text in a training set are input into a gating module in a span classification model, and the original prototype is rectified to obtain a rectified category prototype, which comprises the following specific steps:

；

in the method, in the process of the invention,representing entity class->Corresponding natural language character->Is embedded with feature vectors; />Representing entity categoriesIs a primitive prototype of (a); />Weight moment representing tag gatingAn array; />A bias term representing tag gating; />Representing a normalization function; />The weight which needs to be reserved for the label information is represented; />Information indicating that label information needs to be reserved; />Information indicating that the tag information needs to be replaced;

step S6.4: label information that would require replacementAnd original prototype->Inputting prototype gating to control the information of how many original prototypes need to be kept, wherein the specific calculation steps are as follows:

；

in the method, in the process of the invention,representing entity class->Correcting the class prototype;

step S7: calculating distance between embedded feature vectors of entity spans of query sets in training sets and the corrected category prototype obtained in step S6, distributing corresponding entity categories for the entity spans of the query sets according to the calculated distance, and calculating span classification loss through cross entropyOptimizing and updating parameters of the span classification model:

step S7.1: by calculating entity spansAnd entity class->Class prototype after correction->Distance acquisition entity span set belonging to entity category +.>The specific calculation steps are as follows:

；

in the method, in the process of the invention,representing span divisionClass loss; />Representing a support set in a training set;

step S8: the method comprises the following specific steps of inputting label information of a query set in a test set into a span detection model, and predicting to obtain embedded feature vectors of all entity spans:

step S8.1: similar to step S3, a query set in the test set is obtainedEmbedding feature vectors of sentence texts of the spliced tag information into a linear classification layer of the span detection model;

step S8.2: the linear classification layer of the span detection model predicts a corresponding span boundary prediction tag set for each character of the input sentence textAnd decodes it; span detection model predicts label set for span boundary according to preset rule +.>Decoding to obtain an entity span;

the specific process for obtaining the entity span is as follows:

step S8.23: the 'O' mark in the span boundary prediction label set represents a non-entity span, is an invalid label and is skipped in the decoding process;

step S9: inputting the test set with the spliced label information into a feature encoder in a span classification model to obtain entity spans predicted by a span detection model in sentence texts of the query set and embedded feature vectors of the label information spliced in the sentence texts; inputting all original prototypes obtained by the support set in the test set through the method of the step S6 and the embedded feature vectors of all obtained label information into a gating module in a span classification model to obtain a class prototype subjected to deviation correction; calculating the distance between the entity span predicted by the span detection model and each category prototype, and distributing the entity category corresponding to the category prototype closest to the entity span to obtain a named entity set in a final query set;

the named entity set in the final query set is obtained, and the specific process is as follows:

step S9.1: similar to step S5, a query set in the test set is obtainedEmbedding feature vector of sentence text spliced with tag information, and supporting set +.>Calculating according to the method of step S6 to obtain category prototype +.>；

；

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The method for identifying the named entity in two stages by using a few samples based on the gate control deviation correction is characterized by comprising the following steps of:

2. The method for identifying a named entity in two stages with few samples based on gate correction according to claim 1, wherein the specific process of step S1 includes: converting the constructed tag information into a corresponding natural language character set; the support set of the few sample named entity recognition dataset represents the noted few data for training, and the query set represents the data that needs to be predicted.

3. The method for identifying the fewer-sample two-stage named entity based on the gating deviation according to claim 2, wherein the specific process of defining the fewer-sample two-stage named entity identification task formalization based on the gating deviation in step S2 is as follows:

step S2.1: defining a training set for training a model，/>Representing a support set in a training set, wherein the support set comprises N entity categories, and each entity category comprises K samples; />Representing a query set in the training set, wherein the query set is consistent with entity types in the support set, and the support set and the query set are all provided with a plurality of sentencesSub-textThe composition, n is the number of characters, +.>Representing an ith character in sentence text;

step S2.4: defining entity class tab sets using a span classification model ，/>Representing entity categories; the span classification model distributes an entity class for each entity span in the entity span set output by the span detection model>。

4. The method for identifying a few-sample two-stage named entity based on gate bias correction of claim 3, wherein step S3 comprises: converting the entity class label set corresponding to the sentence text X into a corresponding natural language character setSplicing to the rear of the sentence text X to obtain a spliced sentence textThe method comprises the steps of carrying out a first treatment on the surface of the The feature encoder in the span detection model consists of a pre-trained language model BERT, and spliced sentence text +.>Inputting into a pre-training language model BERT, obtaining corresponding embedded feature vector +.>The specific calculation steps are shown in the following formula:

；

in the method, in the process of the invention,representing the 1 st character in sentence text, < >>Representing the nth character in the sentence text, +.>Representing character->Embedded feature vectors obtained via a pre-trained language model BERT +.>Representing character->An embedded feature vector is obtained through a pre-training language model BERT; />Representing the 1 st natural language character spliced after sentence text,/and>representing the Nth natural language character spliced after sentence text,/and>representing natural language characters ++>Embedded feature vectors obtained via a pre-trained language model BERT +.>Representing natural language characters ++>The embedded feature vectors are obtained through a pre-trained language model BERT.

5. The method for identifying a named entity with few samples and two stages based on gate correction according to claim 4, wherein the specific process of calculating span detection loss in step S4 is as follows:

；

6. The method for identifying a few-sample two-stage named entity based on gating correction according to claim 5, wherein the specific process of step S6 is as follows:

；

step S6.3: category of entitiesOriginal prototype +.>And entity class->Is embedded with feature vector->The retention and replacement of the tag information are determined through tag gating, and the specific calculation steps are as follows:

；

in the method, in the process of the invention,representing entity class->Corresponding natural languageCharacter->Is embedded with feature vectors; />Representing entity class->Is a primitive prototype of (a); />A weight matrix representing tag gating; />A bias term representing tag gating; />Representing a normalization function; />The weight which needs to be reserved for the label information is represented; />Information indicating that label information needs to be reserved; />Information indicating that the tag information needs to be replaced;

；

7. The method for identifying the named entity with few samples and two stages based on the gating correction according to claim 6, wherein the specific process of the step S7 is as follows:

；

8. The method for identifying the named entity with few samples and two stages based on the gating correction according to claim 7, wherein the specific process of the step S8 is as follows:

step S8.2: the linear classification layer of the span detection model predicts a corresponding span boundary prediction tag set for each character of the input sentence textAnd decodes it; span detection model pre-sets span boundaries according to preset rulesLabel set->Decoding is carried out to obtain the entity span.

9. The method for identifying the named entity with few samples and two stages based on the gating deviation correction according to claim 8, wherein the specific process of obtaining the entity span is as follows:

10. The method for identifying named entity in two stages with few samples based on gate correction according to claim 9, wherein the named entity set in the query set of the final test set is obtained in step S9, and the specific process is as follows:

；