CN117114004A - Door control deviation correction-based few-sample two-stage named entity identification method - Google Patents

Door control deviation correction-based few-sample two-stage named entity identification method Download PDF

Info

Publication number
CN117114004A
CN117114004A CN202311386316.4A CN202311386316A CN117114004A CN 117114004 A CN117114004 A CN 117114004A CN 202311386316 A CN202311386316 A CN 202311386316A CN 117114004 A CN117114004 A CN 117114004A
Authority
CN
China
Prior art keywords
entity
span
representing
prototype
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311386316.4A
Other languages
Chinese (zh)
Other versions
CN117114004B (en
Inventor
吕明翰
王明文
谢文
陈筱
罗文兵
黄琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Normal University
Original Assignee
Jiangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Normal University filed Critical Jiangxi Normal University
Priority to CN202311386316.4A priority Critical patent/CN117114004B/en
Publication of CN117114004A publication Critical patent/CN117114004A/en
Application granted granted Critical
Publication of CN117114004B publication Critical patent/CN117114004B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for identifying a few-sample two-stage named entity based on gating deviation correction, which comprises the following steps: firstly, splicing a label prompt and an input text, acquiring all possible entity spans after inputting a span detection model, then inputting all the entity spans into a span classification model, and jointly generating a category prototype by using the label prompt and an original prototype through a gating module by the span classification model to classify the entity spans.

Description

Door control deviation correction-based few-sample two-stage named entity identification method
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method for identifying a few-sample two-stage named entity based on gating deviation correction.
Background
Named entity recognition is often used in question-and-answer, information retrieval, and other language understanding class applications, with the aim of identifying and classifying spans of entities in text into predefined categories, such as person names, regions, organizations, times, etc.; named entity recognition is a basic task in natural language processing; in recent years, deep learning has achieved remarkable success in named entity recognition, especially in the aspect of a pre-training language model trained by using a self-supervision mode, when enough annotation data exists, an impressive performance can be obtained by a method based on the deep learning; in practical application, entity categories which do not appear during training need to be identified in a new field; however, collecting additional annotation data for these new entity categories requires a significant amount of time and effort, which can be costly; therefore, a few sample named entity recognition aimed at identifying entities based on a few labeled data has attracted great attention from the research community; many approaches have been proposed by researchers to solve the problem of recognition of few sample named entities, one popular algorithm being a prototype network that is based on a meta-learning framework and metric learning; firstly, training in a data set containing a large amount of general field labeling data, generalizing a model into a new field through learning, generating a prototype for each category according to a small number of labeling data of each category when testing in the new field, and then distributing a corresponding category for each query instance by calculating the distance between the query instance and the prototype.
However, in recent years, algorithms based on prototype networks are mainly end-to-end methods, and these methods need to learn complex structures composed of span boundaries and entity types at the same time, when the domain knowledge span is large, since only a few pieces of marking data are difficult to capture such complex structure information, the span boundary information is inadequately learned, and the span of the general domain, that is, the problem of false positives, may be identified in the new domain, which makes it difficult for the model to obtain satisfactory performance; moreover, most of the existing prototype network algorithms mainly obtain class prototypes by averaging only a given few labeling data in each class, which makes it difficult for prototypes to fully represent specific classes, and although some researchers propose to optimize the representation of prototypes in combination with external information, these methods are all implicit in combination with external information, and restrict the learning of prototype representations by contrast learning and attention mechanisms; the effect of this inadequate and weak implicit constraint in handling outlier samples is limited.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for identifying a few-sample two-stage named entity based on gating deviation correction, which solves the problems in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions: a method for identifying a few-sample two-stage named entity based on gating deviation correction comprises the following steps:
step S1: acquiring a disclosed few-sample named entity identification data set, and constructing tag information according to entity types of the few-sample named entity identification data set; the method comprises the steps that a few sample named entity recognition data set is divided into a training set and a testing set, wherein the training set and the testing set are composed of a supporting set and a query set, and the supporting set and the query set are composed of sentence texts and labeled real labels;
step S2: defining a few-sample two-stage named entity recognition task formalization based on gating deviation correction; constructing a named entity recognition model, wherein the named entity recognition model comprises a span detection model and a span classification model, and the span detection model is formed by a feature encoder and a linear classification layer and is in a serial structure; the span classification model is formed by a characteristic encoder and a gating module in a serial structure; the gating module consists of tag gating and prototype gating;
step S3: splicing the label information constructed in the step S1 to the sentence text of the support set in the training set in the step S1, inputting the spliced sentence text into a feature encoder of the span detection model, and obtaining embedded feature vectors of all characters in the spliced sentence text;
step S4: inputting the embedded feature vectors obtained in the step S3 into a linear classification layer of a span detection model, predicting all entity spans according to the obtained embedded feature vectors by using a sequence labeling mode, and calculating span detection loss through a cross entropy loss functionThe loss is detected by the calculated span>Optimizing and updating parameters of the span detection model;
step S5: inputting the sentence text of the support set in the training set spliced in the step S3 into a feature encoder in a span classification model, and acquiring the entity span in the sentence text and the embedded feature vector of the tag information spliced in the sentence text;
step S6: averaging the embedded feature vectors of the entity spans belonging to the same entity category in the step S5 to obtain an original prototype representing the entity category, and inputting the original prototype and the embedded feature vectors of the tag information after supporting sentence text in a training set into a gating module in a span classification model to rectify the original prototype to obtain a rectified category prototype;
step S7: calculating distance between embedded feature vectors of entity spans of query sets in training sets and the corrected category prototype obtained in step S6, distributing corresponding entity categories for the entity spans of the query sets according to the calculated distance, and calculating span classification loss through cross entropyOptimizing and updating parameters of the span classification model;
step S8: the query set in the test set is spliced with label information and then input into a span detection model, and all entity spans are predicted to obtain embedded feature vectors;
step S9: inputting the test set with the spliced label information into a feature encoder in a span classification model to obtain entity spans predicted by a span detection model in sentence texts of the query set and embedded feature vectors of the label information spliced in the sentence texts; inputting all original prototypes obtained by the support set in the test set through the method of the step S6 and the embedded feature vectors of all obtained label information into a gating module in a span classification model to obtain a class prototype subjected to deviation correction; and calculating the distance between the entity span predicted by the span detection model and each category prototype, and distributing the entity category corresponding to the category prototype closest to the entity span to obtain a named entity set in the final query set.
Further, the specific process of step S1 includes: converting the constructed tag information into a corresponding natural language character set; the support set of the few sample named entity recognition dataset represents the noted few data for training, and the query set represents the data that needs to be predicted.
Further, the specific process of defining the formalization of the fewer-sample two-stage named entity recognition task based on the gate-controlled deviation correction in the step S2 is as follows:
step S2.1: defining a training set for training a model,/>Representing a support set in a training set, wherein the support set comprises N entity categories, and each entity category comprises K samples; />Representing a query set in the training set, wherein the query set is consistent with entity types in the support set, and the support set and the query set are respectively provided with a plurality of sentence texts +.>The composition, n is the number of characters, +.>Representing an ith character in sentence text;
step S2.2: in the prediction stage, a test set from a new field is definedRepresenting the support set in the test set, +.>Representing a set of queries in a test set; used in training set->On training model, use support set in test set +.>Query set in test set->Predicting;
step S2.3: defining a span boundary prediction tag set using a span detection modelThe span detection model allocates a label to each character in the text of the input sentence, and obtains an entity span set according to the labelThe method comprises the steps of carrying out a first treatment on the surface of the Wherein B represents the beginning of the multi-character span, I represents the middle of the multi-character span, O is the non-entity span,representing the i-th entity span in the entity span set, S representing the entity span in the sentence text,
step S2.4: defining entity class tab sets using a span classification model ,/>Representing entity categories; the span classification model distributes an entity category for each entity span in the entity span set output by the span detection model
Further, step S3 includes: converting the entity class label set corresponding to the sentence text X into a corresponding natural language character setSplicing to the rear of the sentence text X to obtain a spliced sentence textThe method comprises the steps of carrying out a first treatment on the surface of the The feature encoder in the span detection model consists of a pre-trained language model BERT, and spliced sentence text +.>Inputting into a pre-training language model BERT, obtaining corresponding embedded feature vector +.>The specific calculation steps are shown in the following formula:
;
in the method, in the process of the invention,representing the 1 st character in sentence text, < >>Representing the nth character in the sentence text, +.>Representing charactersEmbedded feature vectors obtained via a pre-trained language model BERT +.>Representing character->An embedded feature vector is obtained through a pre-training language model BERT; />Representing the 1 st natural language character spliced after sentence text,/and>representing the Nth natural language character spliced after sentence text,/and>representing natural language characters ++>Embedded feature vectors obtained via a pre-trained language model BERT +.>Representing natural language characters ++>The embedded feature vectors are obtained through a pre-trained language model BERT.
Further, the specific process of calculating the span detection loss in the step S4 is as follows:
step S4.1: the embedded feature vectors of all the characters obtained in the step S3 are input into a linear classification layer of the span detection model to calculate charactersTag set +.>The specific calculation steps are as follows:
;
in the method, in the process of the invention,representing character->Belongs to the tag collection->Probability of the middle label;softmaxrepresenting a normalization function;Wa weight matrix representing a linear classification layer,bbias term representing linear classification layer, +.>Representing an i-th character-embedded feature vector;
step S4.2: probability distribution to be predictedAnd character->Is->Input to cross entropy loss function, calculate span detection loss +.>The specific calculation steps are shown in the following formula:
;
in the method, in the process of the invention,representing span detection loss; />Representing character->Is a real tag of (a).
Further, the specific process of step S6 is as follows:
step S6.1: calculating the representation of the entity span by averaging all character embedded feature vectors in the entity span obtained in step S5, wherein the specific calculation steps are as follows:
;
in the method, in the process of the invention,representing entity span->Is indicated by->Representing character->To->Is a collection of (3); />Representing the embedded feature vector of the kth character in the sentence text;
step S6.2: definition of belonging to entity classIs used to calculate the entity class +.>Original prototype +.>The specific calculation steps are shown in the following formula:
;
in the method, in the process of the invention,representing entity class->Is a primitive prototype of (a); />Representation belonging to entity category->The number of all entity spans;
step S6.3: category of entitiesOriginal prototype +.>And entity class->Is embedded with feature vector->The method comprises the steps of obtaining the reserved label information and the replaced label information through label gating, wherein the specific calculation steps are as follows:
;
;
;
in the method, in the process of the invention,representing entity categories/>Corresponding natural language character->Is embedded with feature vectors; />Representing entity categoriesIs a primitive prototype of (a); />A weight matrix representing tag gating; />A bias term representing tag gating; />Representing a normalization function; />The weight which needs to be reserved for the label information is represented; />Information indicating that label information needs to be reserved; />Information indicating that the tag information needs to be replaced;
step S6.4: label information that would require replacementAnd original prototype->Inputting information of an original prototype by prototype gating, wherein the specific calculation steps are as follows:
;
;
in the method, in the process of the invention,weight matrix representing prototype gating, +.>Bias terms representing prototype gating; />Representing a normalization function; />The weight of the original prototype needing to keep information is represented; />Information which indicates that the original prototype needs to be reserved;
step S6.5: the corrected category prototype is obtained by adding the original prototype and the information which needs to be reserved in the label information, and the specific calculation steps are shown in the following formula:
;
in the method, in the process of the invention,representing entity class->And (5) correcting the class prototype.
Further, the specific process of step S7 is as follows:
step S7.1: by calculating entity spansAnd entity class->Class prototype after correctionDistance acquisition entity span set belonging to entity category +.>The specific calculation steps are as follows:
;
in the method, in the process of the invention,indicating that the entity span belongs to the entity category->Probability of->Representing distance function>Representing except entity class->The category prototypes belong to the entity category label set C at will;
step S7.2: belonging entity span to entity classProbability of->And its real label->Input into the cross entropy loss function, calculate the span class loss +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing span classification loss; />Representing the support set in the training set.
Further, the specific process of step S8 is as follows:
step S8.1: acquiring a set of queries in a test setEmbedding feature vectors of sentence texts of the spliced tag information into a linear classification layer of the span detection model;
step S8.2: the linear classification layer of the span detection model predicts a corresponding span boundary prediction tag set for each character of the input sentence textAnd decodes it; span detection model predicts label set for span boundary according to preset rule +.>Decoding is carried out to obtain the entity span.
Further, the specific process of obtaining the entity span is as follows:
step S8.21: the entity span decoding sequence is decoded word by word from left to right according to sentence text;
step S8.22: when identifying the 'B' in the span boundary prediction label set, continuing to identify to the right, if identifying the 'O' or the 'B' in the span boundary prediction label set, namely, the sentence text corresponding to the 'B' to the 'I' corresponds to a complete entity span;
step S8.23: the "O" flag in the span boundary prediction tag set represents a non-entity span, is an invalid label, and is skipped in the decoding process.
Further, in the step S9, a named entity set in the query set of the final test set is obtained, which specifically includes:
step S9.1: acquiring a set of queries in a test setEmbedding feature vector of sentence text spliced with tag information, and supporting set +.>Calculating according to the method of step S6 to obtain category prototype +.>
Step S9.2: collect queries in a test setThe embedded feature vectors and the category prototypes of the entity spans in the entity span set acquired in the step S8>Calculating distance and obtaining probability of entity class +.>By taking the entity class with highest probability +.>The corresponding labels are distributed for the entity spans in the entity span set, and the specific calculation steps are as follows:
in the method, in the process of the invention,representing the entity class of the final prediction of the entity span, argmax represents the maximum function.
Compared with the prior art, the invention has the following beneficial effects:
(1) According to the invention, the named entity recognition task is decomposed into the span detection task and the span classification task, and each model only executes one task in each stage, so that the task complexity is reduced, the model is easier to learn in a few-sample scene, and the performance of the model in the few-sample scene is improved.
(2) The invention adds label information for sentence text in span detection stage to reduce entities under non-new field predicted by model, and reduce false positive problem.
(3) In the invention, a gating module is introduced in the span classification stage, and the original prototype is subjected to deviation correction by explicitly utilizing the label information, so that the prototype comprises global information of the label information and local information of the original prototype, the complete representation of the prototype on the entity class is enhanced, and the accuracy of model classification is improved.
Drawings
FIG. 1 is a flow chart of the structure of a named entity recognition model of the present invention;
FIG. 2 is a flow chart of the structure of a gating module in a named entity recognition model according to the present invention;
Detailed Description
Referring to fig. 1-2, the present invention provides the following technical solutions: a method for identifying a few-sample two-stage named entity based on gating deviation correction comprises the following steps:
step S1: acquiring a disclosed few-sample named entity identification data set, and constructing tag information according to entity types of the few-sample named entity identification data set; the method comprises the steps that a few sample named entity recognition data set is divided into a training set and a testing set, wherein the training set and the testing set are composed of a supporting set and a query set, and the supporting set and the query set are composed of sentence texts and labeled real labels;
the support set of the few-sample named entity recognition data set represents marked few data for training, and the query set represents data needing to be predicted; converting the constructed tag information into a corresponding natural language character set, for example: label setConversion to the natural language character set +.>
Step S2: defining a few-sample two-stage named entity recognition task formalization based on gating deviation correction; constructing a named entity recognition model, wherein the named entity recognition model comprises a span detection model and a span classification model, and the span detection model is formed by a feature encoder and a linear classification layer and is in a serial structure; the span classification model is formed by a characteristic encoder and a gating module in a serial structure; the gating module consists of tag gating and prototype gating;
step S2.1: defining a training set for training a model,/>Representing a support set in a training set, wherein the support set comprises N entity categories, and each entity category comprises K samples; />Representing a query set in the training set, wherein the query set is consistent with entity types in the support set, and the support set and the query set are respectively provided with a plurality of sentence texts +.>The composition, n is the number of characters, +.>Representing an ith character in sentence text;
step S2.2: in the prediction stage, a test set from a new field is definedRepresenting the support set in the test set, +.>Representing a set of queries in a test set; used in training set->On training model, use support set in test set +.>Query set in test set->Predicting;
step S2.3: defining a span boundary prediction tag set using a span detection modelThe span detection model allocates a label to each character in the text of the input sentence, and obtains an entity span set according to the labelThe method comprises the steps of carrying out a first treatment on the surface of the Wherein B represents the beginning of the multicharacter span, I represents the middle of the multicharacter span, O is the non-entity span,>representing the i-th entity span in the entity span set, S representing the entity span in the sentence text,
step S2.4: defining entity class tab sets using a span classification model ,/>Representing entity categories; the span classification model distributes an entity class for each entity span in the entity span set output by the span detection model>
Step S3: splicing the label information constructed in the step S1 to the sentence text of the support set in the training set in the step S1, inputting the spliced sentence text into a feature encoder of the span detection model, and obtaining embedded feature vectors of all characters in the spliced sentence text;
converting the entity class label set corresponding to the sentence text X into a corresponding natural language character setSplicing to the rear of the sentence text X to obtain a spliced sentence textThe method comprises the steps of carrying out a first treatment on the surface of the The feature encoder in the span detection model consists of a pre-trained language model BERT, and spliced sentence text +.>Inputting into a pre-training language model BERT, obtaining corresponding embedded feature vector +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing the 1 st character in sentence text, < >>Representing the nth character in the sentence text, +.>Representing charactersEmbedded feature vectors obtained via a pre-trained language model BERT +.>Representing character->An embedded feature vector is obtained through a pre-training language model BERT; />Representing the 1 st natural language character spliced after sentence text,/and>representing the Nth natural language character spliced after sentence text,/and>representing natural language characters ++>Embedded feature vectors obtained via a pre-trained language model BERT +.>Representing natural language characters ++>An embedded feature vector is obtained through a pre-training language model BERT;
step S4: inputting the embedded feature vectors obtained in the step S3 into a linear classification layer of a span detection model, predicting all entity spans according to the obtained embedded feature vectors by using a sequence labeling mode, and calculating span detection loss through a cross entropy loss functionThe loss is detected by the calculated span>Optimizing and updating parameters of the span detection model;
the specific process for calculating the span detection loss is as follows:
step S4.1: the step S3 is performed to obtainThe obtained embedded feature vectors of all characters are input into a linear classification layer of a span detection model to calculate the charactersTag set +.>The specific calculation steps are as follows:
in the method, in the process of the invention,representing character->Belongs to the tag collection->Probability of the middle label;softmaxrepresenting a normalization function;Wa weight matrix representing a linear classification layer,bbias term representing linear classification layer, +.>Representing an i-th character-embedded feature vector;
step S4.2: probability distribution to be predictedAnd character->Is->Input to cross entropy loss function, calculate span detection loss +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing span detection loss; />Representing character->Is a real tag of (1);
step S5: inputting the sentence text of the support set in the training set spliced in the step S3 into a feature encoder in a span classification model, and acquiring the entity span in the sentence text and the embedded feature vector of the tag information spliced in the sentence text; in the step, the feature encoder is composed of a pre-trained language model BERT, and a natural language character set corresponding to the tag information corresponding to the sentence text in the step S5 is spliced behind the sentence text, and the specific process is the same as that in the step S3, so that details are not repeated here;
step S6: the embedded feature vectors of the entity spans belonging to the same entity category in the step S5 are averaged to obtain an original prototype representing the entity category, then the original prototype and the embedded feature vectors of the tag information after supporting sentence text in a training set are input into a gating module in a span classification model, and the original prototype is rectified to obtain a rectified category prototype, which comprises the following specific steps:
step S6.1: calculating the representation of the entity span by averaging all character embedded feature vectors in the entity span obtained in step S5, wherein the specific calculation steps are as follows:
in the method, in the process of the invention,representing entity span->Is indicated by->Representing character->To->Is a collection of (3); />Representing the embedded feature vector of the kth character in the sentence text;
step S6.2: definition of belonging to entity classIs used to calculate the entity class +.>Original prototype +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing entity class->Is a primitive prototype of (a); />Representation belonging to entity category->The number of all entity spans;
step S6.3: category of entitiesOriginal prototype +.>And entity class->Is embedded with feature vector->The method comprises the steps of obtaining the reserved label information and the replaced label information through label gating, wherein the specific calculation steps are as follows:
in the method, in the process of the invention,representing entity class->Corresponding natural language character->Is embedded with feature vectors; />Representing entity categoriesIs a primitive prototype of (a); />Weight moment representing tag gatingAn array; />A bias term representing tag gating; />Representing a normalization function; />The weight which needs to be reserved for the label information is represented; />Information indicating that label information needs to be reserved; />Information indicating that the tag information needs to be replaced;
step S6.4: label information that would require replacementAnd original prototype->Inputting prototype gating to control the information of how many original prototypes need to be kept, wherein the specific calculation steps are as follows:
in the method, in the process of the invention,weight matrix representing prototype gating, +.>Bias terms representing prototype gating; />Representing a normalization function; />The weight of the original prototype needing to keep information is represented; />Information which indicates that the original prototype needs to be reserved;
step S6.5: the corrected category prototype is obtained by adding the original prototype and the information which needs to be reserved in the label information, and the specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing entity class->Correcting the class prototype;
step S7: calculating distance between embedded feature vectors of entity spans of query sets in training sets and the corrected category prototype obtained in step S6, distributing corresponding entity categories for the entity spans of the query sets according to the calculated distance, and calculating span classification loss through cross entropyOptimizing and updating parameters of the span classification model:
step S7.1: by calculating entity spansAnd entity class->Class prototype after correction->Distance acquisition entity span set belonging to entity category +.>The specific calculation steps are as follows:
in the method, in the process of the invention,indicating that the entity span belongs to the entity category->Probability of->Representing distance function>Representing except entity class->The category prototypes belong to the entity category label set C at will;
step S7.2: belonging entity span to entity classProbability of->And its real label->Input into the cross entropy loss function, calculate the span class loss +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing span divisionClass loss; />Representing a support set in a training set;
step S8: the method comprises the following specific steps of inputting label information of a query set in a test set into a span detection model, and predicting to obtain embedded feature vectors of all entity spans:
step S8.1: similar to step S3, a query set in the test set is obtainedEmbedding feature vectors of sentence texts of the spliced tag information into a linear classification layer of the span detection model;
step S8.2: the linear classification layer of the span detection model predicts a corresponding span boundary prediction tag set for each character of the input sentence textAnd decodes it; span detection model predicts label set for span boundary according to preset rule +.>Decoding to obtain an entity span;
the specific process for obtaining the entity span is as follows:
step S8.21: the entity span decoding sequence is decoded word by word from left to right according to sentence text;
step S8.22: when identifying the 'B' in the span boundary prediction label set, continuing to identify to the right, if identifying the 'O' or the 'B' in the span boundary prediction label set, namely, the sentence text corresponding to the 'B' to the 'I' corresponds to a complete entity span;
step S8.23: the 'O' mark in the span boundary prediction label set represents a non-entity span, is an invalid label and is skipped in the decoding process;
step S9: inputting the test set with the spliced label information into a feature encoder in a span classification model to obtain entity spans predicted by a span detection model in sentence texts of the query set and embedded feature vectors of the label information spliced in the sentence texts; inputting all original prototypes obtained by the support set in the test set through the method of the step S6 and the embedded feature vectors of all obtained label information into a gating module in a span classification model to obtain a class prototype subjected to deviation correction; calculating the distance between the entity span predicted by the span detection model and each category prototype, and distributing the entity category corresponding to the category prototype closest to the entity span to obtain a named entity set in a final query set;
the named entity set in the final query set is obtained, and the specific process is as follows:
step S9.1: similar to step S5, a query set in the test set is obtainedEmbedding feature vector of sentence text spliced with tag information, and supporting set +.>Calculating according to the method of step S6 to obtain category prototype +.>
Step S9.2: collect queries in a test setThe embedded feature vectors and the category prototypes of the entity spans in the entity span set acquired in the step S8>Calculating distance and obtaining probability of entity class +.>By taking the entity class with highest probability +.>The corresponding labels are distributed for the entity spans in the entity span set, and the specific calculation steps are as follows:
in the method, in the process of the invention,representing the entity class of the final prediction of the entity span, argmax represents the maximum function.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. The method for identifying the named entity in two stages by using a few samples based on the gate control deviation correction is characterized by comprising the following steps of:
step S1: acquiring a disclosed few-sample named entity identification data set, and constructing tag information according to entity types of the few-sample named entity identification data set; the method comprises the steps that a few sample named entity recognition data set is divided into a training set and a testing set, wherein the training set and the testing set are composed of a supporting set and a query set, and the supporting set and the query set are composed of sentence texts and labeled real labels;
step S2: defining a few-sample two-stage named entity recognition task formalization based on gating deviation correction; constructing a named entity recognition model, wherein the named entity recognition model comprises a span detection model and a span classification model, and the span detection model is formed by a feature encoder and a linear classification layer and is in a serial structure; the span classification model is formed by a characteristic encoder and a gating module in a serial structure; the gating module consists of tag gating and prototype gating;
step S3: splicing the label information constructed in the step S1 to the sentence text of the support set in the training set in the step S1, inputting the spliced sentence text into a feature encoder of the span detection model, and obtaining embedded feature vectors of all characters in the spliced sentence text;
step S4: inputting the embedded feature vectors obtained in the step S3 into a linear classification layer of a span detection model, predicting all entity spans according to the obtained embedded feature vectors by using a sequence labeling mode, and calculating span detection loss through a cross entropy loss functionThe loss is detected by the calculated span>Optimizing and updating parameters of the span detection model;
step S5: inputting the sentence text of the support set in the training set spliced in the step S3 into a feature encoder in a span classification model, and acquiring the entity span in the sentence text and the embedded feature vector of the tag information spliced in the sentence text;
step S6: averaging the embedded feature vectors of the entity spans belonging to the same entity category in the step S5 to obtain an original prototype representing the entity category, and inputting the original prototype and the embedded feature vectors of the tag information after supporting sentence text in a training set into a gating module in a span classification model to rectify the original prototype to obtain a rectified category prototype;
step S7: calculating distance between embedded feature vectors of entity spans of query sets in training sets and the corrected category prototype obtained in step S6, distributing corresponding entity categories for the entity spans of the query sets according to the calculated distance, and calculating span classification loss through cross entropyOptimizing and updating parameters of the span classification model;
step S8: the query set in the test set is spliced with label information and then input into a span detection model, and all entity spans are predicted to obtain embedded feature vectors;
step S9: inputting the test set with the spliced label information into a feature encoder in a span classification model to obtain entity spans predicted by a span detection model in sentence texts of the query set and embedded feature vectors of the label information spliced in the sentence texts; inputting all original prototypes obtained by the support set in the test set through the method of the step S6 and the embedded feature vectors of all obtained label information into a gating module in a span classification model to obtain a class prototype subjected to deviation correction; and calculating the distance between the entity span predicted by the span detection model and each category prototype, and distributing the entity category corresponding to the category prototype closest to the entity span to obtain a named entity set in the final query set.
2. The method for identifying a named entity in two stages with few samples based on gate correction according to claim 1, wherein the specific process of step S1 includes: converting the constructed tag information into a corresponding natural language character set; the support set of the few sample named entity recognition dataset represents the noted few data for training, and the query set represents the data that needs to be predicted.
3. The method for identifying the fewer-sample two-stage named entity based on the gating deviation according to claim 2, wherein the specific process of defining the fewer-sample two-stage named entity identification task formalization based on the gating deviation in step S2 is as follows:
step S2.1: defining a training set for training a model,/>Representing a support set in a training set, wherein the support set comprises N entity categories, and each entity category comprises K samples; />Representing a query set in the training set, wherein the query set is consistent with entity types in the support set, and the support set and the query set are all provided with a plurality of sentencesSub-textThe composition, n is the number of characters, +.>Representing an ith character in sentence text;
step S2.2: in the prediction stage, a test set from a new field is definedRepresenting the support set in the test set, +.>Representing a set of queries in a test set; used in training set->On training model, use support set in test set +.>Query set in test set->Predicting;
step S2.3: defining a span boundary prediction tag set using a span detection modelThe span detection model allocates a label to each character in the text of the input sentence, and obtains an entity span set according to the labelThe method comprises the steps of carrying out a first treatment on the surface of the Wherein B represents the beginning of the multi-character span, I represents the middle of the multi-character span, O is the non-entity span,representing the i-th entity span in the entity span set, S representing the entity span in the sentence text,
step S2.4: defining entity class tab sets using a span classification model ,/>Representing entity categories; the span classification model distributes an entity class for each entity span in the entity span set output by the span detection model>
4. The method for identifying a few-sample two-stage named entity based on gate bias correction of claim 3, wherein step S3 comprises: converting the entity class label set corresponding to the sentence text X into a corresponding natural language character setSplicing to the rear of the sentence text X to obtain a spliced sentence textThe method comprises the steps of carrying out a first treatment on the surface of the The feature encoder in the span detection model consists of a pre-trained language model BERT, and spliced sentence text +.>Inputting into a pre-training language model BERT, obtaining corresponding embedded feature vector +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing the 1 st character in sentence text, < >>Representing the nth character in the sentence text, +.>Representing character->Embedded feature vectors obtained via a pre-trained language model BERT +.>Representing character->An embedded feature vector is obtained through a pre-training language model BERT; />Representing the 1 st natural language character spliced after sentence text,/and>representing the Nth natural language character spliced after sentence text,/and>representing natural language characters ++>Embedded feature vectors obtained via a pre-trained language model BERT +.>Representing natural language characters ++>The embedded feature vectors are obtained through a pre-trained language model BERT.
5. The method for identifying a named entity with few samples and two stages based on gate correction according to claim 4, wherein the specific process of calculating span detection loss in step S4 is as follows:
step S4.1: the embedded feature vectors of all the characters obtained in the step S3 are input into a linear classification layer of the span detection model to calculate charactersTag set +.>The specific calculation steps are as follows:
in the method, in the process of the invention,representing character->Belongs to the tag collection->Probability of the middle label;softmaxrepresenting a normalization function;Wa weight matrix representing a linear classification layer,bbias term representing linear classification layer, +.>Representing an i-th character-embedded feature vector;
step S4.2: probability distribution to be predictedAnd character->Is->Input to cross entropy loss function, calculate span detection loss +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing span detection loss; />Representing character->Is a real tag of (a).
6. The method for identifying a few-sample two-stage named entity based on gating correction according to claim 5, wherein the specific process of step S6 is as follows:
step S6.1: calculating the representation of the entity span by averaging all character embedded feature vectors in the entity span obtained in step S5, wherein the specific calculation steps are as follows:
in the method, in the process of the invention,representing entity span->Is indicated by->Representing character->To->Is a collection of (3); />Representing the embedded feature vector of the kth character in the sentence text;
step S6.2: definition of belonging to entity classIs used to calculate the entity class +.>Original prototype +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing entity class->Is a primitive prototype of (a); />Representation belonging to entity category->The number of all entity spans;
step S6.3: category of entitiesOriginal prototype +.>And entity class->Is embedded with feature vector->The retention and replacement of the tag information are determined through tag gating, and the specific calculation steps are as follows:
in the method, in the process of the invention,representing entity class->Corresponding natural languageCharacter->Is embedded with feature vectors; />Representing entity class->Is a primitive prototype of (a); />A weight matrix representing tag gating; />A bias term representing tag gating; />Representing a normalization function; />The weight which needs to be reserved for the label information is represented; />Information indicating that label information needs to be reserved; />Information indicating that the tag information needs to be replaced;
step S6.4: label information that would require replacementAnd original prototype->Inputting information of an original prototype by prototype gating, wherein the specific calculation steps are as follows:
in the method, in the process of the invention,weight matrix representing prototype gating, +.>Bias terms representing prototype gating; />Representing a normalization function; />The weight of the original prototype needing to keep information is represented; />Information which indicates that the original prototype needs to be reserved;
step S6.5: the corrected category prototype is obtained by adding the original prototype and the information which needs to be reserved in the label information, and the specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing entity class->And (5) correcting the class prototype.
7. The method for identifying the named entity with few samples and two stages based on the gating correction according to claim 6, wherein the specific process of the step S7 is as follows:
step S7.1: by calculating entity spansAnd entity class->Class prototype after correction->Distance acquisition entity span set belonging to entity category +.>The specific calculation steps are as follows:
in the method, in the process of the invention,indicating that the entity span belongs to the entity category->Probability of->Representing distance function>Representing except entity class->The category prototypes belong to the entity category label set C at will;
step S7.2: belonging entity span to entity classProbability of->And its real label->Input into the cross entropy loss function, calculate the span class loss +.>The specific calculation steps are shown in the following formula:
in the method, in the process of the invention,representing span classification loss; />Representing the support set in the training set.
8. The method for identifying the named entity with few samples and two stages based on the gating correction according to claim 7, wherein the specific process of the step S8 is as follows:
step S8.1: acquiring a set of queries in a test setEmbedding feature vectors of sentence texts of the spliced tag information into a linear classification layer of the span detection model;
step S8.2: the linear classification layer of the span detection model predicts a corresponding span boundary prediction tag set for each character of the input sentence textAnd decodes it; span detection model pre-sets span boundaries according to preset rulesLabel set->Decoding is carried out to obtain the entity span.
9. The method for identifying the named entity with few samples and two stages based on the gating deviation correction according to claim 8, wherein the specific process of obtaining the entity span is as follows:
step S8.21: the entity span decoding sequence is decoded word by word from left to right according to sentence text;
step S8.22: when identifying the 'B' in the span boundary prediction label set, continuing to identify to the right, if identifying the 'O' or the 'B' in the span boundary prediction label set, namely, the sentence text corresponding to the 'B' to the 'I' corresponds to a complete entity span;
step S8.23: the "O" flag in the span boundary prediction tag set represents a non-entity span, is an invalid label, and is skipped in the decoding process.
10. The method for identifying named entity in two stages with few samples based on gate correction according to claim 9, wherein the named entity set in the query set of the final test set is obtained in step S9, and the specific process is as follows:
step S9.1: acquiring a set of queries in a test setEmbedding feature vector of sentence text spliced with tag information, and supporting set +.>Calculating according to the method of step S6 to obtain category prototype +.>
Step S9.2: collect queries in a test setThe embedded feature vectors and the category prototypes of the entity spans in the entity span set acquired in the step S8>Calculating distance and obtaining probability of entity class +.>By taking the entity class with highest probability +.>The corresponding labels are distributed for the entity spans in the entity span set, and the specific calculation steps are as follows:
in the method, in the process of the invention,representing the entity class of the final prediction of the entity span, argmax represents the maximum function.
CN202311386316.4A 2023-10-25 2023-10-25 Door control deviation correction-based few-sample two-stage named entity identification method Active CN117114004B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311386316.4A CN117114004B (en) 2023-10-25 2023-10-25 Door control deviation correction-based few-sample two-stage named entity identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311386316.4A CN117114004B (en) 2023-10-25 2023-10-25 Door control deviation correction-based few-sample two-stage named entity identification method

Publications (2)

Publication Number Publication Date
CN117114004A true CN117114004A (en) 2023-11-24
CN117114004B CN117114004B (en) 2024-01-16

Family

ID=88809641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311386316.4A Active CN117114004B (en) 2023-10-25 2023-10-25 Door control deviation correction-based few-sample two-stage named entity identification method

Country Status (1)

Country Link
CN (1) CN117114004B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347785A (en) * 2020-11-18 2021-02-09 湖南国发控股有限公司 Nested entity recognition system based on multitask learning
CN112541355A (en) * 2020-12-11 2021-03-23 华南理工大学 Few-sample named entity identification method and system with entity boundary class decoupling
WO2021068329A1 (en) * 2019-10-10 2021-04-15 平安科技(深圳)有限公司 Chinese named-entity recognition method, device, and computer-readable storage medium
CN114676700A (en) * 2022-03-18 2022-06-28 中国人民解放军国防科技大学 Small sample named entity recognition method based on mixed multi-prototype
CN116151256A (en) * 2023-01-04 2023-05-23 北京工业大学 Small sample named entity recognition method based on multitasking and prompt learning
CN116644755A (en) * 2023-07-27 2023-08-25 中国科学技术大学 Multi-task learning-based few-sample named entity recognition method, device and medium
WO2023178802A1 (en) * 2022-03-22 2023-09-28 平安科技(深圳)有限公司 Named entity recognition method and apparatus, device, and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021068329A1 (en) * 2019-10-10 2021-04-15 平安科技(深圳)有限公司 Chinese named-entity recognition method, device, and computer-readable storage medium
CN112347785A (en) * 2020-11-18 2021-02-09 湖南国发控股有限公司 Nested entity recognition system based on multitask learning
CN112541355A (en) * 2020-12-11 2021-03-23 华南理工大学 Few-sample named entity identification method and system with entity boundary class decoupling
CN114676700A (en) * 2022-03-18 2022-06-28 中国人民解放军国防科技大学 Small sample named entity recognition method based on mixed multi-prototype
WO2023178802A1 (en) * 2022-03-22 2023-09-28 平安科技(深圳)有限公司 Named entity recognition method and apparatus, device, and computer readable storage medium
CN116151256A (en) * 2023-01-04 2023-05-23 北京工业大学 Small sample named entity recognition method based on multitasking and prompt learning
CN116644755A (en) * 2023-07-27 2023-08-25 中国科学技术大学 Multi-task learning-based few-sample named entity recognition method, device and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
N. R. GAFUROV; I. A. BESSMERTNY; A. V. PLATONOV; E. A. POLESHCHUK; A. V. VASILIEV: "Named Entity Recognition Through Bidirectional LSTM In Natural Language Texts Obtained Through Audio Interfaces", 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT) *
张晓;李业刚;王栋;史树敏;: "基于ERNIE的命名实体识别", 智能计算机与应用, no. 03 *
陶源;彭艳兵;: "基于门控CNN-CRF的中文命名实体识别", 电子设计工程, no. 04 *

Also Published As

Publication number Publication date
CN117114004B (en) 2024-01-16

Similar Documents

Publication Publication Date Title
CN112541355B (en) Entity boundary type decoupling few-sample named entity recognition method and system
CN108664474B (en) Resume analysis method based on deep learning
CN110909820A (en) Image classification method and system based on self-supervision learning
CN113191148B (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN104966105A (en) Robust machine error retrieving method and system
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN111581368A (en) Intelligent expert recommendation-oriented user image drawing method based on convolutional neural network
CN113434688B (en) Data processing method and device for public opinion classification model training
CN115482418B (en) Semi-supervised model training method, system and application based on pseudo-negative labels
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN116597436A (en) Method and device for recognizing characters of nameplate of switch cabinet of power distribution room
CN116882402A (en) Multi-task-based electric power marketing small sample named entity identification method
CN111428502A (en) Named entity labeling method for military corpus
CN117390198A (en) Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field
CN113886602B (en) Domain knowledge base entity identification method based on multi-granularity cognition
CN117114004B (en) Door control deviation correction-based few-sample two-stage named entity identification method
CN114842301A (en) Semi-supervised training method of image annotation model
CN114510943A (en) Incremental named entity identification method based on pseudo sample playback
CN114298047A (en) Chinese named entity recognition method and system based on stroke volume and word vector
CN113297376A (en) Legal case risk point identification method and system based on meta-learning
CN112860903B (en) Remote supervision relation extraction method integrated with constraint information
CN117454987B (en) Mine event knowledge graph construction method and device based on event automatic extraction
CN113297845B (en) Resume block classification method based on multi-level bidirectional circulation neural network
CN117520551B (en) Automatic classification method and system for small sample text
CN116821349B (en) Literature analysis method and management system based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant