CN114861670A - Entity identification method, device and application for learning unknown label based on known label - Google Patents
Entity identification method, device and application for learning unknown label based on known label Download PDFInfo
- Publication number
- CN114861670A CN114861670A CN202210792170.2A CN202210792170A CN114861670A CN 114861670 A CN114861670 A CN 114861670A CN 202210792170 A CN202210792170 A CN 202210792170A CN 114861670 A CN114861670 A CN 114861670A
- Authority
- CN
- China
- Prior art keywords
- label
- training
- unknown
- class
- training model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 181
- 239000013598 vector Substances 0.000 claims abstract description 103
- 238000002372 labelling Methods 0.000 claims abstract description 19
- 230000002776 aggregation Effects 0.000 claims description 51
- 238000004220 aggregation Methods 0.000 claims description 51
- 238000013145 classification model Methods 0.000 claims description 21
- 230000008859 change Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 7
- 238000013526 transfer learning Methods 0.000 abstract description 3
- 210000002784 stomach Anatomy 0.000 description 21
- 238000013507 mapping Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 239000010985 leather Substances 0.000 description 11
- 230000001788 irregular Effects 0.000 description 10
- 210000004185 liver Anatomy 0.000 description 8
- 230000008719 thickening Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 208000031513 cyst Diseases 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 206010011732 Cyst Diseases 0.000 description 3
- 230000003416 augmentation Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 206010012601 diabetes mellitus Diseases 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 210000000056 organ Anatomy 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000002485 urinary effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The application provides an entity identification method, device and application based on known label learning unknown label, wherein predefined class samples of the existing label are processed in a first training model and a second training model, two classifiers are obtained through training, the new class of the undefined class samples of the unknown label is predicted by the two classifiers, the new class of the unknown label is deduced and identified through the existing characteristics of the existing label, the workload of data labeling on an entity identification task is further reduced, the requirement on the data volume of a target database can be reduced during transfer learning, and the requirements on deployment environment hardware and the database are reduced. The entity recognition model may have entity samples of known tags aggregated over vector space to derive entity samples of unknown tags for related tasks.
Description
Technical Field
The present application relates to the field of entity identification, and in particular, to an entity identification method, apparatus, and application for learning an unknown tag based on a known tag.
Background
Entity Recognition (NER) is an information extraction technique that recognizes predefined Entity types (person Name, organization, place Name, portrait label, etc.) in a text, which is a very important and fundamental problem in natural language processing. The recognition accuracy of the entity recognition model depends on the number of training samples, however, in a special application scenario, a large amount of training samples cannot be provided, for example, in the health image entity recognition applied in the medical field, there are cases where training categories and sample data are seriously insufficient. This is because the conventional entity identification method can only identify entities of predefined categories, and cannot automatically find potential new categories, so that a trained entity identification model still requires transfer learning of a large data volume when deployed in a new database, and the workload of data labeling is increased.
The prior art CN111563165B provides a sentence classification method based on anchor word positioning and training sentence augmentation, which adds the sentence with the worst recognition rate into an augmentation set, and uses a near-sense word to replace an anchor word to form a new sentence augmentation set, thereby improving the classification performance of each cycle on entity labels with poor recognition effect. CN113111180A provides a Chinese medical synonym clustering method based on a deep pre-training neural network, which needs aggregation operation of manually classifying synonyms and still needs a large amount of labor cost.
Disclosure of Invention
The embodiment of the application provides an entity identification method, device and application for learning unknown labels based on known labels, and the new unknown labels can be deduced and identified through the existing characteristics of the known labels, so that the workload of labeling on the entity identification task of the new unknown labels is reduced, and the requirements of deployment environment hardware and a database are reduced.
In a first aspect, an embodiment of the present application provides an entity identification method for learning an unknown tag based on a known tag, where the method includes: acquiring a first training model capable of identifying a category corresponding to a known label;
initializing the first training model to obtain a second training model;
labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
inputting the predefined class samples into the first training model to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label;
judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label;
and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
In a second aspect, an embodiment of the present application provides an entity identification apparatus for learning an unknown tag based on a known tag, including: the first training model acquisition unit is used for acquiring a first training model capable of identifying the category corresponding to the known label;
the second training model obtaining unit is used for initializing the first training model to obtain a second training model;
the labeling unit is used for labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
a known label training unit, configured to input the predefined class sample to the first training model, so as to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label; judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
the unknown label prediction unit is used for inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label; and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the entity identification method for learning an unknown tag based on a known tag.
In a fourth aspect, the present application provides a readable storage medium, in which a computer program is stored, where the computer program includes program codes for controlling a process to execute a process, where the process includes the entity identification method for learning an unknown tag based on a known tag.
The main contributions and innovation points of the invention are as follows:
the entity identification model provided by the embodiment of the application can deduce and identify the type of a new unknown label through the existing characteristics of the existing label, so that the workload of data labeling on an entity identification task is reduced, the requirement on the data volume of a target database can be reduced during transfer learning, and the requirements on deployment environment hardware and the database are reduced. The entity recognition model may have entity samples of known tags aggregated over vector space to derive entity samples of unknown tags for related tasks.
The patent can be used for exploring potential new categories in entity recognition, such as the new potential health portrait label can be mined through the existing health portrait label in the medical field.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of categories of raw network prediction tags;
FIG. 2 is recognition logic of a first training model;
FIG. 3 is a schematic diagram of training and predicting a classifier using predefined class samples and undefined class samples;
FIG. 4 is the identification logic of the joint classification model;
FIG. 5 is a schematic diagram showing the comparison between the joint classification model and the Kmeans algorithm in the present embodiment;
FIG. 6 is a graph of the number of unknown tags versus identification accuracy;
fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the methods may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Example one
Before introducing the present solution, a prototype network for entity recognition is briefly described:
as shown in fig. 1, the prototype network creates a prototype for each class to represent, for example, three prototypes c are formed in the prototype network corresponding to three classes. When the input vectors need to be classified, the center of the input vector x in the vector space mapping mean value of the prototype network is calculated to be the prototype c, and the method can reduce overfitting in the learning of few samples and can also be used for finding new classification categories in the learning of no samples.
As shown in fig. 1 (a), the input vector x is located at the center of the mean value of the vector space mapping of the prototype network at the position of the white dot to form a prototype, and the prototypes c1, c2 and c3 all correspond to a plurality of other different input vectors, and the prototype regions where the input vectors are located are used to determine the corresponding categories of the prototypes, so that overfitting in the low-sample learning can be reduced; in fig. 1 (b), the input vector x is located at the center of the mean value of the vector space mapping of the prototype network at the position of the white dot to form a prototype, and the prototypes c1, c2, and c3 all correspond to other individual input vectors, so that new classes can be found in the no-sample learning.
The scheme improves the entity recognition logic based on the classification logic of the prototype network so as to realize an entity learning method capable of learning the unknown label based on the known label, and provides an entity recognition method for learning the unknown label based on the known label, which comprises the following steps: acquiring a first training model capable of identifying a category corresponding to a known label;
initializing the first training model to obtain a second training model;
labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
inputting the predefined class samples into the first training model to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label;
judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label;
and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
In some embodiments, after the new category is obtained, the method further comprises the steps of:
if the unknown label generates a new class, the unknown label and the new class are used for re-marking the predefined class sample to obtain a defined class sample, and the defined class sample is used for training a second training model to obtain a combined classification model.
In the step of obtaining the first training model capable of identifying the class corresponding to the known label, the first training model completes pre-training through the training data marked with the existing label in the existing database, and can identify the class of the existing label.
In a large application scenario of the present solution, the first training model completes pre-training by using a public medical database, and can identify categories of traditional medical entities, where the categories include but are not limited to: diseases, drugs, symptoms, tissues and organs, surgical operations, examination items, departments, and the like.
The first training model may be a BERT prototype network, the structure of the first training model is shown in fig. 1, the first training model after training obtains a first mapping function, and the process of training the first training model to obtain the first mapping function is as follows:
defining a phrase with a known label to obtain an input sample x after word segmentation, wherein a first mapping function A of the input sample x embedded into a vector space is h, and a prototype of a category y is hp y Input sample x and prototypes of known classesp y D, the prototype of the known class c isp c Training the loss function L using the following formula to obtain a trained first mapping function:
as shown in fig. 2, the first training model that can identify the class to which the known tag corresponds can identify terms under the corresponding class. For example, after a short sentence of irregular thickening of the whole stomach wall is considered that the leather stomach is possibly input into the first training model, the category of the stomach wall can be identified as an organ, the category of the leather stomach is a disease, and samples in different categories can be aggregated in a vector space.
In the step of initializing the first training model to obtain a second training model, the second training model is the untrained first training model, and correspondingly, the second mapping function of the second training model is the original mapping function.
In the embodiment of the scheme, the first training model and the second training model adopt the same original network and are both selected as BERT networks.
In the step of labeling each training datum to obtain a predefined class sample labeling a known label and an undefined class sample labeling an unknown label, the known label of the predefined class sample and the identification label of the first training model are the same, so that the known label can be identified by the first training model to obtain the known class.
In a common entity labeling method, such as the biees labeling method, the known label is "entity", and the unknown label is "non-entity", which has the advantage of identifying the unknown label to improve the prediction accuracy of the known label. Exemplary, in the sentence, "the whole stomach wall is irregularly thickened, and the leather stomach is considered as possible", the "stomach wall" and the "leather stomach" are solid, and the "irregular thickening" is non-solid, and the accuracy of the identification of the leather stomach can be improved by identifying the content of the non-solid.
It should be noted that the present scheme is to label the same training data to obtain predefined class samples and undefined class samples, so as to predict and learn unknown labels through known labels.
After "inputting the predefined class samples to the first training model, obtaining a first training vector corresponding to each known label; in the step of inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label, in order to implement aggregation judgment, at least two known labels of the same class are input into the predefined class samples in the first training model and the second training model, one predefined class sample may contain a plurality of known labels of the same class, or a plurality of predefined class samples, and if the predefined class samples are a plurality of predefined class samples, the known labels of the predefined class samples are collected in the same vector space.
Since the first training model can identify the class of the known label, the known labels belonging to the same class can be clustered based on the class, and the mapping results of the first mapping function on the corresponding vector space are aggregated by taking the prototype of the class as the center. The second training model is in an initial state, so that the mapping results of the second mapping function on the corresponding vector space are scattered, and the aggregation result can be judged according to the mapping results of the first training model and the second training model.
In the step of determining the aggregation result of the known labels based on the distance change between the first training vectors of the two known labels and the first original vectors, the distance between the two first training vectors is calculated as a first distance, the distance between the two first original vectors is calculated as a second distance, the difference value between the first distance and the second distance is compared, and if the difference value is greater than an aggregation setting threshold value, the corresponding two known labels are determined to be aggregated.
Illustratively, as shown in fig. 3, the predefined category sample S1 is "irregular thickening of the full stomach wall, considering leather stomach possibilities", where "stomach wall" is known label 1 and "leather stomach" is known label 2; the predefined class sample S2 is "liver multi-low density nodules," wherein "liver" is a known label 3 and "cyst" is a known label 4, a first distance of a first training vector generated by the known label 3 and a first training vector generated by the known label 1 is calculated, a second distance of the first original vector generated by the known label 3 and the first training vector generated by the known label 1 is calculated, and the known label 1 and the known label 3 are known to be aggregated based on the first distance and the second distance.
If the two known tags are not aggregated, the confidence of the known tags has no referential meaning, so the scheme needs to judge whether the two known tags are aggregated or not at first.
In the step of training the two classifiers based on the aggregation result and calculating the confidence degrees of the known labels in the same class to obtain the trained classification model, if the aggregation result shows that aggregation occurs, pairwise combination is performed on the known labels in the same class to calculate the confidence degrees.
The formula for calculating confidence is as follows:
b ij is the confidence level of two known tags under the same category,h i andh j is the first original vector of different known tags,anda first training vector of different known labels, W being a weight, b being a weight coefficient.
The loss function for the two classifiers is:
where N is the number of known tags,y ij are a class of known tags.
Illustratively, as shown in fig. 3, the predefined category sample S1 is "irregular thickening of the full stomach wall, considering leather stomach possibilities", where "stomach wall" is known label 1 and "leather stomach" is known label 2; the predefined class sample S2 is "liver multiple low density nodules," where "liver" is known label 3, "cyst" is known label 4, "patient is currently with normal urinary system 5 with a history of diabetes 6," where "urinary system 5" is known label 5 and "diabetes 6" is known label 6. After the calculation of the two classifications, the confidence between the known label 1 and the known label 5 is 0, the confidence between the known label 1 and the known label 5 is 1, the confidence between the known label 2 and the known label 3 is 0, the confidence between the known label 2 and the known label 6 is 1, the confidence between the known label 3 and the known label 4 is 0, the confidence between the known label 3 and the known label 5 is 1, the confidence between the known label 4 and the known label 5 is 0, the confidence between the known label 4 and the known label 6 is 1, and the confidence between the known label 5 and the known label 6 is 0.
The two classifiers trained by the method have certain relevance for aggregation of undefined class samples in a vector space, the undefined classes close to the characteristics of known labels of predefined class samples exist, and certain aggregation trend exists in the vector space, so that the method can predict the classes of the unknown labels through the two classifiers obtained by the training.
In the step of determining the aggregation result of the predefined class samples based on the distance change between the two second training vectors and the second original vector, the distance between the two second training vectors is calculated to be a third distance, the distance between the two second original vectors is calculated to be a fourth distance, the difference between the third distance and the fourth distance is compared, and if the difference is greater than an aggregation setting threshold, it is determined that the corresponding two unknown labels are aggregated.
Illustratively, as shown in fig. 3, the undefined class sample S1 is "irregular thickening of the entire stomach wall, considering leather stomach possibilities", where "irregular" is unknown label 7 and "thickened" is unknown label 8; the predefined class sample S2 is "liver multiple low density nodules," where "low density" is the unknown label 9 and "nodules" are the unknown label 10. And calculating a third distance between a second training vector generated by the unknown label 7 and a second training vector generated by the unknown label 9, calculating a fourth distance between a second original vector generated by the unknown label 7 and a second training vector generated by the unknown label 9, and knowing that the unknown label 7 and the unknown label 9 are aggregated based on the first distance and the second distance.
In the step of inputting two unknown labels into a two-classifier to output prediction confidence degrees based on the aggregation result, if the aggregation result shows that the unknown labels are aggregated, any two unknown labels are selected to be input into the two-classifier to output the prediction confidence degrees, and if the prediction confidence degrees are larger than a set threshold value, the two unknown labels are considered to be in the same class; and if the prediction confidence is smaller than a set threshold, the two unknown labels are not considered to belong to the same category.
Illustratively, two unknown tags are inputx u Andx v predicting confidence by the above two classifiersb uv Setting a threshold value gamma;
when in useb uv >Gamma, thenx u Andx v belonging to the same category, which is a newly discovered undefined categoryOSuch as "irregular" and "low density" areO 1 The "thickened" and "nodules" areO 2 。
When in useb uv <Gamma, thenx u Andx v do not belong to any category and do not create new categories such as "irregular" and "thickened".
Illustratively, and still as shown in fig. 3, sample S1 for the undefined class is "irregular thickening of the entire stomach wall, considering leather stomach potential", where "irregular" is unknown label 7 and "thickened" is unknown label 8; the predefined class sample S2 is "liver multiple low density nodules," where "low density" is unknown label 9, "nodules" is unknown label 10, and the undefined class sample S2 is "patient is current urinary system normal 11 with a history of diabetes," where "normal" is unknown label 11. And (3) calculating the confidence coefficient between the unknown labels by using a two-classifier to obtain a result: the confidence between the unknown label 7 and the unknown label 9 is 0.7, the confidence between the unknown label 8 and the unknown label 10 is 0.8, the confidence between the unknown label 7 and the unknown label 11 is 0.9, the confidence between the unknown label 9 and the unknown label 11 is 0.8, the confidence between the unknown label 7 and the unknown label 8 is 0.1, the confidence between the unknown label 8 and the unknown label 9 is 0.2, the confidence between the unknown label 9 and the unknown label 10 is 0.3, the confidence between the unknown label 8 and the unknown label 11 is 0.1, if a threshold value is set to 0.5, the unknown label 7, the unknown label 9 and the unknown label 11 are known to be clustered, and a new class O1 can be defined; clustering occurs between the unknown label 8 and the unknown label 10 and a new category O2 can be defined.
According to the scheme, the number and the identification accuracy of the unknown tags can be adjusted by adjusting the set threshold, and the corresponding relation between the number and the identification accuracy of the unknown tags is shown in fig. 6.
After the new class is obtained, a second training model for the unknown label can be further trained and learned based on the new class.
Specifically, in the step of training the second training model by using the defined class samples to obtain the combined classification model, the prototype of the combined classification model includes the prototype of the first training modelP c And prototypes of second training modelsP O The prototype of the first training model corresponds to the class of the existing label, and the prototype of the second training model corresponds to the class of the unknown label, so that the class is expanded.
As shown in the following equation:
where P is a prototype of the joint classification model,P c is a prototype of the first training model,P O is a prototype of the second training model.
As shown in fig. 4, when "liver is rich and low-density nodules and cysts are likely" are input to the joint classification model, the "liver" is classified as an organ, the "cyst" is classified as a "disease", the "low-density" is classified as O1, and the "nodule" is classified as O2.
The label clustering result of the combined classification model is compared with the Kmeans algorithm under the same condition, the comparison result shows that the label clustering result of the combined classification model of the scheme is better in performance, and the result is shown in figure 5.
Example two
Based on the same concept, the application also provides an entity identification device for learning unknown labels based on known labels, which comprises:
the first training model acquisition unit is used for acquiring a first training model capable of identifying the category corresponding to the known label;
the second training model obtaining unit is used for initializing the first training model to obtain a second training model;
the labeling unit is used for labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
a known label training unit, configured to input the predefined class sample to the first training model, so as to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label; judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
the unknown label prediction unit is used for inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label; and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
EXAMPLE III
The present embodiment further provides an electronic device, referring to fig. 7, comprising a memory 404 and a processor 402, wherein the memory 404 stores a computer program, and the processor 402 is configured to execute the computer program to perform the steps in any of the embodiments of the entity identification method for learning an unknown tag based on a known tag.
Specifically, the processor 402 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
The processor 402 may implement any of the above embodiments of entity identification methods for learning unknown tags based on known tags by reading and executing computer program instructions stored in the memory 404.
Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402, and the input/output device 408 is connected to the processor 402.
The transmitting device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 406 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input-output device 408 is used to input or output information. In this embodiment, the input information may be a predefined class sample of a known tag, and the like, and the output information may be a class classification result of an unknown tag of an undefined class sample, and the like.
Optionally, in this embodiment, the processor 402 may be configured to execute the following steps by a computer program:
acquiring a first training model capable of identifying a category corresponding to a known label;
initializing the first training model to obtain a second training model;
labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
inputting the predefined class samples into the first training model to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label;
judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label;
and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiment and optional implementation manners, and details of this embodiment are not described herein again.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.
Claims (10)
1. An entity identification method for learning an unknown label based on a known label is characterized by comprising the following steps: acquiring a first training model capable of identifying a category corresponding to a known label;
initializing the first training model to obtain a second training model;
labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
inputting the predefined class samples into the first training model to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label;
judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label;
and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
2. The method according to claim 1, wherein if the unknown label generates a new class, the unknown label and the new class are used to label the predefined class sample again to obtain a defined class sample, and the defined class sample is used to train a second training model to obtain a joint classification model.
3. The method as claimed in claim 1, wherein in the step of determining the aggregation result of the known tags based on the distance change between the first training vector and the first original vector of the two known tags, the distance between the two first training vectors is calculated as a first distance, the distance between the two first original vectors is calculated as a second distance, the difference between the first distance and the second distance is compared, and if the difference is greater than an aggregation setting threshold, it is determined that the two corresponding known tags are aggregated.
4. The entity identification method based on known label learning unknown label of claim 1, characterized in that in the step of training two classifiers based on the aggregation result and calculating confidence of known labels of the same class to obtain the trained classification model, if the aggregation result shows aggregation, the confidence is calculated by pairwise combination of the known labels of the same class.
5. The method as claimed in claim 1, wherein in the step of determining the aggregation result of the predefined class samples based on the distance between two second training vectors and the second original vector, the distance between two second training vectors is calculated as a third distance, the distance between two second original vectors is calculated as a fourth distance, the difference between the third distance and the fourth distance is compared, and if the difference is greater than the aggregation setting threshold, it is determined that the two corresponding unknown tags are aggregated.
6. The entity identification method based on known tag learning unknown tag of claim 1, wherein in the step of inputting two unknown tags into two classifiers to output prediction confidence based on the aggregation result, if the aggregation result shows that the unknown tags are aggregated, any two unknown tags are selected and input into two classifiers to output prediction confidence.
7. The method according to claim 1, wherein in the step of "training the second training model with the defined class samples to obtain the combined classification model", the prototypes of the combined classification model include a prototype of the first training model and a prototype of the second training model, wherein the prototype of the first training model corresponds to the class of the existing label, and the prototype of the second training model corresponds to the class of the unknown label.
8. An entity recognition apparatus for learning an unknown tag based on a known tag, comprising:
the first training model acquisition unit is used for acquiring a first training model capable of identifying the category corresponding to the known label;
the second training model obtaining unit is used for initializing the first training model to obtain a second training model;
the labeling unit is used for labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
a known label training unit, configured to input the predefined class sample to the first training model, so as to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label; judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
the unknown label prediction unit is used for inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label; and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method for entity identification based on learning an unknown tag according to any of claims 1 to 7.
10. A readable storage medium, characterized in that a computer program is stored therein, the computer program comprising program code for controlling a process to execute a process, the process comprising the method for entity identification for learning an unknown tag based on known tags according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210792170.2A CN114861670A (en) | 2022-07-07 | 2022-07-07 | Entity identification method, device and application for learning unknown label based on known label |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210792170.2A CN114861670A (en) | 2022-07-07 | 2022-07-07 | Entity identification method, device and application for learning unknown label based on known label |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114861670A true CN114861670A (en) | 2022-08-05 |
Family
ID=82626814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210792170.2A Pending CN114861670A (en) | 2022-07-07 | 2022-07-07 | Entity identification method, device and application for learning unknown label based on known label |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114861670A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113468869A (en) * | 2021-07-12 | 2021-10-01 | 北京有竹居网络技术有限公司 | Semantic analysis model generation method, semantic analysis device and semantic analysis equipment |
CN116232443A (en) * | 2023-05-09 | 2023-06-06 | 中国科学技术大学 | Environment WiFi backscattering system and method based on single commercial AP receiver |
CN116433977A (en) * | 2023-04-18 | 2023-07-14 | 国网智能电网研究院有限公司 | Unknown class image classification method, unknown class image classification device, computer equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516718A (en) * | 2019-08-12 | 2019-11-29 | 西北工业大学 | The zero sample learning method based on depth embedded space |
US20200160177A1 (en) * | 2018-11-16 | 2020-05-21 | Royal Bank Of Canada | System and method for a convolutional neural network for multi-label classification with partial annotations |
CN111222340A (en) * | 2020-01-15 | 2020-06-02 | 东华大学 | Breast electronic medical record entity recognition system based on multi-standard active learning |
CA3063580A1 (en) * | 2018-12-17 | 2020-06-17 | 10353744 Canada Ltd. | Classifier training method and apparatus, electronic device and computer readable medium |
WO2020193966A1 (en) * | 2019-03-26 | 2020-10-01 | Benevolentai Technology Limited | Name entity recognition with deep learning |
CN111932130A (en) * | 2020-08-12 | 2020-11-13 | 上海冰鉴信息科技有限公司 | Service type identification method and device |
CN113127605A (en) * | 2021-06-17 | 2021-07-16 | 明品云(北京)数据科技有限公司 | Method and system for establishing target recognition model, electronic equipment and medium |
CN113191148A (en) * | 2021-04-30 | 2021-07-30 | 西安理工大学 | Rail transit entity identification method based on semi-supervised learning and clustering |
CN113259331A (en) * | 2021-04-29 | 2021-08-13 | 上海电力大学 | Unknown abnormal flow online detection method and system based on incremental learning |
CN113255355A (en) * | 2021-06-08 | 2021-08-13 | 北京明略软件***有限公司 | Entity identification method and device in text information, electronic equipment and storage medium |
CN113298253A (en) * | 2021-06-03 | 2021-08-24 | 清华大学 | Model training method, recognition method and device for named entity recognition |
CN113672696A (en) * | 2021-07-08 | 2021-11-19 | 浙江一山智慧医疗研究有限公司 | Intention recognition method, device, computer equipment and computer readable storage medium |
CN114241260A (en) * | 2021-12-14 | 2022-03-25 | 四川大学 | Open set target detection and identification method based on deep neural network |
-
2022
- 2022-07-07 CN CN202210792170.2A patent/CN114861670A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200160177A1 (en) * | 2018-11-16 | 2020-05-21 | Royal Bank Of Canada | System and method for a convolutional neural network for multi-label classification with partial annotations |
CA3063580A1 (en) * | 2018-12-17 | 2020-06-17 | 10353744 Canada Ltd. | Classifier training method and apparatus, electronic device and computer readable medium |
WO2020193966A1 (en) * | 2019-03-26 | 2020-10-01 | Benevolentai Technology Limited | Name entity recognition with deep learning |
CN110516718A (en) * | 2019-08-12 | 2019-11-29 | 西北工业大学 | The zero sample learning method based on depth embedded space |
CN111222340A (en) * | 2020-01-15 | 2020-06-02 | 东华大学 | Breast electronic medical record entity recognition system based on multi-standard active learning |
CN111932130A (en) * | 2020-08-12 | 2020-11-13 | 上海冰鉴信息科技有限公司 | Service type identification method and device |
CN113259331A (en) * | 2021-04-29 | 2021-08-13 | 上海电力大学 | Unknown abnormal flow online detection method and system based on incremental learning |
CN113191148A (en) * | 2021-04-30 | 2021-07-30 | 西安理工大学 | Rail transit entity identification method based on semi-supervised learning and clustering |
CN113298253A (en) * | 2021-06-03 | 2021-08-24 | 清华大学 | Model training method, recognition method and device for named entity recognition |
CN113255355A (en) * | 2021-06-08 | 2021-08-13 | 北京明略软件***有限公司 | Entity identification method and device in text information, electronic equipment and storage medium |
CN113127605A (en) * | 2021-06-17 | 2021-07-16 | 明品云(北京)数据科技有限公司 | Method and system for establishing target recognition model, electronic equipment and medium |
CN113672696A (en) * | 2021-07-08 | 2021-11-19 | 浙江一山智慧医疗研究有限公司 | Intention recognition method, device, computer equipment and computer readable storage medium |
CN114241260A (en) * | 2021-12-14 | 2022-03-25 | 四川大学 | Open set target detection and identification method based on deep neural network |
Non-Patent Citations (2)
Title |
---|
徐树奎等: "基于层级式Bi-LSTM-CRF模型的军事目标实体识别方法", 《信息化研究》 * |
陈毅松等: "基于支持向量机的渐进直推式分类学习算法", 《软件学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113468869A (en) * | 2021-07-12 | 2021-10-01 | 北京有竹居网络技术有限公司 | Semantic analysis model generation method, semantic analysis device and semantic analysis equipment |
CN116433977A (en) * | 2023-04-18 | 2023-07-14 | 国网智能电网研究院有限公司 | Unknown class image classification method, unknown class image classification device, computer equipment and storage medium |
CN116433977B (en) * | 2023-04-18 | 2023-12-05 | 国网智能电网研究院有限公司 | Unknown class image classification method, unknown class image classification device, computer equipment and storage medium |
CN116232443A (en) * | 2023-05-09 | 2023-06-06 | 中国科学技术大学 | Environment WiFi backscattering system and method based on single commercial AP receiver |
CN116232443B (en) * | 2023-05-09 | 2023-08-29 | 中国科学技术大学 | Environment WiFi backscattering system and method based on single commercial AP receiver |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108399228B (en) | Article classification method and device, computer equipment and storage medium | |
CN114861670A (en) | Entity identification method, device and application for learning unknown label based on known label | |
CN109243618B (en) | Medical model construction method, disease label construction method and intelligent device | |
Chong et al. | Simultaneous image classification and annotation | |
CN108536800B (en) | Text classification method, system, computer device and storage medium | |
WO2019169719A1 (en) | Automatic abstract extraction method and apparatus, and computer device and storage medium | |
US8019699B2 (en) | Machine learning system | |
US11698930B2 (en) | Techniques for determining artificial neural network topologies | |
WO2021139262A1 (en) | Document mesh term aggregation method and apparatus, computer device, and readable storage medium | |
CN110930417A (en) | Training method and device of image segmentation model, and image segmentation method and device | |
CN108520041B (en) | Industry classification method and system of text, computer equipment and storage medium | |
US7664328B2 (en) | Joint classification and subtype discovery in tumor diagnosis by gene expression profiling | |
US20230075100A1 (en) | Adversarial autoencoder architecture for methods of graph to sequence models | |
WO2020224106A1 (en) | Text classification method and system based on neural network, and computer device | |
WO2008137368A1 (en) | Web page analysis using multiple graphs | |
CN109948735B (en) | Multi-label classification method, system, device and storage medium | |
CN112507039A (en) | Text understanding method based on external knowledge embedding | |
CN113806582B (en) | Image retrieval method, image retrieval device, electronic equipment and storage medium | |
WO2019123451A1 (en) | System and method for use in training machine learning utilities | |
CN113887580A (en) | Contrast type open set identification method and device considering multi-granularity correlation | |
CN110808095B (en) | Diagnostic result recognition method, model training method, computer equipment and storage medium | |
CN114943017A (en) | Cross-modal retrieval method based on similarity zero sample hash | |
CN114238746A (en) | Cross-modal retrieval method, device, equipment and storage medium | |
CN111783088B (en) | Malicious code family clustering method and device and computer equipment | |
CN114328942A (en) | Relationship extraction method, apparatus, device, storage medium and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220805 |