CN114861670A - Entity identification method, device and application for learning unknown label based on known label - Google Patents

Entity identification method, device and application for learning unknown label based on known label Download PDF

Info

Publication number
CN114861670A
CN114861670A CN202210792170.2A CN202210792170A CN114861670A CN 114861670 A CN114861670 A CN 114861670A CN 202210792170 A CN202210792170 A CN 202210792170A CN 114861670 A CN114861670 A CN 114861670A
Authority
CN
China
Prior art keywords
label
training
unknown
class
training model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210792170.2A
Other languages
Chinese (zh)
Inventor
葛航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Yishan Intelligent Medical Research Co ltd
Original Assignee
Zhejiang Yishan Intelligent Medical Research Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Yishan Intelligent Medical Research Co ltd filed Critical Zhejiang Yishan Intelligent Medical Research Co ltd
Priority to CN202210792170.2A priority Critical patent/CN114861670A/en
Publication of CN114861670A publication Critical patent/CN114861670A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an entity identification method, device and application based on known label learning unknown label, wherein predefined class samples of the existing label are processed in a first training model and a second training model, two classifiers are obtained through training, the new class of the undefined class samples of the unknown label is predicted by the two classifiers, the new class of the unknown label is deduced and identified through the existing characteristics of the existing label, the workload of data labeling on an entity identification task is further reduced, the requirement on the data volume of a target database can be reduced during transfer learning, and the requirements on deployment environment hardware and the database are reduced. The entity recognition model may have entity samples of known tags aggregated over vector space to derive entity samples of unknown tags for related tasks.

Description

Entity identification method, device and application for learning unknown label based on known label
Technical Field
The present application relates to the field of entity identification, and in particular, to an entity identification method, apparatus, and application for learning an unknown tag based on a known tag.
Background
Entity Recognition (NER) is an information extraction technique that recognizes predefined Entity types (person Name, organization, place Name, portrait label, etc.) in a text, which is a very important and fundamental problem in natural language processing. The recognition accuracy of the entity recognition model depends on the number of training samples, however, in a special application scenario, a large amount of training samples cannot be provided, for example, in the health image entity recognition applied in the medical field, there are cases where training categories and sample data are seriously insufficient. This is because the conventional entity identification method can only identify entities of predefined categories, and cannot automatically find potential new categories, so that a trained entity identification model still requires transfer learning of a large data volume when deployed in a new database, and the workload of data labeling is increased.
The prior art CN111563165B provides a sentence classification method based on anchor word positioning and training sentence augmentation, which adds the sentence with the worst recognition rate into an augmentation set, and uses a near-sense word to replace an anchor word to form a new sentence augmentation set, thereby improving the classification performance of each cycle on entity labels with poor recognition effect. CN113111180A provides a Chinese medical synonym clustering method based on a deep pre-training neural network, which needs aggregation operation of manually classifying synonyms and still needs a large amount of labor cost.
Disclosure of Invention
The embodiment of the application provides an entity identification method, device and application for learning unknown labels based on known labels, and the new unknown labels can be deduced and identified through the existing characteristics of the known labels, so that the workload of labeling on the entity identification task of the new unknown labels is reduced, and the requirements of deployment environment hardware and a database are reduced.
In a first aspect, an embodiment of the present application provides an entity identification method for learning an unknown tag based on a known tag, where the method includes: acquiring a first training model capable of identifying a category corresponding to a known label;
initializing the first training model to obtain a second training model;
labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
inputting the predefined class samples into the first training model to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label;
judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label;
and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
In a second aspect, an embodiment of the present application provides an entity identification apparatus for learning an unknown tag based on a known tag, including: the first training model acquisition unit is used for acquiring a first training model capable of identifying the category corresponding to the known label;
the second training model obtaining unit is used for initializing the first training model to obtain a second training model;
the labeling unit is used for labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
a known label training unit, configured to input the predefined class sample to the first training model, so as to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label; judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
the unknown label prediction unit is used for inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label; and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the entity identification method for learning an unknown tag based on a known tag.
In a fourth aspect, the present application provides a readable storage medium, in which a computer program is stored, where the computer program includes program codes for controlling a process to execute a process, where the process includes the entity identification method for learning an unknown tag based on a known tag.
The main contributions and innovation points of the invention are as follows:
the entity identification model provided by the embodiment of the application can deduce and identify the type of a new unknown label through the existing characteristics of the existing label, so that the workload of data labeling on an entity identification task is reduced, the requirement on the data volume of a target database can be reduced during transfer learning, and the requirements on deployment environment hardware and the database are reduced. The entity recognition model may have entity samples of known tags aggregated over vector space to derive entity samples of unknown tags for related tasks.
The patent can be used for exploring potential new categories in entity recognition, such as the new potential health portrait label can be mined through the existing health portrait label in the medical field.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of categories of raw network prediction tags;
FIG. 2 is recognition logic of a first training model;
FIG. 3 is a schematic diagram of training and predicting a classifier using predefined class samples and undefined class samples;
FIG. 4 is the identification logic of the joint classification model;
FIG. 5 is a schematic diagram showing the comparison between the joint classification model and the Kmeans algorithm in the present embodiment;
FIG. 6 is a graph of the number of unknown tags versus identification accuracy;
fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the methods may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Example one
Before introducing the present solution, a prototype network for entity recognition is briefly described:
as shown in fig. 1, the prototype network creates a prototype for each class to represent, for example, three prototypes c are formed in the prototype network corresponding to three classes. When the input vectors need to be classified, the center of the input vector x in the vector space mapping mean value of the prototype network is calculated to be the prototype c, and the method can reduce overfitting in the learning of few samples and can also be used for finding new classification categories in the learning of no samples.
As shown in fig. 1 (a), the input vector x is located at the center of the mean value of the vector space mapping of the prototype network at the position of the white dot to form a prototype, and the prototypes c1, c2 and c3 all correspond to a plurality of other different input vectors, and the prototype regions where the input vectors are located are used to determine the corresponding categories of the prototypes, so that overfitting in the low-sample learning can be reduced; in fig. 1 (b), the input vector x is located at the center of the mean value of the vector space mapping of the prototype network at the position of the white dot to form a prototype, and the prototypes c1, c2, and c3 all correspond to other individual input vectors, so that new classes can be found in the no-sample learning.
The scheme improves the entity recognition logic based on the classification logic of the prototype network so as to realize an entity learning method capable of learning the unknown label based on the known label, and provides an entity recognition method for learning the unknown label based on the known label, which comprises the following steps: acquiring a first training model capable of identifying a category corresponding to a known label;
initializing the first training model to obtain a second training model;
labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
inputting the predefined class samples into the first training model to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label;
judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label;
and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
In some embodiments, after the new category is obtained, the method further comprises the steps of:
if the unknown label generates a new class, the unknown label and the new class are used for re-marking the predefined class sample to obtain a defined class sample, and the defined class sample is used for training a second training model to obtain a combined classification model.
In the step of obtaining the first training model capable of identifying the class corresponding to the known label, the first training model completes pre-training through the training data marked with the existing label in the existing database, and can identify the class of the existing label.
In a large application scenario of the present solution, the first training model completes pre-training by using a public medical database, and can identify categories of traditional medical entities, where the categories include but are not limited to: diseases, drugs, symptoms, tissues and organs, surgical operations, examination items, departments, and the like.
The first training model may be a BERT prototype network, the structure of the first training model is shown in fig. 1, the first training model after training obtains a first mapping function, and the process of training the first training model to obtain the first mapping function is as follows:
defining a phrase with a known label to obtain an input sample x after word segmentation, wherein a first mapping function A of the input sample x embedded into a vector space is h, and a prototype of a category y is hp y Input sample x and prototypes of known classesp y D, the prototype of the known class c isp c Training the loss function L using the following formula to obtain a trained first mapping function:
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006
as shown in fig. 2, the first training model that can identify the class to which the known tag corresponds can identify terms under the corresponding class. For example, after a short sentence of irregular thickening of the whole stomach wall is considered that the leather stomach is possibly input into the first training model, the category of the stomach wall can be identified as an organ, the category of the leather stomach is a disease, and samples in different categories can be aggregated in a vector space.
In the step of initializing the first training model to obtain a second training model, the second training model is the untrained first training model, and correspondingly, the second mapping function of the second training model is the original mapping function.
In the embodiment of the scheme, the first training model and the second training model adopt the same original network and are both selected as BERT networks.
In the step of labeling each training datum to obtain a predefined class sample labeling a known label and an undefined class sample labeling an unknown label, the known label of the predefined class sample and the identification label of the first training model are the same, so that the known label can be identified by the first training model to obtain the known class.
In a common entity labeling method, such as the biees labeling method, the known label is "entity", and the unknown label is "non-entity", which has the advantage of identifying the unknown label to improve the prediction accuracy of the known label. Exemplary, in the sentence, "the whole stomach wall is irregularly thickened, and the leather stomach is considered as possible", the "stomach wall" and the "leather stomach" are solid, and the "irregular thickening" is non-solid, and the accuracy of the identification of the leather stomach can be improved by identifying the content of the non-solid.
It should be noted that the present scheme is to label the same training data to obtain predefined class samples and undefined class samples, so as to predict and learn unknown labels through known labels.
After "inputting the predefined class samples to the first training model, obtaining a first training vector corresponding to each known label; in the step of inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label, in order to implement aggregation judgment, at least two known labels of the same class are input into the predefined class samples in the first training model and the second training model, one predefined class sample may contain a plurality of known labels of the same class, or a plurality of predefined class samples, and if the predefined class samples are a plurality of predefined class samples, the known labels of the predefined class samples are collected in the same vector space.
Since the first training model can identify the class of the known label, the known labels belonging to the same class can be clustered based on the class, and the mapping results of the first mapping function on the corresponding vector space are aggregated by taking the prototype of the class as the center. The second training model is in an initial state, so that the mapping results of the second mapping function on the corresponding vector space are scattered, and the aggregation result can be judged according to the mapping results of the first training model and the second training model.
In the step of determining the aggregation result of the known labels based on the distance change between the first training vectors of the two known labels and the first original vectors, the distance between the two first training vectors is calculated as a first distance, the distance between the two first original vectors is calculated as a second distance, the difference value between the first distance and the second distance is compared, and if the difference value is greater than an aggregation setting threshold value, the corresponding two known labels are determined to be aggregated.
Illustratively, as shown in fig. 3, the predefined category sample S1 is "irregular thickening of the full stomach wall, considering leather stomach possibilities", where "stomach wall" is known label 1 and "leather stomach" is known label 2; the predefined class sample S2 is "liver multi-low density nodules," wherein "liver" is a known label 3 and "cyst" is a known label 4, a first distance of a first training vector generated by the known label 3 and a first training vector generated by the known label 1 is calculated, a second distance of the first original vector generated by the known label 3 and the first training vector generated by the known label 1 is calculated, and the known label 1 and the known label 3 are known to be aggregated based on the first distance and the second distance.
If the two known tags are not aggregated, the confidence of the known tags has no referential meaning, so the scheme needs to judge whether the two known tags are aggregated or not at first.
In the step of training the two classifiers based on the aggregation result and calculating the confidence degrees of the known labels in the same class to obtain the trained classification model, if the aggregation result shows that aggregation occurs, pairwise combination is performed on the known labels in the same class to calculate the confidence degrees.
The formula for calculating confidence is as follows:
Figure DEST_PATH_IMAGE008
b ij is the confidence level of two known tags under the same category,h i andh j is the first original vector of different known tags,
Figure DEST_PATH_IMAGE010
and
Figure DEST_PATH_IMAGE012
a first training vector of different known labels, W being a weight, b being a weight coefficient.
The loss function for the two classifiers is:
Figure DEST_PATH_IMAGE014
where N is the number of known tags,y ij are a class of known tags.
Illustratively, as shown in fig. 3, the predefined category sample S1 is "irregular thickening of the full stomach wall, considering leather stomach possibilities", where "stomach wall" is known label 1 and "leather stomach" is known label 2; the predefined class sample S2 is "liver multiple low density nodules," where "liver" is known label 3, "cyst" is known label 4, "patient is currently with normal urinary system 5 with a history of diabetes 6," where "urinary system 5" is known label 5 and "diabetes 6" is known label 6. After the calculation of the two classifications, the confidence between the known label 1 and the known label 5 is 0, the confidence between the known label 1 and the known label 5 is 1, the confidence between the known label 2 and the known label 3 is 0, the confidence between the known label 2 and the known label 6 is 1, the confidence between the known label 3 and the known label 4 is 0, the confidence between the known label 3 and the known label 5 is 1, the confidence between the known label 4 and the known label 5 is 0, the confidence between the known label 4 and the known label 6 is 1, and the confidence between the known label 5 and the known label 6 is 0.
The two classifiers trained by the method have certain relevance for aggregation of undefined class samples in a vector space, the undefined classes close to the characteristics of known labels of predefined class samples exist, and certain aggregation trend exists in the vector space, so that the method can predict the classes of the unknown labels through the two classifiers obtained by the training.
In the step of determining the aggregation result of the predefined class samples based on the distance change between the two second training vectors and the second original vector, the distance between the two second training vectors is calculated to be a third distance, the distance between the two second original vectors is calculated to be a fourth distance, the difference between the third distance and the fourth distance is compared, and if the difference is greater than an aggregation setting threshold, it is determined that the corresponding two unknown labels are aggregated.
Illustratively, as shown in fig. 3, the undefined class sample S1 is "irregular thickening of the entire stomach wall, considering leather stomach possibilities", where "irregular" is unknown label 7 and "thickened" is unknown label 8; the predefined class sample S2 is "liver multiple low density nodules," where "low density" is the unknown label 9 and "nodules" are the unknown label 10. And calculating a third distance between a second training vector generated by the unknown label 7 and a second training vector generated by the unknown label 9, calculating a fourth distance between a second original vector generated by the unknown label 7 and a second training vector generated by the unknown label 9, and knowing that the unknown label 7 and the unknown label 9 are aggregated based on the first distance and the second distance.
In the step of inputting two unknown labels into a two-classifier to output prediction confidence degrees based on the aggregation result, if the aggregation result shows that the unknown labels are aggregated, any two unknown labels are selected to be input into the two-classifier to output the prediction confidence degrees, and if the prediction confidence degrees are larger than a set threshold value, the two unknown labels are considered to be in the same class; and if the prediction confidence is smaller than a set threshold, the two unknown labels are not considered to belong to the same category.
Illustratively, two unknown tags are inputx u Andx v predicting confidence by the above two classifiersb uv Setting a threshold value gamma;
when in useb uv >Gamma, thenx u Andx v belonging to the same category, which is a newly discovered undefined categoryOSuch as "irregular" and "low density" areO 1 The "thickened" and "nodules" areO 2
When in useb uv <Gamma, thenx u Andx v do not belong to any category and do not create new categories such as "irregular" and "thickened".
Illustratively, and still as shown in fig. 3, sample S1 for the undefined class is "irregular thickening of the entire stomach wall, considering leather stomach potential", where "irregular" is unknown label 7 and "thickened" is unknown label 8; the predefined class sample S2 is "liver multiple low density nodules," where "low density" is unknown label 9, "nodules" is unknown label 10, and the undefined class sample S2 is "patient is current urinary system normal 11 with a history of diabetes," where "normal" is unknown label 11. And (3) calculating the confidence coefficient between the unknown labels by using a two-classifier to obtain a result: the confidence between the unknown label 7 and the unknown label 9 is 0.7, the confidence between the unknown label 8 and the unknown label 10 is 0.8, the confidence between the unknown label 7 and the unknown label 11 is 0.9, the confidence between the unknown label 9 and the unknown label 11 is 0.8, the confidence between the unknown label 7 and the unknown label 8 is 0.1, the confidence between the unknown label 8 and the unknown label 9 is 0.2, the confidence between the unknown label 9 and the unknown label 10 is 0.3, the confidence between the unknown label 8 and the unknown label 11 is 0.1, if a threshold value is set to 0.5, the unknown label 7, the unknown label 9 and the unknown label 11 are known to be clustered, and a new class O1 can be defined; clustering occurs between the unknown label 8 and the unknown label 10 and a new category O2 can be defined.
According to the scheme, the number and the identification accuracy of the unknown tags can be adjusted by adjusting the set threshold, and the corresponding relation between the number and the identification accuracy of the unknown tags is shown in fig. 6.
After the new class is obtained, a second training model for the unknown label can be further trained and learned based on the new class.
Specifically, in the step of training the second training model by using the defined class samples to obtain the combined classification model, the prototype of the combined classification model includes the prototype of the first training modelP c And prototypes of second training modelsP O The prototype of the first training model corresponds to the class of the existing label, and the prototype of the second training model corresponds to the class of the unknown label, so that the class is expanded.
As shown in the following equation:
Figure DEST_PATH_IMAGE016
Figure DEST_PATH_IMAGE018
where P is a prototype of the joint classification model,P c is a prototype of the first training model,P O is a prototype of the second training model.
As shown in fig. 4, when "liver is rich and low-density nodules and cysts are likely" are input to the joint classification model, the "liver" is classified as an organ, the "cyst" is classified as a "disease", the "low-density" is classified as O1, and the "nodule" is classified as O2.
The label clustering result of the combined classification model is compared with the Kmeans algorithm under the same condition, the comparison result shows that the label clustering result of the combined classification model of the scheme is better in performance, and the result is shown in figure 5.
Example two
Based on the same concept, the application also provides an entity identification device for learning unknown labels based on known labels, which comprises:
the first training model acquisition unit is used for acquiring a first training model capable of identifying the category corresponding to the known label;
the second training model obtaining unit is used for initializing the first training model to obtain a second training model;
the labeling unit is used for labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
a known label training unit, configured to input the predefined class sample to the first training model, so as to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label; judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
the unknown label prediction unit is used for inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label; and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
EXAMPLE III
The present embodiment further provides an electronic device, referring to fig. 7, comprising a memory 404 and a processor 402, wherein the memory 404 stores a computer program, and the processor 402 is configured to execute the computer program to perform the steps in any of the embodiments of the entity identification method for learning an unknown tag based on a known tag.
Specifically, the processor 402 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
Memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, memory 404 may include a hard disk drive (hard disk drive, HDD for short), a floppy disk drive, a solid state drive (SSD for short), flash memory, an optical disk, a magneto-optical disk, tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. The memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a static random-access memory (SRAM) or a dynamic random-access memory (DRAM), where the DRAM may be a fast page mode dynamic random-access memory 404 (FPMDRAM), an extended data output dynamic random-access memory (EDODRAM), a synchronous dynamic random-access memory (SDRAM), or the like.
Memory 404 may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by processor 402.
The processor 402 may implement any of the above embodiments of entity identification methods for learning unknown tags based on known tags by reading and executing computer program instructions stored in the memory 404.
Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402, and the input/output device 408 is connected to the processor 402.
The transmitting device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 406 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input-output device 408 is used to input or output information. In this embodiment, the input information may be a predefined class sample of a known tag, and the like, and the output information may be a class classification result of an unknown tag of an undefined class sample, and the like.
Optionally, in this embodiment, the processor 402 may be configured to execute the following steps by a computer program:
acquiring a first training model capable of identifying a category corresponding to a known label;
initializing the first training model to obtain a second training model;
labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
inputting the predefined class samples into the first training model to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label;
judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label;
and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiment and optional implementation manners, and details of this embodiment are not described herein again.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. An entity identification method for learning an unknown label based on a known label is characterized by comprising the following steps: acquiring a first training model capable of identifying a category corresponding to a known label;
initializing the first training model to obtain a second training model;
labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
inputting the predefined class samples into the first training model to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label;
judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label;
and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
2. The method according to claim 1, wherein if the unknown label generates a new class, the unknown label and the new class are used to label the predefined class sample again to obtain a defined class sample, and the defined class sample is used to train a second training model to obtain a joint classification model.
3. The method as claimed in claim 1, wherein in the step of determining the aggregation result of the known tags based on the distance change between the first training vector and the first original vector of the two known tags, the distance between the two first training vectors is calculated as a first distance, the distance between the two first original vectors is calculated as a second distance, the difference between the first distance and the second distance is compared, and if the difference is greater than an aggregation setting threshold, it is determined that the two corresponding known tags are aggregated.
4. The entity identification method based on known label learning unknown label of claim 1, characterized in that in the step of training two classifiers based on the aggregation result and calculating confidence of known labels of the same class to obtain the trained classification model, if the aggregation result shows aggregation, the confidence is calculated by pairwise combination of the known labels of the same class.
5. The method as claimed in claim 1, wherein in the step of determining the aggregation result of the predefined class samples based on the distance between two second training vectors and the second original vector, the distance between two second training vectors is calculated as a third distance, the distance between two second original vectors is calculated as a fourth distance, the difference between the third distance and the fourth distance is compared, and if the difference is greater than the aggregation setting threshold, it is determined that the two corresponding unknown tags are aggregated.
6. The entity identification method based on known tag learning unknown tag of claim 1, wherein in the step of inputting two unknown tags into two classifiers to output prediction confidence based on the aggregation result, if the aggregation result shows that the unknown tags are aggregated, any two unknown tags are selected and input into two classifiers to output prediction confidence.
7. The method according to claim 1, wherein in the step of "training the second training model with the defined class samples to obtain the combined classification model", the prototypes of the combined classification model include a prototype of the first training model and a prototype of the second training model, wherein the prototype of the first training model corresponds to the class of the existing label, and the prototype of the second training model corresponds to the class of the unknown label.
8. An entity recognition apparatus for learning an unknown tag based on a known tag, comprising:
the first training model acquisition unit is used for acquiring a first training model capable of identifying the category corresponding to the known label;
the second training model obtaining unit is used for initializing the first training model to obtain a second training model;
the labeling unit is used for labeling each training data to obtain a predefined class sample labeled with a known label and an undefined class sample labeled with an unknown label;
a known label training unit, configured to input the predefined class sample to the first training model, so as to obtain a first training vector corresponding to each known label; inputting the predefined class samples into the second training model to obtain a first original vector corresponding to each known label; judging the aggregation result of the known labels based on the distance change between the first training vector of the two known labels and the first original vector, training a second classifier based on the aggregation result, and calculating the confidence coefficient of the known labels of the same class to obtain a trained classification model;
the unknown label prediction unit is used for inputting the undefined class sample into a first training model to obtain a second training vector corresponding to each unknown label; inputting the undefined class sample into a second training model to obtain a second original vector of each corresponding unknown label; and judging an aggregation result of the unknown labels based on the distance change between a second training vector and the second original vector of the two unknown labels, inputting the two unknown labels into a classifier based on the aggregation result to output a prediction confidence coefficient, and generating a new class if the prediction confidence coefficient is greater than a set threshold value.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method for entity identification based on learning an unknown tag according to any of claims 1 to 7.
10. A readable storage medium, characterized in that a computer program is stored therein, the computer program comprising program code for controlling a process to execute a process, the process comprising the method for entity identification for learning an unknown tag based on known tags according to any of claims 1 to 7.
CN202210792170.2A 2022-07-07 2022-07-07 Entity identification method, device and application for learning unknown label based on known label Pending CN114861670A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210792170.2A CN114861670A (en) 2022-07-07 2022-07-07 Entity identification method, device and application for learning unknown label based on known label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210792170.2A CN114861670A (en) 2022-07-07 2022-07-07 Entity identification method, device and application for learning unknown label based on known label

Publications (1)

Publication Number Publication Date
CN114861670A true CN114861670A (en) 2022-08-05

Family

ID=82626814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210792170.2A Pending CN114861670A (en) 2022-07-07 2022-07-07 Entity identification method, device and application for learning unknown label based on known label

Country Status (1)

Country Link
CN (1) CN114861670A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468869A (en) * 2021-07-12 2021-10-01 北京有竹居网络技术有限公司 Semantic analysis model generation method, semantic analysis device and semantic analysis equipment
CN116232443A (en) * 2023-05-09 2023-06-06 中国科学技术大学 Environment WiFi backscattering system and method based on single commercial AP receiver
CN116433977A (en) * 2023-04-18 2023-07-14 国网智能电网研究院有限公司 Unknown class image classification method, unknown class image classification device, computer equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516718A (en) * 2019-08-12 2019-11-29 西北工业大学 The zero sample learning method based on depth embedded space
US20200160177A1 (en) * 2018-11-16 2020-05-21 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations
CN111222340A (en) * 2020-01-15 2020-06-02 东华大学 Breast electronic medical record entity recognition system based on multi-standard active learning
CA3063580A1 (en) * 2018-12-17 2020-06-17 10353744 Canada Ltd. Classifier training method and apparatus, electronic device and computer readable medium
WO2020193966A1 (en) * 2019-03-26 2020-10-01 Benevolentai Technology Limited Name entity recognition with deep learning
CN111932130A (en) * 2020-08-12 2020-11-13 上海冰鉴信息科技有限公司 Service type identification method and device
CN113127605A (en) * 2021-06-17 2021-07-16 明品云(北京)数据科技有限公司 Method and system for establishing target recognition model, electronic equipment and medium
CN113191148A (en) * 2021-04-30 2021-07-30 西安理工大学 Rail transit entity identification method based on semi-supervised learning and clustering
CN113259331A (en) * 2021-04-29 2021-08-13 上海电力大学 Unknown abnormal flow online detection method and system based on incremental learning
CN113255355A (en) * 2021-06-08 2021-08-13 北京明略软件***有限公司 Entity identification method and device in text information, electronic equipment and storage medium
CN113298253A (en) * 2021-06-03 2021-08-24 清华大学 Model training method, recognition method and device for named entity recognition
CN113672696A (en) * 2021-07-08 2021-11-19 浙江一山智慧医疗研究有限公司 Intention recognition method, device, computer equipment and computer readable storage medium
CN114241260A (en) * 2021-12-14 2022-03-25 四川大学 Open set target detection and identification method based on deep neural network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160177A1 (en) * 2018-11-16 2020-05-21 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations
CA3063580A1 (en) * 2018-12-17 2020-06-17 10353744 Canada Ltd. Classifier training method and apparatus, electronic device and computer readable medium
WO2020193966A1 (en) * 2019-03-26 2020-10-01 Benevolentai Technology Limited Name entity recognition with deep learning
CN110516718A (en) * 2019-08-12 2019-11-29 西北工业大学 The zero sample learning method based on depth embedded space
CN111222340A (en) * 2020-01-15 2020-06-02 东华大学 Breast electronic medical record entity recognition system based on multi-standard active learning
CN111932130A (en) * 2020-08-12 2020-11-13 上海冰鉴信息科技有限公司 Service type identification method and device
CN113259331A (en) * 2021-04-29 2021-08-13 上海电力大学 Unknown abnormal flow online detection method and system based on incremental learning
CN113191148A (en) * 2021-04-30 2021-07-30 西安理工大学 Rail transit entity identification method based on semi-supervised learning and clustering
CN113298253A (en) * 2021-06-03 2021-08-24 清华大学 Model training method, recognition method and device for named entity recognition
CN113255355A (en) * 2021-06-08 2021-08-13 北京明略软件***有限公司 Entity identification method and device in text information, electronic equipment and storage medium
CN113127605A (en) * 2021-06-17 2021-07-16 明品云(北京)数据科技有限公司 Method and system for establishing target recognition model, electronic equipment and medium
CN113672696A (en) * 2021-07-08 2021-11-19 浙江一山智慧医疗研究有限公司 Intention recognition method, device, computer equipment and computer readable storage medium
CN114241260A (en) * 2021-12-14 2022-03-25 四川大学 Open set target detection and identification method based on deep neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐树奎等: "基于层级式Bi-LSTM-CRF模型的军事目标实体识别方法", 《信息化研究》 *
陈毅松等: "基于支持向量机的渐进直推式分类学习算法", 《软件学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468869A (en) * 2021-07-12 2021-10-01 北京有竹居网络技术有限公司 Semantic analysis model generation method, semantic analysis device and semantic analysis equipment
CN116433977A (en) * 2023-04-18 2023-07-14 国网智能电网研究院有限公司 Unknown class image classification method, unknown class image classification device, computer equipment and storage medium
CN116433977B (en) * 2023-04-18 2023-12-05 国网智能电网研究院有限公司 Unknown class image classification method, unknown class image classification device, computer equipment and storage medium
CN116232443A (en) * 2023-05-09 2023-06-06 中国科学技术大学 Environment WiFi backscattering system and method based on single commercial AP receiver
CN116232443B (en) * 2023-05-09 2023-08-29 中国科学技术大学 Environment WiFi backscattering system and method based on single commercial AP receiver

Similar Documents

Publication Publication Date Title
CN108399228B (en) Article classification method and device, computer equipment and storage medium
CN114861670A (en) Entity identification method, device and application for learning unknown label based on known label
CN109243618B (en) Medical model construction method, disease label construction method and intelligent device
Chong et al. Simultaneous image classification and annotation
CN108536800B (en) Text classification method, system, computer device and storage medium
WO2019169719A1 (en) Automatic abstract extraction method and apparatus, and computer device and storage medium
US8019699B2 (en) Machine learning system
US11698930B2 (en) Techniques for determining artificial neural network topologies
WO2021139262A1 (en) Document mesh term aggregation method and apparatus, computer device, and readable storage medium
CN110930417A (en) Training method and device of image segmentation model, and image segmentation method and device
CN108520041B (en) Industry classification method and system of text, computer equipment and storage medium
US7664328B2 (en) Joint classification and subtype discovery in tumor diagnosis by gene expression profiling
US20230075100A1 (en) Adversarial autoencoder architecture for methods of graph to sequence models
WO2020224106A1 (en) Text classification method and system based on neural network, and computer device
WO2008137368A1 (en) Web page analysis using multiple graphs
CN109948735B (en) Multi-label classification method, system, device and storage medium
CN112507039A (en) Text understanding method based on external knowledge embedding
CN113806582B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
WO2019123451A1 (en) System and method for use in training machine learning utilities
CN113887580A (en) Contrast type open set identification method and device considering multi-granularity correlation
CN110808095B (en) Diagnostic result recognition method, model training method, computer equipment and storage medium
CN114943017A (en) Cross-modal retrieval method based on similarity zero sample hash
CN114238746A (en) Cross-modal retrieval method, device, equipment and storage medium
CN111783088B (en) Malicious code family clustering method and device and computer equipment
CN114328942A (en) Relationship extraction method, apparatus, device, storage medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220805