CN112507703B - Medical entity identification method, device, medium and electronic equipment - Google Patents

Medical entity identification method, device, medium and electronic equipment Download PDF

Info

Publication number
CN112507703B
CN112507703B CN202011437728.2A CN202011437728A CN112507703B CN 112507703 B CN112507703 B CN 112507703B CN 202011437728 A CN202011437728 A CN 202011437728A CN 112507703 B CN112507703 B CN 112507703B
Authority
CN
China
Prior art keywords
medical
entity
knowledge base
labeling
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011437728.2A
Other languages
Chinese (zh)
Other versions
CN112507703A (en
Inventor
艾杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yidu Cloud Beijing Technology Co Ltd
Original Assignee
Yidu Cloud Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yidu Cloud Beijing Technology Co Ltd filed Critical Yidu Cloud Beijing Technology Co Ltd
Priority to CN202011437728.2A priority Critical patent/CN112507703B/en
Publication of CN112507703A publication Critical patent/CN112507703A/en
Application granted granted Critical
Publication of CN112507703B publication Critical patent/CN112507703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The present disclosure provides a medical entity identification method, a medical entity identification apparatus, a computer readable medium and an electronic device; relates to the technical field of medical data processing. The medical entity identification method comprises the following steps: performing entity labeling on medical text data through a medical knowledge base, and dividing the medical text data containing the labels into training samples and testing samples; acquiring an entity recognition model through the training sample, and recognizing the test sample through the entity recognition model to acquire a recognition result of the test sample; and determining a medical entity to be updated by combining the recognition result of the test sample and the label contained in the test sample, and updating the medical knowledge base by using the determined medical entity to be updated. The medical entity identification method in the disclosure can overcome the problem of high labor cost caused by manual entity labeling to a certain extent, and further improves the efficiency of medical entity identification.

Description

Medical entity identification method, device, medium and electronic equipment
Technical Field
The present disclosure relates to the field of medical data processing technologies, and in particular, to a medical entity identification method, a medical entity identification apparatus, a computer readable medium, and an electronic device.
Background
Clinical test data has great significance for the progress of medicine, but most of the clinical test data are unstructured texts, and a large amount of manual reading is needed in order to read information such as the progress condition of diseases, adverse reaction symptoms and the like from a large amount of unstructured texts. With the development of computer technology, an electronic information system can be used in scenes such as query and storage of clinical data, so that the labor cost is relieved to a certain extent, the data processing efficiency is improved, but a large amount of time is still needed for reading the text when valuable information is identified from the text, and the identification efficiency is low.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure aims to provide a medical entity identification method, a medical entity identification device, a computer readable medium and an electronic device, which can overcome the problem of high labor cost for identifying medical texts to a certain extent, and further improve the identification efficiency of the medical texts.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a medical entity identification method, comprising:
performing entity labeling on medical text data through a medical knowledge base, and dividing the medical text data containing the labeling into a training sample and a testing sample;
acquiring an entity recognition model through the training sample, and recognizing the test sample through the entity recognition model to acquire a recognition result of the test sample;
and determining a medical entity to be updated by combining the recognition result of the test sample and the label contained in the test sample, and updating the medical knowledge base by using the determined medical entity to be updated.
In an exemplary embodiment of the present disclosure, the entity labeling of the medical text data by the medical knowledge base includes:
identifying a first medical entity in the medical text data through a medical dictionary and a regular expression;
labeling the first medical entity as a target label.
In an exemplary embodiment of the present disclosure, the determining the medical entity to be updated by combining the identification result of the test sample and the label contained in the test sample includes:
extracting a first medical entity corresponding to the target label from the test sample, and extracting a second medical entity in the identification result;
and comparing the first medical entity with the second medical entity to obtain a target entity which is not matched with the first medical entity in the second medical entity, and taking the target entity as the medical entity to be updated.
In an exemplary embodiment of the disclosure, the updating the medical knowledge base with the determined medical entity includes:
adding the medical entity to be updated to the medical dictionary.
In an exemplary embodiment of the present disclosure, after dividing the medical text data containing the labels into training samples and testing samples, the method further includes:
sampling and verifying the entity labels of the training samples to obtain verification results of the entity labels;
and adjusting the entity labels of the training samples according to the verification result.
In an exemplary embodiment of the present disclosure, the method further comprises:
screening a third medical entity labeled as a non-entity from the verification result;
updating the third medical entity into the medical knowledge base.
In an exemplary embodiment of the present disclosure, after the updating the medical knowledge base by using the determined medical entity, the method further includes:
performing entity identification by using the updated medical knowledge base to determine identification accuracy;
and if the accuracy does not meet the preset requirement, updating the updated medical knowledge base again.
According to a second aspect of the present disclosure, there is provided a medical entity identifying apparatus, comprising a sample labeling module, a sample identifying module and an entity updating module, wherein:
and the sample labeling module is used for carrying out entity labeling on the medical text data through the medical knowledge base and dividing the medical text data containing the labels into training samples and testing samples.
And the sample identification module is used for acquiring an entity identification model through the training sample, identifying the test sample through the entity identification model and acquiring an identification result of the test sample.
And the entity updating module is used for determining a medical entity to be updated by combining the identification result of the test sample and the label contained in the test sample, and updating the medical knowledge base by using the determined medical entity to be updated.
In an exemplary embodiment of the present disclosure, the sample labeling module may include a dictionary labeling unit and a label determination unit, wherein:
and the dictionary labeling unit is used for identifying the first medical entity in the medical text data through a medical dictionary and a regular expression.
A label determination unit for labeling the first medical entity as a target label.
In an exemplary embodiment of the present disclosure, the entity updating module may include a data extracting unit and an entity comparing unit, wherein:
and the data extraction unit is used for extracting a first medical entity corresponding to the target label from the test sample and extracting a second medical entity in the identification result.
And the entity comparison unit is used for comparing the first medical entity with the second medical entity to obtain a target entity which is not matched with the first medical entity in the second medical entity, and taking the target entity as the medical entity to be updated.
In an exemplary embodiment of the disclosure, the entity update module may be specifically configured to: adding the medical entity to be updated to the medical dictionary.
In an exemplary embodiment of the present disclosure, the medical entity identifying apparatus further comprises a sampling verification module, and a label adjustment module, wherein:
and the sampling verification module is used for sampling and verifying the entity labels of the training samples to obtain the verification result of the entity labels.
And the label adjusting module is used for adjusting the entity labels of the training samples according to the verification result.
In an exemplary embodiment of the present disclosure, the apparatus further includes a label verification module and a verification update module, wherein:
and the marking verification module is used for screening out a third medical entity marked as a non-entity from the verification result.
A verification update module to update the third medical entity to the medical knowledge base.
In an exemplary embodiment of the present disclosure, the apparatus further comprises an entity identification module and a medical knowledge base update module, wherein:
and the entity identification module is used for carrying out entity identification by utilizing the updated medical knowledge base and determining the identification accuracy.
And the medical knowledge base updating module is used for updating the updated medical knowledge base again if the accuracy does not meet the preset requirement.
According to a third aspect of the present disclosure, there is provided an electronic apparatus comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.
Exemplary embodiments of the present disclosure may have some or all of the following advantages:
in the medical entity recognition method provided by an exemplary embodiment of the disclosure, on one hand, entity labeling is performed on medical text data through a medical knowledge base, and then an entity recognition model is trained by using the labeled data, so that the problem that manual labeling is needed during model training can be avoided, the labor cost can be reduced, and the efficiency can be improved; and can also reduce the manual marking mistake, help to improve the accuracy of the model; on the other hand, the entity in the medical text data can be verified through the trained entity recognition model, so that the medical knowledge base is updated by combining the result of model recognition, and the correctness of entity recognition can be improved; and a closed loop can be formed between the entity marking and the entity identification, the medical knowledge base is automatically updated, the labor cost is reduced, and the efficiency of the medical entity identification can be improved by utilizing the updated medical knowledge base.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
Fig. 1 schematically shows a flow chart of a medical entity identification method according to an embodiment of the present disclosure;
fig. 2 schematically shows a flow chart of a medical entity identification method according to another embodiment of the present disclosure;
fig. 3 schematically shows a flow chart of a medical entity identification method according to yet another embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a medical entity identification method according to yet another embodiment of the present disclosure;
fig. 5 schematically shows a block diagram of a medical entity identification apparatus according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a system architecture diagram for implementing a medical entity identification method according to one embodiment of the present disclosure;
FIG. 7 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The technical scheme of the embodiment of the disclosure is explained in detail as follows:
based on one or more of the problems described above, the present example embodiment provides a medical entity identification method. Referring to fig. 1, the medical entity recognition method may include the steps of:
step S110: and performing entity labeling on the medical text data through a medical knowledge base, and dividing the medical text data containing the labeling into a training sample and a testing sample.
Step S120: and acquiring an entity recognition model through the training sample, and recognizing the test sample through the entity recognition model to acquire a recognition result of the test sample.
Step S130: and determining a medical entity by combining the recognition result of the test sample and the label contained in the test sample, and updating the medical knowledge base by using the determined medical entity.
In the medical entity recognition method provided by the exemplary embodiment of the disclosure, on one hand, entity labeling is performed on medical text data through a medical knowledge base, and then an entity recognition model is trained by using the labeled data, so that the problem that manual labeling is needed during model training can be avoided, the labor cost can be reduced, and the efficiency can be improved; and can also reduce the manual marking mistake, help to improve the accuracy of the model; on the other hand, the entity in the medical text data can be verified through the trained entity recognition model, so that the medical entity is determined by combining the result of model recognition and the entity label of the medical knowledge base, and the correctness of entity recognition can be improved; and a closed loop can be formed between the entity marking and the entity identification, the medical knowledge base is automatically updated, the labor cost is reduced, and the efficiency of the medical entity identification is improved.
The above steps of the present exemplary embodiment will be described in more detail below.
In step S110, entity labeling is performed on the medical text data through the medical knowledge base, and the medical text data containing the label is divided into a training sample and a testing sample.
The medical knowledge base refers to a set of medical professional knowledge and rules, and specifically comprises a medical dictionary and medical rules; the medical dictionary may include medical nouns, such as disease names, symptom names, etc., and the medical rules refer to rules summarized in medicine, such as rules of occurrence of diseases, symptoms, rules of medication, etc.; in addition, the medical knowledge base also includes disease knowledge, drug information, and the like, case literature, and the like, which are not limited in this embodiment.
The medical textual data may include textual data recorded during disease study treatment of patient cases, medical literature, and the like. A large amount of medical text data can be inquired through the medical electronic system, and then the medical knowledge base is matched with the medical text data to determine the medical entity contained in the medical text data, so that the determined medical entity is labeled. Wherein a medical entity refers to an object in the text representing a medical intellectual body, such as a disease name, a symptom name, an examination name, a drug name, and the like.
In an exemplary embodiment, the medical entities in the medical text data can be identified through the medical dictionary and the regular expression, and a first medical entity contained in the medical text data is identified; the first medical entity is then labeled as a target label. The medical nouns contained in the medical dictionary are medical entities, whether the words in the medical text data are the words contained in the dictionary or not is determined through matching, if the words contained in the medical text data can be matched with one word in the dictionary, the word can be determined to be the first medical entity in the medical text data, and the word is labeled as a target label. The target label can be set according to actual conditions, for example, the target label can be a specific character, number, etc., for example, for medical text data "12 times of radiotherapy, nausea and vomiting occur by 1 degree, vomitus is mostly white mucus, and no obvious bright red or dark red-like change exists. "entity labeling, wherein the first medical entity" nausea and vomiting "can be labeled" E ". Moreover, the non-medical entities in the medical text data may also be labeled, that is, the first medical entity in the medical text data may be labeled as a target label, and the non-medical entities may be labeled as other labels. For example, a first medical entity may be labeled "E" and a non-medical entity may be labeled "N", and medical textual data "12 radiation treatments with nausea and vomiting, mostly white mucus with no noticeable bright red or dark red-like changes. For example, "NNNNNNNNNNEEEENNNNNNNNNNNNNNNNNNNNNNN" can be obtained after entity labeling is performed, where each character corresponds to a word in the medical text data, and "EEEE" represents the first medical entity therein.
In other embodiments, the medical text data may also be labeled in other manners, and in order to distinguish each first medical entity, each character in the medical text data may be labeled, the non-medical entity may be labeled as a first label, the beginning field of the first medical entity may be labeled as a second label, the non-beginning field of the first medical entity may be labeled as a third label, and so on, so that the beginning and the end of the first medical entity may be determined, and the effect of word segmentation is achieved. For example, the first label is "N", the second label is "B", the third label is "E", and the medical text data "12 times of radiation therapy show nausea and vomiting, and most vomit is white mucus without obvious bright red or dark red-like change. <xnotran> " , " NNNNNNNNNNBEBENNNNNNNNNNNNNNNNNNNNNNN ". </xnotran>
After labeling the medical text data, the medical text data containing the labels may be divided into training samples and test samples. The training samples can be used for training the entity recognition model, and the testing samples can be used for testing and verifying the entity recognition model. For example, the total amount of the medical text data may be counted, and the training samples and the test samples may be divided according to a certain proportion, for example, the training samples are 70% of the total amount, and the test samples may be 30% of the total amount, and for another example, the training samples may be 80% of the total amount, the test samples may be 20% of the total amount, and the like, and other proportions of the division may also be performed according to actual needs, which is not particularly limited in this embodiment.
In step S120, an entity recognition model is obtained through the training sample, and the test sample is recognized through the entity recognition model, so as to obtain a recognition result of the test sample.
In the present exemplary embodiment, a Hidden Markov Model (HMM) may be used for training a training sample, and after training, the training sample may be used as an entity recognition Model; alternatively, other machine learning models may be used for training, such as a support vector machine model, a decision tree model, and the like, but the embodiment is not limited thereto. After the entity recognition model is obtained through training, the test sample can be recognized through the entity recognition model, the test sample is input into the entity recognition model, the medical entity contained in the test sample can be recognized through the model, and then the recognition result is output. The recognition result may include the entity in the test sample, for example, after obtaining the entity recognition model, the test sample may be "irradiated 12 times, and nausea and vomiting occur, and the vomit is mostly white mucus, and no obvious bright red or dark red sample change occurs. "the recognition result" nausea, vomiting, vomit "or the like can be obtained by inputting the result into the entity recognition model.
Next, in step S130, a medical entity to be updated is determined according to the recognition result of the test sample and the label included in the test sample, and the medical knowledge base is updated by using the determined medical entity to be updated.
In this embodiment, the entity included in the recognition result and the first medical entity labeled in the test sample may be compared to verify whether the recognition of the entity recognition model is correct, if the entity recognized by the model is the same as the entity labeled by using the medical knowledge base, it may be determined that the model recognition is correct, and if the entity recognized by the model is different from the entity labeled by using the medical knowledge base, the medical entity may be screened from different entities and updated to the medical knowledge base. Specifically, as shown in fig. 2, the method may include step S210 and step S220, where:
in step S210, a first medical entity corresponding to the target tag is extracted from the test sample, and a second medical entity in the identification result is extracted. After the entity labeling is performed on the medical text data, the corresponding first medical entity can be extracted from the test sample according to the target label corresponding to the entity. And taking the entity identified by the entity identification model in the identification result as a second medical entity.
In step S220, a first medical entity and the second medical entity are compared, a target entity that is not matched with the first medical entity in the second medical entity is obtained, and the target entity is used as the medical entity to be updated. The intersection of the first medical entity and the second medical entity can be determined through comparison, and the entities contained in the intersection can be the entities with correct identification; entities that are not identified by the entity identification model can be classified into two cases, namely, entities that are included in the first medical entity but not in the second medical entity, namely, words that are labeled as target labels in a test sample but not identified by the entity identification model, and the entity identification model can be optimized by using the test sample; another scenario is that the term is not included in the first medical entity but in the second medical entity, i.e., the term is not labeled in the test sample but is recognized as an entity by the entity recognition model. In this embodiment, the target entity is the second case, i.e. a word that is not labeled by the medical knowledge base but recognized as an entity by the entity recognition model.
And after the target entity is obtained, the target entity can be used as the medical entity to be updated. Since the target entity is not labeled as an entity, that is, the target entity does not exist in the medical knowledge base, the target entity can be updated to the medical dictionary, so that entities in the medical dictionary are increased, the updated medical knowledge base can be used for medical entity recognition, and the recognition accuracy can be improved. In the embodiment, the medical text data is subjected to primary entity marking through the medical knowledge base, the entity identification model is used for primary identification of the medical text data, and the entity contained in the medical text data is identified, so that the effect of mutual verification is formed between the entity identification model and the medical knowledge base, the manual entity marking and verification are avoided, and the entity acquisition efficiency can be greatly improved.
In other embodiments of the present disclosure, the medical knowledge base may also be directly updated by a second medical entity, specifically, after the entity identification module identifies the second medical entity in the test sample, the second medical entity may be compared with the medical dictionary, the second medical entity not included in the medical dictionary is compared as the entity to be updated, and then the compared medical entity to be updated is added to the medical dictionary. Or, before the medical entity to be updated is added to the medical dictionary, the medical entity to be updated, which is verified to be correct, may be verified manually, to confirm whether the medical entity to be updated is correct, and the medical entity to be updated, which is verified to be correct, is added to the medical dictionary. Compared with the method that a large amount of medical text data are processed through manual labeling, the second medical entity is identified through the entity identification model, the entity to be updated in the second medical entity is screened out through comparison, the entity to be updated only needs to be verified manually, the operation that a large amount of texts need to be read manually is avoided, and the entity acquisition efficiency can be improved.
In an exemplary embodiment, after the medical text data is subjected to entity labeling by using the medical knowledge base, sampling verification can be performed on the labeled medical text data to obtain a verification result of the entity labeling; and then, the entity label of the medical text is modified according to the verification result so as to improve the accuracy of the entity identification model. Or, sampling verification can be directly performed on the training samples, the words labeled with errors in the training samples are verified, and the labels with errors are modified. Illustratively, a certain number of samples can be randomly extracted from the labeled training samples for manual verification, whether the entity in the sample is labeled correctly or not is determined, and the incorrect label can be recorded to obtain a verification result. And then, according to the verification result, the incorrect label is adjusted, the non-entity labeled as the entity is re-labeled as the non-entity, and the entity labeled as the non-entity is re-labeled as the entity. For example, a word belonging to a medically named entity is labeled with the label "E" by an entity label, a word that is not an entity is labeled with "N", and if not an entity but is labeled with "E", the labeling error may be determined. After sampling verification is carried out on the labeled medical text data, the accuracy rate of entity labeling of the medical knowledge base can be determined, and therefore whether the medical knowledge base needs to be updated or not can be conveniently determined.
After obtaining the verification result of the training sample, the present embodiment may further include step S310 and step S320, as shown in fig. 3.
In step S310, a third medical entity labeled as a non-entity is screened from the verification result. The third medical entity refers to a word that is not labeled as a target label but actually verified as an entity, for example, a label corresponding to the entity is a target label.
In step S320, the third medical entity is updated into the medical knowledge base. If the third medical entity is not labeled as an entity at the time of entity labeling, it may be determined that the third medical entity is not included in the medical knowledge base, and thus the third medical entity may be added to the medical knowledge base to complete the update of the medical knowledge base.
In this embodiment, after the medical knowledge base is updated, a certain number of text samples may be obtained again, entity recognition may be performed on the obtained text samples again by using the updated medical knowledge base, entities included in the text samples may be determined, and the accuracy of the medical knowledge base recognition may be calculated. If the accuracy rate does not meet the requirement, the medical knowledge base can be updated again by using the acquired text sample, and iteration is repeated until the identification accuracy rate of the medical knowledge base meets the requirement. When the accuracy of the medical knowledge base meets the requirement, the medical knowledge base can be directly utilized to carry out entity recognition on the medical text, manual participation is not needed, and the efficiency of entity recognition can be improved.
As shown in fig. 4, the present embodiment may further include steps S410 to S470, where:
in step S410, a medical text set is acquired; the medical text set comprises a plurality of medical texts; in step S420, performing entity tagging on the medical text set by using a medical knowledge base; in step S430, sampling and verifying the labeled medical text set; in step S440, updating the medical knowledge base with the result of the sampling verification; in step S450, the labeled medical text set is divided into training samples and test samples, and the entity recognition model is trained by using the training samples; in step S460, identifying the test sample by using the entity identification model, and outputting an identification result; in S470, the medical knowledge base is updated by combining the recognition result and the entity labeling result; illustratively, participles that are not labeled as entities but are identified by the model as entities are added to the medical knowledge base. In the embodiment, the accuracy rate of the marking can be calculated during sampling verification; if the accuracy rate of the labeling reaches a preset threshold value, the medical knowledge base can be confirmed to be capable of meeting the requirement of data production for the identification of the entity, and the identified entity has higher reliability; if the accuracy of the labeling is low and does not reach the preset threshold value, the reliability of the medical knowledge base is determined to be not high enough, the medical knowledge base needs to be updated, a batch of medical texts can be obtained again after the updating, the labeling is carried out again by using the updated medical knowledge base, the accuracy of the labeling is determined, the labeling accuracy of the medical knowledge base can meet the requirement through multiple rounds of iteration, the entity identification accuracy is improved, meanwhile, the labor cost can be greatly reduced, and the entity identification efficiency is improved.
It should be noted that the steps in fig. 4 are a summary of the above specific embodiment, and therefore, steps S410 to S470 are all described in the above specific embodiment, and are not described again here.
Further, in the present exemplary embodiment, a medical entity identification apparatus is also provided, which is configured to execute the medical entity identification method of the present disclosure. The device can be applied to a server or terminal equipment.
Referring to fig. 5, the medical entity identifying apparatus 500 may include: a sample labeling module 510, a sample identification module 520, and an entity update module 530, wherein:
and the sample labeling module 510 is configured to perform entity labeling on the medical text data through the medical knowledge base, and divide the medical text data containing the label into a training sample and a testing sample.
The sample identification module 520 is configured to obtain an entity identification model through the training sample, identify the test sample through the entity identification model, and obtain an identification result of the test sample.
And the entity updating module 530 is configured to determine a medical entity to be updated according to the recognition result of the test sample and the label included in the test sample, and update the medical knowledge base by using the determined medical entity to be updated.
In an exemplary embodiment of the present disclosure, the sample labeling module 510 may include a dictionary labeling unit and a label determination unit, wherein:
and the dictionary labeling unit is used for identifying the first medical entity in the medical text data through a medical dictionary and a regular expression.
A label determination unit for labeling the first medical entity as a target label.
In an exemplary embodiment of the present disclosure, the entity updating module 530 may include a data extracting unit and an entity comparing unit, wherein:
and the data extraction unit is used for extracting a first medical entity corresponding to the target label from the test sample and extracting a second medical entity in the identification result.
And the entity comparison unit is used for comparing the first medical entity with the second medical entity to obtain a target entity which is not matched with the first medical entity in the second medical entity, and taking the target entity as the medical entity to be updated.
In an exemplary embodiment of the disclosure, the entity update module 530 may be specifically configured to: adding the medical entity to be updated to the medical dictionary.
In an exemplary embodiment of the present disclosure, the medical entity identifying apparatus 500 further comprises a sampling verification module, and an annotation adjustment module, wherein:
and the sampling verification module is used for sampling and verifying the entity labels of the training samples to obtain the verification result of the entity labels.
And the label adjusting module is used for adjusting the entity labels of the training samples according to the verification result.
In an exemplary embodiment of the present disclosure, the apparatus 500 further includes a label verification module and a verification update module, wherein:
and the marking verification module is used for screening out a third medical entity marked as a non-entity from the verification result.
A verification update module to update the third medical entity to the medical knowledge base.
In an exemplary embodiment of the present disclosure, the apparatus 500 further comprises an entity identification module and a medical knowledge base update module, wherein:
and the entity identification module is used for carrying out entity identification by utilizing the updated medical knowledge base and determining the identification accuracy.
And the medical knowledge base updating module is used for updating the updated medical knowledge base again if the accuracy does not meet the preset requirement.
For details which are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the medical entity identification method of the present disclosure for the details which are not disclosed in the embodiments of the apparatus of the present disclosure.
Referring to fig. 6, fig. 6 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a medical entity recognition method and a medical entity recognition apparatus according to an embodiment of the present disclosure may be applied.
As shown in fig. 6, the system architecture 600 may include one or more of terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 serves as a medium for providing communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 601, 602, 603 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 605 may be a server cluster composed of a plurality of servers, or the like.
The medical entity identification method provided by the embodiment of the present disclosure is generally executed by the server 605, and accordingly, the medical entity identification apparatus is generally disposed in the server 605. However, it is easily understood by those skilled in the art that the medical entity identification method provided in the embodiment of the present disclosure may also be executed by the terminal device 601, 602, 603, and accordingly, the medical entity identification apparatus may also be disposed in the terminal device 601, 602, 603, which is not particularly limited in the present exemplary embodiment.
FIG. 7 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement embodiments of the present disclosure.
It should be noted that the computer system 700 of the electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for system operation are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that the computer program read out therefrom is mounted in the storage section 708 as necessary.
In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU) 701, performs various functions defined in the methods and apparatus of the present application.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 1 to 4, and the like.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (7)

1. A medical entity identification method, comprising:
performing entity marking on medical text data through a medical knowledge base, dividing the medical text data containing the marking into a training sample and a testing sample, wherein the entity marking of the training sample is sampled and verified to obtain a verification result of the entity marking, screening a third medical entity marked as a non-entity from the verification result, and updating the third medical entity to the medical knowledge base;
acquiring an entity recognition model through the training sample, and recognizing the test sample through the entity recognition model to acquire a recognition result of the test sample;
extracting a first medical entity corresponding to the target label from the test sample, extracting a second medical entity in the identification result, and comparing the first medical entity with the second medical entity;
for the compared entities contained in the first medical entity but not contained in the second medical entity, optimizing an entity identification model by using the test samples corresponding to the entities;
for the entity which is obtained by comparison and is not contained in the first medical entity but contained in the second medical entity, as a medical entity to be updated, updating the medical knowledge base by using the medical entity to be updated;
calculating the accuracy of entity labeling of the training sample during sampling verification, if the accuracy does not reach a preset threshold, re-acquiring a batch of medical text data after updating the medical knowledge base, labeling the re-acquired medical text data through the updated medical knowledge base again, determining the accuracy of labeling, and repeating multiple iterations until the accuracy of labeling through the medical knowledge base reaches the preset threshold.
2. The method of claim 1, wherein the entity labeling of medical textual data by a medical knowledge base comprises:
identifying a first medical entity in the medical text data through a medical dictionary and a regular expression;
labeling the first medical entity as the target label.
3. The method of claim 1, wherein the updating the medical knowledge base with the determined medical entity comprises:
adding the medical entity to be updated to a medical dictionary.
4. The method of claim 1, further comprising, after the performing sample validation on the entity label of the training sample to obtain a validation result of the entity label:
and adjusting the entity labels of the training samples according to the verification result.
5. A medical entity recognition apparatus, comprising:
the system comprises a sample marking module, a medical knowledge base and a data processing module, wherein the sample marking module is used for carrying out entity marking on medical text data through the medical knowledge base, dividing the medical text data containing the marking into a training sample and a testing sample, sampling and verifying the entity marking of the training sample to obtain a verification result of the entity marking, screening a third medical entity marked as a non-entity from the verification result, and updating the third medical entity to the medical knowledge base;
the sample identification module is used for acquiring an entity identification model through the training sample, identifying the test sample through the entity identification model and acquiring an identification result of the test sample;
the entity updating module is used for extracting a first medical entity corresponding to the target label from the test sample, extracting a second medical entity in the identification result, and comparing the first medical entity with the second medical entity; for entities which are obtained through comparison and contained in a first medical entity but not contained in a second medical entity, optimizing an entity recognition model by using a test sample corresponding to the entities, and for entities which are obtained through comparison and not contained in the first medical entity but contained in the second medical entity as medical entities to be updated, updating the medical knowledge base by using the medical entities to be updated;
the entity identification module is used for calculating the accuracy of entity labeling of the training sample during sampling verification;
and the medical knowledge base updating module is used for re-acquiring a batch of medical text data after updating the medical knowledge base if the accuracy rate does not reach a preset threshold value, re-labeling the re-acquired medical text data through the updated medical knowledge base, determining the labeling accuracy rate, and repeating multiple iterations until the accuracy rate of labeling through the medical knowledge base reaches the preset threshold value.
6. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-4.
7. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1-4 via execution of the executable instructions.
CN202011437728.2A 2020-12-07 2020-12-07 Medical entity identification method, device, medium and electronic equipment Active CN112507703B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011437728.2A CN112507703B (en) 2020-12-07 2020-12-07 Medical entity identification method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011437728.2A CN112507703B (en) 2020-12-07 2020-12-07 Medical entity identification method, device, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112507703A CN112507703A (en) 2021-03-16
CN112507703B true CN112507703B (en) 2022-11-08

Family

ID=74970669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011437728.2A Active CN112507703B (en) 2020-12-07 2020-12-07 Medical entity identification method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112507703B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020083298A1 (en) * 2018-10-22 2020-04-30 深圳前海达闼云端智能科技有限公司 Medical image identification method and apparatus, storage medium and electronic device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649272B (en) * 2016-12-23 2019-06-25 东北大学 A kind of name entity recognition method based on mixed model
CN106897559B (en) * 2017-02-24 2019-09-17 黑龙江特士信息技术有限公司 A kind of symptom and sign class entity recognition method and device towards multi-data source
CN110321550A (en) * 2019-04-25 2019-10-11 北京科技大学 A kind of name entity recognition method and device towards Chinese medical book document
CN111834014A (en) * 2020-07-17 2020-10-27 北京工业大学 Medical field named entity identification method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020083298A1 (en) * 2018-10-22 2020-04-30 深圳前海达闼云端智能科技有限公司 Medical image identification method and apparatus, storage medium and electronic device

Also Published As

Publication number Publication date
CN112507703A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN112256828B (en) Medical entity relation extraction method, device, computer equipment and readable storage medium
US20180075368A1 (en) System and Method of Advising Human Verification of Often-Confused Class Predictions
CN109522552B (en) Normalization method and device of medical information, medium and electronic equipment
CN109657251B (en) Method and device for translating sentences
US20180068221A1 (en) System and Method of Advising Human Verification of Machine-Annotated Ground Truth - High Entropy Focus
US9390374B2 (en) Adaptive testing for answers in a question and answer system
WO2021139257A1 (en) Method and apparatus for selecting annotated data, and computer device and storage medium
CN111143226A (en) Automatic testing method and device, computer readable storage medium and electronic equipment
CN112990294B (en) Training method and device of behavior discrimination model, electronic equipment and storage medium
CN111090641A (en) Data processing method and device, electronic equipment and storage medium
CN110609910B (en) Medical knowledge graph construction method and device, storage medium and electronic equipment
US11182605B2 (en) Search device, search method, search program, and recording medium
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
CN113407536B (en) Method, device, terminal equipment and medium for associating table data
CN111143556A (en) Software function point automatic counting method, device, medium and electronic equipment
CN111667923B (en) Data matching method and device, computer readable medium and electronic equipment
CN113593709A (en) Disease coding method, system, readable storage medium and device
CN110688111A (en) Configuration method, device, server and storage medium of business process
CN111325031B (en) Resume analysis method and device
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN114357195A (en) Knowledge graph-based question-answer pair generation method, device, equipment and medium
CN113808758A (en) Method and device for verifying data standardization, electronic equipment and storage medium
CN112507703B (en) Medical entity identification method, device, medium and electronic equipment
CN110909824B (en) Test data checking method and device, storage medium and electronic equipment
CN111523309A (en) Medicine information normalization method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant