CN117973393A - Accurate semantic comparison method and system for key medical information in medical text - Google Patents

Accurate semantic comparison method and system for key medical information in medical text Download PDF

Info

Publication number
CN117973393A
CN117973393A CN202410363130.5A CN202410363130A CN117973393A CN 117973393 A CN117973393 A CN 117973393A CN 202410363130 A CN202410363130 A CN 202410363130A CN 117973393 A CN117973393 A CN 117973393A
Authority
CN
China
Prior art keywords
medical
semantic
key
medical information
texts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410363130.5A
Other languages
Chinese (zh)
Other versions
CN117973393B (en
Inventor
邓立宗
蒋太交
陈禄明
程瑶
杨涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Institute Of Systems Medicine
Original Assignee
Suzhou Institute Of Systems Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Institute Of Systems Medicine filed Critical Suzhou Institute Of Systems Medicine
Priority to CN202410363130.5A priority Critical patent/CN117973393B/en
Publication of CN117973393A publication Critical patent/CN117973393A/en
Application granted granted Critical
Publication of CN117973393B publication Critical patent/CN117973393B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a precise semantic comparison method and a system for key medical information in a medical text, and relates to the technical field of medical natural language processing. Based on the semantic structure unit list, establishing a semantic structure unit similarity distinguishing model; and comparing the phenotype concepts and the attribute sets in the semantic structural units in sequence, and comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result. The invention not only technically fills up the blank of the key medical information comparison technology of the medical text, but also can effectively improve the efficiency of clinical experts in comparing the medical text.

Description

Accurate semantic comparison method and system for key medical information in medical text
Technical Field
The invention relates to the technical field of medical natural language processing, in particular to a precise semantic comparison method and system for key medical information in medical texts.
Background
The range of medical texts includes natural language texts such as medical textbooks, clinical guidelines, medical documents, and electronic medical records. The key medical information in the medical text refers to clinical medical terms including clinical medical concepts of diseases, symptoms, signs, examinations, operations, and medicines, which are contained in natural language text. The definition of the accurate semantic comparison problem of key medical information in the medical text is as follows: and (3) extracting the key medical information contained in the two medical texts through a natural language processing algorithm, and carrying out semantic comparison, so that the key medical information comparison results of the two medical texts are finally output, wherein the key medical information comparison results comprise clinical medical elements which are completely identical, partially similar and completely different. For example, in medical text A ("patient's right lower abdomen severe needle-like abdominal pain … laboratory exam: WBC 12.9 x10 9/L …") and medical text B ("patient complaint left upper abdominal cramp … Blood image: white Blood Cell bias higher …"), two clinical medical concepts were mentioned, namely "abdominal pain" and "White Blood Cell" (WBC), the comparison of which were "identical" and "partially similar", respectively, as shown in FIG. 1. The accurate semantic comparison technology of key medical information in the medical text is an important basic technology in the field of medical informatics, and can be widely applied to intelligent medical products such as similar patient retrieval, clinical auxiliary diagnosis decision and the like.
The current text comparison technology is mainly focused on the field of general texts, and the research on the precise semantic comparison technology of key medical information in medical texts is very rare. Since the medical concepts contained in the medical text are semantically very complex, the general text comparison technology has a plurality of defects in applicability and accuracy when being directly applied to the medical text comparison problem. For example, the keyword-based comparison technique extracts keywords contained in a text mainly according to a predefined keyword list, and outputs keywords common to two medical texts. However, there are a number of synonym phenomena in medical text, such as "abdominal pain" and "abdominal pain", "appendicitis" and "brainstorming", which are completely synonymous at the semantic level, but differ at the character level, so that the solution cannot compare them together. In addition, keyword matching techniques have not been able to solve the problem of semantic comparison of qualitative and quantitative inspection class terms in medical text, such as "WBC:12.9 x 10 9/L "and" higher white blood cell count "are comparable, because the reference range is based on the normal white blood cell count (adult 3.5-9.5 x 10 9/L),"WBC:12.9*109/L" belongs to the category of "higher white blood cell count").
In addition to the keyword comparison technology, text comparison tools represented by TextDiff and Text computer are used for comparing texts based on the word element composition and the editing distance of character strings, however, the main purpose of the schemes is to calculate the overall similarity between texts, and the accurate semantic comparison target of the key medical information in the medical texts as shown in fig. 1 cannot be realized. Similarly, there is a method for vectorizing text strings based on a deep learning algorithm to perform text comparison, but similarly, the method can calculate the similarity on the whole text layer, but cannot realize the accurate semantic comparison target of medical texts taking medical concepts as units.
Therefore, a brand new technical scheme for comparing medical texts is needed, which is used for solving the problem of accurate semantic comparison of key medical information in medical texts.
Disclosure of Invention
Therefore, the embodiment of the invention provides a method and a system for precisely comparing the semantics of key medical information in medical texts, which are used for solving the problems that the prior art cannot solve the semantic comparison of qualitative and quantitative check type terms in medical texts and cannot realize the precise semantic comparison target of the medical texts taking medical concepts as units.
In order to solve the above problems, an embodiment of the present invention provides a method for precisely comparing semantic meaning of key medical information in a medical text, the method comprising:
step S1: inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
Step S2: based on the semantic structure unit list, establishing a semantic structure unit similarity distinguishing model;
step S3: and comparing the phenotype concepts and the attribute sets in the semantic structural units in sequence, and comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result.
Preferably, in step S1, two different medical texts are input, medical information contained in the medical texts is extracted, and standardized processing is performed on the medical information, so as to obtain a structured and standardized semantic structure unit list in the medical texts, which specifically includes:
Step S11: inputting medical text in natural language form;
Step S12: based on bi-directional coding representation BERT and conditional random field CRF of the converter, constructing a BERT-CRF named entity identification architecture, and identifying core medical concepts contained in medical texts and attribute entities associated with the core medical concepts by using the BERT-CRF named entity identification architecture;
Step S13: constructing and training a pre-training language model, carrying out standardized processing on attribute entities obtained in the process of structured extraction of the Chinese medicine text in the step S12 by utilizing the pre-training language model, and linking the attribute entities to a unified medical language system;
Step S14: and outputting a structured and standardized semantic structure unit list.
Preferably, in step S12, using the BERT-CRF named entity recognition architecture, a method for recognizing core medical concepts and attribute entities associated with the core medical concepts contained in a medical text specifically includes:
firstly, obtaining an input sequence according to an input medical text in a natural language form;
then, encoding the input sequence by using the bi-directional encoding representation BERT of the converter to obtain an embedded representation of each word;
Then inputting the obtained embedded representation of each word into a conditional random field CRF, and carrying out label prediction by the conditional random field CRF according to the context information of the word and the dependency relationship between labels;
and finally outputting the core medical concept contained in the medical text and the attribute entity associated with the core medical concept.
Preferably, the core medical concept includes, but is not limited to, diseases, symptoms, signs, examinations, procedures, and medications, and the attribute entities associated with the core medical concept include, but are not limited to, presence, severity, urgency, and seizure.
Preferably, in step S13, a pre-training language model is constructed and trained, and the attribute entity obtained in the text structured extraction process in step S12 is standardized by using the pre-training language model and linked to a unified medical language system, which specifically includes:
firstly, a Chinese-English bilingual aligned medical term set from a unified medical language system is used as a corpus;
Then training a pre-training language model for associating a certain medical term to a standard medical concept of the unified medical language system based on a contrast learning method;
And finally, carrying out standardized processing on the entity obtained in the structured extraction process of the Chinese medicine text in the step S21 by utilizing a pre-training language model, and linking the entity to a unified medical language system.
Preferably, in step S2, the method for establishing a semantic structural unit similarity distinguishing model based on the semantic structural unit list specifically includes:
Step S21: constructing a medical synonym data set by collecting, sorting and translating medical synonym knowledge from a unified medical language system, wherein the medical synonym data set comprises Chinese and English term pairs, term pairs of hierarchical relations and dissimilar term pairs;
Step S22: based on the constructed medical synonym data set, the parameters of the pre-training language model are re-trained and fine-tuned, so that a semantic structure unit similarity distinguishing model for judging whether two semantic structure units are similar is constructed.
Preferably, in step S3, comparing the phenotype concept and the attribute set in the semantic structural unit in sequence, and comprehensively judging the similarity category of the complete semantic structural unit according to the results of the phenotype concept and the attribute set based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result, which specifically includes:
According to different semantic structure unit types, different strategies are adopted to carry out semantic comparison on the semantic structure units, and the semantic structure unit types are divided into term type semantic structure units and logic type semantic structure units;
For the term type semantic structure unit, a pre-training language model strategy is adopted, a semantic structure unit similarity distinguishing model is used, comparison and judgment are sequentially carried out on the surface type concept and the attribute set, and a comparison result is obtained, wherein the term type semantic structure unit similarity category is divided into three categories, namely: "exactly equal", "partially similar" and "dissimilar";
For the logic type semantic structure unit, a strategy based on knowledge base driving is adopted, laboratory examination is firstly uniformly converted into a semantic structure unit form of examination source-analyte-abnormality judgment, and then similarity judgment of the semantic structure unit is directly carried out, wherein the similarity categories of the logic type semantic structure unit are divided into two categories, namely: "exactly equal" and "dissimilar".
The embodiment of the invention also provides a system for precisely comparing the semantic meaning of the key medical information in the medical text, which is used for realizing the precisely comparing method for the key medical information in the medical text, and specifically comprises the following steps:
The key medical information extraction module is used for inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
the similarity distinguishing model establishing module is used for establishing a similarity distinguishing model of the semantic structural units based on the semantic structural unit list;
The key medical information comparison module is used for sequentially comparing the phenotype concepts and the attribute sets in the semantic structural units, comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model, and obtaining key medical information comparison results.
The embodiment of the invention also provides electronic equipment, which comprises a processor, a memory and a bus system, wherein the processor is connected with the memory through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored by the memory so as to realize the accurate semantic comparison method facing the key medical information in the medical text.
The embodiment of the invention also provides a computer storage medium which stores a computer software product, wherein the computer software product comprises a plurality of instructions for enabling a piece of computer equipment to execute the accurate semantic comparison method facing the key medical information in the medical text.
From the above technical scheme, the invention has the following beneficial effects:
(1) According to the invention, the unified medical language system is used for carrying out standardized processing on clinical medical elements, the introduction of the knowledge can greatly enhance the accuracy and coverage of semantic comparison, and not only can a large number of medical synonym comparison be effectively covered, but also cross-language Chinese-English medical text comparison tasks can be completed. However, it is difficult to achieve this with either the conventional keyword-based method or the vectorization-based method.
(2) According to the invention, a data knowledge dual-drive hybrid strategy is used, and medical term knowledge from medical texts is injected into the pre-training language model, so that the capability of the model for comparing key medical information is effectively improved, and the problem that all real world terms cannot be covered by relying on a knowledge base alone is solved. The strategy is a comparison problem which is firstly applied to key medical information in medical texts.
(3) By the technical scheme, the two input medical texts can directly output key medical information which can be compared between the two medical texts. The comparison result is visual and has strong interpretability, and has strong guidance for clinical medical professionals to judge the similarity of patients, compare the medical records of the patients, guide and the like. Before the invention, clinical medical professionals can only complete the comparison work of key medical information with naked eyes and manual work, which is time-consuming and labor-consuming. The invention not only technically fills up the blank of the key medical information comparison technology of the medical text, but also can effectively improve the efficiency of clinical experts in comparing the medical text.
Drawings
For a clearer description of embodiments of the invention or of solutions in the prior art, reference will be made to the accompanying drawings, which are intended to be used in the examples, for a clearer understanding of the characteristics and advantages of the invention, by way of illustration and not to be interpreted as limiting the invention in any way, and from which, without any inventive effort, a person skilled in the art can obtain other figures. Wherein:
FIG. 1 is a schematic diagram of input and output of a medical text alignment technique in the background art;
FIG. 2 is a flowchart of a method for accurate semantic comparison of key medical information in a medical text provided in an embodiment;
FIG. 3 is a schematic diagram of identifying core medical concepts and attribute entities associated with the core medical concepts contained in a medical text using a BERT-CRF named entity identification architecture in an embodiment;
FIG. 4 is a flow chart of a method of medical science normalization in an embodiment;
FIG. 5 is a schematic diagram of a traditional Chinese medicine text semantic structural unit comparison strategy framework in an embodiment;
FIG. 6 is a schematic diagram of a precise semantic comparison technique of key medical information in a medical text according to an embodiment;
Fig. 7 is a block diagram of a precise semantic comparison system for key medical information in medical texts, which is provided in an embodiment.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Because the current technical solution cannot effectively solve the problem of accurate semantic comparison of key medical information in a medical text, as shown in fig. 2, an embodiment of the present invention provides an accurate semantic comparison method for key medical information in a medical text, which includes:
step S1: inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
Step S2: based on the semantic structure unit list, establishing a semantic structure unit similarity distinguishing model;
step S3: and comparing the phenotype concepts and the attribute sets in the semantic structural units in sequence, and comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result.
According to the technical scheme, the invention provides the accurate semantic comparison method for the key medical information in the medical text, the accurate semantic comparison technology for the key medical information in the medical text is established by integrating the technologies of a unified medical language system, a pre-training language model and the like, meanwhile, a data knowledge dual-drive hybrid strategy is used, and the capability of the model to compare the key medical information is effectively improved by injecting the medical term knowledge from the medical text into the pre-training language model, so that the problem that all real world terms cannot be covered by the knowledge base alone is solved. The invention not only technically fills up the blank of the key medical information comparison technology of the medical text, but also can effectively improve the efficiency of clinical experts in comparing the medical text.
In the present embodiment, in step S1, two different medical texts are first input; then extracting medical information (clinical medical conceptual information such as diseases, symptoms, physical signs and the like) contained in the medical text, and carrying out standardized treatment on the medical information; finally, a structured and standardized semantic structure unit list in the medical text is obtained, which specifically comprises the following steps:
step S11: medical text in natural language form is entered.
Step S12: based on bi-directional coding representation BERT and conditional random field CRF of the converter, a BERT-CRF named entity recognition architecture is constructed, and the BERT-CRF named entity recognition architecture is used for recognizing core medical concepts contained in medical texts and attribute entities associated with the core medical concepts.
Specifically, the structuring step aims at establishing a model to identify core medical concepts and associated attribute entities thereof contained in clinical medical texts, wherein the core medical concepts comprise diseases, symptoms, signs, examinations, operations, medicines and the like, and the attribute entities associated with the core medical concepts comprise the attributes of existence, severity, urgency, attack position and the like. To solve the problem, the invention firstly constructs a BERT-CRF named entity recognition architecture (a neural network architecture for named entity recognition) based on bidirectional coding representation (Bidirectional Encoder Representations from Transformers, BERT for short) of a converter and a conditional random field (conditional random field, CRF for short). The BERT-CRF named entity recognition architecture combines the capability of embedding and capturing the word of the BERT with the tag dependency capturing capability of the CRF, thereby realizing high performance in a named entity recognition task.
Further, using the BERT-CRF named entity recognition architecture, the method for recognizing the core medical concepts and the attribute entities associated with the core medical concepts contained in the medical text specifically includes, as shown in fig. 3:
firstly, obtaining an input sequence according to an input medical text in a natural language form;
then, encoding the input sequence by using the bi-directional encoding representation BERT of the converter to obtain an embedded representation of each word;
Then inputting the obtained embedded representation of each word into a conditional random field CRF, and carrying out label prediction by the conditional random field CRF according to the context information of the word and the dependency relationship between labels;
and finally outputting the core medical concept contained in the medical text and the attribute entity associated with the core medical concept.
Step S13: and (3) constructing and training a pre-training language model, carrying out standardized processing on attribute entities obtained in the process of structured extraction of the Chinese medicine text in the step (S12) by utilizing the pre-training language model, and linking the attribute entities to a unified medical language system.
Specifically, the invention firstly uses a medium-english bilingual aligned medical term set from a unified medical language system (Unified Medical Language System, abbreviated as UMLS) as a corpus; then training a pre-training language model for associating a certain medical term to a standard medical concept of the unified medical language system based on a contrast learning method; and finally, carrying out standardized processing on the entity obtained in the structured extraction process of the Chinese medicine text in the step S21 by utilizing a pre-training language model, and linking the entity to a unified medical language system.
Further, the unified medical language system (Unified Medical Language System, abbreviated as UMLS) is a huge medical term system which is continuously developed by the national medical library for over 20 years, covers medical science and related disciplines of clinic, basic, pharmacy, biology, medical management and the like, records about 200 ten thousand medical concepts, and has more unprecedented medical vocabulary reaching over 500 ten thousand.
The objective of contrast learning, also called contrast learning, is to learn an encoder that encodes similar data of the same class and makes the encoding results of data of different classes as different as possible. Compared with the generation type learning, the comparison type learning does not need to pay attention to complicated details on the example, and only needs to learn the distinction of data on the feature space of the abstract semantic level, so that the model and the optimization thereof become simpler, and the generalization capability is stronger.
The invention extracts the recorded synonym knowledge from the UMLS knowledge base, and uses a translation tool (hundred-degree translation) to translate the synonym knowledge into a Chinese-English bilingual synonym pair, and the bilingual mapped synonym knowledge base is injected into a multi-language pre-training language model by applying a contrast learning thought, so that the multi-language pre-training language model finally has the capability of mapping the Chinese medical terms to the corresponding UMLS standard medical terms, and the process is called standardization. The flow of a specific medical term normalization method is shown in fig. 4.
Step S14: and outputting a structured and standardized semantic structure unit list.
Through the medical text structuring and standardizing process flow, a piece of medical text in a natural language form can be converted into a structured and standardized clinical medical semantic structure unit list.
In this embodiment, in step S2, a semantic structure unit similarity distinction model is built based on the semantic structure unit list, which specifically includes:
Step S21: constructing a medical synonym data set by collecting, sorting and translating medical synonym knowledge from a unified medical language system, wherein the medical synonym data set comprises Chinese and English term pairs, term pairs of hierarchical relations and dissimilar term pairs;
Step S22: based on the constructed medical synonym data set, the parameters of the pre-training language model are re-trained and fine-tuned, so that a semantic structure unit similarity distinguishing model for judging whether two semantic structure units are similar is constructed.
Specifically, after obtaining a structured, standardized list of semantic building units, the present invention explores and develops an optimization ratio strategy for semantic building units. First, by collecting, sorting and translating knowledge of medical synonyms from the unified medical language system UMLS, more than 100 tens of thousands of pairs of medical synonym datasets are constructed. This medical synonym dataset includes Chinese and English term pairs, hierarchically related term pairs (e.g., "fever" and "hyperthermia"), and dissimilar term pairs. The parameters of the pre-trained language model BERT are then re-pre-trained-fine-tuned based on these synonym knowledge, as shown in fig. 5, to construct a semantic building block similarity discrimination model (classifier) for determining whether two semantic building blocks are similar.
In this embodiment, in step S3, in the comparison process of the semantic structural units, the phenotype concepts and the attribute sets in the semantic structural units are compared sequentially, and based on the semantic structural unit similarity distinguishing model, the similarity category of the complete semantic structural unit is comprehensively judged according to the results of the phenotype concepts and the attribute sets, so as to obtain the key medical information comparison result.
Specifically, according to different semantic structure unit types, different strategies are adopted to carry out semantic comparison on the semantic structure units:
(1) For term-type semantic structural units, a pre-trained language model strategy is directly adopted. The term type semantic structural unit similarity class is divided into three classes, namely: "exactly equal", "partially similar" and "dissimilar". For example, "abdominal pain" and "abdominal pain" are converted into semantic structural units in the format of [ phenotypic concept: pain, severity: severe, onset site: abdomen and [ phenotypic concept: pain, severity: mild, onset site: directly using the classifier (semantic structural unit similarity distinguishing model) to compare and judge the surface type concept [ pain, pain ] and the attribute set [ severe, abdomen ], [ mild, abdomen ] in sequence. Here the phenotype concept similarity class is "completely equal", the property set is "partially similar", and thus the similarity class of the complete semantic building block is "partially similar".
(2) For the logical semantic structure unit, a strategy based on a knowledge base is directly adopted. The logical semantic structural unit similarity categories are divided into two categories, namely: "exactly equal" and "dissimilar". We use a knowledge base based approach to uniformly transform laboratory exams into semantic building block forms of "exam origin-analyte-abnormality judgment", where "analyte" is a "phenotypic concept". For example, "blood convention: white blood cell bias high "and" blood convention: WBC 15x109/L ", converts it into semantic building blocks in the format: phenotypic concept: white blood cell count, check source: blood, abnormality judgment: top-hat [ phenotypic concept: white blood cell count, check source: blood, abnormality judgment: and (3) directly judging the similarity of the semantic structural units based on the result. Here the phenotype concept similarity class is "completely equal", the property set is also "completely equal", and the similarity class of the complete semantic building block is "completely equal".
Finally, the invention provides a general framework for comparing the semantic structural units of the unstructured medical document, as shown in fig. 5, and develops an algorithm effectively combining a knowledge base strategy and a pre-training language model strategy to realize the semantic alignment of the phenotype information fine granularity level of the unstructured medical document.
In order to further illustrate the technical scheme and advantages of the invention, the following description is made through specific experiments.
Given medical text a: patients suddenly showed severe vomiting and abdominal pain with mild fever. Blood convention WBC 12.9x109/L "; medical text B: patient's sudden severe vomiting, mild abdominal pain in abdomen, no fever. Blood convention, white blood cell bias. ". The accurate semantic comparison technology for key medical information in medical texts, which is established by the invention, can obtain the comparison result shown in figure 6. It can be seen that, despite the "sudden severe vomiting" and "sudden severe vomiting", "WBC 12.9x109/L" and "blood convention: the white blood cells are higher "are not identical literally, but they are identical on a semantic level by semantic structuring and standardization techniques; "light fever" and "no fever" are quite different on a semantic level, although they both have "fever" keywords. Therefore, the invention effectively improves the accuracy of the comparison of the key medical information in the medical text.
Example two
As shown in fig. 7, the present invention provides a system for precisely comparing semantic meaning of key medical information in a medical text, which is used for implementing the method for precisely comparing semantic meaning of key medical information in a medical text according to the first embodiment, and specifically includes:
the key medical information extraction module 10 is used for inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
the similarity distinguishing model establishing module 20 is configured to establish a similarity distinguishing model of the semantic structural units based on the semantic structural unit list;
the key medical information comparison module 30 is used for sequentially comparing the phenotype concepts and the attribute sets in the semantic structural units, comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model, and obtaining key medical information comparison results.
The foregoing accurate semantic comparison method for the key medical information in the medical text is used to implement the foregoing accurate semantic comparison method for the key medical information in the medical text, so that the specific implementation of the accurate semantic comparison system for the key medical information in the medical text may be the embodiment part of the foregoing accurate semantic comparison method for the key medical information in the medical text, for example, the key medical information extraction module 10, the similarity distinction model establishment module 20, and the key medical information comparison module 30 are respectively used to implement steps S1, S2, and S3 in the foregoing accurate semantic comparison method for the key medical information in the medical text, so that the specific implementation thereof may refer to descriptions of the corresponding embodiments of the respective parts, and in order to avoid redundancy, details are not repeated herein.
Example III
The embodiment of the invention also provides electronic equipment, which comprises a processor, a memory and a bus system, wherein the processor is connected with the memory through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored by the memory so as to realize the accurate semantic comparison method facing the key medical information in the medical text.
Example IV
The embodiment of the invention also provides a computer storage medium which stores a computer software product, wherein the computer software product comprises a plurality of instructions for enabling a piece of computer equipment to execute the accurate semantic comparison method facing the key medical information in the medical text.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims (10)

1. The accurate semantic comparison method for key medical information in medical texts is characterized by comprising the following steps of:
step S1: inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
Step S2: based on the semantic structure unit list, establishing a semantic structure unit similarity distinguishing model;
step S3: and comparing the phenotype concepts and the attribute sets in the semantic structural units in sequence, and comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result.
2. The method for precisely semantic comparison of key medical information in medical texts according to claim 1, wherein in step S1, two different medical texts are input, medical information contained in the medical texts is extracted, and standardized processing is performed on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts, and the method specifically comprises the following steps:
Step S11: inputting medical text in natural language form;
Step S12: based on bi-directional coding representation BERT and conditional random field CRF of the converter, constructing a BERT-CRF named entity identification architecture, and identifying core medical concepts contained in medical texts and attribute entities associated with the core medical concepts by using the BERT-CRF named entity identification architecture;
Step S13: constructing and training a pre-training language model, carrying out standardized processing on attribute entities obtained in the process of structured extraction of the Chinese medicine text in the step S12 by utilizing the pre-training language model, and linking the attribute entities to a unified medical language system;
Step S14: and outputting a structured and standardized semantic structure unit list.
3. The method for precisely semantic comparison of key medical information in medical texts according to claim 2, wherein in step S12, the BERT-CRF named entity recognition architecture is used, and the method for recognizing the core medical concepts contained in the medical texts and the attribute entities associated with the core medical concepts specifically comprises the following steps:
firstly, obtaining an input sequence according to an input medical text in a natural language form;
then, encoding the input sequence by using the bi-directional encoding representation BERT of the converter to obtain an embedded representation of each word;
Then inputting the obtained embedded representation of each word into a conditional random field CRF, and carrying out label prediction by the conditional random field CRF according to the context information of the word and the dependency relationship between labels;
and finally outputting the core medical concept contained in the medical text and the attribute entity associated with the core medical concept.
4. The method of claim 3, wherein the core medical concept includes but is not limited to diseases, symptoms, signs, examinations, procedures, and medications, and the attribute entity associated with the core medical concept includes but is not limited to presence, severity, urgency, and seizure.
5. The accurate semantic comparison method for key medical information in medical texts according to claim 2, wherein in step S13, a pre-training language model is constructed and trained, and attribute entities obtained in the process of structured extraction of the medical texts in step S12 are standardized and linked to a unified medical language system by using the pre-training language model, and specifically comprising:
firstly, a Chinese-English bilingual aligned medical term set from a unified medical language system is used as a corpus;
Then training a pre-training language model for associating a certain medical term to a standard medical concept of the unified medical language system based on a contrast learning method;
And finally, carrying out standardized processing on the entity obtained in the structured extraction process of the Chinese medicine text in the step S21 by utilizing a pre-training language model, and linking the entity to a unified medical language system.
6. The method for precisely semantic comparison of key medical information in a medical text according to claim 1, wherein in step S2, the method for establishing a semantic structural unit similarity distinguishing model based on a semantic structural unit list specifically comprises:
Step S21: constructing a medical synonym data set by collecting, sorting and translating medical synonym knowledge from a unified medical language system, wherein the medical synonym data set comprises Chinese and English term pairs, term pairs of hierarchical relations and dissimilar term pairs;
Step S22: based on the constructed medical synonym data set, the parameters of the pre-training language model are re-trained and fine-tuned, so that a semantic structure unit similarity distinguishing model for judging whether two semantic structure units are similar is constructed.
7. The accurate semantic comparison method for key medical information in medical texts according to claim 1, wherein in step S3, phenotype concepts and attribute sets in semantic structural units are compared in sequence, similarity classification models of semantic structural units are based, and similarity categories of complete semantic structural units are comprehensively judged according to results of the phenotype concepts and the attribute sets to obtain key medical information comparison results, and the method specifically comprises the following steps:
According to different semantic structure unit types, different strategies are adopted to carry out semantic comparison on the semantic structure units, and the semantic structure unit types are divided into term type semantic structure units and logic type semantic structure units;
For the term type semantic structure unit, a pre-training language model strategy is adopted, a semantic structure unit similarity distinguishing model is used, comparison and judgment are sequentially carried out on the surface type concept and the attribute set, and a comparison result is obtained, wherein the term type semantic structure unit similarity category is divided into three categories, namely: "exactly equal", "partially similar" and "dissimilar";
For the logic type semantic structure unit, a strategy based on knowledge base driving is adopted, laboratory examination is firstly uniformly converted into a semantic structure unit form of examination source-analyte-abnormality judgment, and then similarity judgment of the semantic structure unit is directly carried out, wherein the similarity categories of the logic type semantic structure unit are divided into two categories, namely: "exactly equal" and "dissimilar".
8. The system for precisely comparing the semantic meaning of the key medical information in the medical text is characterized by being used for realizing the precisely comparing method for the key medical information in the medical text according to any one of claims 1 to 7, and specifically comprising the following steps:
The key medical information extraction module is used for inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
the similarity distinguishing model establishing module is used for establishing a similarity distinguishing model of the semantic structural units based on the semantic structural unit list;
The key medical information comparison module is used for sequentially comparing the phenotype concepts and the attribute sets in the semantic structural units, comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model, and obtaining key medical information comparison results.
9. An electronic device, characterized in that the electronic device comprises a processor, a memory and a bus system, the processor and the memory are connected through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored in the memory so as to realize the accurate semantic comparison method for key medical information in medical texts according to any one of claims 1 to 7.
10. A computer storage medium, wherein the computer storage medium stores a computer software product comprising instructions for causing a computer device to perform the method of precise semantic comparison of critical medical information in medical-oriented text according to any of claims 1 to 7.
CN202410363130.5A 2024-03-28 2024-03-28 Accurate semantic comparison method and system for key medical information in medical text Active CN117973393B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410363130.5A CN117973393B (en) 2024-03-28 2024-03-28 Accurate semantic comparison method and system for key medical information in medical text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410363130.5A CN117973393B (en) 2024-03-28 2024-03-28 Accurate semantic comparison method and system for key medical information in medical text

Publications (2)

Publication Number Publication Date
CN117973393A true CN117973393A (en) 2024-05-03
CN117973393B CN117973393B (en) 2024-06-07

Family

ID=90848061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410363130.5A Active CN117973393B (en) 2024-03-28 2024-03-28 Accurate semantic comparison method and system for key medical information in medical text

Country Status (1)

Country Link
CN (1) CN117973393B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581960A (en) * 2020-05-06 2020-08-25 上海海事大学 Method for obtaining semantic similarity of medical texts
CN111737997A (en) * 2020-06-18 2020-10-02 达而观信息科技(上海)有限公司 Text similarity determination method, text similarity determination equipment and storage medium
CN112270965A (en) * 2020-11-16 2021-01-26 苏州***医学研究所 Semantic structural processing method for medical text phenotype information
CN113343703A (en) * 2021-08-09 2021-09-03 北京惠每云科技有限公司 Medical entity classification extraction method and device, electronic equipment and storage medium
CN114707516A (en) * 2022-03-29 2022-07-05 北京理工大学 Long text semantic similarity calculation method based on contrast learning
US20230034401A1 (en) * 2021-07-16 2023-02-02 Novoic Ltd. Method of evaluating text similarity for diagnosis or monitoring of a health condition
CN116341557A (en) * 2023-05-29 2023-06-27 华北理工大学 Diabetes medical text named entity recognition method
CN116702743A (en) * 2023-07-07 2023-09-05 中国平安人寿保险股份有限公司 Text similarity detection method and device, electronic equipment and storage medium
CN116776884A (en) * 2023-06-26 2023-09-19 中山大学 Data enhancement method and system for medical named entity recognition
JP2024027087A (en) * 2022-08-16 2024-02-29 之江実験室 Standard medical term management system and method based on general model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581960A (en) * 2020-05-06 2020-08-25 上海海事大学 Method for obtaining semantic similarity of medical texts
CN111737997A (en) * 2020-06-18 2020-10-02 达而观信息科技(上海)有限公司 Text similarity determination method, text similarity determination equipment and storage medium
CN112270965A (en) * 2020-11-16 2021-01-26 苏州***医学研究所 Semantic structural processing method for medical text phenotype information
US20230034401A1 (en) * 2021-07-16 2023-02-02 Novoic Ltd. Method of evaluating text similarity for diagnosis or monitoring of a health condition
CN113343703A (en) * 2021-08-09 2021-09-03 北京惠每云科技有限公司 Medical entity classification extraction method and device, electronic equipment and storage medium
CN114707516A (en) * 2022-03-29 2022-07-05 北京理工大学 Long text semantic similarity calculation method based on contrast learning
JP2024027087A (en) * 2022-08-16 2024-02-29 之江実験室 Standard medical term management system and method based on general model
CN116341557A (en) * 2023-05-29 2023-06-27 华北理工大学 Diabetes medical text named entity recognition method
CN116776884A (en) * 2023-06-26 2023-09-19 中山大学 Data enhancement method and system for medical named entity recognition
CN116702743A (en) * 2023-07-07 2023-09-05 中国平安人寿保险股份有限公司 Text similarity detection method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LUMING CHEN等: "TeaBERT: An Efficient Knowledge Infused Cross-Lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System", 《BIOMEDICAL AND HEALTH INFORMATICS》, vol. 27, no. 12, 31 December 2023 (2023-12-31), pages 6029 - 6037 *
程瑶,等: "中文标准医学术语集对实际应用覆盖度研究", 《中国卫生信息管理》, vol. 17, no. 5, 31 December 2020 (2020-12-31), pages 601 - 605 *

Also Published As

Publication number Publication date
CN117973393B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
CN112487202B (en) Chinese medical named entity recognition method and device fusing knowledge map and BERT
Bharadiya A comprehensive survey of deep learning techniques natural language processing
CN111078875B (en) Method for extracting question-answer pairs from semi-structured document based on machine learning
CN112597774B (en) Chinese medical named entity recognition method, system, storage medium and equipment
Alobaidi et al. Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain
CN112241457A (en) Event detection method for event of affair knowledge graph fused with extension features
Tyagi et al. Demystifying the role of natural language processing (NLP) in smart city applications: background, motivation, recent advances, and future research directions
Liu et al. Concept placement using BERT trained by transforming and summarizing biomedical ontology structure
CN113901807A (en) Clinical medicine entity recognition method and clinical test knowledge mining method
CN115048447B (en) Database natural language interface system based on intelligent semantic completion
CN113707339B (en) Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN113742493A (en) Method and device for constructing pathological knowledge map
CN111651569B (en) Knowledge base question-answering method and system in electric power field
Adduru et al. Towards Dataset Creation And Establishing Baselines for Sentence-level Neural Clinical Paraphrase Generation and Simplification.
Abdallah et al. Exploring the state of the art in legal QA systems
CN113111660A (en) Data processing method, device, equipment and storage medium
Xiang et al. A cross-guidance cross-lingual model on generated parallel corpus for classical Chinese machine reading comprehension
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
CN117973393B (en) Accurate semantic comparison method and system for key medical information in medical text
CN116227594A (en) Construction method of high-credibility knowledge graph of medical industry facing multi-source data
Afzal et al. Multi-class clinical text annotation and classification using bert-based active learning
CN113314236A (en) Intelligent question-answering system for hypertension
Singh et al. Next-LSTM: a novel LSTM-based image captioning technique
Madi et al. Grammar checking and relation extraction in text: approaches, techniques and open challenges

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant