CN117973393A - Accurate semantic comparison method and system for key medical information in medical text - Google Patents
Accurate semantic comparison method and system for key medical information in medical text Download PDFInfo
- Publication number
- CN117973393A CN117973393A CN202410363130.5A CN202410363130A CN117973393A CN 117973393 A CN117973393 A CN 117973393A CN 202410363130 A CN202410363130 A CN 202410363130A CN 117973393 A CN117973393 A CN 117973393A
- Authority
- CN
- China
- Prior art keywords
- medical
- semantic
- key
- medical information
- texts
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000012549 training Methods 0.000 claims description 33
- 238000012545 processing Methods 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 12
- 239000003814 drug Substances 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 7
- 201000010099 disease Diseases 0.000 claims description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 5
- 208000024891 symptom Diseases 0.000 claims description 5
- 229940079593 drug Drugs 0.000 claims description 4
- 238000002483 medication Methods 0.000 claims description 2
- 238000013145 classification model Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 12
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 210000000265 leukocyte Anatomy 0.000 description 17
- 238000010586 diagram Methods 0.000 description 11
- 208000004998 Abdominal Pain Diseases 0.000 description 9
- 210000004369 blood Anatomy 0.000 description 8
- 239000008280 blood Substances 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 206010037660 Pyrexia Diseases 0.000 description 5
- 210000001015 abdomen Anatomy 0.000 description 5
- 238000004820 blood count Methods 0.000 description 5
- 208000002193 Pain Diseases 0.000 description 4
- 206010047700 Vomiting Diseases 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008673 vomiting Effects 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 206010003011 Appendicitis Diseases 0.000 description 1
- 208000002881 Colic Diseases 0.000 description 1
- 206010020843 Hyperthermia Diseases 0.000 description 1
- 208000019789 abdominal cramp Diseases 0.000 description 1
- 239000012491 analyte Substances 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000036031 hyperthermia Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 229940127554 medical product Drugs 0.000 description 1
- 208000030247 mild fever Diseases 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Medical Treatment And Welfare Office Work (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a precise semantic comparison method and a system for key medical information in a medical text, and relates to the technical field of medical natural language processing. Based on the semantic structure unit list, establishing a semantic structure unit similarity distinguishing model; and comparing the phenotype concepts and the attribute sets in the semantic structural units in sequence, and comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result. The invention not only technically fills up the blank of the key medical information comparison technology of the medical text, but also can effectively improve the efficiency of clinical experts in comparing the medical text.
Description
Technical Field
The invention relates to the technical field of medical natural language processing, in particular to a precise semantic comparison method and system for key medical information in medical texts.
Background
The range of medical texts includes natural language texts such as medical textbooks, clinical guidelines, medical documents, and electronic medical records. The key medical information in the medical text refers to clinical medical terms including clinical medical concepts of diseases, symptoms, signs, examinations, operations, and medicines, which are contained in natural language text. The definition of the accurate semantic comparison problem of key medical information in the medical text is as follows: and (3) extracting the key medical information contained in the two medical texts through a natural language processing algorithm, and carrying out semantic comparison, so that the key medical information comparison results of the two medical texts are finally output, wherein the key medical information comparison results comprise clinical medical elements which are completely identical, partially similar and completely different. For example, in medical text A ("patient's right lower abdomen severe needle-like abdominal pain … laboratory exam: WBC 12.9 x10 9/L …") and medical text B ("patient complaint left upper abdominal cramp … Blood image: white Blood Cell bias higher …"), two clinical medical concepts were mentioned, namely "abdominal pain" and "White Blood Cell" (WBC), the comparison of which were "identical" and "partially similar", respectively, as shown in FIG. 1. The accurate semantic comparison technology of key medical information in the medical text is an important basic technology in the field of medical informatics, and can be widely applied to intelligent medical products such as similar patient retrieval, clinical auxiliary diagnosis decision and the like.
The current text comparison technology is mainly focused on the field of general texts, and the research on the precise semantic comparison technology of key medical information in medical texts is very rare. Since the medical concepts contained in the medical text are semantically very complex, the general text comparison technology has a plurality of defects in applicability and accuracy when being directly applied to the medical text comparison problem. For example, the keyword-based comparison technique extracts keywords contained in a text mainly according to a predefined keyword list, and outputs keywords common to two medical texts. However, there are a number of synonym phenomena in medical text, such as "abdominal pain" and "abdominal pain", "appendicitis" and "brainstorming", which are completely synonymous at the semantic level, but differ at the character level, so that the solution cannot compare them together. In addition, keyword matching techniques have not been able to solve the problem of semantic comparison of qualitative and quantitative inspection class terms in medical text, such as "WBC:12.9 x 10 9/L "and" higher white blood cell count "are comparable, because the reference range is based on the normal white blood cell count (adult 3.5-9.5 x 10 9/L),"WBC:12.9*109/L" belongs to the category of "higher white blood cell count").
In addition to the keyword comparison technology, text comparison tools represented by TextDiff and Text computer are used for comparing texts based on the word element composition and the editing distance of character strings, however, the main purpose of the schemes is to calculate the overall similarity between texts, and the accurate semantic comparison target of the key medical information in the medical texts as shown in fig. 1 cannot be realized. Similarly, there is a method for vectorizing text strings based on a deep learning algorithm to perform text comparison, but similarly, the method can calculate the similarity on the whole text layer, but cannot realize the accurate semantic comparison target of medical texts taking medical concepts as units.
Therefore, a brand new technical scheme for comparing medical texts is needed, which is used for solving the problem of accurate semantic comparison of key medical information in medical texts.
Disclosure of Invention
Therefore, the embodiment of the invention provides a method and a system for precisely comparing the semantics of key medical information in medical texts, which are used for solving the problems that the prior art cannot solve the semantic comparison of qualitative and quantitative check type terms in medical texts and cannot realize the precise semantic comparison target of the medical texts taking medical concepts as units.
In order to solve the above problems, an embodiment of the present invention provides a method for precisely comparing semantic meaning of key medical information in a medical text, the method comprising:
step S1: inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
Step S2: based on the semantic structure unit list, establishing a semantic structure unit similarity distinguishing model;
step S3: and comparing the phenotype concepts and the attribute sets in the semantic structural units in sequence, and comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result.
Preferably, in step S1, two different medical texts are input, medical information contained in the medical texts is extracted, and standardized processing is performed on the medical information, so as to obtain a structured and standardized semantic structure unit list in the medical texts, which specifically includes:
Step S11: inputting medical text in natural language form;
Step S12: based on bi-directional coding representation BERT and conditional random field CRF of the converter, constructing a BERT-CRF named entity identification architecture, and identifying core medical concepts contained in medical texts and attribute entities associated with the core medical concepts by using the BERT-CRF named entity identification architecture;
Step S13: constructing and training a pre-training language model, carrying out standardized processing on attribute entities obtained in the process of structured extraction of the Chinese medicine text in the step S12 by utilizing the pre-training language model, and linking the attribute entities to a unified medical language system;
Step S14: and outputting a structured and standardized semantic structure unit list.
Preferably, in step S12, using the BERT-CRF named entity recognition architecture, a method for recognizing core medical concepts and attribute entities associated with the core medical concepts contained in a medical text specifically includes:
firstly, obtaining an input sequence according to an input medical text in a natural language form;
then, encoding the input sequence by using the bi-directional encoding representation BERT of the converter to obtain an embedded representation of each word;
Then inputting the obtained embedded representation of each word into a conditional random field CRF, and carrying out label prediction by the conditional random field CRF according to the context information of the word and the dependency relationship between labels;
and finally outputting the core medical concept contained in the medical text and the attribute entity associated with the core medical concept.
Preferably, the core medical concept includes, but is not limited to, diseases, symptoms, signs, examinations, procedures, and medications, and the attribute entities associated with the core medical concept include, but are not limited to, presence, severity, urgency, and seizure.
Preferably, in step S13, a pre-training language model is constructed and trained, and the attribute entity obtained in the text structured extraction process in step S12 is standardized by using the pre-training language model and linked to a unified medical language system, which specifically includes:
firstly, a Chinese-English bilingual aligned medical term set from a unified medical language system is used as a corpus;
Then training a pre-training language model for associating a certain medical term to a standard medical concept of the unified medical language system based on a contrast learning method;
And finally, carrying out standardized processing on the entity obtained in the structured extraction process of the Chinese medicine text in the step S21 by utilizing a pre-training language model, and linking the entity to a unified medical language system.
Preferably, in step S2, the method for establishing a semantic structural unit similarity distinguishing model based on the semantic structural unit list specifically includes:
Step S21: constructing a medical synonym data set by collecting, sorting and translating medical synonym knowledge from a unified medical language system, wherein the medical synonym data set comprises Chinese and English term pairs, term pairs of hierarchical relations and dissimilar term pairs;
Step S22: based on the constructed medical synonym data set, the parameters of the pre-training language model are re-trained and fine-tuned, so that a semantic structure unit similarity distinguishing model for judging whether two semantic structure units are similar is constructed.
Preferably, in step S3, comparing the phenotype concept and the attribute set in the semantic structural unit in sequence, and comprehensively judging the similarity category of the complete semantic structural unit according to the results of the phenotype concept and the attribute set based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result, which specifically includes:
According to different semantic structure unit types, different strategies are adopted to carry out semantic comparison on the semantic structure units, and the semantic structure unit types are divided into term type semantic structure units and logic type semantic structure units;
For the term type semantic structure unit, a pre-training language model strategy is adopted, a semantic structure unit similarity distinguishing model is used, comparison and judgment are sequentially carried out on the surface type concept and the attribute set, and a comparison result is obtained, wherein the term type semantic structure unit similarity category is divided into three categories, namely: "exactly equal", "partially similar" and "dissimilar";
For the logic type semantic structure unit, a strategy based on knowledge base driving is adopted, laboratory examination is firstly uniformly converted into a semantic structure unit form of examination source-analyte-abnormality judgment, and then similarity judgment of the semantic structure unit is directly carried out, wherein the similarity categories of the logic type semantic structure unit are divided into two categories, namely: "exactly equal" and "dissimilar".
The embodiment of the invention also provides a system for precisely comparing the semantic meaning of the key medical information in the medical text, which is used for realizing the precisely comparing method for the key medical information in the medical text, and specifically comprises the following steps:
The key medical information extraction module is used for inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
the similarity distinguishing model establishing module is used for establishing a similarity distinguishing model of the semantic structural units based on the semantic structural unit list;
The key medical information comparison module is used for sequentially comparing the phenotype concepts and the attribute sets in the semantic structural units, comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model, and obtaining key medical information comparison results.
The embodiment of the invention also provides electronic equipment, which comprises a processor, a memory and a bus system, wherein the processor is connected with the memory through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored by the memory so as to realize the accurate semantic comparison method facing the key medical information in the medical text.
The embodiment of the invention also provides a computer storage medium which stores a computer software product, wherein the computer software product comprises a plurality of instructions for enabling a piece of computer equipment to execute the accurate semantic comparison method facing the key medical information in the medical text.
From the above technical scheme, the invention has the following beneficial effects:
(1) According to the invention, the unified medical language system is used for carrying out standardized processing on clinical medical elements, the introduction of the knowledge can greatly enhance the accuracy and coverage of semantic comparison, and not only can a large number of medical synonym comparison be effectively covered, but also cross-language Chinese-English medical text comparison tasks can be completed. However, it is difficult to achieve this with either the conventional keyword-based method or the vectorization-based method.
(2) According to the invention, a data knowledge dual-drive hybrid strategy is used, and medical term knowledge from medical texts is injected into the pre-training language model, so that the capability of the model for comparing key medical information is effectively improved, and the problem that all real world terms cannot be covered by relying on a knowledge base alone is solved. The strategy is a comparison problem which is firstly applied to key medical information in medical texts.
(3) By the technical scheme, the two input medical texts can directly output key medical information which can be compared between the two medical texts. The comparison result is visual and has strong interpretability, and has strong guidance for clinical medical professionals to judge the similarity of patients, compare the medical records of the patients, guide and the like. Before the invention, clinical medical professionals can only complete the comparison work of key medical information with naked eyes and manual work, which is time-consuming and labor-consuming. The invention not only technically fills up the blank of the key medical information comparison technology of the medical text, but also can effectively improve the efficiency of clinical experts in comparing the medical text.
Drawings
For a clearer description of embodiments of the invention or of solutions in the prior art, reference will be made to the accompanying drawings, which are intended to be used in the examples, for a clearer understanding of the characteristics and advantages of the invention, by way of illustration and not to be interpreted as limiting the invention in any way, and from which, without any inventive effort, a person skilled in the art can obtain other figures. Wherein:
FIG. 1 is a schematic diagram of input and output of a medical text alignment technique in the background art;
FIG. 2 is a flowchart of a method for accurate semantic comparison of key medical information in a medical text provided in an embodiment;
FIG. 3 is a schematic diagram of identifying core medical concepts and attribute entities associated with the core medical concepts contained in a medical text using a BERT-CRF named entity identification architecture in an embodiment;
FIG. 4 is a flow chart of a method of medical science normalization in an embodiment;
FIG. 5 is a schematic diagram of a traditional Chinese medicine text semantic structural unit comparison strategy framework in an embodiment;
FIG. 6 is a schematic diagram of a precise semantic comparison technique of key medical information in a medical text according to an embodiment;
Fig. 7 is a block diagram of a precise semantic comparison system for key medical information in medical texts, which is provided in an embodiment.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Because the current technical solution cannot effectively solve the problem of accurate semantic comparison of key medical information in a medical text, as shown in fig. 2, an embodiment of the present invention provides an accurate semantic comparison method for key medical information in a medical text, which includes:
step S1: inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
Step S2: based on the semantic structure unit list, establishing a semantic structure unit similarity distinguishing model;
step S3: and comparing the phenotype concepts and the attribute sets in the semantic structural units in sequence, and comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result.
According to the technical scheme, the invention provides the accurate semantic comparison method for the key medical information in the medical text, the accurate semantic comparison technology for the key medical information in the medical text is established by integrating the technologies of a unified medical language system, a pre-training language model and the like, meanwhile, a data knowledge dual-drive hybrid strategy is used, and the capability of the model to compare the key medical information is effectively improved by injecting the medical term knowledge from the medical text into the pre-training language model, so that the problem that all real world terms cannot be covered by the knowledge base alone is solved. The invention not only technically fills up the blank of the key medical information comparison technology of the medical text, but also can effectively improve the efficiency of clinical experts in comparing the medical text.
In the present embodiment, in step S1, two different medical texts are first input; then extracting medical information (clinical medical conceptual information such as diseases, symptoms, physical signs and the like) contained in the medical text, and carrying out standardized treatment on the medical information; finally, a structured and standardized semantic structure unit list in the medical text is obtained, which specifically comprises the following steps:
step S11: medical text in natural language form is entered.
Step S12: based on bi-directional coding representation BERT and conditional random field CRF of the converter, a BERT-CRF named entity recognition architecture is constructed, and the BERT-CRF named entity recognition architecture is used for recognizing core medical concepts contained in medical texts and attribute entities associated with the core medical concepts.
Specifically, the structuring step aims at establishing a model to identify core medical concepts and associated attribute entities thereof contained in clinical medical texts, wherein the core medical concepts comprise diseases, symptoms, signs, examinations, operations, medicines and the like, and the attribute entities associated with the core medical concepts comprise the attributes of existence, severity, urgency, attack position and the like. To solve the problem, the invention firstly constructs a BERT-CRF named entity recognition architecture (a neural network architecture for named entity recognition) based on bidirectional coding representation (Bidirectional Encoder Representations from Transformers, BERT for short) of a converter and a conditional random field (conditional random field, CRF for short). The BERT-CRF named entity recognition architecture combines the capability of embedding and capturing the word of the BERT with the tag dependency capturing capability of the CRF, thereby realizing high performance in a named entity recognition task.
Further, using the BERT-CRF named entity recognition architecture, the method for recognizing the core medical concepts and the attribute entities associated with the core medical concepts contained in the medical text specifically includes, as shown in fig. 3:
firstly, obtaining an input sequence according to an input medical text in a natural language form;
then, encoding the input sequence by using the bi-directional encoding representation BERT of the converter to obtain an embedded representation of each word;
Then inputting the obtained embedded representation of each word into a conditional random field CRF, and carrying out label prediction by the conditional random field CRF according to the context information of the word and the dependency relationship between labels;
and finally outputting the core medical concept contained in the medical text and the attribute entity associated with the core medical concept.
Step S13: and (3) constructing and training a pre-training language model, carrying out standardized processing on attribute entities obtained in the process of structured extraction of the Chinese medicine text in the step (S12) by utilizing the pre-training language model, and linking the attribute entities to a unified medical language system.
Specifically, the invention firstly uses a medium-english bilingual aligned medical term set from a unified medical language system (Unified Medical Language System, abbreviated as UMLS) as a corpus; then training a pre-training language model for associating a certain medical term to a standard medical concept of the unified medical language system based on a contrast learning method; and finally, carrying out standardized processing on the entity obtained in the structured extraction process of the Chinese medicine text in the step S21 by utilizing a pre-training language model, and linking the entity to a unified medical language system.
Further, the unified medical language system (Unified Medical Language System, abbreviated as UMLS) is a huge medical term system which is continuously developed by the national medical library for over 20 years, covers medical science and related disciplines of clinic, basic, pharmacy, biology, medical management and the like, records about 200 ten thousand medical concepts, and has more unprecedented medical vocabulary reaching over 500 ten thousand.
The objective of contrast learning, also called contrast learning, is to learn an encoder that encodes similar data of the same class and makes the encoding results of data of different classes as different as possible. Compared with the generation type learning, the comparison type learning does not need to pay attention to complicated details on the example, and only needs to learn the distinction of data on the feature space of the abstract semantic level, so that the model and the optimization thereof become simpler, and the generalization capability is stronger.
The invention extracts the recorded synonym knowledge from the UMLS knowledge base, and uses a translation tool (hundred-degree translation) to translate the synonym knowledge into a Chinese-English bilingual synonym pair, and the bilingual mapped synonym knowledge base is injected into a multi-language pre-training language model by applying a contrast learning thought, so that the multi-language pre-training language model finally has the capability of mapping the Chinese medical terms to the corresponding UMLS standard medical terms, and the process is called standardization. The flow of a specific medical term normalization method is shown in fig. 4.
Step S14: and outputting a structured and standardized semantic structure unit list.
Through the medical text structuring and standardizing process flow, a piece of medical text in a natural language form can be converted into a structured and standardized clinical medical semantic structure unit list.
In this embodiment, in step S2, a semantic structure unit similarity distinction model is built based on the semantic structure unit list, which specifically includes:
Step S21: constructing a medical synonym data set by collecting, sorting and translating medical synonym knowledge from a unified medical language system, wherein the medical synonym data set comprises Chinese and English term pairs, term pairs of hierarchical relations and dissimilar term pairs;
Step S22: based on the constructed medical synonym data set, the parameters of the pre-training language model are re-trained and fine-tuned, so that a semantic structure unit similarity distinguishing model for judging whether two semantic structure units are similar is constructed.
Specifically, after obtaining a structured, standardized list of semantic building units, the present invention explores and develops an optimization ratio strategy for semantic building units. First, by collecting, sorting and translating knowledge of medical synonyms from the unified medical language system UMLS, more than 100 tens of thousands of pairs of medical synonym datasets are constructed. This medical synonym dataset includes Chinese and English term pairs, hierarchically related term pairs (e.g., "fever" and "hyperthermia"), and dissimilar term pairs. The parameters of the pre-trained language model BERT are then re-pre-trained-fine-tuned based on these synonym knowledge, as shown in fig. 5, to construct a semantic building block similarity discrimination model (classifier) for determining whether two semantic building blocks are similar.
In this embodiment, in step S3, in the comparison process of the semantic structural units, the phenotype concepts and the attribute sets in the semantic structural units are compared sequentially, and based on the semantic structural unit similarity distinguishing model, the similarity category of the complete semantic structural unit is comprehensively judged according to the results of the phenotype concepts and the attribute sets, so as to obtain the key medical information comparison result.
Specifically, according to different semantic structure unit types, different strategies are adopted to carry out semantic comparison on the semantic structure units:
(1) For term-type semantic structural units, a pre-trained language model strategy is directly adopted. The term type semantic structural unit similarity class is divided into three classes, namely: "exactly equal", "partially similar" and "dissimilar". For example, "abdominal pain" and "abdominal pain" are converted into semantic structural units in the format of [ phenotypic concept: pain, severity: severe, onset site: abdomen and [ phenotypic concept: pain, severity: mild, onset site: directly using the classifier (semantic structural unit similarity distinguishing model) to compare and judge the surface type concept [ pain, pain ] and the attribute set [ severe, abdomen ], [ mild, abdomen ] in sequence. Here the phenotype concept similarity class is "completely equal", the property set is "partially similar", and thus the similarity class of the complete semantic building block is "partially similar".
(2) For the logical semantic structure unit, a strategy based on a knowledge base is directly adopted. The logical semantic structural unit similarity categories are divided into two categories, namely: "exactly equal" and "dissimilar". We use a knowledge base based approach to uniformly transform laboratory exams into semantic building block forms of "exam origin-analyte-abnormality judgment", where "analyte" is a "phenotypic concept". For example, "blood convention: white blood cell bias high "and" blood convention: WBC 15x109/L ", converts it into semantic building blocks in the format: phenotypic concept: white blood cell count, check source: blood, abnormality judgment: top-hat [ phenotypic concept: white blood cell count, check source: blood, abnormality judgment: and (3) directly judging the similarity of the semantic structural units based on the result. Here the phenotype concept similarity class is "completely equal", the property set is also "completely equal", and the similarity class of the complete semantic building block is "completely equal".
Finally, the invention provides a general framework for comparing the semantic structural units of the unstructured medical document, as shown in fig. 5, and develops an algorithm effectively combining a knowledge base strategy and a pre-training language model strategy to realize the semantic alignment of the phenotype information fine granularity level of the unstructured medical document.
In order to further illustrate the technical scheme and advantages of the invention, the following description is made through specific experiments.
Given medical text a: patients suddenly showed severe vomiting and abdominal pain with mild fever. Blood convention WBC 12.9x109/L "; medical text B: patient's sudden severe vomiting, mild abdominal pain in abdomen, no fever. Blood convention, white blood cell bias. ". The accurate semantic comparison technology for key medical information in medical texts, which is established by the invention, can obtain the comparison result shown in figure 6. It can be seen that, despite the "sudden severe vomiting" and "sudden severe vomiting", "WBC 12.9x109/L" and "blood convention: the white blood cells are higher "are not identical literally, but they are identical on a semantic level by semantic structuring and standardization techniques; "light fever" and "no fever" are quite different on a semantic level, although they both have "fever" keywords. Therefore, the invention effectively improves the accuracy of the comparison of the key medical information in the medical text.
Example two
As shown in fig. 7, the present invention provides a system for precisely comparing semantic meaning of key medical information in a medical text, which is used for implementing the method for precisely comparing semantic meaning of key medical information in a medical text according to the first embodiment, and specifically includes:
the key medical information extraction module 10 is used for inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
the similarity distinguishing model establishing module 20 is configured to establish a similarity distinguishing model of the semantic structural units based on the semantic structural unit list;
the key medical information comparison module 30 is used for sequentially comparing the phenotype concepts and the attribute sets in the semantic structural units, comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model, and obtaining key medical information comparison results.
The foregoing accurate semantic comparison method for the key medical information in the medical text is used to implement the foregoing accurate semantic comparison method for the key medical information in the medical text, so that the specific implementation of the accurate semantic comparison system for the key medical information in the medical text may be the embodiment part of the foregoing accurate semantic comparison method for the key medical information in the medical text, for example, the key medical information extraction module 10, the similarity distinction model establishment module 20, and the key medical information comparison module 30 are respectively used to implement steps S1, S2, and S3 in the foregoing accurate semantic comparison method for the key medical information in the medical text, so that the specific implementation thereof may refer to descriptions of the corresponding embodiments of the respective parts, and in order to avoid redundancy, details are not repeated herein.
Example III
The embodiment of the invention also provides electronic equipment, which comprises a processor, a memory and a bus system, wherein the processor is connected with the memory through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored by the memory so as to realize the accurate semantic comparison method facing the key medical information in the medical text.
Example IV
The embodiment of the invention also provides a computer storage medium which stores a computer software product, wherein the computer software product comprises a plurality of instructions for enabling a piece of computer equipment to execute the accurate semantic comparison method facing the key medical information in the medical text.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.
Claims (10)
1. The accurate semantic comparison method for key medical information in medical texts is characterized by comprising the following steps of:
step S1: inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
Step S2: based on the semantic structure unit list, establishing a semantic structure unit similarity distinguishing model;
step S3: and comparing the phenotype concepts and the attribute sets in the semantic structural units in sequence, and comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model to obtain a key medical information comparison result.
2. The method for precisely semantic comparison of key medical information in medical texts according to claim 1, wherein in step S1, two different medical texts are input, medical information contained in the medical texts is extracted, and standardized processing is performed on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts, and the method specifically comprises the following steps:
Step S11: inputting medical text in natural language form;
Step S12: based on bi-directional coding representation BERT and conditional random field CRF of the converter, constructing a BERT-CRF named entity identification architecture, and identifying core medical concepts contained in medical texts and attribute entities associated with the core medical concepts by using the BERT-CRF named entity identification architecture;
Step S13: constructing and training a pre-training language model, carrying out standardized processing on attribute entities obtained in the process of structured extraction of the Chinese medicine text in the step S12 by utilizing the pre-training language model, and linking the attribute entities to a unified medical language system;
Step S14: and outputting a structured and standardized semantic structure unit list.
3. The method for precisely semantic comparison of key medical information in medical texts according to claim 2, wherein in step S12, the BERT-CRF named entity recognition architecture is used, and the method for recognizing the core medical concepts contained in the medical texts and the attribute entities associated with the core medical concepts specifically comprises the following steps:
firstly, obtaining an input sequence according to an input medical text in a natural language form;
then, encoding the input sequence by using the bi-directional encoding representation BERT of the converter to obtain an embedded representation of each word;
Then inputting the obtained embedded representation of each word into a conditional random field CRF, and carrying out label prediction by the conditional random field CRF according to the context information of the word and the dependency relationship between labels;
and finally outputting the core medical concept contained in the medical text and the attribute entity associated with the core medical concept.
4. The method of claim 3, wherein the core medical concept includes but is not limited to diseases, symptoms, signs, examinations, procedures, and medications, and the attribute entity associated with the core medical concept includes but is not limited to presence, severity, urgency, and seizure.
5. The accurate semantic comparison method for key medical information in medical texts according to claim 2, wherein in step S13, a pre-training language model is constructed and trained, and attribute entities obtained in the process of structured extraction of the medical texts in step S12 are standardized and linked to a unified medical language system by using the pre-training language model, and specifically comprising:
firstly, a Chinese-English bilingual aligned medical term set from a unified medical language system is used as a corpus;
Then training a pre-training language model for associating a certain medical term to a standard medical concept of the unified medical language system based on a contrast learning method;
And finally, carrying out standardized processing on the entity obtained in the structured extraction process of the Chinese medicine text in the step S21 by utilizing a pre-training language model, and linking the entity to a unified medical language system.
6. The method for precisely semantic comparison of key medical information in a medical text according to claim 1, wherein in step S2, the method for establishing a semantic structural unit similarity distinguishing model based on a semantic structural unit list specifically comprises:
Step S21: constructing a medical synonym data set by collecting, sorting and translating medical synonym knowledge from a unified medical language system, wherein the medical synonym data set comprises Chinese and English term pairs, term pairs of hierarchical relations and dissimilar term pairs;
Step S22: based on the constructed medical synonym data set, the parameters of the pre-training language model are re-trained and fine-tuned, so that a semantic structure unit similarity distinguishing model for judging whether two semantic structure units are similar is constructed.
7. The accurate semantic comparison method for key medical information in medical texts according to claim 1, wherein in step S3, phenotype concepts and attribute sets in semantic structural units are compared in sequence, similarity classification models of semantic structural units are based, and similarity categories of complete semantic structural units are comprehensively judged according to results of the phenotype concepts and the attribute sets to obtain key medical information comparison results, and the method specifically comprises the following steps:
According to different semantic structure unit types, different strategies are adopted to carry out semantic comparison on the semantic structure units, and the semantic structure unit types are divided into term type semantic structure units and logic type semantic structure units;
For the term type semantic structure unit, a pre-training language model strategy is adopted, a semantic structure unit similarity distinguishing model is used, comparison and judgment are sequentially carried out on the surface type concept and the attribute set, and a comparison result is obtained, wherein the term type semantic structure unit similarity category is divided into three categories, namely: "exactly equal", "partially similar" and "dissimilar";
For the logic type semantic structure unit, a strategy based on knowledge base driving is adopted, laboratory examination is firstly uniformly converted into a semantic structure unit form of examination source-analyte-abnormality judgment, and then similarity judgment of the semantic structure unit is directly carried out, wherein the similarity categories of the logic type semantic structure unit are divided into two categories, namely: "exactly equal" and "dissimilar".
8. The system for precisely comparing the semantic meaning of the key medical information in the medical text is characterized by being used for realizing the precisely comparing method for the key medical information in the medical text according to any one of claims 1 to 7, and specifically comprising the following steps:
The key medical information extraction module is used for inputting two different medical texts, extracting medical information contained in the medical texts, and carrying out standardized processing on the medical information to obtain a structured and standardized semantic structure unit list in the medical texts;
the similarity distinguishing model establishing module is used for establishing a similarity distinguishing model of the semantic structural units based on the semantic structural unit list;
The key medical information comparison module is used for sequentially comparing the phenotype concepts and the attribute sets in the semantic structural units, comprehensively judging the similarity types of the complete semantic structural units according to the results of the phenotype concepts and the attribute sets based on the semantic structural unit similarity distinguishing model, and obtaining key medical information comparison results.
9. An electronic device, characterized in that the electronic device comprises a processor, a memory and a bus system, the processor and the memory are connected through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored in the memory so as to realize the accurate semantic comparison method for key medical information in medical texts according to any one of claims 1 to 7.
10. A computer storage medium, wherein the computer storage medium stores a computer software product comprising instructions for causing a computer device to perform the method of precise semantic comparison of critical medical information in medical-oriented text according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410363130.5A CN117973393B (en) | 2024-03-28 | 2024-03-28 | Accurate semantic comparison method and system for key medical information in medical text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410363130.5A CN117973393B (en) | 2024-03-28 | 2024-03-28 | Accurate semantic comparison method and system for key medical information in medical text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117973393A true CN117973393A (en) | 2024-05-03 |
CN117973393B CN117973393B (en) | 2024-06-07 |
Family
ID=90848061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410363130.5A Active CN117973393B (en) | 2024-03-28 | 2024-03-28 | Accurate semantic comparison method and system for key medical information in medical text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117973393B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111581960A (en) * | 2020-05-06 | 2020-08-25 | 上海海事大学 | Method for obtaining semantic similarity of medical texts |
CN111737997A (en) * | 2020-06-18 | 2020-10-02 | 达而观信息科技(上海)有限公司 | Text similarity determination method, text similarity determination equipment and storage medium |
CN112270965A (en) * | 2020-11-16 | 2021-01-26 | 苏州***医学研究所 | Semantic structural processing method for medical text phenotype information |
CN113343703A (en) * | 2021-08-09 | 2021-09-03 | 北京惠每云科技有限公司 | Medical entity classification extraction method and device, electronic equipment and storage medium |
CN114707516A (en) * | 2022-03-29 | 2022-07-05 | 北京理工大学 | Long text semantic similarity calculation method based on contrast learning |
US20230034401A1 (en) * | 2021-07-16 | 2023-02-02 | Novoic Ltd. | Method of evaluating text similarity for diagnosis or monitoring of a health condition |
CN116341557A (en) * | 2023-05-29 | 2023-06-27 | 华北理工大学 | Diabetes medical text named entity recognition method |
CN116702743A (en) * | 2023-07-07 | 2023-09-05 | 中国平安人寿保险股份有限公司 | Text similarity detection method and device, electronic equipment and storage medium |
CN116776884A (en) * | 2023-06-26 | 2023-09-19 | 中山大学 | Data enhancement method and system for medical named entity recognition |
JP2024027087A (en) * | 2022-08-16 | 2024-02-29 | 之江実験室 | Standard medical term management system and method based on general model |
-
2024
- 2024-03-28 CN CN202410363130.5A patent/CN117973393B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111581960A (en) * | 2020-05-06 | 2020-08-25 | 上海海事大学 | Method for obtaining semantic similarity of medical texts |
CN111737997A (en) * | 2020-06-18 | 2020-10-02 | 达而观信息科技(上海)有限公司 | Text similarity determination method, text similarity determination equipment and storage medium |
CN112270965A (en) * | 2020-11-16 | 2021-01-26 | 苏州***医学研究所 | Semantic structural processing method for medical text phenotype information |
US20230034401A1 (en) * | 2021-07-16 | 2023-02-02 | Novoic Ltd. | Method of evaluating text similarity for diagnosis or monitoring of a health condition |
CN113343703A (en) * | 2021-08-09 | 2021-09-03 | 北京惠每云科技有限公司 | Medical entity classification extraction method and device, electronic equipment and storage medium |
CN114707516A (en) * | 2022-03-29 | 2022-07-05 | 北京理工大学 | Long text semantic similarity calculation method based on contrast learning |
JP2024027087A (en) * | 2022-08-16 | 2024-02-29 | 之江実験室 | Standard medical term management system and method based on general model |
CN116341557A (en) * | 2023-05-29 | 2023-06-27 | 华北理工大学 | Diabetes medical text named entity recognition method |
CN116776884A (en) * | 2023-06-26 | 2023-09-19 | 中山大学 | Data enhancement method and system for medical named entity recognition |
CN116702743A (en) * | 2023-07-07 | 2023-09-05 | 中国平安人寿保险股份有限公司 | Text similarity detection method and device, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
LUMING CHEN等: "TeaBERT: An Efficient Knowledge Infused Cross-Lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System", 《BIOMEDICAL AND HEALTH INFORMATICS》, vol. 27, no. 12, 31 December 2023 (2023-12-31), pages 6029 - 6037 * |
程瑶,等: "中文标准医学术语集对实际应用覆盖度研究", 《中国卫生信息管理》, vol. 17, no. 5, 31 December 2020 (2020-12-31), pages 601 - 605 * |
Also Published As
Publication number | Publication date |
---|---|
CN117973393B (en) | 2024-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112487202B (en) | Chinese medical named entity recognition method and device fusing knowledge map and BERT | |
Bharadiya | A comprehensive survey of deep learning techniques natural language processing | |
CN111078875B (en) | Method for extracting question-answer pairs from semi-structured document based on machine learning | |
CN112597774B (en) | Chinese medical named entity recognition method, system, storage medium and equipment | |
Alobaidi et al. | Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain | |
CN112241457A (en) | Event detection method for event of affair knowledge graph fused with extension features | |
Tyagi et al. | Demystifying the role of natural language processing (NLP) in smart city applications: background, motivation, recent advances, and future research directions | |
Liu et al. | Concept placement using BERT trained by transforming and summarizing biomedical ontology structure | |
CN113901807A (en) | Clinical medicine entity recognition method and clinical test knowledge mining method | |
CN115048447B (en) | Database natural language interface system based on intelligent semantic completion | |
CN113707339B (en) | Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases | |
CN113657105A (en) | Medical entity extraction method, device, equipment and medium based on vocabulary enhancement | |
CN113742493A (en) | Method and device for constructing pathological knowledge map | |
CN111651569B (en) | Knowledge base question-answering method and system in electric power field | |
Adduru et al. | Towards Dataset Creation And Establishing Baselines for Sentence-level Neural Clinical Paraphrase Generation and Simplification. | |
Abdallah et al. | Exploring the state of the art in legal QA systems | |
CN113111660A (en) | Data processing method, device, equipment and storage medium | |
Xiang et al. | A cross-guidance cross-lingual model on generated parallel corpus for classical Chinese machine reading comprehension | |
CN116719840A (en) | Medical information pushing method based on post-medical-record structured processing | |
CN117973393B (en) | Accurate semantic comparison method and system for key medical information in medical text | |
CN116227594A (en) | Construction method of high-credibility knowledge graph of medical industry facing multi-source data | |
Afzal et al. | Multi-class clinical text annotation and classification using bert-based active learning | |
CN113314236A (en) | Intelligent question-answering system for hypertension | |
Singh et al. | Next-LSTM: a novel LSTM-based image captioning technique | |
Madi et al. | Grammar checking and relation extraction in text: approaches, techniques and open challenges |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |