CN111160012A - Medical term recognition method and device and electronic equipment - Google Patents

Medical term recognition method and device and electronic equipment Download PDF

Info

Publication number
CN111160012A
CN111160012A CN201911364148.2A CN201911364148A CN111160012A CN 111160012 A CN111160012 A CN 111160012A CN 201911364148 A CN201911364148 A CN 201911364148A CN 111160012 A CN111160012 A CN 111160012A
Authority
CN
China
Prior art keywords
word
words
recognized
medical
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911364148.2A
Other languages
Chinese (zh)
Other versions
CN111160012B (en
Inventor
赵蒙海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jinshida Weining Software Technology Co ltd
Original Assignee
Shanghai Jinshida Weining Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jinshida Weining Software Technology Co ltd filed Critical Shanghai Jinshida Weining Software Technology Co ltd
Priority to CN201911364148.2A priority Critical patent/CN111160012B/en
Publication of CN111160012A publication Critical patent/CN111160012A/en
Application granted granted Critical
Publication of CN111160012B publication Critical patent/CN111160012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a medical term identification method, a device and electronic equipment, wherein the method comprises the following steps: performing word segmentation on the words to be recognized to obtain feature words; performing word recognition on the feature words to obtain character features of the words to be recognized; performing semantic recognition on the feature words to obtain semantic features of the words to be recognized, wherein the semantic features are used for representing the features of the words to be recognized in the corresponding target medical field; and determining a standard word corresponding to the word to be recognized based on the character features and the semantic features. The method of the embodiment includes the steps of segmenting words to be recognized to obtain feature words, respectively performing word recognition and semantic recognition on the feature words to obtain character features and semantic features of the words to be recognized, determining standard words corresponding to the words to be recognized according to the character features and the semantic features, matching near-meaning words and common abbreviations in medical term concentration to the standard words corresponding to the near-meaning words and the common abbreviations as far as possible, and improving matching accuracy.

Description

Medical term recognition method and device and electronic equipment
Technical Field
The invention relates to the technical field of medical insurance, in particular to a medical term identification method, a medical term identification device and electronic equipment.
Background
The standardization of clinical medicine terms is the basis of medical information sharing, and is particularly important for realizing national overall arrangement of medical insurance. Various medical terms have various sources and are written differently, the same word has different meanings at different times and occasions, and the same concept has different expression modes in different systems. Moreover, the term coding system in the current domestic medical field is relatively complicated, and multiple versions of the coding system exist in the same medical term, for example, five versions of disease codes exist: the national standard edition published by the national Weijian Commission statistical information center, the clinical edition published by the medical administration and management bureau, and the three local editions published by Beijing, Shanghai and Guangdong. These factors present obstacles to the sharing of medical information and the communication and cooperation in the medical field. Therefore, the standardization of medical terminology is of particular importance.
In order to promote the sharing of medical information, the communication and cooperation in the medical field and realize the national overall arrangement of medical insurance, the national medical insurance bureau greatly promotes the medical term standardization construction work and unifies the term coding system in the medical field. The focus of the unified medical term coding system is on how to accurately and efficiently convert versions of medical terms into nationally unified medical terms. Manual conversion by medical professionals can ensure accuracy, but requires a large amount of labor cost and is inefficient.
In order to solve the problems, the current mainstream method is to automatically match medical terms of each version by a word similarity method, and although the labor cost is low and the efficiency is high, the matching result is not accurate.
Therefore, there is a need for a suitable way to identify versions of medical terms.
Disclosure of Invention
The embodiment of the invention provides a medical term identification method, a medical term identification device and electronic equipment, and aims to solve the problem that in the prior art, automatic matching can be performed on medical terms of various versions, but the matching result is inaccurate.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, a medical term recognition method is provided, the method comprising:
performing word segmentation on the words to be recognized to obtain feature words;
performing word recognition on the feature words to obtain character features of the words to be recognized;
performing semantic recognition on the feature words to obtain semantic features of the words to be recognized, wherein the semantic features are used for representing the features of the words to be recognized in the corresponding target medical field;
and determining a standard word corresponding to the word to be recognized based on the character features and the semantic features.
In a second aspect, there is provided a medical term recognition apparatus, the apparatus comprising:
the word segmentation module is used for segmenting words to be recognized to obtain characteristic words;
the word recognition module is used for carrying out word recognition on the characteristic words to obtain the character characteristics of the words to be recognized;
the semantic recognition module is used for performing semantic recognition on the feature words to obtain semantic features of the words to be recognized, and the semantic features are used for representing the features of the words to be recognized in the corresponding target medical field;
and the first determining module is used for determining the standard word corresponding to the word to be recognized based on the character features and the semantic features.
In a third aspect, an electronic device is provided, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method according to the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method according to the first aspect.
In the embodiment of the invention, the word to be recognized is segmented to obtain the characteristic words, the word recognition and the semantic recognition are respectively carried out on the characteristic words to obtain the character characteristics and the semantic characteristics of the word to be recognized, the standard words corresponding to the word to be recognized are determined according to the character characteristics and the semantic characteristics, and the near-meaning words and the common abbreviations in the medical term set can be matched to the standard words corresponding to the near-meaning words and the abbreviations as far as possible based on the combination of the character characteristics and the semantic characteristics, so that the accuracy of the matching result is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow diagram of a medical term identification method of one embodiment of the present invention;
FIG. 2 is a schematic flow diagram of a training word model according to another embodiment of the present invention;
FIG. 3 is a schematic flow diagram of training a semantic model according to yet another embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a medical term identification device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to another embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flow chart of a medical term recognition method according to an embodiment of the present invention, and the method shown in fig. 1 may be performed by a medical term recognition apparatus, as shown in fig. 1, the method including:
and S102, performing word segmentation on the words to be recognized to obtain characteristic words.
It should be understood that the word to be recognized may be any one of medical terms in each version of the medical term set, wherein each version of the medical term set represents a medical term set of each region.
In step S102, for example, the word to be recognized is infectious rhinitis, and the word segmentation is performed on the infectious rhinitis pair to obtain infectious rhinitis and rhinitis. Or the word to be identified is the infectious rhinitis, and the infectious rhinitis is subjected to word segmentation to obtain the infectious rhinitis and the rhinitis. Or the word to be recognized is the cerebral apoplexy, and the cerebral apoplexy is subjected to word segmentation to obtain the cerebral apoplexy and the apoplexy.
And step S104, performing word recognition on the feature words to obtain the character features of the words to be recognized.
It should be understood that the character features of the word to be recognized are high-dimensional vectors of the word to be recognized.
In step S104, for example, the feature words are infectious and rhinitis, and word recognition is performed on the infectious and rhinitis to obtain a high-dimensional vector of infectious rhinitis. Or the characteristic words are infectious rhinitis and the infectious rhinitis are identified to obtain the high-dimensional vector of the infectious rhinitis. Or the characteristic words are brain stroke and stroke, and the words are identified to obtain the high-dimensional vector of the brain stroke.
And S106, performing semantic recognition on the feature words to obtain semantic features of the words to be recognized, wherein the semantic features are used for representing the features of the words to be recognized in the corresponding target medical field.
It should be understood that the semantic features of the word to be recognized are low-dimensional vectors of the word to be recognized.
In step S106, for example, the feature words are infectious and rhinitis, and semantic recognition is performed on the infectious and rhinitis to obtain a low-dimensional vector of infectious rhinitis. Or the characteristic words are infectious rhinitis and carry out semantic recognition on the infectious rhinitis to obtain the low-dimensional vector of the infectious rhinitis. Or the characteristic words are brain stroke and stroke, and the words are identified to obtain the low-dimensional vector of the brain stroke.
And S108, determining a standard word corresponding to the word to be recognized based on the character features and the semantic features.
In step S108, for example, the word to be recognized is infectious rhinitis, and the standard word corresponding to the infectious rhinitis is determined to be infectious rhinitis according to the high-dimensional vector and the low-dimensional vector of the infectious rhinitis. Or the word to be identified is the infectious rhinitis, and the standard word corresponding to the infectious rhinitis is determined to be the infectious rhinitis according to the high-dimensional vector and the low-dimensional vector of the infectious rhinitis. Or the word to be recognized is stroke, and the standard word corresponding to the stroke is determined to be stroke according to the high-dimensional vector and the low-dimensional vector of the stroke. Or determining that the standard word corresponding to the hyperthyroidism is hyperthyroidism according to the high-dimensional vector and the low-dimensional vector of the hyperthyroidism.
In the embodiment of the invention, the word to be recognized is segmented to obtain the characteristic words, the word recognition and the semantic recognition are respectively carried out on the characteristic words to obtain the character characteristics and the semantic characteristics of the word to be recognized, the standard words corresponding to the word to be recognized are determined according to the character characteristics and the semantic characteristics, and the near-meaning words and the common abbreviations in the medical term set can be matched to the standard words corresponding to the near-meaning words and the abbreviations as far as possible based on the combination of the character characteristics and the semantic characteristics, so that the accuracy of the matching result is improved.
Optionally, in some embodiments, determining a standard word corresponding to the word to be recognized based on the character feature and the semantic feature in step S108 may include:
matching character features by adopting a pre-learned word model to obtain a first matching result representing the similarity between the word to be recognized and the target word;
matching semantic features by adopting a pre-learned semantic model to obtain a second matching result representing the similarity of the words to be recognized and the target words in the target medical field;
obtaining a comprehensive matching result between the word to be recognized and the target word based on the first matching result and the second matching result;
and under the condition that the comprehensive matching result meets the threshold value, determining the target word as a standard word.
It should be understood that the first matching result is a value of similarity between the word to be recognized and the target word, the second matching result is a value of similarity between the word to be recognized and the target word in the target medical field, and the composite matching result meeting the threshold value may be that the composite matching result is greater than or equal to the threshold value.
Taking the word to be recognized as the infectious rhinitis as an example, inputting the infectious rhinitis into a pre-learned word model, matching the high-dimensional vector of the infectious rhinitis by using the word model, and outputting a first matching result of ninety percent; inputting infectious rhinitis into a pre-learned semantic model, matching low-dimensional vectors of the infectious rhinitis by the semantic model, outputting a second matching result of ninety-eight percent, obtaining a comprehensive matching result based on the first matching result and the second matching result, determining a target word as a standard word if the comprehensive matching result is ninety-four percent, the threshold value is ninety-two percent, and the comprehensive matching result is greater than the threshold value, otherwise, indicating that the target word is possibly not the standard word if the comprehensive matching result is less than the threshold value, and manually matching the infectious rhinitis by a medical professional.
Specifically, obtaining a comprehensive matching result between the word to be recognized and the target word based on the first matching result and the second matching result may include:
and obtaining a comprehensive matching result according to the first matching result, the second matching result and a weight value, wherein the weight value is the weight of the first matching result in the comprehensive matching result or the weight of the second matching result in the comprehensive matching result.
It can be understood that the weight value is based on secondary optimization of the code data accumulated in history, and the optimal parameter is selected.
A comprehensive matching result is obtained according to the first matching result, the second matching result, and the weight value, and may be represented by the following formula 1.
S-a S1+ (1-a) S2 formula 1
Wherein S represents the integrated matching result, the a-weight value represents the weight of the first matching result in the integrated matching result, S1 represents the first matching result, and S2 represents the weight of the second matching result (1-a) in the integrated matching result.
Optionally, in some embodiments, the method shown in fig. 1 further includes:
obtaining first training data, the first training data comprising medical terms for a plurality of regions;
based on the first training data, a word model is obtained.
It should be understood that the first training data includes medical terms for a plurality of regions, such as "disease classification and code (revision)" 1.3 nationwide, "disease classification and code (revision)" 2011 nationwide, "2013 Shanghai health agency" first page of medical records "disease classification and code _ ICD-10_ update," Beijing City first page of medical records diagnosis name and code standard V6.01 edition, and so on. And training the first training data to obtain a word model.
Specifically, based on the first training data, obtaining a word model may include:
performing word segmentation on the medical terms of the first training data, and determining character features of the medical terms;
a word model is determined based on character features of the medical data.
In some embodiments, as shown in FIG. 2, the specific training process for the word model is as follows:
step S202, acquiring first training data, and performing word segmentation on medical terms in the first training data to obtain at least one word. For example, the Chinese word segmentation can be preferably conducted by jieba word segmentation, and accurate word segmentation of Chinese medical terms is achieved by combining a medical term word bank in the field.
Step S204, frequency calculation is carried out on at least one word, the word frequency and the reverse logarithm frequency of the at least one word are determined, and a word frequency-reverse text frequency matrix is obtained according to the product of the word frequency and the reverse logarithm frequency of the at least one word.
Step S206, at least one word is converted into a high-dimensional vector, and the high-dimensional vector is multiplied by the word frequency-reverse text frequency matrix to obtain a word frequency-reverse text frequency vector.
Step S208, a vector similarity calculation method is adopted, and cosine similarity between the word frequency and the reverse text frequency vectors is obtained according to comprehensive analysis and experiments, wherein the cosine similarity represents character weighting similarity between medical terms in the first training data, and can be represented by S1.
Step S210, determining a word model according to cosine similarity between the word frequency and the reverse text frequency vectors.
Optionally, in some embodiments, the method shown in fig. 1 further includes:
acquiring second training data, the second training data comprising medical data, the medical data comprising medical terms in the medical domain;
identifying contextual content of medical terms of the medical material based on the medical material, the contextual content for determining a medical domain of the medical terms of the medical material;
based on the medical terms of the medical material and the corresponding contextual content, a semantic model is determined.
It is understood that medical data includes medical terms in the medical field, mainly diseases, operations, drugs, medical items, consumables.
In some embodiments, as shown in FIG. 3, the specific training process of the semantic model is as follows:
step S302, second training data is obtained and input into the pre-training model.
Step S304, embedding the medical terms into a low-dimensional and continuous Hilbert space H by the pre-training model through learning semantic information of the medical terms in the second training data, wherein the semantic information is the medical terms and corresponding context content, and at least one low-dimensional vector is obtained.
Step S306, based on the cosine value of the included angle between at least one low-dimensional vector, obtaining a semantic model, wherein the semantic model can be represented by a function, and the function is represented as f (x, theta) (x is the hot point representation of text input, and theta is the parameter of the pre-training language model).
Furthermore, experiments show that the more similar the semantics between medical terms, the smaller the angle between vectors mapped into Hilbert space H, the closer the cosine value of the angle is to 1. Thus, semantic similarity between medical terms can be measured by the value of the cosine of the angle of the vector in Hilbert space H, as shown in equation 2 below.
S2 ═ cos < f (x1, θ), f (x2, θ) > formula 2
Semantic similarity provides great advantages for word matching with large difference between the faces and similar semantics: for example, "hyperthyroidism" and "hyperthyroidism" are abbreviations and full-name relations, and "stroke" are similar words, and because the two are in a semantic model and often appear in a similar context, the semantic model can learn to obtain the similarity between the two.
Fig. 4 is a schematic structural diagram of a medical term recognition apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus 40 includes:
the word segmentation module 41 is configured to perform word segmentation on the word to be recognized to obtain a feature word;
the word recognition module 44 is configured to perform word recognition on the feature words to obtain character features of the words to be recognized;
the semantic recognition module 43 is configured to perform semantic recognition on the feature words to obtain semantic features of the words to be recognized, where the semantic features are used to represent features of the words to be recognized in the corresponding target medical field;
and the first determining module 44 is configured to determine a standard word corresponding to the word to be recognized based on the character features and the semantic features.
In the embodiment of the invention, the word to be recognized is segmented to obtain the characteristic words, the word recognition and the semantic recognition are respectively carried out on the characteristic words to obtain the character characteristics and the semantic characteristics of the word to be recognized, the standard words corresponding to the word to be recognized are determined according to the character characteristics and the semantic characteristics, and the near-meaning words and the common abbreviations in the medical term set can be matched to the standard words corresponding to the near-meaning words and the abbreviations as far as possible based on the combination of the character characteristics and the semantic characteristics, so that the accuracy of the matching result is improved.
Optionally, as an embodiment, the first determining module 44 includes:
the character matching sub-model is used for matching character features by adopting a pre-learned word model to obtain a first matching result representing the similarity between the word to be recognized and the target word;
the semantic matching sub-model is used for matching semantic features by adopting a pre-learned semantic model to obtain a second matching result representing the similarity of the words to be recognized and the target words in the target medical field;
the sub-model is used for obtaining a comprehensive matching result between the word to be recognized and the target word based on the first matching result and the second matching result;
and the first determining submodule is used for determining the target word as a standard word under the condition that the comprehensive matching result meets the threshold value.
Optionally, as an embodiment, the sub-model is obtained for:
and obtaining a comprehensive matching result according to the first matching result, the second matching result and a weight value, wherein the weight value is the weight of the first matching result in the comprehensive matching result or the weight of the second matching result in the comprehensive matching result.
Optionally, as an embodiment, the apparatus further includes:
a first acquisition module for acquiring first training data, the first training data comprising medical terms for a plurality of regions;
and the first obtaining module is used for obtaining a word model based on the first training data.
Optionally, as an embodiment, the first obtaining module includes:
the word segmentation sub-module is used for segmenting medical terms of the first training data and determining character features of the medical terms;
and the second determining submodule is used for determining the word model based on the character characteristics of the medical data.
Optionally, as an embodiment, the apparatus further includes:
a second acquisition module for acquiring second training data, the second training data comprising medical data, the medical data comprising medical terms in the medical domain;
an identification module for identifying context content of medical terms of the medical material based on the medical material, the context content being used to determine a medical domain of the medical terms of the medical material;
a second determination module for determining a semantic model based on the medical terms and corresponding contextual content of the medical material.
The mobile terminal provided in the embodiment of the present invention can implement each process implemented in the method embodiment of fig. 1, and is not described here again to avoid repetition.
An electronic device according to an embodiment of the present application will be described in detail below with reference to fig. 5. Referring to fig. 5, at a hardware level, the electronic device includes a processor, optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be interconnected by an internal bus, which may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an extended EISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs, forming the medical term recognition device on a logical level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
performing word segmentation on the words to be recognized to obtain feature words;
performing word recognition on the feature words to obtain character features of the words to be recognized;
performing semantic recognition on the feature words to obtain semantic features of the words to be recognized, wherein the semantic features are used for representing the features of the words to be recognized in the corresponding target medical field;
and determining a standard word corresponding to the word to be recognized based on the character features and the semantic features.
In the embodiment of the invention, the word to be recognized is segmented to obtain the characteristic words, the word recognition and the semantic recognition are respectively carried out on the characteristic words to obtain the character characteristics and the semantic characteristics of the word to be recognized, the standard words corresponding to the word to be recognized are determined according to the character characteristics and the semantic characteristics, and the near-meaning words and the common abbreviations in the medical term set can be matched to the standard words corresponding to the near-meaning words and the abbreviations as far as possible based on the combination of the character characteristics and the semantic characteristics, so that the accuracy of the matching result is improved.
The method performed by the medical term recognition apparatus as disclosed in the embodiment of fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
The embodiment of the invention provides a computer-readable storage medium, which is used for segmenting words to be recognized to obtain characteristic words; performing word recognition on the feature words to obtain character features of the words to be recognized; performing semantic recognition on the feature words to obtain semantic features of the words to be recognized, wherein the semantic features are used for representing the features of the words to be recognized in the corresponding target medical field; and determining a standard word corresponding to the word to be recognized based on the character features and the semantic features.
In the embodiment of the invention, the word to be recognized is segmented to obtain the characteristic words, the word recognition and the semantic recognition are respectively carried out on the characteristic words to obtain the character characteristics and the semantic characteristics of the word to be recognized, the standard words corresponding to the word to be recognized are determined according to the character characteristics and the semantic characteristics, and the near-meaning words and the common abbreviations in the medical term set can be matched to the standard words corresponding to the near-meaning words and the abbreviations as far as possible based on the combination of the character characteristics and the semantic characteristics, so that the accuracy of the matching result is improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transient media) such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present invention, and are not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (9)

1. A method of medical term identification, the method comprising:
performing word segmentation on the words to be recognized to obtain feature words;
performing word recognition on the feature words to obtain character features of the words to be recognized;
performing semantic recognition on the feature words to obtain semantic features of the words to be recognized, wherein the semantic features are used for representing the features of the words to be recognized in the corresponding target medical field;
and determining a standard word corresponding to the word to be recognized based on the character features and the semantic features.
2. The method of claim 1, wherein the determining a standard word corresponding to the word to be recognized based on the character features and the semantic features comprises:
matching the character features by adopting a pre-learned word model to obtain a first matching result representing the similarity between the word to be recognized and the target word;
matching the semantic features by adopting a pre-learned semantic model to obtain a second matching result representing the similarity of the words to be recognized and the target words in the target medical field;
obtaining a comprehensive matching result between the word to be recognized and the target word based on the first matching result and the second matching result;
and under the condition that the comprehensive matching result meets a threshold value, determining the target word as the standard word.
3. The method of claim 2, wherein the obtaining a composite match result between the word to be recognized and the target word based on the first match result and the second match result comprises:
and obtaining the comprehensive matching result according to the first matching result, the second matching result and a weight value, wherein the weight value is the weight of the first matching result in the comprehensive matching result or the weight of the second matching result in the comprehensive matching result.
4. The method of claim 2, wherein the method further comprises:
obtaining first training data, the first training data comprising medical terms for a plurality of regions;
and obtaining the word model based on the first training data.
5. The method of claim 4, wherein the deriving the word model based on the first training data comprises:
performing word segmentation on the medical terms of the first training data, and determining character features of the medical terms;
determining the word model based on character features of the medical data.
6. The method of claim 2, wherein the method further comprises:
obtaining second training data, the second training data comprising medical material, the medical material comprising medical terms in a medical domain;
identifying, based on the medical material, contextual content of medical terms of the medical material, the contextual content for determining a medical domain of the medical terms of the medical material;
determining the semantic model based on medical terms and corresponding contextual content of the medical material.
7. A medical term recognition apparatus, characterized in that the apparatus comprises:
the word segmentation module is used for segmenting words to be recognized to obtain characteristic words;
the word recognition module is used for carrying out word recognition on the characteristic words to obtain the character characteristics of the words to be recognized;
the semantic recognition module is used for performing semantic recognition on the feature words to obtain semantic features of the words to be recognized, and the semantic features are used for representing the features of the words to be recognized in the corresponding target medical field;
and the first determining module is used for determining the standard word corresponding to the word to be recognized based on the character features and the semantic features.
8. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method according to any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN201911364148.2A 2019-12-26 2019-12-26 Medical term identification method and device and electronic equipment Active CN111160012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911364148.2A CN111160012B (en) 2019-12-26 2019-12-26 Medical term identification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911364148.2A CN111160012B (en) 2019-12-26 2019-12-26 Medical term identification method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111160012A true CN111160012A (en) 2020-05-15
CN111160012B CN111160012B (en) 2024-02-06

Family

ID=70556662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911364148.2A Active CN111160012B (en) 2019-12-26 2019-12-26 Medical term identification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111160012B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652299A (en) * 2020-05-26 2020-09-11 泰康保险集团股份有限公司 Method and equipment for automatically matching service data
CN112101021A (en) * 2020-09-03 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing standard word mapping
CN112257446A (en) * 2020-10-20 2021-01-22 平安科技(深圳)有限公司 Named entity recognition method and device, computer equipment and readable storage medium
CN112541056A (en) * 2020-12-18 2021-03-23 卫宁健康科技集团股份有限公司 Medical term standardization method, device, electronic equipment and storage medium
CN113657086A (en) * 2021-08-09 2021-11-16 腾讯科技(深圳)有限公司 Word processing method, device, equipment and storage medium
CN113793668A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Symptom standardization method and device based on artificial intelligence, electronic equipment and medium
CN113822051A (en) * 2020-06-19 2021-12-21 北京彩智科技有限公司 Data processing method and device and electronic equipment
CN114613515A (en) * 2022-03-28 2022-06-10 医渡云(北京)技术有限公司 Medical entity relationship extraction method and device, storage medium and electronic equipment
CN115658891A (en) * 2022-10-18 2023-01-31 支付宝(杭州)信息技术有限公司 Intention identification method and device, storage medium and electronic equipment
WO2024066903A1 (en) * 2022-09-30 2024-04-04 上海寰通商务科技有限公司 Method and device for recognizing pharmaceutical-industry target object to be recognized, and medium
CN118035504A (en) * 2024-04-15 2024-05-14 上海森亿医疗科技有限公司 Medical core word knowledge base construction method, device, medium and terminal

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150227505A1 (en) * 2012-08-27 2015-08-13 Hitachi, Ltd. Word meaning relationship extraction device
CN105894088A (en) * 2016-03-25 2016-08-24 苏州赫博特医疗信息科技有限公司 Medical information extraction system and method based on depth learning and distributed semantic features
CN106933806A (en) * 2017-03-15 2017-07-07 北京大数医达科技有限公司 The determination method and apparatus of medical synonym
US20170351971A1 (en) * 2016-06-07 2017-12-07 International Business Machines Corporation Method and apparatus for informative training repository building in sentiment analysis model learning and customaization
CN108804423A (en) * 2018-05-30 2018-11-13 平安医疗健康管理股份有限公司 Medical Text character extraction and automatic matching method and system
CN108829894A (en) * 2018-06-29 2018-11-16 北京百度网讯科技有限公司 Spoken word identification and method for recognizing semantics and its device
US20190005019A1 (en) * 2017-06-29 2019-01-03 Accenture Global Solutions Limited Contextual pharmacovigilance system
CN109256216A (en) * 2018-08-14 2019-01-22 平安医疗健康管理股份有限公司 Medical data processing method, device, computer equipment and storage medium
CN109522551A (en) * 2018-11-09 2019-03-26 天津新开心生活科技有限公司 Entity link method, apparatus, storage medium and electronic equipment
WO2019071661A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic apparatus, medical text entity name identification method, system, and storage medium
CN109829156A (en) * 2019-01-18 2019-05-31 北京惠每云科技有限公司 Medicine text recognition method and device
CN109920536A (en) * 2019-02-28 2019-06-21 生活空间(沈阳)数据技术服务有限公司 A kind of device and storage medium identifying Single diseases
CN110287337A (en) * 2019-06-19 2019-09-27 上海交通大学 The system and method for medicine synonym is obtained based on deep learning and knowledge mapping

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150227505A1 (en) * 2012-08-27 2015-08-13 Hitachi, Ltd. Word meaning relationship extraction device
CN105894088A (en) * 2016-03-25 2016-08-24 苏州赫博特医疗信息科技有限公司 Medical information extraction system and method based on depth learning and distributed semantic features
US20170351971A1 (en) * 2016-06-07 2017-12-07 International Business Machines Corporation Method and apparatus for informative training repository building in sentiment analysis model learning and customaization
CN106933806A (en) * 2017-03-15 2017-07-07 北京大数医达科技有限公司 The determination method and apparatus of medical synonym
US20190005019A1 (en) * 2017-06-29 2019-01-03 Accenture Global Solutions Limited Contextual pharmacovigilance system
WO2019071661A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic apparatus, medical text entity name identification method, system, and storage medium
CN108804423A (en) * 2018-05-30 2018-11-13 平安医疗健康管理股份有限公司 Medical Text character extraction and automatic matching method and system
CN108829894A (en) * 2018-06-29 2018-11-16 北京百度网讯科技有限公司 Spoken word identification and method for recognizing semantics and its device
CN109256216A (en) * 2018-08-14 2019-01-22 平安医疗健康管理股份有限公司 Medical data processing method, device, computer equipment and storage medium
CN109522551A (en) * 2018-11-09 2019-03-26 天津新开心生活科技有限公司 Entity link method, apparatus, storage medium and electronic equipment
CN109829156A (en) * 2019-01-18 2019-05-31 北京惠每云科技有限公司 Medicine text recognition method and device
CN109920536A (en) * 2019-02-28 2019-06-21 生活空间(沈阳)数据技术服务有限公司 A kind of device and storage medium identifying Single diseases
CN110287337A (en) * 2019-06-19 2019-09-27 上海交通大学 The system and method for medicine synonym is obtained based on deep learning and knowledge mapping

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
侯庆霖;: "基于词向量及术语关系抽取方法的文本分类方法", no. 07 *
冯艳红;于红;孙庚;赵禹锦;: "基于词向量和条件随机场的领域术语识别方法", no. 11 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652299A (en) * 2020-05-26 2020-09-11 泰康保险集团股份有限公司 Method and equipment for automatically matching service data
CN113822051A (en) * 2020-06-19 2021-12-21 北京彩智科技有限公司 Data processing method and device and electronic equipment
CN113822051B (en) * 2020-06-19 2024-01-30 北京彩智科技有限公司 Data processing method and device and electronic equipment
CN112101021A (en) * 2020-09-03 2020-12-18 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing standard word mapping
CN112257446A (en) * 2020-10-20 2021-01-22 平安科技(深圳)有限公司 Named entity recognition method and device, computer equipment and readable storage medium
CN112541056A (en) * 2020-12-18 2021-03-23 卫宁健康科技集团股份有限公司 Medical term standardization method, device, electronic equipment and storage medium
CN112541056B (en) * 2020-12-18 2024-05-31 卫宁健康科技集团股份有限公司 Medical term standardization method, device, electronic equipment and storage medium
CN113657086B (en) * 2021-08-09 2023-08-15 腾讯科技(深圳)有限公司 Word processing method, device, equipment and storage medium
CN113657086A (en) * 2021-08-09 2021-11-16 腾讯科技(深圳)有限公司 Word processing method, device, equipment and storage medium
CN113793668A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Symptom standardization method and device based on artificial intelligence, electronic equipment and medium
CN114613515A (en) * 2022-03-28 2022-06-10 医渡云(北京)技术有限公司 Medical entity relationship extraction method and device, storage medium and electronic equipment
WO2024066903A1 (en) * 2022-09-30 2024-04-04 上海寰通商务科技有限公司 Method and device for recognizing pharmaceutical-industry target object to be recognized, and medium
CN115658891A (en) * 2022-10-18 2023-01-31 支付宝(杭州)信息技术有限公司 Intention identification method and device, storage medium and electronic equipment
CN115658891B (en) * 2022-10-18 2023-07-25 支付宝(杭州)信息技术有限公司 Method and device for identifying intention, storage medium and electronic equipment
CN118035504A (en) * 2024-04-15 2024-05-14 上海森亿医疗科技有限公司 Medical core word knowledge base construction method, device, medium and terminal

Also Published As

Publication number Publication date
CN111160012B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN111160012B (en) Medical term identification method and device and electronic equipment
CN110705214B (en) Automatic coding method and device
CN114625732B (en) Query method and system based on structured query language SQL
US11609748B2 (en) Semantic code search based on augmented programming language corpus
CN110737689B (en) Data standard compliance detection method, device, system and storage medium
CN107622080B (en) Data processing method and equipment
CN109190007A (en) Data analysing method and device
CN113535817B (en) Feature broad table generation and service processing model training method and device
CN113191908A (en) Claims auditing and processing method and device
US10782942B1 (en) Rapid onboarding of data from diverse data sources into standardized objects with parser and unit test generation
CN116629227A (en) Method and equipment for converting text into SQL (structured query language) sentence
CN111708867A (en) Question and answer query method, device and equipment applied to electric power operation and inspection
CN113743618A (en) Time series data processing method and device, readable medium and electronic equipment
CN111709327A (en) Fuzzy matching method and device based on OCR recognition
CN112069267A (en) Data processing method and device
CN109993190B (en) Ontology matching method and device and computer storage medium
CN110647568B (en) Method and device for converting graph database data into programming language data
CN114611513A (en) Sample generation method, model training method, entity identification method and related device
CN110750625B (en) Judicial question-answering method and related equipment
CN110018844B (en) Management method and device of decision triggering scheme and electronic equipment
CN114282586A (en) Data annotation method, system and electronic equipment
CN110909538A (en) Question and answer content identification method and device, terminal equipment and medium
CN111967767A (en) Business risk identification method, device, equipment and medium
CN111046909A (en) Load prediction method and device
CN113496124A (en) Semantic analysis method and device for medical document, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant