CN109920536A - A kind of device and storage medium identifying Single diseases - Google Patents

A kind of device and storage medium identifying Single diseases Download PDF

Info

Publication number
CN109920536A
CN109920536A CN201910151998.8A CN201910151998A CN109920536A CN 109920536 A CN109920536 A CN 109920536A CN 201910151998 A CN201910151998 A CN 201910151998A CN 109920536 A CN109920536 A CN 109920536A
Authority
CN
China
Prior art keywords
word
medical
participle
word embedding
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910151998.8A
Other languages
Chinese (zh)
Inventor
代晓宇
张亚军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Living Space (shenyang) Data Technology Service Co Ltd
Original Assignee
Living Space (shenyang) Data Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Living Space (shenyang) Data Technology Service Co Ltd filed Critical Living Space (shenyang) Data Technology Service Co Ltd
Priority to CN201910151998.8A priority Critical patent/CN109920536A/en
Publication of CN109920536A publication Critical patent/CN109920536A/en
Pending legal-status Critical Current

Links

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the present application provides a kind of device and storage medium for identifying Single diseases, available medical record to be identified, and word segmentation processing is carried out to medical record to be identified, obtain multiple participles of medical record to be identified, and determining that the corresponding word of the multiple participle is embedded in vector, the corresponding word insertion vector of each participle embodies the participle in the semanteme of medical domain in the multiple participle.The corresponding word insertion vector of the multiple participle is input to, whether the corresponding disease of medical record is belonged in the Single diseases identification model that Single diseases identify, to obtain recognition result.That is, in the embodiment of the present application, can by but disease identification model identify whether the corresponding disease of medical record to be identified belongs to Single diseases, rather than as in traditional technology, the identification that Single diseases are carried out according to the rule manually summarized, improves the accuracy of identification Single diseases.

Description

Device for identifying single disease species and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a device and a storage medium for identifying a single disease category.
Background
In some situations, it is necessary to identify whether some cases are single disease types, for example, when medical insurance department determines medical reimbursement amount, it is necessary to identify whether the cases are single disease types.
The identification of individual disease species can now be carried out according to predetermined rules. Specifically, a manual summary mode can be adopted to summarize the rule according to which the single disease is identified in advance according to some single disease cases, so as to identify the single disease according to the rule.
It will be appreciated that the rules are manually summarized from individual cases, such that the summarized rules may be influenced by factors such as the experience of the summarizer. Further, the rules leading to the summarization may not apply to all single disease categories, i.e. the rules summarized from a single disease case may only apply to common single disease categories, not to rare single disease categories.
That is, in the current method of identifying single disease according to the rules summarized manually, only common single disease can be identified, but rare single disease cannot be identified. That is, the current method for identifying single disease species may cause the identification result of single disease species to be inaccurate, for example, some rare single disease species are identified as non-single disease species.
Disclosure of Invention
The technical problem that this application will solve is the mode of present discernment list disease kind, probably makes the identification result of single disease kind inaccurate, provides a device and the storage medium of discernment list disease kind.
In a first aspect, an embodiment of the present application provides an apparatus for identifying a single disease category, where the apparatus includes: a memory, a processor, and an output device;
the memory for storing a computer program;
the processor is used for executing the computer program to realize the functions of the following modules; the plurality of modules includes:
the system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a medical record to be recognized, and the medical record to be recognized embodies the relevant information of diseases;
the word segmentation module is used for carrying out word segmentation processing on the medical records to be identified to obtain a plurality of words of the medical records to be identified;
the determining module is used for determining a word embedding vector corresponding to each participle in the multiple participles of the medical record to be identified, wherein the word embedding vector corresponding to each participle reflects the semantic meaning of the participle in the medical field;
the recognition module is used for embedding a vector into a single disease category recognition model corresponding to each word in the multiple words of the medical record to be recognized to obtain a recognition result, and the single disease category recognition model is used for recognizing whether the disease corresponding to the medical record to be recognized belongs to a single disease category;
and the output equipment is used for outputting the identification result.
Optionally, the determining module includes a first training unit and a first determining unit;
the first training unit is used for training by using a word vector training model to obtain a plurality of candidate word embedded vectors corresponding to the first participle; a plurality of candidate word embedding vectors of the first participle are used for representing a plurality of semantics of the first participle, and one candidate word embedding vector represents one semantic of the first participle; the plurality of semantics comprises at least semantics in a medical domain; the first participle is any one of a plurality of participles of the medical record to be identified;
the first determining unit is used for determining a word embedding vector corresponding to the first word segmentation according to a medical term library and a plurality of candidate word embedding vectors of the first word segmentation; the medical term library is a collection of terms in the medical field.
Optionally, the first determining unit includes a calculating subunit and a determining subunit:
the calculating subunit is configured to calculate distances between word embedding vectors of terms in the medical field in the medical term library and the multiple candidate word embedding vectors of the first participle, so as to obtain multiple distances;
the determining subunit is configured to determine, according to a minimum distance among the plurality of distances, a word embedding vector corresponding to the first participle.
Optionally, the determining subunit is specifically configured to:
determining the word embedding vector corresponding to the first target term with the minimum distance obtained through calculation as a word embedding vector corresponding to the first participle; or determining the candidate word embedding vector of the first word segmentation with the minimum distance obtained by calculation as a word embedding vector corresponding to the first word segmentation.
Optionally, the apparatus further comprises: a training module for training the single disease recognition model, the training module comprising: the device comprises an acquisition unit, a word segmentation unit, a second determination unit and a second training unit;
the acquisition unit is used for acquiring historical medical records; the historical medical records embody the relevant information of the diseases; the historical medical records are provided with labels, and the labels represent whether diseases corresponding to the historical medical records belong to single disease species or not;
the word segmentation unit is used for carrying out word segmentation processing on the historical medical records to obtain a plurality of words of the historical medical records;
the second determining unit is configured to determine a word embedding vector corresponding to each of a plurality of participles of the historical medical records, and the word embedding vector corresponding to each of the plurality of participles of the historical medical records reflects semantics of the participles in the medical field;
and the second training unit is used for training a single disease type recognition model according to a word embedding vector corresponding to each word in a plurality of words of the historical medical records and a label corresponding to the historical medical records.
Optionally, the second determining unit includes: a training subunit and a determining subunit;
the training subunit is configured to train by using a word vector training model to obtain a plurality of candidate word embedding vectors corresponding to the second segmented word; the candidate word embedding vectors of the second participles are used for representing various semantics of the second participles, and one candidate word embedding vector represents one semantic of the second participles; the plurality of semantics comprises at least semantics in a medical domain; the second word segmentation is any one word segmentation in the multiple word segmentation of the historical case;
the determining subunit is configured to determine, according to a medical term library and a plurality of candidate word embedding vectors corresponding to the second participle, a word embedding vector corresponding to the second participle; the medical term library is a collection of terms in the medical field.
Optionally, the information related to the disease includes any one or a combination of the following:
name of surgery, preoperative diagnosis, and postoperative diagnosis.
Optionally, the output device is at least one of the following:
a display, a printer, or a speaker.
In a second aspect, embodiments of the present application provide a computer-readable storage medium for storing a computer program; the computer program is executed and performs the following operations:
acquiring a medical record to be identified, wherein the medical record to be identified embodies relevant information of diseases;
performing word segmentation processing on the medical records to be identified to obtain a plurality of word segments of the medical records to be identified;
determining a word embedding vector corresponding to each participle in a plurality of participles of the medical record to be identified, wherein the word embedding vector corresponding to each participle reflects the semantic meaning of the participle in the medical field;
and embedding a vector into a word corresponding to each word in the multiple words of the medical record to be recognized, and inputting a single disease category recognition model to obtain a recognition result, wherein the single disease category recognition model is used for recognizing whether the disease corresponding to the medical record to be recognized belongs to a single disease category.
In a third aspect, an embodiment of the present application provides a computer program product, which when run on a terminal device, causes the terminal device to perform the following operations:
acquiring a medical record to be identified, wherein the medical record to be identified embodies relevant information of diseases;
performing word segmentation processing on the medical records to be identified to obtain a plurality of word segments of the medical records to be identified;
determining a word embedding vector corresponding to each participle in a plurality of participles of the medical record to be identified, wherein the word embedding vector corresponding to each participle reflects the semantic meaning of the participle in the medical field;
and embedding a vector into a word corresponding to each word in the multiple words of the medical record to be recognized, and inputting a single disease category recognition model to obtain a recognition result, wherein the single disease category recognition model is used for recognizing whether the disease corresponding to the medical record to be recognized belongs to a single disease category.
Compared with the prior art, the embodiment of the application has the following advantages:
the embodiment of the application provides a device and a storage medium for identifying single disease category, can acquire the medical records to be identified, and carry out word segmentation processing on the medical records to be identified, obtain a plurality of participles of the medical records to be identified, and confirm the word embedding vectors corresponding to the plurality of participles, the word embedding vectors corresponding to each participle in the plurality of participles are embodied the semantics of the participle in the medical field. And inputting the word embedding vectors corresponding to the multiple participles into a single disease identification model for identifying whether the disease corresponding to the medical record belongs to a single disease, so as to obtain an identification result. That is to say, in the embodiment of the present application, whether the disease corresponding to the medical record to be identified belongs to a single disease category can be identified through the disease category identification model, instead of identifying a single disease category according to a manually summarized rule as in the conventional technology, the accuracy of identifying a single disease category is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic structural diagram of an apparatus for identifying a single disease species according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a processor according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a determining module according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of another apparatus for identifying single disease species according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a training module according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Various non-limiting embodiments of the present application are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, the drawing is a schematic structural diagram of an apparatus for identifying a single disease species according to an embodiment of the present application.
The apparatus for identifying a single disease category provided in the embodiment of the present application may include, for example, a memory 101, a processor 102, and an output device 103.
In the embodiment of the present application, the memory 101 may be used to store a computer program. The memory 101 may further store a medical record database, which may include medical record data corresponding to a plurality of individuals.
In the embodiment of the present application, the processor 102 may be configured to execute the computer program stored in the memory 101.
The functions of the processor 102 are described below in conjunction with the specific structure of the processor 102.
Referring to fig. 2, the diagram is a schematic structural diagram of a processor according to an embodiment of the present application.
The processor of the apparatus for identifying a single disease category provided in the embodiment of the present application may include an obtaining module 201, a word segmentation module 202, a determination module 203, and an identification module 204.
In this embodiment of the application, the obtaining module 201 may be configured to obtain a medical record to be identified. The medical record to be identified can embody the relevant information of the disease. The embodiment of the present application does not specifically limit the information related to the disease, and the information related to the disease may include, for example, the diagnosis result of the disease, such as preoperative diagnosis and postoperative diagnosis, and the information related to the disease may also include the treatment information of the disease, such as the name of an operation and the name of a drug.
As mentioned above, the medical records to be identified may be stored in the memory 101, and therefore, in the embodiment of the present application, the medical records to be identified may be retrieved from the memory 101.
In the embodiment of the present application, the word segmentation module 202 is configured to perform word segmentation processing on the medical records to be identified, so as to obtain a plurality of words of the medical records to be identified.
The number of characters included in a word segmentation is not specifically limited in the embodiments of the present application, and as an example, a word segmentation may include only one character or may include a plurality of characters.
It should be noted that the characters mentioned in the embodiment of the present application may be chinese characters, english characters, or characters of other language types, and the embodiment of the present application is not limited specifically. In the embodiment of the present application, the chinese character may be understood as a chinese kanji, and the english character may be understood as an english word.
In an embodiment of the present application, the determining module 203 is configured to determine a word embedding vector corresponding to each participle in the multiple participles of the medical record to be identified, where the word embedding vector corresponding to each participle reflects semantics of the participle in the medical field.
In the embodiment of the application, since the input of the single disease category identification model is a word embedding vector, before the identification is performed by using the single disease category identification model, a word embedding vector corresponding to each word segmentation in the multiple word segmentation of the medical record to be identified may be determined first. For implementation of the word embedding vector corresponding to each of the multiple participles for determining the medical records, reference may be made to the following description, which is not detailed herein.
The recognition module 204 in the embodiment of the application may embed a word corresponding to each of the multiple participles of the medical record to be recognized into a vector input single disease type recognition model to obtain a recognition result. And the single disease identification model is used for identifying whether the disease corresponding to the medical record to be identified belongs to a single disease. In other words, the identification result is whether the disease corresponding to the medical record to be identified belongs to a single disease category.
The embodiment of the present application does not specifically limit the single disease species identification model, and the single disease species identification model may be a Neural Network model such as a Recurrent Neural Network (RNN) model. The embodiment of the present application does not specifically limit the structure of the single disease species recognition model, and the single disease species recognition model may include, for example, a plurality of Long Short-Term Memory (LSTM) layers and a full connection layer.
In the embodiment of the present application, after the recognition module 204 obtains the recognition result of the medical record to be recognized, the output device 103 may output the recognition result. The output device 103 is not particularly limited in the embodiments of the present application, and the output device 103 may be any one or more of a display, a printer, or a sound box, for example.
Therefore, the device for identifying the single disease category can acquire the medical records to be identified, perform word segmentation processing on the medical records to be identified, obtain a plurality of word segmentations of the medical records to be identified, and determine word embedding vectors corresponding to the word segmentations, wherein the word embedding vectors corresponding to each word segmentation in the word segmentations represent the semantics of the word segmentations in the medical field. And inputting the word embedding vectors corresponding to the multiple participles into a single disease identification model for identifying whether the disease corresponding to the medical record belongs to a single disease, so as to obtain an identification result. That is to say, in the embodiment of the present application, whether the disease corresponding to the medical record to be identified belongs to a single disease category can be identified through the disease category identification model, instead of identifying a single disease category according to a manually summarized rule as in the conventional technology, the accuracy of identifying a single disease category is improved.
The following introduces a specific implementation manner of the determining module 203 for determining the word embedding vector corresponding to each word in the multiple words of the medical record to be identified. Specifically, referring to fig. 3, which is a schematic structural diagram of a determining module 203 provided in the embodiment of the present application, the determining module 203 may include a first training unit 2031 and a first determining unit 2032.
For convenience of description, any one of the multiple participles of the medical record to be identified is called a first participle. In consideration of the fact that in practical applications, a participle, for example, a first participle, may have multiple semantics, in the embodiment of the present application, a word embedding vector that can embody the semantics of the first participle in the medical field is determined from the multiple semantics of the first participle.
In this embodiment of the present application, in order to determine a target word embedding vector that can represent the semantic meaning of the first participle in the medical field, the first training triplet 2031 may first obtain a plurality of candidate word embedding vectors that can represent different semantic meanings of the first participle by training with a word vector training model, where one candidate word embedding vector is used to represent one semantic meaning of the first participle. Then, the first determining unit 2032 may determine, according to the plurality of candidate word embedding vectors of the first participle and the medical term library, a word embedding vector that can embody the semantic meaning of the first participle in the medical field.
The embodiment of the present application is not particularly limited to the Word vector training model, and the Word vector training model may be, for example, a Skip-gram Word vector training model using Word2Vec algorithm.
The term library in the embodiment of the present application may include a plurality of terms in the medical field, in other words, the term library is a term set in the medical field. The first determining unit 2032 "may determine a word embedding vector corresponding to the first participle according to a medical term library and a plurality of candidate word embedding vectors of the first participle" in a specific implementation, in this embodiment, in one implementation, the first determining unit 2032 includes a calculating subunit and a determining subunit, the calculating subunit may calculate distances between the plurality of candidate word embedding vectors and word embedding vectors corresponding to terms in a medical field in the medical term library, and the determining subunit may determine the word embedding vector corresponding to the first participle according to a minimum distance among the plurality of distances.
In this embodiment of the application, when the determining subunit "determines the word embedding vector corresponding to the first participle according to the minimum distance in the plurality of distances" is implemented specifically, there may be a plurality of implementation manners, and as an example, the determining subunit may specifically determine the candidate word embedding vector with the minimum distance obtained through calculation as the word embedding vector corresponding to the first participle. In another implementation manner of the embodiment of the present application, in consideration that the participle in the medical record is obtained by participle of content input by a doctor or a nurse, and the participle input by the doctor or the nurse may not be a standard term in the medical field, in the embodiment of the present application, in order to make a recognition result of whether a disease corresponding to the medical record to be recognized belongs to a single disease category more accurate, when determining a word embedding vector corresponding to a first participle, the determining subunit may determine a word embedding vector corresponding to the standard term, which embodies semantics of the first participle, as the word embedding vector of the first participle. Since the term in the medical term library can be regarded as a standard term in the medical field, in the embodiment of the present application, the word embedding vector corresponding to the first target term with the minimum distance obtained by calculation can be determined as the word embedding vector corresponding to the first participle.
The word embedding vector corresponding to the first word segment is determined as described above, and an example is now given.
Assuming that the first word segment corresponds to 3 candidate word embedding vectors, a1, a2, and a3, respectively, the medical term library includes 4 terms (shown for ease of understanding, in reality, the number of terms in the medical term library is much greater than 4), and the four terms correspond to word embedding vectors, b1, b2, b3, and b4, respectively. Distances among a1-b1, a1-b2, a1-b3, a1-b4, a2-b1, a2-b2, a2-b3, a2-b4, a3-b1, a3-b2, a3-b3 and a3-b4 are respectively calculated to obtain 12 distances, and if the distance between a2-b4 is the minimum distance of the 12 distances, the candidate word embedding vector a2 can embody the semantic meaning of the first participle in the medical field, and then the candidate word embedding vector a2 can be determined as the word embedding vector corresponding to the first participle. In addition, the term corresponding to b4 can be regarded as a standard term capable of reflecting the semantic meaning of the first participle in the medical field, so b4 can also be determined as the word embedding vector corresponding to the first participle.
It should be noted that the single disease identification model mentioned above in the embodiment of the present application is obtained by pre-training, and the apparatus for identifying a single disease provided in the embodiment of the present application may further include a training module. Referring to fig. 4, the figure is a schematic structural diagram of another apparatus for identifying a single disease species according to an embodiment of the present application. The apparatus shown in fig. 4 includes a training module 205 in addition to the obtaining module 201, the word segmentation module 202, the determining module 203, and the recognition module 204, and introduces a manner in which the training module 205 trains to obtain the single disease recognition model in combination with the structure of the training module shown in fig. 5.
The training module 205 comprises: an obtaining unit 2051, a word segmentation unit 2052, a second determination unit 2053, and a second training unit 2054.
First, the acquiring unit 2051 acquires a history medical record; the historical medical records embody the relevant information of the diseases; the historical medical records are provided with labels, and the labels represent whether the diseases corresponding to the historical medical records belong to single disease species. Next, the word segmentation unit 2052 performs word segmentation processing on the historical medical records to obtain a plurality of words of the historical medical records. Then, the second determining unit 2053 determines a word embedding vector corresponding to each participle in the multiple participles of the historical medical records, and the word embedding vector corresponding to each participle in the multiple participles of the historical medical records reflects the semantic meaning of the participle in the medical field; finally, the second training unit 2053 trains the single disease type recognition model according to the word embedding vector corresponding to each of the multiple participles of the historical medical records and the label corresponding to the historical medical records.
It should be noted that, in the embodiment of the present application, the historical medical records are used for training the single-disease identification model. Whether the diseases corresponding to the historical medical records belong to a single disease species is known. In other words, the historical medical records have labels that characterize whether the diseases corresponding to the historical medical records belong to a single disease category.
In the embodiment of the present application, the historical medical records represent information related to diseases, the information related to diseases may include, for example, diagnosis results of diseases such as pre-operative diagnosis and post-operative diagnosis, and the information related to diseases may also include treatment information of diseases such as operation names and drug names.
In this embodiment of the application, the second determining unit 2053 determines a word embedding vector corresponding to each of a plurality of segmented words of the historical medical records, and in a specific implementation, the determining module 203 determines that a method of determining a word embedding vector corresponding to each of a plurality of segmented words of the medical records to be recognized is similar. The second determination unit 2053 comprises a training subunit and a determination subunit.
Specifically, for convenience of description, any one of the multiple segmented words of the historical medical records is called a second segmented word, and when determining that a word embedding vector capable of embodying the second segmented word in the medical field is specifically implemented, the training subunit may first train by using a word vector training model to obtain multiple candidate word embedding vectors capable of embodying different semantics of the second segmented word, where one candidate word embedding vector is used to represent one semantic of the second segmented word. Then, the determining subunit may determine, according to the plurality of candidate word embedding vectors of the second participle and the medical term library, a word embedding vector that can embody the semantics of the second participle in the medical field. The determining subunit "determines, according to a medical term library and a plurality of candidate word embedding vectors of the second participle, a word embedding vector corresponding to the second participle" in a specific implementation, there may be a plurality of implementation manners, for example, the determining subunit may calculate distances between the plurality of candidate word embedding vectors of the second participle and word embedding vectors corresponding to terms in the medical field in the medical term library, so as to determine the word embedding vector corresponding to the second participle according to a minimum distance among the plurality of distances.
In this embodiment of the application, "determining a word embedding vector corresponding to the second word segmentation according to a minimum distance among the plurality of distances" may be implemented in various ways, and as an example, the candidate word embedding vector with the minimum distance obtained by calculation may be determined as the word embedding vector corresponding to the second word segmentation. In another implementation manner of the embodiment of the present application, the word embedding vector corresponding to the second target term with the minimum distance obtained by calculation may be determined as the word embedding vector corresponding to the second participle. The second target term may be considered as a standard term embodying the semantics of the second participle in the medical domain.
Regarding the specific example of determining the word embedding vector corresponding to the second word, reference may be made to the above illustrative section for "determining the word embedding vector corresponding to the first word", and details thereof are not described herein.
The embodiment of the application also provides a computer readable storage medium for storing a computer program; the computer program is executed and performs the following operations:
acquiring a medical record to be identified, wherein the medical record to be identified embodies relevant information of diseases;
performing word segmentation processing on the medical records to be identified to obtain a plurality of word segments of the medical records to be identified;
determining a word embedding vector corresponding to each participle in a plurality of participles of the medical record to be identified, wherein the word embedding vector corresponding to each participle reflects the semantic meaning of the participle in the medical field;
and embedding a vector into a word corresponding to each word in the multiple words of the medical record to be recognized, and inputting a single disease category recognition model to obtain a recognition result, wherein the single disease category recognition model is used for recognizing whether the disease corresponding to the medical record to be recognized belongs to a single disease category.
Optionally, the first participle is any one of the multiple participles of the medical record to be identified, and the word embedding vector corresponding to the first participle is determined in the following manner:
training by using a word vector training model to obtain a plurality of candidate word embedding vectors corresponding to the first participle; a plurality of candidate word embedding vectors of the first participle are used for representing a plurality of semantics of the first participle, and one candidate word embedding vector represents one semantic of the first participle; the plurality of semantics comprises at least semantics in a medical domain;
determining a word embedding vector corresponding to the first word segmentation according to a medical term library and a plurality of candidate word embedding vectors of the first word segmentation; the medical term library is a collection of terms in the medical field.
Optionally, the determining a word embedding vector corresponding to the first participle from the medical term library and the plurality of candidate word embedding vectors of the first participle includes:
calculating distances between word embedding vectors of terms in the medical field in the medical term library and a plurality of candidate word embedding vectors of the first participle to obtain a plurality of distances;
and determining a word embedding vector corresponding to the first participle according to the minimum distance in the plurality of distances.
Optionally, the determining, according to the candidate word embedding vector corresponding to the minimum distance in the plurality of distances, as the word embedding vector corresponding to the first participle includes:
determining the word embedding vector corresponding to the first target term with the minimum distance obtained through calculation as a word embedding vector corresponding to the first participle; or,
and determining the candidate word embedding vector of the first word segmentation with the minimum distance obtained by calculation as a word embedding vector corresponding to the first word segmentation.
Optionally, the single disease identification model is obtained by training in the following way:
acquiring a historical medical record; the historical medical records embody the relevant information of the diseases; the historical medical records are provided with labels, and the labels represent whether diseases corresponding to the historical medical records belong to single disease species or not;
performing word segmentation processing on the historical medical records to obtain a plurality of words of the historical medical records;
determining a word embedding vector corresponding to each participle in the multiple participles of the historical medical records, wherein the word embedding vector corresponding to each participle in the multiple participles of the historical medical records reflects the semantics of the participle in the medical field;
and training a single disease identification model according to a word embedding vector corresponding to each word in a plurality of words in the historical medical records and a label corresponding to the historical medical records.
Optionally, the second participle is any one of the multiple participles of the historical medical records, and a word embedding vector corresponding to the second participle is determined in the following manner:
training by using a word vector training model to obtain a plurality of candidate word embedding vectors corresponding to the second participle; the candidate word embedding vectors of the second participles are used for representing various semantics of the second participles, and one candidate word embedding vector represents one semantic of the second participles; the plurality of semantics comprises at least semantics in a medical domain;
determining a word embedding vector corresponding to the second word segmentation according to the medical term library and a plurality of candidate word embedding vectors corresponding to the second word segmentation; the medical term library is a collection of terms in the medical field.
Optionally, the information related to the disease includes any one or a combination of the following:
name of surgery, preoperative diagnosis, and postoperative diagnosis.
An embodiment of the present application further provides a computer program product, which when running on a terminal device, causes the terminal device to perform the following operations:
acquiring a medical record to be identified, wherein the medical record to be identified embodies relevant information of diseases;
performing word segmentation processing on the medical records to be identified to obtain a plurality of word segments of the medical records to be identified;
determining a word embedding vector corresponding to each participle in a plurality of participles of the medical record to be identified, wherein the word embedding vector corresponding to each participle reflects the semantic meaning of the participle in the medical field;
and embedding a vector into a word corresponding to each word in the multiple words of the medical record to be recognized, and inputting a single disease category recognition model to obtain a recognition result, wherein the single disease category recognition model is used for recognizing whether the disease corresponding to the medical record to be recognized belongs to a single disease category.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the attached claims
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. An apparatus for identifying a single disease species, the apparatus comprising: a memory, a processor, and an output device;
the memory for storing a computer program;
the processor is used for executing the computer program to realize the functions of the following modules; the plurality of modules includes:
the system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a medical record to be recognized, and the medical record to be recognized embodies the relevant information of diseases;
the word segmentation module is used for carrying out word segmentation processing on the medical records to be identified to obtain a plurality of words of the medical records to be identified;
the determining module is used for determining a word embedding vector corresponding to each participle in the multiple participles of the medical record to be identified, wherein the word embedding vector corresponding to each participle reflects the semantic meaning of the participle in the medical field;
the recognition module is used for embedding a vector into a single disease category recognition model corresponding to each word in the multiple words of the medical record to be recognized to obtain a recognition result, and the single disease category recognition model is used for recognizing whether the disease corresponding to the medical record to be recognized belongs to a single disease category;
and the output equipment is used for outputting the identification result.
2. The apparatus of claim 1, wherein the determination module comprises a first training unit and a first determination unit;
the first training unit is used for training by using a word vector training model to obtain a plurality of candidate word embedded vectors corresponding to the first participle; a plurality of candidate word embedding vectors of the first participle are used for representing a plurality of semantics of the first participle, and one candidate word embedding vector represents one semantic of the first participle; the plurality of semantics comprises at least semantics in a medical domain; the first participle is any one of a plurality of participles of the medical record to be identified;
the first determining unit is used for determining a word embedding vector corresponding to the first word segmentation according to a medical term library and a plurality of candidate word embedding vectors of the first word segmentation; the medical term library is a collection of terms in the medical field.
3. The apparatus of claim 2, wherein the first determining unit comprises a computing subunit and a determining subunit:
the calculating subunit is configured to calculate distances between word embedding vectors of terms in the medical field in the medical term library and the multiple candidate word embedding vectors of the first participle, so as to obtain multiple distances;
the determining subunit is configured to determine, according to a minimum distance among the plurality of distances, a word embedding vector corresponding to the first participle.
4. The apparatus according to claim 3, wherein the determining subunit is specifically configured to:
determining the word embedding vector corresponding to the first target term with the minimum distance obtained through calculation as a word embedding vector corresponding to the first participle; or determining the candidate word embedding vector of the first word segmentation with the minimum distance obtained by calculation as a word embedding vector corresponding to the first word segmentation.
5. The apparatus of claim 1, further comprising: a training module for training the single disease recognition model, the training module comprising: the device comprises an acquisition unit, a word segmentation unit, a second determination unit and a second training unit;
the acquisition unit is used for acquiring historical medical records; the historical medical records embody the relevant information of the diseases; the historical medical records are provided with labels, and the labels represent whether diseases corresponding to the historical medical records belong to single disease species or not;
the word segmentation unit is used for carrying out word segmentation processing on the historical medical records to obtain a plurality of words of the historical medical records;
the second determining unit is configured to determine a word embedding vector corresponding to each of a plurality of participles of the historical medical records, and the word embedding vector corresponding to each of the plurality of participles of the historical medical records reflects semantics of the participles in the medical field;
and the second training unit is used for training a single disease type recognition model according to a word embedding vector corresponding to each word in a plurality of words of the historical medical records and a label corresponding to the historical medical records.
6. The apparatus according to claim 5, wherein the second determining unit comprises: a training subunit and a determining subunit;
the training subunit is configured to train by using a word vector training model to obtain a plurality of candidate word embedding vectors corresponding to the second segmented word; the candidate word embedding vectors of the second participles are used for representing various semantics of the second participles, and one candidate word embedding vector represents one semantic of the second participles; the plurality of semantics comprises at least semantics in a medical domain; the second word segmentation is any one word segmentation in the multiple word segmentation of the historical case;
the determining subunit is configured to determine, according to a medical term library and a plurality of candidate word embedding vectors corresponding to the second participle, a word embedding vector corresponding to the second participle; the medical term library is a collection of terms in the medical field.
7. The apparatus according to any one of claims 1 to 6, wherein the information related to the disease comprises any one or a combination of the following:
name of surgery, preoperative diagnosis, and postoperative diagnosis.
8. The apparatus of any one of claims 1-6, wherein the output device is at least one of:
a display, a printer, or a speaker.
9. A computer-readable storage medium for storing a computer program; the computer program is executed and performs the following operations:
acquiring a medical record to be identified, wherein the medical record to be identified embodies relevant information of diseases;
performing word segmentation processing on the medical records to be identified to obtain a plurality of word segments of the medical records to be identified;
determining a word embedding vector corresponding to each participle in a plurality of participles of the medical record to be identified, wherein the word embedding vector corresponding to each participle reflects the semantic meaning of the participle in the medical field;
and embedding a vector into a word corresponding to each word in the multiple words of the medical record to be recognized, and inputting a single disease category recognition model to obtain a recognition result, wherein the single disease category recognition model is used for recognizing whether the disease corresponding to the medical record to be recognized belongs to a single disease category.
10. A computer program product, which, when run on a terminal device, causes the terminal device to:
acquiring a medical record to be identified, wherein the medical record to be identified embodies relevant information of diseases;
performing word segmentation processing on the medical records to be identified to obtain a plurality of word segments of the medical records to be identified;
determining a word embedding vector corresponding to each participle in a plurality of participles of the medical record to be identified, wherein the word embedding vector corresponding to each participle reflects the semantic meaning of the participle in the medical field;
and embedding a vector into a word corresponding to each word in the multiple words of the medical record to be recognized, and inputting a single disease category recognition model to obtain a recognition result, wherein the single disease category recognition model is used for recognizing whether the disease corresponding to the medical record to be recognized belongs to a single disease category.
CN201910151998.8A 2019-02-28 2019-02-28 A kind of device and storage medium identifying Single diseases Pending CN109920536A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910151998.8A CN109920536A (en) 2019-02-28 2019-02-28 A kind of device and storage medium identifying Single diseases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910151998.8A CN109920536A (en) 2019-02-28 2019-02-28 A kind of device and storage medium identifying Single diseases

Publications (1)

Publication Number Publication Date
CN109920536A true CN109920536A (en) 2019-06-21

Family

ID=66962845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910151998.8A Pending CN109920536A (en) 2019-02-28 2019-02-28 A kind of device and storage medium identifying Single diseases

Country Status (1)

Country Link
CN (1) CN109920536A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160012A (en) * 2019-12-26 2020-05-15 上海金仕达卫宁软件科技有限公司 Medical term recognition method and device and electronic equipment
CN112836500A (en) * 2019-11-25 2021-05-25 泰康保险集团股份有限公司 System, method, apparatus and computer readable medium for identifying a case

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596238A (en) * 2018-04-19 2018-09-28 北京工业大学 A kind of end-to-end biomedicine signals characterizing semantics method based on depth term vector

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596238A (en) * 2018-04-19 2018-09-28 北京工业大学 A kind of end-to-end biomedicine signals characterizing semantics method based on depth term vector

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张少聪: ""中医医疗辅助诊断***研究与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836500A (en) * 2019-11-25 2021-05-25 泰康保险集团股份有限公司 System, method, apparatus and computer readable medium for identifying a case
CN111160012A (en) * 2019-12-26 2020-05-15 上海金仕达卫宁软件科技有限公司 Medical term recognition method and device and electronic equipment
CN111160012B (en) * 2019-12-26 2024-02-06 上海金仕达卫宁软件科技有限公司 Medical term identification method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US11687719B2 (en) Post-filtering of named entities with machine learning
CN109767787B (en) Emotion recognition method, device and readable storage medium
CN111368094A (en) Entity knowledge map establishing method, attribute information acquiring method, outpatient triage method and device
CN111985241B (en) Medical information query method, device, electronic equipment and medium
CN113420122B (en) Method, device, equipment and storage medium for analyzing text
CN114462616A (en) Machine learning model for preventing sensitive data from being disclosed online
CN112017744A (en) Electronic case automatic generation method, device, equipment and storage medium
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN110188357A (en) The industry recognition methods of object and device
CN109815500A (en) Management method, device, computer equipment and the storage medium of unstructured official document
CN111950262A (en) Data processing method, data processing device, computer equipment and storage medium
CN111180025A (en) Method and device for representing medical record text vector and inquiry system
CN109920536A (en) A kind of device and storage medium identifying Single diseases
CN111695054A (en) Text processing method and device, information extraction method and system, and medium
CN109284497B (en) Method and apparatus for identifying medical entities in medical text in natural language
KR102185733B1 (en) Server and method for automatically generating profile
CN111104800A (en) Entity identification method, device, equipment, storage medium and program product
CN109740156B (en) Feedback information processing method and device, electronic equipment and storage medium
CN112699671B (en) Language labeling method, device, computer equipment and storage medium
CN109660621A (en) Content pushing method and service equipment
CN113658690A (en) Intelligent medical guide method and device, storage medium and electronic equipment
CN117350291A (en) Electronic medical record named entity identification method, device, equipment and storage medium
CN112308048A (en) Medical record integrity judging method, device and system based on small amount of labeled data
CN107577760B (en) text classification method and device based on constraint specification
JP6026036B1 (en) DATA ANALYSIS SYSTEM, ITS CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190621