CN110880362A - Large-scale medical data knowledge mining and treatment scheme recommending system - Google Patents
Large-scale medical data knowledge mining and treatment scheme recommending system Download PDFInfo
- Publication number
- CN110880362A CN110880362A CN201911117826.5A CN201911117826A CN110880362A CN 110880362 A CN110880362 A CN 110880362A CN 201911117826 A CN201911117826 A CN 201911117826A CN 110880362 A CN110880362 A CN 110880362A
- Authority
- CN
- China
- Prior art keywords
- treatment
- patient
- information
- module
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses a large-scale medical data knowledge mining and treatment scheme recommending system, which comprises: the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources; the disease severity prediction module is used for obtaining a disease severity score in the treatment process of each patient; the treatment effectiveness measurement module is used for obtaining effective treatment measurement information; the patient similarity measurement module is used for constructing a similarity measurement relation of the patient; and the drug treatment scheme recommendation module is used for obtaining the next-stage drug treatment scheme recommendation. The invention judges and predicts the severity of the disease condition of the patient and defines the effectiveness measure of the treatment by the multitask bidirectional heterogeneous LSTM. And calculating the fine granularity similarity of the patient, and recommending the treatment scheme of the next stage according to the historical treatment record of the patient and the effective treatment scheme of other patients with high pathological similarity.
Description
Technical Field
The invention discloses a system for realizing effective drug treatment scheme discovery and recommendation by applying deep learning and knowledge introduction, and belongs to the field of medical data mining.
Background
Electronic medical record (EHR) data is from millions of patients, and is currently collected and stored periodically at various medical institutions. These EHR data consist of heterogeneous data elements, typically including demographics, diagnostics, physical examinations, sensor measurements, laboratory test results, prescribed or managed medications, and clinical records, among others. With the rapid development of information technology and the rapid popularization of Electronic Medical Record Systems (EMRs), the amount of digital information stored in electronic health medical records in China has increased dramatically over the last decade. It is widely believed that a great deal of hidden knowledge is contained in the massive data, and various types of data in an electronic medical record system (EMR) provide a way to acquire medical knowledge, so that a foundation is provided for improving medical quality and efficiency. Specifically, EMR data has played an important role in many medical applications, and in particular has significant implications in providing effective medication recommendations for physicians and patients, increasing the cure rate of disease, reducing the risk of death in clinical patients, while reducing decision-making costs during physician treatment and avoiding increased medical costs due to ineffective or harmful treatments.
Although there is great interest in using EMRs data to improve medical performance, the gains from the analysis of EMRs data are far less than those that EMRs can provide. One reason is that the prognosis of a patient is influenced by many factors, such as the age and sex of the patient, the severity of the disease, and the treatment being administered. While the EMRs data contains comprehensive information about patients, diagnosis and treatment, there is no unified framework to integrate all relevant factors for advanced data modeling. Furthermore, EMRs data are heterogeneous, vertical in nature. For example, a treatment record is a series of orders, where each order typically consists of a medication name, a route of administration, a dose, a start time, and an end time. In general, analyzing large-scale complex EMRs data, extracting medical knowledge, and promoting decision making in treatment practice is a not small challenge.
Scientists have made many beneficial explorations in electronic case data mining in order to analyze large-scale complex EMRs data. According to the data mining paper review [1] [2] applied to EMR, the Recurrent Neural Network (RNN) and its variants (LSTM, GRU) specifically used for sequential modeling can capture the complex temporal dynamics in longitudinal EMR data, which is the first choice for EMR modeling tasks. Chen, W., Wang, S. [4] et al dynamically predicted the severity of Intensive Care Unit (ICU) patient' S condition using a multitasking RNN by integrating laboratory test results for different organs of the patient. However, the method in [3] does not make full use of heterogeneous data in EMR, for example, the diagnosis results and the description of the disease are meaningful for the task. Cao X, Edward C et al [3] developed a treatment engine based on historical EMR data to provide patients with next-stage prescriptions based on their condition, laboratory results, treatment records, and demographic information. [4] Three different LSTM variants were proposed primarily to address the problem of data heterogeneity, but no overall framework for recommended treatment was proposed. Since the prescription for the next phase of the procedure is from historical treatment, the problem of "cold start", i.e. the treatment recommendation for the first hospitalized patient, is not addressed, and the present invention recognizes that the first 24 hours of treatment in the treatment of critically ill patients is critical. Leileilei Sun, Chuanren Liu et al [5] proposed a method for developing and recommending a data-driven automatic treatment plan, mainly using important information in medical advice, and the clustering method used by the method finally obtained a few types of drug treatment combinations, which could not satisfy more refined treatment method recommendations. Meanwhile, none of the above schemes takes into account the problem of reactivity between drugs and the history of drug allergy of patients.
Reference documents:
[1].Shickel B,Tighe P J,Bihorac A,et al.Deep EHR:A Survey of RecentAdvances in Deep Learning Techniques for Electronic Health Record(EHR)Analysis[J].IEEE Journal of Biomedical and Health Informatics,2017:1-1.
[2].Cao X,Edward C,Jimeng S.Opportunities and challenges indeveloping deep learning models using electronic health records data:asystematic review[J].Journal of the American Medical Informatics Association,2018.
[3].Chen,W.,Wang,S.,Long,G.,Yao,L.,Sheng,Q.Z.,Li,X.:Dynamic illnessseverity prediction via multi-task rnns for intensive care unit.In:ICDM(2018)
[4].Jin B.,Yang H.,Sun L.,Liu C.,Qu Y.,Tong J.A treatment engine bypredicting next-period prescriptions Proceedings of the 24th ACM SIGKDDInternational Conference on Knowledge Discovery&Data Mining,ACM(2018),pp.1608-1616.
[5].Leilei Sun,Chuanren Liu,Chonghui Guo,Hui Xiong,and YanmingXie.2016.Data-driven Automatic Treatment Regimen Development andRecommendation.In Proceedings of the 22Nd ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining.ACM,New York,NY,USA,1865–1874.
disclosure of Invention
The invention aims to provide a large-scale medical data knowledge mining and treatment scheme recommending system, which applies a heterogeneous cyclic neural network and knowledge introduction to find effective treatment segments from a large-scale electronic medical record and can explain the next stage of drug treatment of a patient based on the fine-grained similarity of the patient so as to meet the modeling requirements and have good effect.
In order to achieve the purpose, the invention adopts the technical scheme that:
a large-scale medical data knowledge mining and treatment plan recommendation system, comprising: the system comprises a data set preprocessing module, a disease severity prediction module, a treatment effectiveness measurement module, a patient similarity measurement module and a drug treatment scheme recommendation module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources, and the preprocessed electronic medical record comprises five types of patient information which are demographic information, diagnosis description information, laboratory indexes, medicine prescriptions and discharge results respectively;
the disease severity prediction module is used for training the bidirectional heterogeneous LSTM network through the demographic information, the diagnosis description information and the laboratory index data which are obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity grade obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
The electronic medical record data is from an intensive care medical database MIMIC III v 1.4.
The disease severity prediction module is a bidirectional heterogeneous LSTM network, and the overall structure of the bidirectional heterogeneous LSTM network is as follows:
wherein, input is the input of the heterogeneous LSTM network, and comprises physiological characteristic indexes, demographic information and diagnosis description information of a laboratory,scoring the severity of the disease;
the bi-directional heterogeneous LSTM for each time step t is defined as follows:
ft=σ(Wf[Chechupt,ht-1]+bf)ft′=σ(W′f[Chechupt,h′t+1]+b′f)
it=σ(Wi[Chechupt,ht-1]+bi)i′t=σ(Wi′[Chechupt,h′t+1]+b′i)
ot=σ(Wo[Chechupt,ht-1]+bo)o′t=σ(W′o[Chechupt,ht+1]+′o)
dt=σ(WdCt-1+bd)d′t=σ(W′dC′t+1+b′d)
ht=ot*tanh(Ct)h′t=o′t*tanh(C′t)
D=relu(Wdense[ht,h′t]+WstaticPStatic+bdense)
wherein σ is Sigmoid functiontan h is tan h functionReLu is ReLu function f (x) max (0, x), W is each weight matrix, and b represents the deviationItem, W, b are parameters to be learned by the model network; diagnosinest,ChechuptRespectively diagnosis description information and laboratory indexes at the time t; i, f, o, C and h, respectively input gate, forget gate, output gate, memory cell and hidden state, using cell state Ct-1Structural breakdown door dtFor controlling the amount of added information; by forgetting door ftControl to add additional candidate valuesAnd cell state C at the previous timet-1Add to Current cell State Ct(ii) a From an input gate itControlling new state informationTo an updated degree ofAdd to Current cell State Ct。
The forward LSTM and the backward LSTM have the same structure, the forward LSTM network is represented by using a label without a prime sign, and the backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information PStaticThe weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is Wdense,WstaticWeight for static information, bdenseA deviation term for this layer; then inputting the data into a sigmoid layer, and out represents that the layer is an output layer, and finally obtaining a predicted disease severity scoreThe model uses SOFA score as a true value y of Cross entry for training a bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; and solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining the real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network.
The treatment effectiveness measurement module obtains effective treatment measurement information through three aspects of disease severity degree scoring, the influence degree K of the current treatment on the next stage and discharge results R ═ {0001,0010,0100,1000 };
wherein the degree of effect K of the current treatment on the next stage is represented using the slope of the disease severity score curve, K being defined as:
where T is the length of the time window, yTScoring the severity of each disease within the tth time window;
effective treatment measure information M ═ W [ y ═ yT;K;R]。
The patient similarity measurement relationship constructed by the patient similarity measurement module is as follows:
the ith patient is expressed as:
whereinAndfrom the forward LSTM network, the network,andfrom the backward-direction LSTM network,static demographic information;
inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar〈Pi,Pj>=||Pi-Pj||2;
wherein j represents the jth patient.
The drug treatment scheme recommendation module obtains effective treatment measure information through the treatment effectiveness measure module, and similarity among patients obtained through the patient similarity measure module, introduces a time sequence of drug prescription information, and constructs a similarity measure-treatment effectiveness measure-pharmacy-time tensor table.
Compared with the prior art, the invention adopting the technical scheme has the following beneficial effects:
(1) the system of the present invention explores effective treatment modalities from large-scale real electronic cases, which are fine-grained and short-lived, unlike existing treatment recommendation engine treatments that involve only a generally coarse-grained treatment regimen. Thus, doctors can be guided to carry out more refined treatment.
(2) The system of the invention recommends drug treatment according to the physiological condition, treatment history, drug history and the like of the patient in an individualized way, and dynamically updates the drug treatment.
(3) The invention introduces drug reactivity knowledge and patient allergy history, reduces reactivity and anaphylactic reaction between drugs, and can increase reliability and effectiveness of treatment. Through the comparison of the patient similarity with fine granularity, the whole treatment process of extracting positive and negative treatment effects is provided for doctors, so that the interpretability and the reliability of the medicine recommendation are enhanced, and the doctors can judge the predicted effectiveness of the recommended treatment scheme according to the treatment cases of similar patients and different effects generated by different schemes and decide whether to adopt or not to adopt the improved treatment scheme.
The specific implementation mode is as follows:
the present invention is explained further below.
Fig. 1 shows a large-scale medical data knowledge mining and treatment scheme recommendation system according to the present invention, which includes a data set preprocessing module, a disease severity prediction module, a treatment effectiveness measurement module, a patient similarity measurement module, and a medication scheme recommendation module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of a plurality of heterogeneous data sources, wherein the preprocessed electronic medical record comprises five types of patient information which are demographic information, diagnosis description information, laboratory indexes, medicine prescriptions and discharge results respectively;
the disease severity prediction module is used for training the bidirectional heterogeneous LSTM network through the demographic information, the diagnosis description information and the laboratory index data which are obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity grade obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
The realization process of the large-scale medical data knowledge mining and treatment scheme recommending system is as follows:
step 1: a dataset preprocessing module preprocesses Electronic Medical Record (EMR) data. EMR databases are typically composed of a variety of heterogeneous data sources, and the data retrieved from EMR databases is diverse, incomplete, redundant, and will greatly impact the final mining results. Therefore, the EMR data must be pre-processed to ensure that the EMR data is accurate, complete, and consistent. First, EMR data is improved by filling in defaults, smoothing out noise, and correcting for data inconsistencies; second, EMR data may come from multiple EMR systems, and different data sources naturally lead to heterogeneous problems. The heterogeneous problem is mainly manifested as inconsistency of data attributes, such as attribute names and measurement units. For example, the specific gravity of urine may be expressed as SG or specific gravity, and the unit of measurement of triglyceride may be mmol/L, and sometimes may be mg/dl. Redundant data is also processed, and redundancy is mainly expressed by repeated records of data attributes or inconsistent attribute expression modes.
The pre-processed electronic cases typically contain five categories of patient information, demographic information, diagnostic description information, laboratory indices (physical examination results), medication prescriptions (medical orders), and discharge results (death).
Demographic information includes the patient's age, gender, address of residence, educational background, religion, race, marital status, weight, height, and other information. This information is important in the course of clinical decisions such as influencing the design of the overall treatment regimen and the dosage of the drug. Demographic information may be considered static during a patient's stay, as PStaticThe demographic information is expressed as:
PStatic={PAge,PGender,PSite,PEducation,...}
the diagnosis description information is given by the doctor and comprises the type of the disease, the qualitative description of the severity of the disease, complications and the like. Patients may suffer from a variety of diseases and during treatment, the disease may gradually heal, or the disease may become worse, with new disease or increased complications. This can therefore be viewed as a dynamic process, using DiagnosinestRepresenting diagnostic description information at time t. The diagnostic description information is formalized as:
physiological indicators in the laboratory (physical examination results): during the course of treatment, in order to accurately assess the efficacy of the treatment, multiple examinations are performed during hospitalization of the patient. For the invention
Shows the result of the physical examination at the t-th time, whereinAs the physiological characteristic index of jth laboratoryThe value at time t.
The prescription (order) of the medicine includes the name of the medicine, the administration route, the daily dosage, the start time and the end time, and the invention uses TreatmenttRepresenting a prescription for a drug, as a combination of a series of drugs, the prescription for the drug is formulated as:
wherein, thereinThe name of the used medicine is shown,is the route of administration, by "intravenous" (IV), "intramuscular" (IM), "oral" (Per os, PO) and the like.Is the dose of the active ingredient to be administered,which indicates how many times a day each time,the time of administration is indicated as such,on day d. dr indicates that the sub-optimal drug prescription is a total of dr different drugs. In the present invention, a time window of a specific size is considered as a complete treatment, and therefore medication is rewritten as:
discharge outcome (mortality): when the patient is discharged, the doctor gives out a discharge evaluation result according to the actual condition of the patient, the patient result can be cure, improvement, invalid and death, and the four results are represented by a unique hot code R {0001,0010,0100,1000 }.
Step 2: the disease severity prediction module intensively predicts the ICU patient's criticality by building a bidirectional heterogeneous LSTM network W1.
In the ICU, the SOFA scoring system may reflect the severity of the patient's condition. SOFA assessments are performed over a long period of time, such as 24 hours, which results in a lower level of response to critically ill patients, and predicting the severity of the disease score in a more intensive way is an effective solution for rapidly monitoring patients in the ICU.
The overall structure of the bidirectional heterogeneous LSTM network W1 is as follows:
wherein, input is physiological characteristic index (physical examination result), demographic information and diagnosis description information of the laboratory, and the heterogeneous LSTM can use the three types of heterogeneous data as input.The predicted disease severity was scored.
The LSTM at each time step t comprises i, f, o, c and h which are respectively an input gate, a forgetting gate, an output gate, a memory unit and a hidden state, wherein the forgetting gate controls the amount of memory to be forgotten, and the input gate controls the updating of each unit and the output gate controls the exposure of the state of the unit; if the physiological characteristic index of the laboratory is comparedAll heterogeneous sequences (physical examination results), demographic information, and diagnostic description information, etc. are used as inputs and sequential hidden states are built for each time series, and the inherent dynamics of each time series may be confounded by fully connected hidden neurons of different time series. In order to realize flexible interaction of multi-surface time series, the invention only reserves the memory related to the physiological characteristic index. Under the control of the previous memory, the additional diagnostic description information time series affects the cell state only through a unique structure called a decomposition gate. Using cell state Ct-1Structural breakdown door dtFor controlling the amount of added information. By controlling the resolution gate, additional candidates are addedAdd to cell state Ct. The forward LSTM and the backward LSTM have the same structure, the forward LSTM network is represented by using a label without a prime sign, and the backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information PStaticThe weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is Wdense,WstaticWeight for static information, bdenseA deviation term for this layer; then inputting the data into a sigmoid layer, and out represents that the layer is an output layer, and finally obtaining a predicted disease severity scoreThe model uses SOFA score as a true value y of Cross entry for training a bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; and solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining the real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network.
Bi-directional heterogeneous LSTM is defined as follows:
ft=σ(Wf[Chechupt,ht-1]+bf)ft′=σ(W′f[Chechupt,′t+1]+b′f)
it=σ(Wi[Chechupt,ht-1]+bi)i′t=σ(W′i[Chechupt,h′t+1]+b′i)
ot=σ(Wo[Chechupt,ht-1]+bo)o′t=σ(W′o[Chechupt,ht+1]+b′o)
dt=σ(WdCt-1+bd)d′t=σ(W′dC′t+1+b′d)
ht=ot*tanh(Ct)h′t=o′t*tanh(Ct′)
D=relu(Wdense[ht,h′t]+WstaticPStatic+bdense)
wherein σ is Sigmoid functiontan h is tan h functionReLu is ReLu function f (x) max (0, x), W is each weight matrix, b represents an offset term, and W, b are parameters to be learned by the model network; diagnosinest,ChechuptDiagnostic description information at time t, laboratory indices (abbreviated to Di, Ch omitted), respectively; i, f, o, C and h, respectively input gate, forget gate, output gate, memory cell and hidden state, using cell state Ct-1Structural breakdown door dtFor controlling the amount of added information; by forgetting door ftControl to add additional candidate valuesAnd cell state C at the previous timet-1Add to Current cell State Ct(ii) a From an input gate itControlling new state informationTo an updated degree ofAdd to Current cell State Ct。
And step 3: a treatment effectiveness measurement module for defining what treatment is effective; the disease severity score for each patient during treatment obtained in step two. By solidifying the structure and parameters of the trained web W1, when a new patient enters, a real-time disease severity score of the new patient can be obtained by inputting current laboratory physiological characteristic indicators (physical examination results), demographic information and diagnosis description information.
For treatments in EMR data, a measure of treatment effectiveness is defined. Evaluation is based on three considerations, the current disease severity of the patient, the degree of impact of the current treatment on the next stage (time window) and the outcome of the treatment at the final discharge. Wherein the degree of influence K of the current treatment on the next stage is represented using the slope of the disease severity score curve, for ease of calculation and considering that the score curve is not smooth, K is defined as:
where T is the length of the time window, yTFor each disease in the Tth time windowDisease severity scores.
Therapeutic efficacy of M ═ W [ y ═ yT;K;R]In the embodiment of the present invention, yTK, after R normalization, W ═ 1,2,1]。
And 4, step 4: the patient similarity measurement module constructs a similarity measurement relation of the patient: information such as laboratory physiological characteristic indicators, demographic information, diagnostic description information, and disease severity of patients is important to construct a measure of similarity between patients. When the disease severity score is measured using the web W1, the patient's information has been deposited in the web.
Each patient is represented as:
wherein the content of the first and second substances,andfrom the forward direction of the LSTM, the,andfrom the back-direction of the LSTM,static demographic information.
Inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar<Pi,Pj>=||Pi-Pj||2;
wherein j represents the jth patient.
And 5: the drug treatment scheme recommendation module provides interpretability by searching and recommending a drug treatment scheme of the next stage through the positive and negative similarity treatment samples: the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module. And introducing a time sequence of medicine prescription information, and constructing a similarity measure-a treatment effectiveness measure-a pharmacy-a time tensor table. When a new patient is hospitalized, a treatment pharmacy with the highest treatment effect at the current stage and the highest similarity with the patient is recommended to the patient. It should be noted that, as the patient treatment is recommended, the patient status changes, and the similarity between the current patient and the patient in the EMR data also changes, so the recommendation of the present invention is dynamically changed according to the patient status.
Considering the adverse reactions between drugs and the allergy history of patients, the invention filters the combination of drugs with large adverse reactions and the allergy treatment of the current patients when recommending in the embodiment, and selects a suboptimal method, wherein the recommendation is more reliable for the current patients. Meanwhile, the treatment examples of the first s effective treatments and the first s ineffective or negative treatments with high similarity to the patient are provided for the doctor to help the doctor to make a better decision.
In this embodiment, the electronic medical record data is from the critical medicine database MIMIC-III. The MIMIC-III database is a real clinical database containing health data related to more than 40,000 patients admitted to the ICU by the Beth Israel Deaconess medical center within 11 years of age, and the invention applies the latest version of MIMIC III v1.4, including 50206 medical treatment records, relating to 6695 different diseases and 4127 drugs. The examples exclude those patients under 15 years of age or staying in the ICU for less than 48 hours. Children were excluded because the definition of the normal range of medical metrics varied between adults and children, and the 48-hour requirement in the ICU ensured sufficient data for analysis. At the same time, patients with large amounts of missing data are excluded because overestimation of the missing data may introduce differences with negative effects. Finally 3255 patients were selected for modeling and analysis.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (6)
1. A large-scale medical data knowledge mining and treatment scheme recommendation system is characterized in that: the system comprises a data set preprocessing module, a disease severity prediction module, a treatment effectiveness measurement module, a patient similarity measurement module and a drug treatment scheme recommendation module, wherein:
the data set preprocessing module is used for acquiring real electronic medical record data and preprocessing the electronic medical record data consisting of various heterogeneous data sources, and the preprocessed electronic medical record comprises five types of patient information which are respectively demographic information PStaticDiagnosis description information Diagnosines, laboratory index Chechup, medicine prescription Treatment and discharge result R;
the disease severity prediction module is used for training the bidirectional heterogeneous LSTM network through the demographic information, the diagnosis description information and the laboratory index data which are obtained by the data set preprocessing module to obtain a disease severity score of each patient in the treatment process;
the treatment effectiveness measurement module is used for obtaining effective treatment measurement information through the disease severity grade obtained by the disease severity prediction module, the influence degree of the current treatment on the next stage and the discharge result obtained by the data set preprocessing module;
the patient similarity measurement module is used for constructing a similarity measurement relation of patients and calculating the similarity between the patients through information deposited in the bidirectional heterogeneous LSTM network and static demographic information of the patients;
the drug treatment scheme recommendation module is used for introducing the time sequence of the drug prescription information according to the effective treatment measure information obtained by the treatment effectiveness measure module and the similarity measurement relation of the patient obtained by the patient similarity measurement module to obtain the next-stage drug treatment scheme recommendation, filtering the treatment combination of the adverse reaction drug and the drug containing the current patient allergy drug, and providing the first s effective treatments with high similarity to the patient and the treatment examples of ineffective or negative effect treatment to the doctor.
2. The large-scale medical data knowledge mining and therapy planning recommendation system of claim 1, wherein: the electronic medical record data is from an intensive care medical database MIMIC III.
3. The large-scale medical data knowledge mining and therapy planning recommendation system of claim 1, wherein: the disease severity prediction module is a bidirectional heterogeneous LSTM network, and the overall structure of the bidirectional heterogeneous LSTM network is as follows:
wherein, input is the input of the heterogeneous LSTM network, and comprises physiological characteristic indexes, demographic information and diagnosis description information of a laboratory,scoring the severity of the disease;
the bi-directional heterogeneous LSTM for each time step t is defined as follows:
ft=σ(Wf[Chechupt,ht-1]+bf) ft′=σ(W′f[Chechupt,h′t+1]+b′f)
it=σ(Wi[Chechupt,ht-1]+bi) i′t=σ(Wi′[Chechupt,h′t+1]+b′i)
ot=σ(Wo[Chechupt,ht-1]+bo) o′t=σ(W′o[Chechupt,ht+1]+b′o)
dt=σ(WdCt-1+bd) d′t=σ(W′dC′t+1+b′d)
ht=ot*tanh(Ct) h′t=o′t*tanh(C′t)
D=relu(Wdense[ht,h′t]+WstaticPStatic+bdense)
wherein σ is Sigmoid functiontan h is tan h functionReLu is ReLu function f (x) max (0, x), W is each weight matrix, b represents an offset term, and W, b are parameters to be learned by the model network; diagnosinest,ChechuptRespectively diagnosis description information and laboratory indexes at the time t; i, f, o, C and h, respectively input gate, forget gate, output gate, memory cell and hidden state, using cell state Ct-1Structural breakdown door dtFor controlling the amount of added information; by forgetting door ftControl to add additional candidate valuesAnd cell state C at the previous timet-1Add to Current cell State Ct(ii) a From an input gate itControlling new state informationTo an updated degree ofAdd to Current cell State Ct;
The forward LSTM and the backward LSTM have the same structure, the forward LSTM network is represented by using a label without a prime sign, and the backward LSTM network is represented by using a label without a prime sign; adding a fully connected layer D to manipulate static demographic information PStaticThe weight of the dense connection with the output of the forward and backward LSTM, the output of the forward and backward LSTM is Wdense,WstaticWeight for static information, bdenseA deviation term for this layer; then inputting the data into a sigmoid layer, and out represents that the layer is an output layer, and finally obtaining a predicted disease severity scoreThe model uses SOFA score as a true value y of Cross entry for training a bidirectional heterogeneous LSTM model, minimizes Cross Entropy, and finally obtains a disease severity score curve of each patient; and solidifying the structure and parameters of the trained bidirectional heterogeneous LSTM network, and when a new patient enters, obtaining the real-time disease severity score of the new patient by using the bidirectional heterogeneous LSTM network.
4. The large-scale medical data knowledge mining and therapy planning recommendation system of claim 1, wherein: the treatment effectiveness measurement module obtains effective treatment measurement information through three aspects of disease severity degree scoring, the influence degree K of the current treatment on the next stage and discharge results R ═ {0001,0010,0100,1000 };
wherein the degree of effect K of the current treatment on the next stage is represented using the slope of the disease severity score curve, K being defined as:
where T is the length of the time window, yTScoring the severity of each disease within the tth time window;
effective treatment measure information M ═ W [ y ═ yT;K;R]。
5. The large-scale medical data knowledge mining and therapy planning recommendation system of claim 1, wherein: the patient similarity measurement relationship constructed by the patient similarity measurement module is as follows:
the ith patient is expressed as:
whereinAndfrom the forward LSTM network, the network,andfrom backward LSTM network, Pi StaticStatic demographic information;
inter-patient similarity is defined as the 2-norm of the subtraction of two patient representations:
Similar<Pi,Pj>=||Pi-Pj||2;
wherein j represents the jth patient.
6. The large-scale medical data knowledge mining and therapy planning recommendation system of claim 1, wherein: the drug treatment scheme recommendation module obtains effective treatment measure information through the treatment effectiveness measure module, and similarity among patients obtained through the patient similarity measure module, introduces a time sequence of drug prescription information, and constructs a similarity measure-treatment effectiveness measure-pharmacy-time tensor table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911117826.5A CN110880362B (en) | 2019-11-12 | 2019-11-12 | Large-scale medical data knowledge mining and treatment scheme recommending system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911117826.5A CN110880362B (en) | 2019-11-12 | 2019-11-12 | Large-scale medical data knowledge mining and treatment scheme recommending system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110880362A true CN110880362A (en) | 2020-03-13 |
CN110880362B CN110880362B (en) | 2022-10-11 |
Family
ID=69728839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911117826.5A Active CN110880362B (en) | 2019-11-12 | 2019-11-12 | Large-scale medical data knowledge mining and treatment scheme recommending system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110880362B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111430032A (en) * | 2020-03-20 | 2020-07-17 | 山东科技大学 | Old people disease modeling method based on APC model and genetic clustering algorithm |
CN111462897A (en) * | 2020-04-01 | 2020-07-28 | 山东大学 | Patient similarity analysis method and system based on improved heterogeneous information network |
CN111681767A (en) * | 2020-06-12 | 2020-09-18 | 电子科技大学 | Electronic medical record data processing method and system |
CN111696666A (en) * | 2020-06-10 | 2020-09-22 | 杭州联众医疗科技股份有限公司 | Intelligent chronic disease management system based on time coding |
CN111863281A (en) * | 2020-07-29 | 2020-10-30 | 山东大学 | Personalized adverse drug reaction prediction method, system, equipment and medium |
CN112712435A (en) * | 2020-12-28 | 2021-04-27 | 天津幸福生命科技有限公司 | Service management system, computer-readable storage medium, and electronic device |
CN113436727A (en) * | 2021-06-30 | 2021-09-24 | 华中科技大学 | Method for scoring cure probability of potential treatment plan based on patient detection information |
CN113593670A (en) * | 2021-08-05 | 2021-11-02 | 江西省科学院应用物理研究所 | Prescription generation method and system for household direct current stimulation medical equipment |
CN113628716A (en) * | 2021-08-05 | 2021-11-09 | 翼健(上海)信息科技有限公司 | Prescription recommendation system |
CN116504354A (en) * | 2023-06-28 | 2023-07-28 | 合肥工业大学 | Intelligent service recommendation method and system based on intelligent medical treatment |
CN116580797A (en) * | 2023-05-15 | 2023-08-11 | 北京利久医药科技有限公司 | Rapid comparison method of clinical test data |
CN117012375A (en) * | 2023-10-07 | 2023-11-07 | 之江实验室 | Clinical decision support method and system based on patient topological feature similarity |
CN117373657A (en) * | 2023-12-07 | 2024-01-09 | 深圳问止中医健康科技有限公司 | Personalized medical auxiliary inquiry system based on big data analysis |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105793852A (en) * | 2013-12-04 | 2016-07-20 | M·奥利尼克 | Computational medical treatment plan method and system with mass medical analysis |
US20190034591A1 (en) * | 2017-07-28 | 2019-01-31 | Google Inc. | System and Method for Predicting and Summarizing Medical Events from Electronic Health Records |
CN109637669A (en) * | 2018-11-22 | 2019-04-16 | 中山大学 | Generation method, device and the storage medium of therapeutic scheme based on deep learning |
CN109994215A (en) * | 2019-04-25 | 2019-07-09 | 清华大学 | Disease automatic coding system, method, equipment and storage medium |
CN110024044A (en) * | 2016-09-28 | 2019-07-16 | 曼迪奥研究有限公司 | For excavating the system and method for medical data |
CN110310740A (en) * | 2019-04-15 | 2019-10-08 | 山东大学 | Based on see a doctor again information forecasting method and the system for intersecting attention neural network |
CN110347837A (en) * | 2019-07-17 | 2019-10-18 | 电子科技大学 | A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease |
-
2019
- 2019-11-12 CN CN201911117826.5A patent/CN110880362B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105793852A (en) * | 2013-12-04 | 2016-07-20 | M·奥利尼克 | Computational medical treatment plan method and system with mass medical analysis |
CN110024044A (en) * | 2016-09-28 | 2019-07-16 | 曼迪奥研究有限公司 | For excavating the system and method for medical data |
US20190034591A1 (en) * | 2017-07-28 | 2019-01-31 | Google Inc. | System and Method for Predicting and Summarizing Medical Events from Electronic Health Records |
CN109637669A (en) * | 2018-11-22 | 2019-04-16 | 中山大学 | Generation method, device and the storage medium of therapeutic scheme based on deep learning |
CN110310740A (en) * | 2019-04-15 | 2019-10-08 | 山东大学 | Based on see a doctor again information forecasting method and the system for intersecting attention neural network |
CN109994215A (en) * | 2019-04-25 | 2019-07-09 | 清华大学 | Disease automatic coding system, method, equipment and storage medium |
CN110347837A (en) * | 2019-07-17 | 2019-10-18 | 电子科技大学 | A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease |
Non-Patent Citations (2)
Title |
---|
SHICKEL, B.1 等: "《Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis》", 《IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS》 * |
丁阳阳: "数据驱动的重症患者健康监测方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111430032A (en) * | 2020-03-20 | 2020-07-17 | 山东科技大学 | Old people disease modeling method based on APC model and genetic clustering algorithm |
CN111430032B (en) * | 2020-03-20 | 2022-03-18 | 山东科技大学 | Old people disease modeling method based on APC model and genetic clustering algorithm |
CN111462897A (en) * | 2020-04-01 | 2020-07-28 | 山东大学 | Patient similarity analysis method and system based on improved heterogeneous information network |
CN111462897B (en) * | 2020-04-01 | 2021-05-11 | 山东大学 | Patient similarity analysis method and system based on improved heterogeneous information network |
CN111696666A (en) * | 2020-06-10 | 2020-09-22 | 杭州联众医疗科技股份有限公司 | Intelligent chronic disease management system based on time coding |
CN111681767B (en) * | 2020-06-12 | 2022-07-05 | 电子科技大学 | Electronic medical record data processing method and system |
CN111681767A (en) * | 2020-06-12 | 2020-09-18 | 电子科技大学 | Electronic medical record data processing method and system |
CN111863281A (en) * | 2020-07-29 | 2020-10-30 | 山东大学 | Personalized adverse drug reaction prediction method, system, equipment and medium |
CN111863281B (en) * | 2020-07-29 | 2021-08-06 | 山东大学 | Personalized medicine adverse reaction prediction system, equipment and medium |
CN112712435A (en) * | 2020-12-28 | 2021-04-27 | 天津幸福生命科技有限公司 | Service management system, computer-readable storage medium, and electronic device |
CN113436727A (en) * | 2021-06-30 | 2021-09-24 | 华中科技大学 | Method for scoring cure probability of potential treatment plan based on patient detection information |
CN113436727B (en) * | 2021-06-30 | 2022-07-12 | 华中科技大学 | Method for scoring cure probability of potential treatment plan based on patient detection information |
CN113628716A (en) * | 2021-08-05 | 2021-11-09 | 翼健(上海)信息科技有限公司 | Prescription recommendation system |
CN113593670A (en) * | 2021-08-05 | 2021-11-02 | 江西省科学院应用物理研究所 | Prescription generation method and system for household direct current stimulation medical equipment |
CN116580797A (en) * | 2023-05-15 | 2023-08-11 | 北京利久医药科技有限公司 | Rapid comparison method of clinical test data |
CN116580797B (en) * | 2023-05-15 | 2023-10-31 | 北京利久医药科技有限公司 | Rapid comparison method of clinical test data |
CN116504354A (en) * | 2023-06-28 | 2023-07-28 | 合肥工业大学 | Intelligent service recommendation method and system based on intelligent medical treatment |
CN117012375A (en) * | 2023-10-07 | 2023-11-07 | 之江实验室 | Clinical decision support method and system based on patient topological feature similarity |
CN117012375B (en) * | 2023-10-07 | 2024-03-26 | 之江实验室 | Clinical decision support method and system based on patient topological feature similarity |
CN117373657A (en) * | 2023-12-07 | 2024-01-09 | 深圳问止中医健康科技有限公司 | Personalized medical auxiliary inquiry system based on big data analysis |
CN117373657B (en) * | 2023-12-07 | 2024-02-20 | 深圳问止中医健康科技有限公司 | Personalized medical auxiliary inquiry system based on big data analysis |
Also Published As
Publication number | Publication date |
---|---|
CN110880362B (en) | 2022-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110880362B (en) | Large-scale medical data knowledge mining and treatment scheme recommending system | |
US11468998B2 (en) | Methods and systems for software clinical guidance | |
Purushotham et al. | Benchmarking deep learning models on large healthcare datasets | |
Shortliffe et al. | Knowledge engineering for medical decision making: A review of computer-based clinical decision aids | |
US7805385B2 (en) | Prognosis modeling from literature and other sources | |
Huddar et al. | Predicting complications in critical care using heterogeneous clinical data | |
WO2023078025A1 (en) | Task decomposition strategy-based auxiliary differential diagnosis system for fever of unknown origin | |
US20090150134A1 (en) | Simulating Patient-Specific Outcomes | |
JP2008532104A (en) | A method, system, and computer program product for generating and applying a prediction model capable of predicting a plurality of medical-related outcomes, evaluating an intervention plan, and simultaneously performing biomarker causality verification | |
Afsaneh et al. | Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review | |
Gautier et al. | Artificial intelligence and diabetes technology: A review | |
Ho et al. | Interpreting a recurrent neural network’s predictions of ICU mortality risk | |
CN111863238A (en) | Parallel intelligence based chronic disease diagnosis and treatment system and diagnosis and treatment method | |
Luo et al. | Applying interpretable deep learning models to identify chronic cough patients using EHR data | |
Yang et al. | Disease prediction model based on bilstm and attention mechanism | |
Sheikhalishahi et al. | Benchmarking machine learning models on eICU critical care dataset | |
Liang et al. | Heart failure disease prediction and stratification with temporal electronic health records data using patient representation | |
RU2752792C1 (en) | System for supporting medical decision-making | |
Zhang et al. | A time-sensitive hybrid learning model for patient subgrouping | |
Basha et al. | Deep learning neural network (DLNN)-based classification and optimization algorithm for organ inflammation disease diagnosis | |
Gupta et al. | An overview of clinical decision support system (CDSS) as a computational tool and its applications in public health | |
Cheng et al. | Combining knowledge extension with convolution neural network for diabetes prediction | |
Berlanga et al. | Medical data integration and the semantic annotation of medical protocols | |
Rasubala et al. | Digital twin roles in public healthcare | |
Strobel et al. | Healthcare in the Era of Digital twins: towards a Domain-Specific Taxonomy. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |