CN113140273B

CN113140273B - ICU patient electronic medical record analysis method and system based on deep learning

Info

Publication number: CN113140273B
Application number: CN202110349716.2A
Authority: CN
Inventors: 杨帆; 梁云帆; 林开标; 赖永炫; 姚毅虹
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2022-05-10
Anticipated expiration: 2041-03-31
Also published as: CN113140273A

Abstract

The invention relates to an ICU patient electronic medical record analysis method and system based on deep learning, comprising the following steps: receiving input ICD codes and drug vectors, respectively inputting the ICD codes and the drug vectors into two multilayer perceptrons, generating two hidden layers with the same dimension, and calculating the correlation of the two hidden layers; based on the initialized sparse coefficient, calculating the distance between the sparse coefficient and the average activation degree of the neuron after the activation of the intermediate layer of the self encoder by using KL divergence, and putting the distance into a loss function; the hidden layer is used as the middle layer of a self-encoder, a multilayer perceptron is used for decoding, and a prescription containing a plurality of medicines is output; the output prescription is weighted based on the initialized weighting matrix, and the excess medication is put into the loss function as medication loss. The method and the device excavate the potential association between the medicines and give a recommendation by acquiring the mapping relation between the ICD code of the patient and the prescription, and have higher reliability.

Description

ICU patient electronic medical record analysis method and system based on deep learning

Technical Field

The invention relates to the field of information intellectualization, in particular to an ICU patient electronic medical record analysis method and system based on deep learning.

Background

With the advancement of today's big data and medical case electronics. More and more patient information and electronic medical record data is stored in medical systems and databases. With the rapid development of the field of artificial intelligence and the continuous improvement of server computing power, the system has the capability of researching the distribution and characteristics of the data, and further provides help for aspects such as auxiliary clinical diagnosis, prescription recommendation and health management.

Recently, electronic medical records (electronic health records) are gradually becoming a focus of research. EHR data attracts a large number of scholars to study it with its rich information volume, for example, scholars build medical knowledge maps using EHR data. The map can be used by medical workers to organize different situations, data and develop an intelligent medical diagnosis and decision-making assisting system. Such a system can record the patient's illness information and physical condition and can even mine the cause and underlying problems therein. In addition, the studies combining deep learning and EHR data mainly lie in the following three aspects: information extraction (information extraction), characterization learning (representation learning) and clinical Prediction (clinical Prediction). In the characterization learning, besides the Word vector embedding tool commonly used in NLP such as Word2Vec, Glove and the like, there are embedding methods specially performed for patient information, such as the embedding structure proposed by Choi et al. In the aspect of information extraction, the manual processing consumes a lot of time and cost due to the huge amount of data, and in this context, the information extraction tool Valx for EHR data comes up. Clinical prediction is to input patient information such as signs, test data, etc. into a model, and then the model gives a corresponding diagnosis of the disease, and the more well-known model is vector AI.

The characterization learning is to effectively characterize the ICD codes of the patients and the corresponding medicines, and is beneficial to model learning. Clinical prediction is the prescription that predicts the patient with a high degree of accuracy. This work is quantified herein as a multi-label classification problem, aiming to make reliable predictions with reasonable models based on data distribution characteristics. The development of multi-label classification models has evolved into a number of models, such as the classic Binary reservance, whose approach is to treat data labels as independent and solve the problem with the integration of multiple two classifiers. The Classiier Chain (CC) converts the multi-label classification problem into a Chain classification problem, and considers the dependency between labels to a certain extent. Label Power set (LP) is a combination of all tags, which transforms the problem into a multi-classification problem, but the large number of tags of prescription data and the time complexity of LP can be difficult to estimate. Researchers have also proposed Multi-label classification algorithms based on decision trees and support vector machines, Multi-label decision trees, Rank-SVM, etc. In addition, researchers have also explored correlations between markers using RNNs. For example, the CNN _ RNN network proposed by Guibin Chen et al, extracts picture or text information using CNN, considers the correlation of labels and classifies using RNN. Exploring the correlation between labels is an important aspect of multi-label classification, and the quality of processing of the aspect directly affects the performance and effect of the classifier. Although the time complexity is low, the traditional multi-label classification model is difficult to explore hidden relations among labels. Even though some models take into account the link between the tokens, they take into account the way they may be less than sound and reasonable.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides an ICU patient electronic medical record analysis method and system based on deep learning, trains a multi-label classification model with high accuracy based on electronic medical record (electronic health record) data, and can output prescriptions containing various medicines.

The technical scheme adopted by the invention for solving the technical problems is as follows:

on one hand, an ICU patient electronic medical record analysis method based on deep learning comprises the following steps:

s101, receiving input ICD codes and drug vectors, respectively inputting the ICD codes and the drug vectors into two multilayer perceptrons, generating two hidden layers with the same dimension, and calculating the correlation of the two hidden layers;

s102, calculating the distance between the sparse coefficient and the average activation degree of the neuron after the activation of the middle layer of the self-encoder by using KL divergence based on the initialized sparse coefficient, and putting the distance into a loss function;

s103, the hidden layer is used as the middle layer of the self-encoder, a multilayer perceptron is used for decoding, and a prescription containing a plurality of medicines is output;

and S104, weighting the output prescription based on the initialized weighting matrix, and putting the excessive medicines into a loss function as the medicine loss.

Preferably, in S101, the ICD codes are embedded by independent hot codes one-hot and then word vector embedding is performed as input; the drug vector is a one-hot code, so that the mapping from the disease condition to the drug is realized.

Preferably, in S101, the correlation between the two hidden layers is calculated as follows:

wherein f is₁And f₂A neural network representing two hidden layers, respectively; x₁And X₂Respectively representing the inputs of two neural networks; theta.theta.₁And theta₂Network parameters of the two neural networks are respectively represented.

Preferably, in S102, the distance between the sparse coefficient and the average activation degree of the neuron after activation of the intermediate layer of the self-encoder is calculated by using the KL divergence, specifically as follows:

wherein, AESPARSE_LoSSRepresenting the sparse coefficient p and the average degree of activation of neurons after activation of the middle layer of the self-encoder

The distance of (d); s2 represents the number of neurons.

Preferably, in S104, the drug loss is calculated as follows:

WEIGHT_LOSS＝-Σrelu(output-true_label)

among them, Weight_lossIndicating a loss of drug; relu represents a linear rectification function; output represents the prescription output; true label indicates true prescription label.

Preferably, the loss function is represented as follows:

MODEL_LOSS＝α*DCCA_LOSS+β*AESPARSE_LOSS+η*WEIGHT_LOSS

wherein, MODEL_LOSSRepresents the total loss; DCCA_LOSSRepresents the loss from the encoder; α, β, and η represent corresponding weight coefficients, respectively.

In another aspect, an ICU patient electronic medical record analysis system based on deep learning includes:

the correlation acquisition module receives the input ICD codes and the drug vectors, respectively inputs the ICD codes and the drug vectors into the two multilayer perceptrons, generates two hidden layers with the same dimension, and calculates the correlation of the two hidden layers;

a distance acquisition module for calculating the sparse coefficient rho and the average activation degree of the neuron after the activation of the intermediate layer of the self-encoder by using KL divergence based on the initialized sparse coefficient rho

And put into the loss function;

the prescription output module is used for decoding the hidden layer serving as the middle layer of the self-encoder by using a multilayer perceptron and outputting a prescription containing a plurality of medicines;

and the prescription loss acquisition module is used for weighting the output prescription based on the initialized weighting matrix and putting the opened medicines into a loss function as the medicine loss.

According to the embodiment of the invention, the invention has the following beneficial effects:

(1) the invention utilizes the sparse self-encoder to mine the multiple correlation among the marks, the sparse self-encoder utilizes the neurons to directly explore the correlation among the marks, the limitation of a chain structure or an RNN structure is avoided, the time complexity of the method is far lower than the time complexity of training a plurality of base classifiers, the loss function of the sparse self-encoder comprises the loss of the self-encoder, and the KL divergence is used for calculating the sparse coefficient rho and the average activation degree of the neurons after the activation of the middle layer of the self-encoder

The distance of (d);

(2) the ICU patient medical record analysis method based on deep learning uses a multi-label classification model, wherein the multi-label classification model is embedded by using a depth canonical correlation analysis DCCA and a sparse self-encoder at the same time to learn a hidden Space (hidden layer), and then a decoder is used for decoding; this has the advantage that the relationships between the labels can be explored more directly and the classification made. The method has the advantages that the data are very sparse due to the excessive types of the medicines, and the output is convenient to be sparse by using the embedding mode of the sparse self-encoder;

(3) the invention optimizes the loss function and reduces the occurrence of unnecessary drugs in order to fit the practical situation of clinical evolution.

The invention is further described in detail with reference to the drawings and the embodiments, but the ICU patient electronic medical record analysis method and system based on deep learning of the invention is not limited to the embodiments.

Drawings

FIG. 1 is a flow chart of a method for deep learning based analysis of electronic medical records in an ICU patient according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a multi-label classification model according to an embodiment of the invention;

FIG. 3 is a graph of the change in drug loss with increasing number of iterations for an embodiment of the present invention;

FIG. 4 is a block diagram of an ICU patient electronic medical record analysis system based on deep learning according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings, it being noted that the embodiments described in the drawings are only illustrative and are only for the purpose of explaining the present invention and are not to be construed as limiting the present invention. The following describes an ICU patient electronic medical record analysis method and system based on deep learning according to an embodiment of the invention with reference to the accompanying drawings.

Referring to fig. 1 and 2, a method for analyzing an electronic medical record of an ICU patient based on deep learning includes:

Further, in S101, word vector embedding is performed after the ICD code is embedded through the independent hot code one-hot as an input; the drug vector is a one-hot code, so that the mapping from the disease condition to the drug is realized.

one-hot encoding is a common encoding method, and since there are thousands of ICD codes, there are only 30-40 diseases for a specific example. Therefore, if one-hot encoding is used for the experiment, the input data becomes extremely sparse, which is not beneficial to the feature extraction of the neural network. Moreover, because there is a certain relation between diseases, for example, diabetic patients are easy to get complications such as retinopathy, damage of organ nerves such as kidney, etc. But one-hot coding directly ignores this connection due to the nature of the coding. Therefore, one-hot encoding is not applicable to such medical data.

The invention utilizes Word2Vec to embed Word vectors and reduce the dimension of input data. Word2Vec uses a shallow neural network to find the mapping between the independent thermal codes and the Word vectors. That is, Word2Vec is a mapping that converts independent thermal encoding into Word vectors. After mapping, the dimensionality of the data is greatly reduced, and data information can be effectively reserved. Of course, this mapping can also be understood as a spatial transformation, i.e., the original independent thermal encoding is reduced in size with a minimum amount of information. The invention uses the method to reduce the dimension of the ICD code, and can improve the performance of the classifier.

Further, in S101, a correlation between two hidden layers is calculated as follows:

wherein f is₁And f₂A neural network representing two hidden layers, respectively; x₁And X₂Respectively representing the inputs of two neural networks; theta₁And theta₂Network parameters of the two neural networks are respectively represented.

The ICU patient medical record analysis method based on deep learning uses a multi-label classification model, wherein the multi-label classification model is embedded by using a deep canonical correlation analysis DCCA and a sparse self-encoder at the same time to learn a hidden Space, and then a decoder is used for decoding, and the decoder is used for decoding by using a two-layer neural network. Where DCCA computes a representation of two spaces through a stack of multiple nonlinear transformations. Assume that each intermediate layer in the network of the first space has c₁One unit and the final (output) layer has o units. Let a

Becomes an example of the first space. Example x₁Is output from the first layer of

Wherein

Is a matrix of the weights that is,

is the offset vector, s: r → R is a non-linear function. Similarly, the input of another space also calculates its output value in a similar way.

And (3) calculating by using specific variables: h₁∈R^o*m，H₂∈R^o*mAnd the two data are output results of two different spaces through the deep network. The following definitions

Definition of

And define

Wherein r is₁Is a regular term. Let r be₁Is positive, then

Must be positive, according to the formula for calculating the correlation in CCA, H₁，H₂The correlation value of the values of the first K middle factors is the sum of the first K singular values of the T matrix. Wherein

If K is equal to o, substitute the formula

The following holds:

corr(H₁，H₂)＝||T||_tr＝tr(T′T)^1/2

this has the advantage that the relationships between the labels can be explored more directly and the classification made. There is also an advantage in that the excessive variety of drugs causes the data to be very sparse. The embedding mode of the sparse autoencoder is utilized to facilitate output sparseness. After the models are classified, drug losses are used to minimize the inclusion of drugs that are not needed by the patient in the prescribed prescription.

Further, in the step S102, the KL divergence is used to calculate the sparse coefficient ρ and the average activation degree of the neurons after the activation of the intermediate layer of the self-encoder

The specific distance of (c) is as follows:

The distance of (d); s2 represents the number of neurons.

Further, in S104, the drug loss is calculated as follows:

WEIGHT_LOSS＝-∑relu(output-true_label)

Further, the loss function is represented as follows:

MODEL_LOSS＝α*DCCA_LOSS+β*AESPARSE_LOSS+η*WEIGHT_LOSS

Wherein the loss of the self-encoder is AESPARSE_LOSSNamely the Euclidean distance between the output of the Auto-encoder and the real value. β is a coefficient that controls the degree of sparity. From the change of the loss function, when the back propagation is carried out, the gradient calculation formula is represented by the formula

Conversion to the formula:

as described above, due to the multi-label classification model, the evaluation index adopted by the invention comprises Hamming distance, fl _ score and Jaccard loss, and also provides an evaluation index which is used for measuring whether the model has more medicines, namely whether the model has medicines which are not needed by the patient. Because the model provides a prescription reference, the prescription ideally completely conforms to the prescription required by the patient. But one hundred percent accuracy cannot be achieved in actual training. Therefore, the model introduces a new evaluation index, defined as WEIGHT, based on the principle that the patient does not need any drugs rather than do not, and_LOSS。

this is based on the idea of sparse self-encoding, and although the way of calculating the distances between distributions is different, the loss can still achieve the effect of making the output sparse. WEIGHT_LOSSAnd also becomes an important index for measuring the quality of the model.

The multi-marker classification model (CAAE _ SPARSE) used in the method of the invention is aligned with the BinaryRelevance (BR), ClassifierChain (CC), Med _ AR and RNN _ ATT models as follows.

BR and CC are two more common multi-label classification models, and BR directly classifies targets by a base learner while neglecting the correlation among labels. And CC considers the correlation among the marks, and continuously inputs Y, namely label as feature into the classifier, and finally synthesizes the results of a plurality of classifiers and outputs the final result. Thus, the labels are regarded as features, and the correlation among the labels can be considered. As a simplest example, if the base classifier is MLP, it is clear that the correlation between labels is considered again when multiple labels are propagated forward as features. RNN _ ATT uses the Attention in the RNN, which also takes into account correlation, and different ICD codes are weighted differently by the Attention. Finally, the Med _ AR model is formed by combining the Attention, RNN and Retink networks, and is a multi-label classification model proposed by the Gansu deer.

The parameter settings for each model are as follows:

BR (word2vec), input data adopts a decision tree with max _ depth of 2 by using a word2vec base classifier, and other parameters of the base classifier are defaults.

CC (word2vec), input ICD codes are also embedded into words by using the word2vec, and an SVM is adopted as a base classifier.

Med _ AR (glove) is used for embedding input ICD codes into word vectors by using glove, the Retink _ Net part time step is 5, and the Attention part acquires disease characteristics as 512 dimensions.

Med _ AR (Word2Vec) the input ICD code was Word2Vec embedded with Word2Vec, and the other steps were as above.

RNN _ ATT (Word2Vec): the Attention vector is first initialized, weighting the quantized ICD code. Then RNN (LSTM) is used for multi-label classification. Wherein the Cell _ Num parameter of the RNN is set to 2726 the number of drugs output.

CAAE _ SPARSE: the multi-label classification experiments were performed with the above mentioned models. The Word vector embedding mode is Word2 Vec.

The specific parameter adjusting process of the CAAE _ SPARSE comprises the following steps: batch _ size is set to 100, learning rate lr is set to 1e-5, and the dimension of word2Vec is 512. And 2-layer MLPs are adopted in the three positions of the model. The total number of neurons is 1020. The covariance matrix plus a diagonal matrix with diagonal elements 1e-4 increases the stability of the matrix eigenvalue decomposition when computing the Correlation. Dropout _ rate is set to 0.7, training set test set ratio is 7:3, random decimation.

Table 1 below is the experimental results of all data, integrated according to HADM _ ID. There are a total of 14727 strips. Since Glove and Word2Vec gave nearly identical results, the test results presented herein were all that was for Word2 Vec.

TABLE 1 Total test results

The above results are experiments performed on all data, 14727. In terms of accuracy, both the BR and CC models are integration class models. BR splits tags apart, ignoring the correlation between tags. While CC takes into account the correlation between tags, in this case this approach is inadequate. Over two thousand prescriptions are used by the patient, which results in a very sparse coding of the label. Therefore, the way of using CC to continuously use tags as features is difficult to achieve the ideal effect, and correlation among some tags is neglected. Turning to the RNN-based model. The reason why the two models of the RNN Att effect are better is that the two models consider different importance of different diseases, but the contribution mechanism has little effect on improving the overall model result through experimental results. Looking again at the Med _ AR model, it differs in that RethinkNet is used to consider the correlation between markers. The model effect is improved. The model of the invention considers the relevance of the mark more directly, and combines the output of the sparse model with the real situation trend, so that the result can be more accepted by patients.

WEIGHT as defined for the present invention_LOSSIt was found that BR and CC work well, with an average number of drugs per record that were originally prescribed but were small. But their accuracy is not high. While two models based on RNN, WEIGHT_LOSSAre relatively high and are likely to cause misuse of the medicament and are therefore not suitable. The model provided by the invention has the highest accuracy. And its WEIGHT_LOSSAnd also relatively low, within acceptable limits. FIG. 3 shows the WEIGHT model proposed by the present invention_LOSSGraph of the variation with the increase of the number of iterations. It can be clearly seen that the average number of multiple pills per record is significantly reduced as the number of iterations increases. Thus illustrating that the loss function and model structure proposed by the present invention are valid.

Referring to fig. 4, the invention relates to an ICU patient electronic medical record analysis system based on deep learning, which comprises:

the correlation obtaining module 401 receives the input ICD codes and the drug vectors, respectively inputs the ICD codes and the drug vectors into two multilayer perceptrons, generates two hidden layers with the same dimension, and calculates the correlation between the two hidden layers;

a distance obtaining module 402, based on the initialized sparse coefficient ρ, calculating the sparse coefficient ρ and the average activation degree of the neurons after the activation of the intermediate layer of the self-encoder by using the KL divergence

And put into the loss function;

a prescription output module 403, which uses the hidden layer as the middle layer of the self-encoder, decodes the hidden layer by using a multilayer perceptron, and outputs prescriptions containing multiple drugs;

the prescription loss obtaining module 404 weights the output prescription based on the initialized weighting matrix, and puts the opened medicines into the loss function as the medicine loss.

The above-mentioned embodiments are intended to illustrate the objects, aspects and effects of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the present invention, and those skilled in the art can make modifications, substitutions and alterations without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims

1. An ICU patient electronic medical record analysis method based on deep learning is characterized by comprising the following steps:

s104, weighting the output prescription based on the initialized weighting matrix, and putting the excessive medicines into a loss function as medicine loss;

in S101, the correlation between the two hidden layers is calculated as follows:

wherein,

representing the correlation of two hidden layers; f. of₁And f₂A neural network representing two hidden layers, respectively; x₁And X₂Respectively representing the inputs of two neural networks; theta₁And theta₂Respectively representing network parameters of two neural networks;

in S102, the distance between the sparse coefficient and the average activation degree of the neuron after activation of the intermediate layer of the self-encoder is calculated by using the KL divergence, specifically as follows:

The distance of (d); s2 represents the number of neuronsCounting; ρ (x) represents a sparse coefficient corresponding to the specified input x;

representing the average activation degree of j-th neurons when x is input after the middle layer of the self-encoder is activated;

in S104, the drug loss is calculated as follows:

WEIGHT_LOSS＝-Σrelu(output-true_label)

among them, WEIGHT_LOSSIndicating a loss of drug; relu represents a linear rectification function; output represents the prescription output; true _ label represents the true prescription label;

the loss function is represented as follows:

MODEL_LOSS＝α*DCCA_LOSS+β*AESPARSE_LOSS+η*WEIGHT_LOSS

2. The deep learning-based ICU patient electronic medical record analysis method as claimed in claim 1, wherein in S101, ICD codes are embedded by independent thermal codes one-hot and then word vector embedding is performed as input; the drug vector is a one-hot code, so that the mapping from the disease condition to the drug is realized.

3. An ICU patient electronic medical record analysis system based on deep learning, comprising:

And put into the loss function;

the prescription output module is used for decoding the hidden layer as the middle layer of the self-encoder by using a multilayer perceptron and outputting a prescription containing a plurality of medicines;

the prescription loss acquisition module is used for weighting the output prescription based on the initialized weighting matrix and putting the opened medicines into a loss function as the medicine loss;

in the correlation obtaining module, the correlation between the two hidden layers is calculated as follows:

wherein,

in the distance obtaining module, the KL divergence is used for calculating the distance between the sparse coefficient and the average activation degree of the neurons after the activation of the middle layer of the self-encoder, and the method specifically comprises the following steps:

The distance of (d); s2 represents the number of neurons; ρ (x) represents a sparse coefficient corresponding to the specified input x;

in the prescription loss obtaining module, the calculation mode of the medicine loss is as follows:

WEIGHT_LOSS＝-∑relu(output-true_label)

the loss function is represented as follows:

MODEL_LOSS＝α*DCCA_LOSS+β*AESPARSE_LOSS+η*WEIGHT_LOSS