WO2022233121A1

WO2022233121A1 - Unsupervised medical behavior compliance assessment method based on electronic medical record

Info

Publication number: WO2022233121A1
Application number: PCT/CN2021/132173
Authority: WO
Inventors: 杨雪; 李孟娇; 兰蓝; 周小波
Original assignee: 四川大学华西医院
Priority date: 2021-05-06
Filing date: 2021-11-22
Publication date: 2022-11-10
Also published as: CN112992370A; CN112992370B

Abstract

An unsupervised medical behavior compliance assessment method based on an electronic medical record, comprising: S1, collecting, cleaning and preprocessing medical record data; S2, classifying medical record data of patients; S3, clustering medical advice data of the patients; S4, fusing the medical advice data after patient clustering and operation data, and mining a diagnosis and treatment process model according to the patient category in combination with the effect after diagnosis and treatment of the patients; and S5, aligning an actual diagnosis and treatment sequence with the mined diagnosis and treatment process model on the basis of a cost function so as to position the anomaly and calculate the deviation degree of the anomaly. Priori knowledge dependency is reduced, data may be deeply utilized, clinical interpretability of the evaluation result is high, and the complexity of preset logic, such as the illness state and the physical condition of the patient, is high.

Description

一种无监督的基于电子病历的医疗行为合规性评估方法An unsupervised electronic medical record-based medical behavior compliance assessment method

技术领域technical field

本发明涉及医疗数据处理与分析领域领域，尤其涉及一种无监督的基于电子病历的医疗行为合规性评估方法。The invention relates to the field of medical data processing and analysis, in particular to an unsupervised electronic medical record-based medical behavior compliance assessment method.

背景技术Background technique

近年来，伴随着人民生活水平的日益提高，医疗健康行业的发展也遇到了诸多难题。一方面，医疗费用正在以相当快的速度不断增长，而在临床诊疗流程中，因医疗机构受利益干扰，大部分患者都存在着临床医疗行为不合理的现象，这不仅造成了国家医疗资源的浪费，增加了人民的经济负担，甚至可能危害人民的身体健康。另一方面，在临床诊疗过程中，存在着医护人员对指南要求的干预流程及标准掌握不足、对指南要求的依从性不足等问题，从而引起医疗行为不合规的现象，导致患者住院天数增加、感染率和死亡率也相应增加等问题。In recent years, with the increasing improvement of people's living standards, the development of the medical and health industry has also encountered many difficulties. On the one hand, medical expenses are increasing at a fairly rapid rate, and in the clinical diagnosis and treatment process, due to the interference of interests of medical institutions, most patients have unreasonable clinical medical behaviors, which not only causes the national medical resources Waste increases the economic burden of the people, and may even endanger people's health. On the other hand, in the process of clinical diagnosis and treatment, there are problems such as insufficient understanding of the intervention process and standards required by the guidelines, and insufficient compliance with the guidelines, which lead to the phenomenon of non-compliant medical behaviors and increase the number of hospitalization days for patients. , infection rates and mortality rates have also increased accordingly.

随着大数据时代的到来，很多有价值的医疗数据都被记录在电子病历中，但如何利用人工智能和信息化技术，使得电子病历数据得到更好的挖掘和利用，是我们迫切需要解决的难题。通过建立一套基于电子病历的医疗行为合规性评估***，对以往的电子病历数据进行挖掘和分析，可以为医疗工作者提供技术上的辅助，大大提高临床诊疗的质量和效率。With the advent of the era of big data, a lot of valuable medical data are recorded in electronic medical records, but how to use artificial intelligence and information technology to make better mining and utilization of electronic medical record data is what we urgently need to solve. problem. By establishing a medical behavior compliance assessment system based on electronic medical records, mining and analyzing the data of previous electronic medical records, it can provide technical assistance for medical workers, and greatly improve the quality and efficiency of clinical diagnosis and treatment.

传统的基于机器学习方法的评估***大多依赖于相关领域专家的先验知识对数据进行标注，且基于行为分析学习的机器学习方法其学习时间长，但现实是有标注的数据相对较少，电子病历的数据挖掘更适用于半监督或无监督的数据驱动方法；Traditional evaluation systems based on machine learning methods mostly rely on the prior knowledge of experts in related fields to label data, and machine learning methods based on behavioral analysis and learning have a long learning time, but the reality is that there are relatively few labeled data, and electronic Data mining of medical records is more suitable for semi-supervised or unsupervised data-driven methods;

现有技术大多仅利用患者收费项信息，未全面考虑到患者入院检查结果、过敏情况以及医嘱等信息，信息利用情况不够深入；Most of the existing technologies only use the information of the patient's charge items, and do not fully consider the information of the patient's admission examination results, allergies, and doctor's orders, and the information utilization is not deep enough;

现有技术未考虑各指标临床取值，评估结果临床可解释性不足；The existing technology does not consider the clinical value of each index, and the clinical interpretability of the evaluation results is insufficient;

现有模型过于考虑模型的统一性而轻视了不同患者病情和身体状况的差异性，导致精度不高，且适应性较差，同时预警***的预设逻辑规则多针对单一病种和简单临床场景。Existing models overly consider the uniformity of the model and underestimate the differences in the disease and physical conditions of different patients, resulting in low accuracy and poor adaptability. At the same time, the preset logic rules of the early warning system are mostly aimed at a single disease and simple clinical scenarios. .

发明内容SUMMARY OF THE INVENTION

本发明旨在提供一种无监督的基于电子病历的医疗行为合规性评估方法，降低了先验知识依赖度，能够深入利用数据，评估结果临床可解释性强，患者病情和身体状况等预设逻辑复杂度高。The invention aims to provide an unsupervised medical behavior compliance evaluation method based on electronic medical records, which reduces the dependence on prior knowledge, can make deep use of data, has strong clinical interpretability of evaluation results, and predicts the patient's condition and physical condition. The logic complexity is high.

为达到上述目的，本发明是采用以下技术方案实现的：To achieve the above object, the present invention adopts the following technical solutions to realize:

本发明公开一种无监督的基于电子病历的医疗行为合规性评估方法，包括以下步骤：The invention discloses an unsupervised electronic medical record-based medical behavior compliance assessment method, comprising the following steps:

S1、对病例数据进行收集、清洗和预处理，病例数据包括患者的个人信息、入院数据、病史数据、检查数据、诊断数据、诊疗结果、医疗操作数据、住院数据，医疗操作数据包括医嘱数据、手术数据，医嘱数据和手术数据均为时间序列的形式；S1. Collect, clean and preprocess case data. Case data includes patient's personal information, admission data, medical history data, inspection data, diagnosis data, diagnosis and treatment results, medical operation data, and hospitalization data. Medical operation data includes doctor's order data, Surgical data, doctor's order data and surgical data are in the form of time series;

S2、根据患者的个人信息、病史数据、检查数据、诊断数据、诊疗结果，对患者的病例数据进行分类，构造具有相似指标值的模糊概念；S2. Classify the patient's case data according to the patient's personal information, medical history data, examination data, diagnosis data, and diagnosis and treatment results, and construct fuzzy concepts with similar index values;

S3、对患者的医嘱数据进行聚类；S3. Clustering the patient's doctor's order data;

S4、融合患者聚类后的医嘱数据和手术数据，按照患者所属模糊概念的类别，结合患者诊疗后的效果，对诊疗过程模型进行挖掘；S4. Integrate the doctor's order data and operation data after clustering of the patients, and mine the diagnosis and treatment process model according to the category of the fuzzy concept to which the patient belongs, combined with the effect of the patient's diagnosis and treatment;

S5、自定义诊疗过程模型的成本函数，将实际诊疗序列与挖掘出的诊疗过程模型基于此成本函数进行对齐，以此来定位异常的位置，并计算异常的偏离程度。S5. Customize the cost function of the diagnosis and treatment process model, and align the actual diagnosis and treatment sequence with the excavated diagnosis and treatment process model based on this cost function, so as to locate the abnormal position and calculate the deviation degree of the abnormality.

优选的，步骤S2中，使用模糊形式概念分析理论，每一位具有完整临床路径的历史就诊患者视为模糊形式背景的对象，每一类指标视为模糊形式背景的属性，对形式背景的取值进行归一化处理，并为每一个属性设置阈值以及合并相似疾病患者，以便于对模糊形式背景进行化简，构造模糊概念，每一个模糊概念都代表着一类具有相似指标值的特定患者群体。Preferably, in step S2, using the fuzzy formal concept analysis theory, each historical patient with a complete clinical path is regarded as the object of the fuzzy formal background, each type of index is regarded as the attribute of the fuzzy formal background, and the selection of the formal background is regarded as the object of the fuzzy formal background. Values are normalized, thresholds are set for each attribute, and patients with similar diseases are combined, so as to simplify the background of the fuzzy form and construct fuzzy concepts. Each fuzzy concept represents a specific type of patients with similar index values. group.

优选的，步骤S3中，首先将医嘱数据采用多粒度主题模型进行聚类，然后将主题聚类后的医嘱数据采用K-means++算法将主题聚类后的医嘱数据进行按天聚类，以降低医疗行为合规性评估的难度，Preferably, in step S3, firstly, the medical order data is clustered using a multi-granularity topic model, and then the topic-clustered medical order data is clustered by day using the K-means++ algorithm to reduce the Difficulty of medical conduct compliance assessment,

若医嘱数据主题个数为t，则患者i和患者j在第m天和第n天的相似度描述如下：If the number of subjects in the doctor's order data is t, the similarity between patient i and patient j on the mth day and the nth day is described as follows:

Dis _i,m＝(p _im1k ₁,p _im2k ₂,…,p _imtk _t) (2) Dis _i,m = (p _im1 k ₁ ,p _im2 k ₂ ,…,p _imt k _t ) (2)

其中，Dis _i,m表示患者i在第m天t维主题向量的主题概率分布，p代表主题概率，k代表相应主题的权重。 Among them, Dis _i,m represents the topic probability distribution of the t-dimensional topic vector of patient i on the mth day, p represents the topic probability, and k represents the weight of the corresponding topic.

优选的，步骤S4中，使用ProM过程挖掘软件中的Imf过程发现算法对细分后的“治愈”或“好转”的患者数据进行诊疗过程模型的挖掘。Preferably, in step S4, the Imf process discovery algorithm in the ProM process mining software is used to mine the diagnosis and treatment process model for the subdivided "cured" or "improved" patient data.

优选的，步骤S5中，在进行具体医疗行为频繁度的前提下，采用基于TF-IDF加权技术的诊疗过程模型成本函数，量化***或跳过形式的医疗行为的成本，通过ProM过程挖掘软件中的PNetReplyer插件实现实际诊疗序列与标准诊疗过程模型的对齐，判断出异常的位置和偏离程度。Preferably, in step S5, under the premise of the frequency of specific medical behaviors, the cost function of the diagnosis and treatment process model based on TF-IDF weighting technology is used to quantify the cost of medical behaviors in the form of insertion or skipping, and the cost of medical behaviors in the form of insertion or skipping is quantified. The PNetReplyer plug-in realizes the alignment of the actual diagnosis and treatment sequence with the standard diagnosis and treatment process model, and determines the abnormal position and degree of deviation.

优选的，步骤S5中，在医疗序列Seq的***事件x时，***成本Cos t(x)具体描述如下：Preferably, in step S5, during the insertion event x of the medical sequence Seq, the insertion cost Cost(x) is specifically described as follows:

其中，N为样本中医疗序列总数，N(Seq)为包含和不包含***事件x的医疗序列Seq出现的总次数，N(Seq) _x为包含***事件x的医疗序列Seq出现的次数，N为所有医疗序列的样本个数，N(x)为包含有***事件x的医疗序列个数。 Among them, N is the total number of medical sequences in the sample, N(Seq) is the total number of occurrences of medical sequences Seq with and without insertion event x, N(Seq) _x is the number of occurrences of medical sequence Seq including insertion event x, N is the number of samples of all medical sequences, and N(x) is the number of medical sequences containing insertion event x.

本发明的有益效果：Beneficial effects of the present invention:

1、本发明提出了一种综合了更全面的数据的挖掘方案，更全面的考虑了患者各方面的信息。1. The present invention proposes a mining scheme that integrates more comprehensive data, and considers all aspects of patient information more comprehensively.

2、本发明引入了模糊形式概念分析理论对患者群体进行了细分，使得过程模型参照数据的范围更加精细，从而提高了标准过程模型与实际诊疗序列的适配度。2. The present invention introduces the fuzzy formal concept analysis theory to subdivide the patient group, so that the range of reference data of the process model is more refined, thereby improving the degree of adaptation between the standard process model and the actual diagnosis and treatment sequence.

3、本发明采用的多粒度主题模型聚类方法(M-GTM，Multi-Grain Topic Model)在使用效果上显著优于普通的LDA主题模型聚类。3. The multi-granularity topic model clustering method (M-GTM, Multi-Grain Topic Model) adopted in the present invention is significantly better than the common LDA topic model clustering in terms of use effect.

4、本发明在医疗行为频繁度计算时引入TF-IDF加权技术的方法提高了成本函数计算的准确度。4. The method of introducing the TF-IDF weighting technology in the calculation of the frequency of medical behaviors in the present invention improves the accuracy of the cost function calculation.

5、本发明提出的基于电子病历的医疗行为合规性评估流程，无需进行医疗异常数据的标注，是一种无监督的数据驱动方法。5. The electronic medical record-based medical behavior compliance evaluation process proposed by the present invention does not need to mark medical abnormal data, and is an unsupervised data-driven method.

附图说明Description of drawings

图1为本发明的流程示意图。FIG. 1 is a schematic flow chart of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图，对本发明进行进一步详细说明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings.

如图1所示，本发明包括以下步骤：As shown in Figure 1, the present invention comprises the following steps:

S1：病例数据的收集、清洗和预处理，病例数据包括患者的入院数据、住院数据、病史数据、各项检查数据以及医嘱数据；S1: Collection, cleaning and preprocessing of case data. Case data includes patient admission data, hospitalization data, medical history data, various inspection data and doctor's order data;

S2：将病例数据根据患者的个人信息、各项检查数据、患者及家庭病史、诊断数据的不同情况，以及诊疗后的效果，对患者进行分类；S2: Classify the case data according to the patient's personal information, various examination data, patient and family medical history, different conditions of diagnosis data, and the effect of diagnosis and treatment;

S3：对患者的医嘱数据进行聚类，该医嘱数据为时间序列的形式；S3: Clustering the patient's doctor's order data, the doctor's order data is in the form of time series;

S4：融合患者聚类后的医嘱数据和手术等具有时间序列的医疗操作数据，按照细分的患者类别，综合考虑患者诊疗后的效果，挖掘出相对更有效的诊疗过程模型；S4: Integrate the clustered doctor's order data and surgery and other medical operation data with time series, according to the subdivided patient category, comprehensively consider the effect of the patient's diagnosis and treatment, and mine a relatively more effective diagnosis and treatment process model;

S5：自定义诊疗过程模型的成本函数，将实际诊疗序列与挖掘出的诊疗过程模型基于此成本函数进行对齐，以此来定位异常的位置，并计算异常的偏离程度。S5: Customize the cost function of the diagnosis and treatment process model, and align the actual diagnosis and treatment sequence with the excavated diagnosis and treatment process model based on this cost function, so as to locate the abnormal position and calculate the degree of deviation of the abnormality.

步骤S2中，引入模糊形式概念分析理论，每一位具有完整临床路径的历史就诊患者视为模糊形式背景的对象，每一类指标视为模糊形式背景的属性。接下来对形式背景的取值进行归一化处理，并为每一个属性设置阈值以及合并相似疾病患者，以便于对模糊形式背景进行化简。然后就可以进行模糊概念格的构造，每一个模糊概念都代表着一类具有相似指标值的特定患者群体。In step S2, the theory of fuzzy formal concept analysis is introduced, each historical patient with a complete clinical path is regarded as an object of a fuzzy formal background, and each type of index is regarded as an attribute of the fuzzy formal background. Next, normalize the value of the formal background, set a threshold for each attribute, and merge patients with similar diseases, so as to simplify the fuzzy formal background. Then a lattice of fuzzy concepts can be constructed, each fuzzy concept representing a specific group of patients with similar index values.

步骤S3中，首先将医嘱数据采用多粒度主题模型(Multi-Grain Topic Model)进行聚类，然后将主题聚类后的医嘱数据采用K-means++算法将主题聚类后的医嘱数据进行按天聚类，以降低医疗行为合规性评估的难度。若医嘱数据主题个数为t，则患者i和患者j在第m天和第n天的相似度描述如下：In step S3, firstly, the medical order data is clustered by using a multi-grain topic model (Multi-Grain Topic Model), and then the topic-clustered medical order data is clustered by K-means++ algorithm by day. class to reduce the difficulty of medical conduct compliance assessment. If the number of subjects in the doctor's order data is t, the similarity between patient i and patient j on the mth day and the nth day is described as follows:

其中，Dis _i,m＝(p _im1k ₁,p _im2k ₂,…,p _imtk _t)表示患者i在第m天t维主题向量的主题概率分布，p代表主题概率，k代表相应主题的权重。 Among them, Dis _i,m =(p _im1 k ₁ ,p _im2 k ₂ ,...,p _imt k _t ) represents the topic probability distribution of the t-dimensional topic vector of patient i on the mth day, p represents the topic probability, and k represents the corresponding topic the weight of.

步骤S4中，使用ProM过程挖掘软件中的Imf(Inductive Miner-frequent)过程发现算法，对细分后的“治愈”或“好转”的患者数据进行诊疗过程模型的挖掘。In step S4, the Imf (Inductive Miner-frequent) process discovery algorithm in the ProM process mining software is used to mine the diagnosis and treatment process model for the subdivided "cured" or "improved" patient data.

步骤S5中，在进行具体医疗行为频繁度的前提下，采用基于TF-IDF加权技术的诊疗过程模型成本函数，量化***或跳过等形式的医疗行为的成本。以某医疗序列Seq的***事件x为例，***成本Cost(x)具体描述如下：In step S5, on the premise of the frequency of specific medical behaviors, the cost function of the diagnosis and treatment process model based on the TF-IDF weighting technology is used to quantify the cost of medical behaviors in the form of insertion or skipping. Taking the insertion event x of a medical sequence Seq as an example, the insertion cost Cost(x) is specifically described as follows:

其中，N为样本中医疗序列总数，N(Seq)为包含和不包含***事件x的医疗序列Seq出现的总次数，N(Seq)x为包含***事件x的医疗序列Seq出现的次数，N为所有医疗序列的样本个数，N(x)为包含有***事件x的医疗序列个数。Among them, N is the total number of medical sequences in the sample, N(Seq) is the total number of occurrences of medical sequences Seq with and without insertion event x, N(Seq)x is the number of occurrences of medical sequence Seq including insertion event x, N is the number of samples of all medical sequences, and N(x) is the number of medical sequences containing insertion event x.

接下来通过ProM过程挖掘软件中的PNetReplyer插件实现实际诊疗序列与标准诊疗过程模型的对齐，最终判断出异常的位置和偏离程度。Next, the PNetReplyer plug-in in the ProM process mining software is used to align the actual diagnosis and treatment sequence with the standard diagnosis and treatment process model, and finally determine the abnormal position and degree of deviation.

实际使用时，以大血管疾病为例，其实现过程如下：In actual use, taking macrovascular disease as an example, the implementation process is as follows:

1.大血管疾病患者电子病历的收集、清洗和预处理：1. Collection, cleaning and preprocessing of electronic medical records of patients with macrovascular disease:

收集和整理所有患有大血管疾病患者的电子病历数据，选取电子病历中患者的入院数据、住院数据、病史数据、各项检查数据以及医嘱数据。Collect and organize the electronic medical record data of all patients with macrovascular disease, and select the patient's admission data, hospitalization data, medical history data, various inspection data and doctor's order data in the electronic medical record.

然后进行数据的清洗和预处理，包括统一类似项目或诊疗操作的命名、排除中途***或退出的临床路径以及医治无效的案例、合并同一时间的相同诊疗操作或医嘱等。Then, the data is cleaned and preprocessed, including unifying the naming of similar items or medical operations, excluding clinical paths inserted or withdrawn midway, and cases of ineffective treatment, and merging the same medical operations or medical orders at the same time.

2.患者类型分类：2. Classification of patient types:

(1)按照模糊形式概念分析理论，首先构造模糊形式背景：整理患者的个人信息、各项检查数据、患者及家庭病史、诊断数据的不同情况，以及诊疗后的效果几个维度的信息，根据这些信息设置模糊形式背景的属性。(1) According to the theory of fuzzy formal concept analysis, first construct the fuzzy formal background: organize the patient's personal information, various examination data, patient and family medical history, different situations of the diagnosis data, and the information of several dimensions of the effect after diagnosis and treatment, according to This information sets the properties of the blurred form background.

(2)每一例患者的具体信息都将转化为每一条属性所对应的隶属度，并且所有属性的隶属度需做归一化处理。(2) The specific information of each patient will be converted into the degree of membership corresponding to each attribute, and the degree of membership of all attributes needs to be normalized.

(3)通过为每一个属性的隶属度设置合适的阈值并合并相似患者来化简模糊形式背景。可依靠专家经验来对属性的隶属度进行一个合理的划分，也可以选择将此属性的隶属度的变动情况根据历史数据拟合为一个正态分布，并选择一个合适的置信区间(比如设置80％的置信度，单侧或双侧置信区间)，在置信区间外的隶属度设置为0，置信区间内的隶属度则设置为1。(3) Simplify the fuzzy formal background by setting an appropriate threshold for the membership of each attribute and merging similar patients. You can rely on expert experience to make a reasonable division of the membership degree of the attribute, or you can choose to fit the change of the membership degree of this attribute to a normal distribution according to the historical data, and choose a suitable confidence interval (for example, set 80). % confidence, one-sided or two-sided confidence interval), membership outside the confidence interval is set to 0, and membership within the confidence interval is set to 1.

(4)构造概念格。可以通过选择概念格的层次来决定患者分类的粒度。(4) Construct the concept lattice. The granularity of patient classification can be determined by choosing the level of the concept lattice.

3.医嘱数据聚类模块：3. Medical order data clustering module:

对于类似的患者，其每天的诊疗方案也类似，因此首先将医嘱数据采用多粒度主题模型M-GTM(Multi-Grain Topic Model)进行聚类，如果想提升主题聚类的精度，可以掺入少许医嘱数据分类的先验知识，然后再将主题聚类后的医嘱数据采用K-means++算法，将主题聚类后的医嘱数据根据相似度进行按天聚类，从而降低医疗行为合规性评估的难度。若医嘱数据主题个数为t，则患者i和患者j在第m天和第n天的相似度描述如下：For similar patients, their daily diagnosis and treatment plans are also similar. Therefore, the doctor's order data is firstly clustered using the multi-granularity topic model M-GTM (Multi-Grain Topic Model). If you want to improve the accuracy of topic clustering, you can add a little The prior knowledge of the classification of medical order data, and then the K-means++ algorithm is used to cluster the medical order data after topic clustering. difficulty. If the number of subjects in the doctor's order data is t, the similarity between patient i and patient j on the mth day and the nth day is described as follows:

4.诊疗过程模型挖掘：4. Diagnosis and treatment process model mining:

融合患者聚类后的医嘱数据和手术等具有时间序列的医疗操作数据，按照细分的患者类别，综合考虑患者诊疗后的效果，通过Prom过程挖掘软件，选择Imf(Inductive Miner-frequent)过程发现算法，并选择“治愈”和“好转”的患者类别数据进行诊疗过程模型的挖掘，挖掘出相对更有效的诊疗过程模型。Integrate the clustered medical order data and surgery and other medical operation data with time series, according to the subdivided patient categories, comprehensively consider the effect of patients after diagnosis and treatment, and select the Imf (Inductive Miner-frequent) process through the Prom process mining software to discover algorithm, and select the "cured" and "improved" patient category data to mine the diagnosis and treatment process model, and dig out a relatively more effective diagnosis and treatment process model.

5.医疗行为异常发现：5. Abnormal medical behavior found:

(1)在进行大血管疾病相关具体医疗行为频繁度的前提下，采用基于TF-IDF加权技术的诊疗过程模型成本函数，量化***或跳过等形式的医疗行为的成本。以某医疗序列Seq的***事件x为例，***成本Cost(x)具体描述如下：(1) Under the premise of the frequency of specific medical behaviors related to macrovascular disease, the cost function of the diagnosis and treatment process model based on TF-IDF weighting technology is used to quantify the cost of medical behaviors in the form of insertion or skipping. Taking the insertion event x of a medical sequence Seq as an example, the insertion cost Cost(x) is specifically described as follows:

(2)通过ProM软件中的PNetReplyer插件，将实际诊疗序列与挖掘出的诊疗过程标准模型基于此成本函数进行对齐，最终判断出异常的位置和偏离程度。(2) Through the PNetReplyer plug-in in the ProM software, the actual diagnosis and treatment sequence and the excavated diagnosis and treatment process standard model are aligned based on this cost function, and the abnormal position and degree of deviation are finally determined.

当然，本发明还可有其它多种实例，在不背离本发明精神及其实质的情况下，熟悉本领域的技术人员可根据本发明作出各种相应的改变和变形，但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。Of course, the present invention can also have many other examples, without departing from the spirit and essence of the present invention, those skilled in the art can make various corresponding changes and modifications according to the present invention, but these corresponding changes and All modifications should belong to the protection scope of the appended claims of the present invention.

Claims

一种无监督的基于电子病历的医疗行为合规性评估方法，其特征在于包括以下步骤：An unsupervised electronic medical record-based medical behavior compliance assessment method is characterized by comprising the following steps:

S1、对病例数据进行收集、清洗和预处理，病例数据包括患者的个人信息、入院数据、病史数据、检查数据、诊断数据、诊疗结果、医疗操作数据、住院数据，医疗操作数据包括医嘱数据、手术数据，医嘱数据和手术数据均为时间序列的形式；S1. Collect, clean and preprocess case data. Case data includes patient's personal information, admission data, medical history data, inspection data, diagnosis data, diagnosis and treatment results, medical operation data, and hospitalization data. Medical operation data includes doctor's order data, Surgical data, doctor's order data and surgical data are in the form of time series;

S2、根据患者的个人信息、病史数据、检查数据、诊断数据、诊疗结果，对患者的病例数据进行分类，构造具有相似指标值的模糊概念；S2. Classify the patient's case data according to the patient's personal information, medical history data, examination data, diagnosis data, and diagnosis and treatment results, and construct fuzzy concepts with similar index values;

S3、对患者的医嘱数据进行聚类；S3. Clustering the patient's doctor's order data;

S4、融合患者聚类后的医嘱数据和手术数据，按照患者所属模糊概念的类别，结合患者诊疗后的效果，对诊疗过程模型进行挖掘；S4. Integrate the doctor's order data and surgery data after clustering of the patients, and mine the diagnosis and treatment process model according to the category of the fuzzy concept to which the patient belongs, combined with the effect of the patient's diagnosis and treatment;

S5、自定义诊疗过程模型的成本函数，将实际诊疗序列与挖掘出的诊疗过程模型基于此成本函数进行对齐，以此来定位异常的位置，并计算异常的偏离程度。S5. Customize the cost function of the diagnosis and treatment process model, and align the actual diagnosis and treatment sequence with the excavated diagnosis and treatment process model based on this cost function, so as to locate the abnormal position and calculate the deviation degree of the abnormality.
根据权利要求1所述的无监督的基于电子病历的医疗行为合规性评估方法，其特征在于：步骤S2中，使用模糊形式概念分析理论，每一位具有完整临床路径的历史就诊患者视为模糊形式背景的对象，每一类指标视为模糊形式背景的属性，对形式背景的取值进行归一化处理，并为每一个属性设置阈值以及合并相似疾病患者，以便于对模糊形式背景进行化简，构造模糊概念，每一个模糊概念都代表着一类具有相似指标值的特定患者群体。The unsupervised electronic medical record-based medical behavior compliance assessment method according to claim 1, characterized in that: in step S2, the fuzzy formal concept analysis theory is used, and each historical patient with a complete clinical path is regarded as For objects with blurred form background, each type of index is regarded as an attribute of the blurred form background. The value of the form background is normalized, and a threshold is set for each attribute and patients with similar diseases are merged, so as to facilitate the analysis of the blurred form background. Simplify and construct fuzzy concepts, each of which represents a specific group of patients with similar index values.
根据权利要求1所述的无监督的基于电子病历的医疗行为合规性评估方法，其特征在于：步骤S3中，首先将医嘱数据采用多粒度主题模型进行聚类，然后将主题聚类后的医嘱数据采用K-means++算法将主题聚类后的医嘱数据进行按天聚类，以降低医疗行为合规性评估的难度，The unsupervised electronic medical record-based medical behavior compliance assessment method according to claim 1, characterized in that: in step S3, firstly, the doctor's order data is clustered using a multi-granularity topic model, and then the subject clustered The doctor's order data uses the K-means++ algorithm to cluster the subject-clustered doctor's order data by day to reduce the difficulty of evaluating medical behavior compliance.

若医嘱数据主题个数为t，则患者i和患者j在第m天和第n天的相似度描述如下：If the number of subjects in the doctor's order data is t, the similarity between patient i and patient j on the mth day and the nth day is described as follows:

Dis _i,m＝(p _im1k ₁,p _im2k ₂,…,p _imtk _t) (2) Dis _i,m = (p _im1 k ₁ ,p _im2 k ₂ ,…,p _imt k _t ) (2)

其中，Dis _i,m表示患者i在第m天t维主题向量的主题概率分布，p代表主题概率，k代表相应主题的权重。 Among them, Dis _i,m represents the topic probability distribution of the t-dimensional topic vector of patient i on the mth day, p represents the topic probability, and k represents the weight of the corresponding topic.
根据权利要求1所述的无监督的基于电子病历的医疗行为合规性评估方法，其特征在于：步骤S4中，使用ProM过程挖掘软件中的Imf过程发现算法对细分后的“治愈”或“好转”的患者数据进行诊疗过程模型的挖掘。The unsupervised electronic medical record-based medical behavior compliance assessment method according to claim 1, characterized in that: in step S4, the Imf process discovery algorithm in the ProM process mining software is used to find the subdivided "cured" or The "improved" patient data is used to mine the diagnosis and treatment process model.
根据权利要求1所述的无监督的基于电子病历的医疗行为合规性评估方法，其特征在于：步骤S5中，在进行具体医疗行为频繁度的前提下，采用基于TF-IDF加权技术的诊疗过程模型成本函数，量化***或跳过形式的医疗行为的成本，通过ProM过程挖掘软件中的PNetReplyer插件实现实际诊疗序列与标准诊疗过程模型的对齐，判断出异常的位置和偏离程度。The unsupervised electronic medical record-based medical behavior compliance assessment method according to claim 1, characterized in that: in step S5, under the premise of the frequency of specific medical behaviors, a diagnosis and treatment based on TF-IDF weighting technology is adopted The process model cost function quantifies the cost of medical behavior in the form of insertion or skipping. The PNetReplyer plug-in in the ProM process mining software realizes the alignment of the actual diagnosis and treatment sequence with the standard diagnosis and treatment process model, and determines the abnormal position and degree of deviation.
根据权利要求5所述的无监督的基于电子病历的医疗行为合规性评估方法，其特征在于：步骤S5中，在医疗序列Seq的***事件x时，***成本Cos t(x)具体描述如下：The unsupervised electronic medical record-based medical behavior compliance assessment method according to claim 5, characterized in that: in step S5, during the insertion event x of the medical sequence Seq, the insertion cost Cost(x) is specifically described as follows :

其中，N为样本中医疗序列总数，N(Seq)为包含和不包含***事件x的医疗序列Seq出现的总次数，N(Seq) _x为包含***事件x的医疗序列Seq出现的次数，N为所有医疗序列的样本个数，N(x)为包含有***事件x的医疗序列个数。 Among them, N is the total number of medical sequences in the sample, N(Seq) is the total number of occurrences of medical sequences Seq with and without insertion event x, N(Seq) _x is the number of occurrences of medical sequence Seq including insertion event x, N is the number of samples of all medical sequences, and N(x) is the number of medical sequences containing insertion event x.