WO2022222026A1 - 医疗诊断缺失数据补全方法及补全装置、电子设备、介质 - Google Patents

医疗诊断缺失数据补全方法及补全装置、电子设备、介质 Download PDF

Info

Publication number
WO2022222026A1
WO2022222026A1 PCT/CN2021/088359 CN2021088359W WO2022222026A1 WO 2022222026 A1 WO2022222026 A1 WO 2022222026A1 CN 2021088359 W CN2021088359 W CN 2021088359W WO 2022222026 A1 WO2022222026 A1 WO 2022222026A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
model
completion
sample point
initial
Prior art date
Application number
PCT/CN2021/088359
Other languages
English (en)
French (fr)
Inventor
苗晓晔
尹建伟
吴洋洋
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Priority to US17/874,230 priority Critical patent/US20220367057A1/en
Publication of WO2022222026A1 publication Critical patent/WO2022222026A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present invention relates to database completion technology, in particular to a method and a completion device, electronic equipment and medium for missing data in medical diagnosis.
  • Missing data is a common problem faced by medical diagnostic data.
  • the main reasons for missing data can include:
  • the purpose of the present invention is to provide a method for completing missing data in medical diagnosis, a complementing device, electronic equipment, and a medium, so as to solve the problem that the traditional method for completing missing data in medical diagnosis is difficult to handle, and for the medical diagnosis data with the problem of missing data. Effective completion to improve medical data integrity as much as possible.
  • an embodiment of the present invention provides a method for completing missing data in medical diagnosis, including:
  • the chain rule is used to calculate the influence of the sample points in the candidate sample point data on the prediction result of the generative adversarial network initial completion model
  • the missing data of the medical diagnosis to be completed is missing completion.
  • an embodiment of the present invention provides a device for completing missing data in medical diagnosis, including:
  • an acquisition module configured to acquire original data with missing data, wherein the original data is a medical diagnosis dataset with missing data
  • a building module for randomly dividing the original data into initial sample point data and candidate sample point data, and using the initial sample point data to construct and train a generative adversarial network initial completion model
  • a parameter estimation module used for estimating the change in the parameters of the generative adversarial network initial completion model by the sample points in the candidate sample point data by using the influence function
  • an influence evaluation module configured to calculate the influence of the sample points in the candidate sample point data on the prediction result of the generative adversarial network initial completion model by using the chain rule on the basis of changes in model parameters
  • a result prediction module used for estimating the prediction result of the initial completion model of the adversarial network by using the influence of the point
  • the sampling module is used for sampling the most influential sample points in the candidate sample point data by using the binary search algorithm, and further iteratively optimizes the initial completion model of the generative adversarial network to obtain the generative adversarial network completion model;
  • the generating module is used to complete the model using the generative adversarial network obtained by the training to complete the missing data of the medical diagnosis to be completed.
  • an embodiment of the present invention provides a device, including:
  • processors one or more processors
  • memory for storing one or more programs
  • the one or more programs when executed by the one or more processors, cause the one or more processors to implement the method as described in the first aspect.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method described in the first aspect is implemented.
  • the embodiment of the present invention constructs and trains a generative adversarial network initial completion model; using the influence function, the present invention estimates that the sample points in the candidate sample point data have an impact on the generative adversarial network initial completion model parameters.
  • the present invention uses the chain rule to calculate the influence of the sample points in the candidate sample point data on the prediction result of the generative adversarial network initial completion model; the present invention uses the influence of the sample points force to estimate the prediction result of the adversarial network completion model; the present invention uses the binary search algorithm to sample the most influential sample points in the candidate sample point data, and further iteratively optimizes the generative adversarial network initial completion model to obtain the generative adversarial network.
  • the network completion model realizes the completion of missing data in medical diagnosis.
  • the completion method can greatly reduce the training samples and training time required by the model by sampling the most influential sample points, and greatly enhance the practicability of the completion model and the efficiency of handling large-scale missing data.
  • FIG. 1 is a flowchart of a method for completing missing data in medical diagnosis according to an embodiment of the present invention
  • FIG. 2 is a block diagram of an influence function evaluation method for sample point data according to an embodiment of the present invention
  • FIG. 3 is a block diagram of an apparatus for completing missing data in medical diagnosis according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for completing missing data in medical diagnosis according to an embodiment of the present invention, and the method includes the following steps:
  • Step S100 Acquire original data with missing data, wherein the original data is a medical diagnosis data set with missing data.
  • the medical diagnosis data set with missing data may specifically be data collected from medical instruments such as artificial ventilators, heart sound sensors, hemoglobin meters, etc.
  • the data missing in the medical diagnosis data is due to the failure of the medical diagnosis instrument and the phenomenon of omission of the diagnosis data. .
  • Step S200 randomly dividing the original data into initial sample point data and candidate sample point data, and using the initial sample point data to construct and train a generative adversarial network initial completion model; this step may include the following substeps:
  • Step S201 According to the obtained original data X, a missing matrix M corresponding to the data missing state in the original data X is calculated and obtained, wherein if the feature of the original data X exists, the missing state of the corresponding position in the missing matrix M is 1, if the data If the feature of the matrix X is missing, the missing state of the corresponding position in the missing matrix M is 0;
  • Step S202 Divide the original data X into initial sample point data X 0 and candidate sample point data X c .
  • Step S203 According to the initial sample point data X 0 , construct and train the initial completion model of the generative adversarial network.
  • the initial completion model of the adversarial network includes a generator model G and a discriminator model D
  • the generator model G is used to perform data completion on the initial sample point data X 0 , and the completed data Input to the discriminator model D
  • the discriminator model D is used to discriminate the completed data and the initial sample point data X 0 to the greatest extent.
  • the generator model and the discriminator model are both deep neural network structures composed of various activation functions.
  • the model parameters of the current discriminator model D are fixed, and the generator model G is trained according to the autoencoder loss function in the generator model G and the discriminator model D's feedback on the discrimination results of the data generated by the generator model G. Therefore, the generator model G
  • the training process is described as follows:
  • the generator model G minimizes its loss function by Perform model training to obtain the current optimal generator model parameters.
  • the discriminator model D judges the probability that each feature in all samples belongs to the real feature. Therefore, the calculation formula of the loss function of the discriminator model D is as follows:
  • the discriminator model D works by minimizing the loss function Perform model training to obtain the current optimal discriminator model parameters.
  • the training strategy of the generator model and the discriminator model is repeated using the batch training method until the maximum number of iterations of the model is reached, so that the initial completion model of the adversarial network is finally obtained.
  • Step S300 using the influence function, estimate the change of the sample points in the candidate sample point data to the parameters of the initial completion model of the generative adversarial network.
  • FIG. 2 is a block diagram of the influence function evaluation method of sample point data according to the present invention.
  • Step S400 on the basis of the change of the model parameters, using the chain rule to calculate the influence of the sample points in the candidate sample point data on the prediction result of the generative adversarial network initial completion model.
  • the chain rule is used to calculate the influence of sample points. That is, the initial completion model predicts the change of the loss function on the validation set H:
  • Step S500 using the influence to estimate the prediction result of the initial completion model of the adversarial network.
  • Step S600 using a binary search algorithm to sample the most influential sample points in the candidate sample point data, and further iteratively optimize the generative adversarial network initial completion model to obtain a generative adversarial network completion model.
  • the trained model predicts the loss function on the validation set H, i.e.
  • the initial completion model of the generative adversarial network is further iteratively optimized to obtain the generative adversarial network completion model
  • Step S700 using the generative adversarial network completion model to complete the missing data in medical diagnosis.
  • the embodiment of the present invention constructs and trains the initial completion model of the generative adversarial network; using the influence function, the present invention estimates that the sample points in the candidate sample point data are on the parameters of the initial completion model of the generative adversarial network.
  • the present invention uses the chain rule to calculate the influence of sample points in the candidate sample point data on the prediction results of the generative adversarial network initial completion model; the present invention uses the sample points The influence estimates the prediction result of the adversarial network completion model; the present invention uses the binary search algorithm to sample the most influential sample points in the candidate sample point data, and further iteratively optimizes the generative adversarial network initial completion model to obtain the generated adversarial network initial completion model.
  • the adversarial network completion model realizes the completion of missing data in medical diagnosis.
  • the completion method can greatly reduce the training samples and training time required by the model by sampling the most influential sample points, and greatly enhance the practicability of the completion model and the efficiency of handling large-scale missing data.
  • the present application also provides an embodiment of an apparatus for completing missing data in medical diagnosis.
  • Fig. 3 is a block diagram of a device for completing missing data in medical diagnosis according to an exemplary embodiment.
  • the device includes:
  • an acquisition module 91 configured to acquire original data with missing data, wherein the original data is a medical diagnosis dataset with missing data;
  • a construction module 92 is used to randomly divide the original data into initial sample point data and candidate sample point data, and use the initial sample point data to construct and train a generative adversarial network initial completion model;
  • the parameter estimation module 93 is used for estimating the change of the parameters of the generative adversarial network initial completion model by the sample points in the candidate sample point data by using the influence function;
  • the influence evaluation module 94 is used to calculate the influence of the sample points in the candidate sample point data on the prediction result of the generative adversarial network initial completion model by using the chain rule on the basis of the change of the model parameters;
  • the result prediction module 95 is used for estimating the prediction result of the initial completion model of the adversarial network by using the influence of the point;
  • Sampling module 96 is used to sample the most influential sample point in the candidate sample point data using the binary search algorithm, further iteratively optimizes the initial completion model of the generative adversarial network, and obtains the generative adversarial network completion model;
  • the generating module 97 is configured to use the generative adversarial network obtained by the training to complete the model, and perform missing completion on the missing data of the medical diagnosis to be completed.
  • the present application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors , so that the one or more processors implement the above-mentioned method for completing missing data in medical diagnosis.
  • the present application also provides a computer-readable storage medium on which computer instructions are stored, characterized in that, when the instructions are executed by a processor, the above-mentioned method for complementing missing data in medical diagnosis is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种医疗诊断缺失数据的补全方法及补全装置、电子设备、介质,该方法包括:获取存在数据缺失问题的医疗诊断数据集;将所述原始数据随机划分成初始样本点数据和候选样本点数据,并利用所述初始样本点数据,构建并训练生成对抗网络初始补全模型;利用影响函数估计出样本点对生成对抗网络初始补全模型参数和对生成对抗网络初始补全模型预测结果的影响力;利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,实现医疗诊断缺失数据补全。该方法针对在医疗诊断数据中存在的数据缺失和数据规模大等问题,提出补全方法,拥有补全效果好、效率高、可扩展性强等优点。

Description

医疗诊断缺失数据补全方法及补全装置、电子设备、介质 技术领域
本发明涉及数据库补全技术,特别是指一种医疗诊断缺失数据补全方法及补全装置、电子设备、介质。
背景技术
数据缺失是医疗诊断数据经常面临的问题,出现缺失的主要原因可以包括:
(a)医疗检测仪器工作状态不稳定:现场环境因素或人为原因导致医疗检测仪器在某些时间段内没有正常工作,从而造成数据缺失;
(b)医疗监测数据:医疗监测过程中,由于检测仪器精准度、生产异常波动等原因,常常存在异常监测数据,此类“坏数据”与实际生产状况不符,需要剔除,而剔除的过程相当于引入了数据缺失。
医疗数据的缺失会造成数据信息不完全,直接影响到后期的医疗诊断。因此,需要对医疗诊断数据中的缺失数据进行补全以提高数据的完整性,从而提高后期医疗诊断数据分析的质量。
众所周知,针对存在数据缺失问题的医疗诊断数据进行数据补全是提高数据完整性的有效途径。但由于传统补全方法的模型复杂度较高,无法直接有效的处理医疗诊断数据。目前针对缺失数据补全问题,国内外学者已经做出了一些工作,但这些工作还存在局限性:(1)数据补全方法补全效果有限;(2)数据补全方法复杂度较高,无法处理缺失数据。
发明内容
本发明的目的是提供一种医疗诊断缺失数据补全方法及补全装置、电子设备、介质,以解决传统补全方法难以处理医疗诊断缺失数据的问题,针对存在 数据缺失问题的医疗诊断数据进行有效的补全,尽可能地提高医疗数据完整性。
为了达到上述目的,本发明采用如下技术方案:
第一方面,本发明实施例提供一种医疗诊断缺失数据补全方法,包括:
获取存在数据缺失问题的原始数据,其中,所述原始数据为存在数据缺失的医疗诊断数据集;
将所述原始数据随机划分成初始样本点数据和候选样本点数据,并利用所述初始样本点数据,构建并训练生成对抗网络初始补全模型;
利用影响函数,估计出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型参数上的变化;
在模型参数变化基础上,利用链式法则计算出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型预测结果的影响力;
利用所述影响力估计出所述对抗网络初始补全模型的预测结果;
利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗网络补全模型;
利用所述生成对抗网络补全模型,对待补全医疗诊断缺失数据进行缺失补全。
第二方面,本发明实施例提供一种医疗诊断缺失数据的补全装置,包括:
获取模块,用于获取存在数据缺失问题的原始数据,其中,所述原始数据为存在数据缺失的医疗诊断数据集;
构建模块,用于将所述原始数据随机划分成初始样本点数据和候选样本点数据,并利用所述初始样本点数据,构建并训练生成对抗网络初始补全模型;
参数估计模块,用于利用影响函数估计出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型参数上的变化;
影响力评估模块,用于在模型参数变化基础上,利用链式法则计算出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型预测结果的影响力;
结果预测模块,用于利用所点影响力估计出所述对抗网络初始补全模型的预测结果;
采样模块,用于利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗网络补全模型;
生成模块,用于利用所述训练得到的生成对抗网络补全模型,对待补全医疗诊断缺失数据进行缺失补全。
第三方面,本发明实施例提供一种设备,包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的方法。
第四方面,本发明实施例提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如第一方面所述的方法。
根据以上技术方案,本发明实施例构建并训练生成对抗网络初始补全模型;利用影响函数,本发明估计出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型参数上的变化;在模型参数变化基础上,本发明利用链式法则计算出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型预测结果的影响力;本发明利用所述样本点影响力估计出对抗网络补全模型的预测结果;本发明利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗网络补全模型, 实现医疗诊断缺失数据补全。在保证模型补全精确度的情况下,所述补全方法能够通过采样最具影响力样本点的方式,极大降低模型所需的训练样本和训练时间,极大增强补全模型的实用性和处理大规模缺失数据的效率。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是本发明实施例的一种医疗诊断缺失数据的补全方法的流程图;
图2是本发明实施例的样本点数据的影响力函数评估方法框图;
图3是本发明实施例的一种医疗诊断缺失数据的补全装置的框图。
具体实施方式
现结合附图和具体实施对本发明的技术方案作进一步说明。
实施例一
图1是本发明实施例的一种医疗诊断缺失数据补全方法的流程图,该方法包括如下步骤:
步骤S100:获取存在数据缺失问题的原始数据,其中,所述原始数据为存在数据缺失的医疗诊断数据集。
所述存在数据缺失的医疗诊断数据集具体可以为人工呼吸机、心音传感器、血红蛋白仪等医疗仪器中采集的数据,医疗诊断数据出现数据缺失是由于医疗诊断仪器出现故障使得诊断数据存在遗漏的现象。
步骤S200,将所述原始数据随机划分成初始样本点数据和候选样本点数据,并利用所述初始样本点数据,构建并训练生成对抗网络初始补全模型;该步骤可以包括以下子步骤:
步骤S201:根据获取的原始数据X,计算得到对应原始数据X中数据缺失状态的缺失矩阵M,其中若原始数据X的特征存在则其在缺失矩阵M中对应位置的缺失状态为1,若数据矩阵X的特征缺失则其在缺失矩阵M中对应位置的缺失状态为0;
步骤S202:将原始数据X划分成初始样本点数据X 0和候选样本点数据X c
步骤S203:根据初始样本点数据X 0,构建并训练生成对抗网络初始补全模型。
具体地,所述对抗网络初始补全模型包括生成器模型G和判别器模型D,所述生成器模型G用于将所述初始样本点数据X 0进行数据补全,并将补全后数据输入到判别器模型D;所述判别器模型D用于最大程度判别补全后数据与所述初始样本点数据X 0。所述生成器模型和判别器模型均为多种激活函数组成的深层神经网络结构。
下面描述生成器模型和判别器模型的训练策略。
生成器模型的训练策略:
固定当前判别器模型D的模型参数,依据生成器模型G中的自编码器损失函数以及判别器模型D对生成器模型G生成数据的判别结果反馈,训练生成器模型G,因此生成器模型G的训练过程描述如下:
首先,基于原始数据矩阵大小生成随机高斯噪声矩阵Z,并利用随机高斯噪声矩阵Z初始化数据矩阵X 0,得到噪声补全矩阵X (z)
Figure PCTCN2021088359-appb-000001
其中
Figure PCTCN2021088359-appb-000002
表示逐元素乘法符号;
其次,将噪声补全矩阵X (z)输入到生成器模型G中,生成器模型的损失函数
Figure PCTCN2021088359-appb-000003
包括:重构损失函数L rec和判别器模型的判别结果反馈函数L pro,如下所示。
Figure PCTCN2021088359-appb-000004
其中超参数λ用以权衡生成器模型,
Figure PCTCN2021088359-appb-000005
表示生成器模型G补全原始数据后输出的补全矩阵,
Figure PCTCN2021088359-appb-000006
表示判别器模型D预测补全矩阵
Figure PCTCN2021088359-appb-000007
中所有样本的每个特征属于真实特征的概率。
最终,生成器模型G通过最小化其损失函数
Figure PCTCN2021088359-appb-000008
进行模型训练,得到当前最优生成器模型参数。
判别器模型的训练策略:
固定当前生成器模型模型参数,将训练好的生成器模型G补全原始数据后输出的补全矩阵
Figure PCTCN2021088359-appb-000009
作为判别器模型D的输入,判别器模型D判断所有样本中每个特征属于真实特征的概率。因此,判别器模型D的损失函数的计算公式如下所示:
Figure PCTCN2021088359-appb-000010
判别器模型D通过最小化损失函数
Figure PCTCN2021088359-appb-000011
进行模型训练,得到当前最优判别器模型参数。
利用批量训练方法重复生成器模型和判别器模型的训练策略,直到达到模型的最大迭代次数,从而最终得到对抗网络初始补全模型。
步骤S300,利用影响函数,估计出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型参数上的变化。图2是本发明的样本点数据的影响力函数评估方法框图。
具体地,利用影响函数
Figure PCTCN2021088359-appb-000012
计算每个样本x添加到初始训练集时,初始补全模型参数上的变化:
Figure PCTCN2021088359-appb-000013
其中
Figure PCTCN2021088359-appb-000014
表示模型的海森矩阵,
Figure PCTCN2021088359-appb-000015
表述模型损失函数在计算样本点 x时所对应的模型梯度。
步骤S400,在模型参数变化基础上,利用链式法则计算出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型预测结果的影响力。
具体地,在模型参数变化基础上,利用链式法则计算出样本点影响力
Figure PCTCN2021088359-appb-000016
即初始补全模型在验证集H上预测损失函数的变化:
Figure PCTCN2021088359-appb-000017
步骤S500,利用所述影响力估计出所述对抗网络初始补全模型的预测结果。
具体地,利用所有样本点影响力
Figure PCTCN2021088359-appb-000018
估计出,当使用所有数据样本点进行训练时补全模型在验证集H上预测损失函数,
Figure PCTCN2021088359-appb-000019
步骤S600,利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗网络补全模型。
具体地,利用二分搜索算法检索出最具影响力的最小样本点集合
Figure PCTCN2021088359-appb-000020
且同时保证由
Figure PCTCN2021088359-appb-000021
训练得到的模型在验证集H上预测损失函数,即
Figure PCTCN2021088359-appb-000022
并在此基础上进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗网络补全模型;
步骤S700,利用所述生成对抗网络补全模型,实现医疗诊断缺失数据补全。
由上述实施例可知,本发明实施例构建并训练生成对抗网络初始补全模型;利用影响函数,本发明估计出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型参数上的变化;在模型参数变化基础上,本发明利用链式法则计算出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型预测结 果的影响力;本发明利用所述样本点影响力估计出对抗网络补全模型的预测结果;本发明利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗网络补全模型,实现医疗诊断缺失数据补全。在保证模型补全精确度的情况下,所述补全方法能够通过采样最具影响力样本点的方式,极大降低模型所需的训练样本和训练时间,极大增强补全模型的实用性和处理大规模缺失数据的效率。
与前述的一种医疗诊断缺失数据的补全方法的实施例相对应,本申请还提供了一种医疗诊断缺失数据的补全装置的实施例。
图3是根据一示例性实施例示出的一种医疗诊断缺失数据的补全装置框图。参照图3,该装置包括:
获取模块91,用于获取存在数据缺失问题的原始数据,其中,所述原始数据为存在数据缺失的医疗诊断数据集;
构建模块92,用于将所述原始数据随机划分成初始样本点数据和候选样本点数据,并利用所述初始样本点数据,构建并训练生成对抗网络初始补全模型;
参数估计模块93,用于利用影响函数估计出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型参数上的变化;
影响力评估模块94,用于在模型参数变化基础上,利用链式法则计算出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型预测结果的影响力;
结果预测模块95,用于利用所点影响力估计出所述对抗网络初始补全模型的预测结果;
采样模块96,用于利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗 网络补全模型;
生成模块97,用于利用所述训练得到的生成对抗网络补全模型,对待补全医疗诊断缺失数据进行缺失补全。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本申请方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
相应的,本申请还提供一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述的一种医疗诊断缺失数据的补全方法。
相应的,本申请还提供一种计算机可读存储介质,其上存储有计算机指令,其特征在于,该指令被处理器执行时实现如上述的一种医疗诊断缺失数据的补全方法。
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明所述原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (10)

  1. 一种医疗诊断缺失数据补全方法,其特征在于,包括:
    获取存在数据缺失问题的原始数据,其中,所述原始数据为存在数据缺失的医疗诊断数据集;
    将所述原始数据随机划分成初始样本点数据和候选样本点数据,并利用所述初始样本点数据,构建并训练生成对抗网络初始补全模型;
    利用影响函数,估计出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型参数上的变化;
    在模型参数变化基础上,利用链式法则计算出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型预测结果的影响力;
    利用所述影响力估计出所述对抗网络初始补全模型的预测结果;
    利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗网络补全模型;
    利用所述生成对抗网络补全模型,对待补全医疗诊断缺失数据进行缺失补全。
  2. 根据权利要求1所述的一种医疗诊断缺失数据补全方法,其特征在于:所述生成对抗网络初始补全模型包括生成器模型和判别器模型,所述生成器模型用于将所述初始样本点数据进行数据补全,并将补全后数据输入到判别器模型;所述判别器模型用于最大程度判别补全后数据与初始样本点数据。
  3. 根据权利要求2所述的一种医疗诊断缺失数据补全方法,其特征在于:所述生成器模型和判别器模型均为多种激活函数组成的深层神经网络结构。
  4. 根据权利要求3所述的一种医疗诊断缺失数据补全方法,其特征在于:依据所述生成器模型中的重构损失函数以及所述判别器模型对所述生成器模型 生成数据的判别结果反馈,训练所述生成器模型。
  5. 一种医疗诊断缺失数据的补全装置,其特征在于,包括:
    获取模块,用于获取存在数据缺失问题的原始数据,其中,所述原始数据为存在数据缺失的医疗诊断数据集;
    构建模块,用于将所述原始数据随机划分成初始样本点数据和候选样本点数据,并利用所述初始样本点数据,构建并训练生成对抗网络初始补全模型;
    参数估计模块,用于利用影响函数估计出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型参数上的变化;
    影响力评估模块,用于在模型参数变化基础上,利用链式法则计算出所述候选样本点数据中样本点对所述生成对抗网络初始补全模型预测结果的影响力;
    结果预测模块,用于利用所点影响力估计出所述对抗网络初始补全模型的预测结果;
    采样模块,用于利用二分搜索算法采样所述候选样本点数据中最具影响力的样本点,进一步迭代优化所述生成对抗网络初始补全模型,得到生成对抗网络补全模型;
    生成模块,用于利用所述训练得到的生成对抗网络补全模型,对待补全医疗诊断缺失数据进行缺失补全。
  6. 根据权利要求5所述的一种医疗诊断缺失数据补全方法,其特征在于:所述生成对抗网络初始补全模型包括生成器模型和判别器模型,所述生成器模型用于将所述初始样本点数据进行数据补全,并将补全后数据输入到判别器模型;所述判别器模型用于最大程度判别补全后数据与初始样本点数据。
  7. 根据权利要求6所述的一种医疗诊断缺失数据补全方法,其特征在于:所述生成器模型和判别器模型均为多种激活函数组成的深层神经网络结构。
  8. 根据权利要求7所述的一种医疗诊断缺失数据补全方法,其特征在于:依据所述生成器模型中的重构损失函数以及所述判别器模型对所述生成器模型生成数据的判别结果反馈,训练所述生成器模型。
  9. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-4任一项所述的方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-4中任一项所述的方法。
PCT/CN2021/088359 2021-04-19 2021-04-20 医疗诊断缺失数据补全方法及补全装置、电子设备、介质 WO2022222026A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/874,230 US20220367057A1 (en) 2021-04-19 2022-07-26 Missing medical diagnosis data imputation method and apparatus, electronic device and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110419669.4A CN113239022B (zh) 2021-04-19 2021-04-19 医疗诊断缺失数据补全方法及补全装置、电子设备、介质
CN202110419669.4 2021-04-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/874,230 Continuation US20220367057A1 (en) 2021-04-19 2022-07-26 Missing medical diagnosis data imputation method and apparatus, electronic device and medium

Publications (1)

Publication Number Publication Date
WO2022222026A1 true WO2022222026A1 (zh) 2022-10-27

Family

ID=77128424

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088359 WO2022222026A1 (zh) 2021-04-19 2021-04-20 医疗诊断缺失数据补全方法及补全装置、电子设备、介质

Country Status (3)

Country Link
US (1) US20220367057A1 (zh)
CN (1) CN113239022B (zh)
WO (1) WO2022222026A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116844733A (zh) * 2023-08-31 2023-10-03 吉林大学第一医院 一种基于人工智能的医疗数据完整性分析方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843941A (zh) * 2023-05-15 2023-10-03 北京中润惠通科技发展有限公司 电力设备检测数据智能分析***
CN117421548B (zh) * 2023-12-18 2024-03-12 四川互慧软件有限公司 基于卷积神经网络对生理指标数据缺失的治理方法及***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165664A (zh) * 2018-07-04 2019-01-08 华南理工大学 一种基于生成对抗网络的属性缺失数据集补全与预测方法
CN111581189A (zh) * 2020-03-27 2020-08-25 浙江大学 一种空气质量检测数据缺失的补全方法及补全装置
CN111738420A (zh) * 2020-06-24 2020-10-02 莫毓昌 一种基于多尺度抽样的机电设备状态数据补全与预测方法
CN112259247A (zh) * 2020-10-22 2021-01-22 平安科技(深圳)有限公司 对抗网络训练、医疗数据补充方法、装置、设备及介质
CN112529209A (zh) * 2020-12-07 2021-03-19 上海云从企业发展有限公司 模型训练方法、装置以及计算机可读存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177088B (zh) * 2013-03-08 2016-05-18 北京理工大学 一种生物医学空缺数据弥补方法
US10225277B1 (en) * 2018-05-24 2019-03-05 Symantec Corporation Verifying that the influence of a user data point has been removed from a machine learning classifier
CN109360159A (zh) * 2018-09-07 2019-02-19 华南理工大学 一种基于生成对抗网络模型的图像补全方法
CN109815223B (zh) * 2019-01-21 2020-09-25 北京科技大学 一种针对工业监测数据缺失的补全方法及补全装置
CN110414601A (zh) * 2019-07-30 2019-11-05 南京工业大学 基于深度卷积对抗网络的光伏组件故障诊断方法、***及设备
CN112286824B (zh) * 2020-11-18 2022-08-02 长江大学 基于二分搜索迭代的测试用例生成方法、***及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165664A (zh) * 2018-07-04 2019-01-08 华南理工大学 一种基于生成对抗网络的属性缺失数据集补全与预测方法
CN111581189A (zh) * 2020-03-27 2020-08-25 浙江大学 一种空气质量检测数据缺失的补全方法及补全装置
CN111738420A (zh) * 2020-06-24 2020-10-02 莫毓昌 一种基于多尺度抽样的机电设备状态数据补全与预测方法
CN112259247A (zh) * 2020-10-22 2021-01-22 平安科技(深圳)有限公司 对抗网络训练、医疗数据补充方法、装置、设备及介质
CN112529209A (zh) * 2020-12-07 2021-03-19 上海云从企业发展有限公司 模型训练方法、装置以及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116844733A (zh) * 2023-08-31 2023-10-03 吉林大学第一医院 一种基于人工智能的医疗数据完整性分析方法
CN116844733B (zh) * 2023-08-31 2023-11-07 吉林大学第一医院 一种基于人工智能的医疗数据完整性分析方法

Also Published As

Publication number Publication date
CN113239022B (zh) 2023-04-07
CN113239022A (zh) 2021-08-10
US20220367057A1 (en) 2022-11-17

Similar Documents

Publication Publication Date Title
WO2022222026A1 (zh) 医疗诊断缺失数据补全方法及补全装置、电子设备、介质
JP2022540634A (ja) 深層学習に基づく3d点群の物体検出およびインスタンスセグメント化
CN110427654B (zh) 一种基于敏感状态的滑坡预测模型构建方法及***
CN108960303B (zh) 一种基于lstm的无人机飞行数据异常检测方法
CN111581189B (zh) 一种空气质量检测数据缺失的补全方法及补全装置
CN112613584A (zh) 一种故障诊断方法、装置、设备及存储介质
CN117079815A (zh) 一种基于图神经网络的心血管疾病风险预测模型构建方法
CN117011234A (zh) 一种基于去噪扩散概率模型的染色体异常检测***及方法
CN115051929A (zh) 基于自监督目标感知神经网络的网络故障预测方法及装置
CN106778252B (zh) 基于粗糙集理论与waode算法的入侵检测方法
CN110399279B (zh) 一种用于非人智能体的智能度量方法
CN116664265A (zh) 一种数据处理的方法、装置、电子设备及存储介质
CN116243680A (zh) 一种黑盒域适应的工业设备诊断方法、***及存储介质
CN116992380A (zh) 卫星多维遥测序列异常检测模型构建方法及装置、异常检测方法及装置
CN114224354B (zh) 心律失常分类方法、装置及可读存储介质
EP4012667A2 (en) Data preparation for artificial intelligence models
Shahid et al. Batch renormalization accumulated residual U-network for artifacts removal in photoacoustic imaging
CN115168326A (zh) Hadoop大数据平台分布式能源数据清洗方法及***
WO2021103623A1 (zh) 一种脓毒血症的预警装置、设备及存储介质
EP3869458A1 (en) Annular structure representation
CN111444659A (zh) 基于改进粒子滤波的离心泵故障诊断方法、***和介质
CN117493583B (zh) 结合事件日志和知识图谱的流程操作序列生成方法及***
EP4198997A1 (en) A computer implemented method, a method and a system
CN117688496B (zh) 面向卫星遥测多维时序数据的异常诊断方法、***及设备
CN115859201B (zh) 一种化工过程故障诊断方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21937275

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21937275

Country of ref document: EP

Kind code of ref document: A1