CN116089801A

CN116089801A - Medical data missing value repairing method based on multiple confidence degrees

Info

Publication number: CN116089801A
Application number: CN202310031008.3A
Authority: CN
Inventors: 范科峰; 曾登辉; 杨磊; 董建; 方春燕; 苗宗利; 刘立新
Original assignee: Guilin University of Electronic Technology; China Electronics Standardization Institute
Current assignee: Guilin University of Electronic Technology; China Electronics Standardization Institute
Priority date: 2023-01-10
Filing date: 2023-01-10
Publication date: 2023-05-09

Abstract

The invention discloses a medical data missing value batch repairing method based on multiple confidence degrees, which comprises the following steps: updating the sample confidence coefficient by using the attribute weight, and introducing the missing sample set into model training; optimizing a loss function by using the sample confidence, and filling missing values of the data set; the sample confidence is calculated according to attribute relations among samples, and different confidence is given to the samples according to the data attribute to be predicted and the number of missing values in the samples; the influence degree of the sample in the model training process is adjusted through dynamic selection of the confidence coefficient. The model architecture is optimized, so that the network can be used for batch filling of multidimensional data missing values at one time, the constant mapping problem of a network transfer function can be eliminated, and the cross correlation among nodes can be enhanced. The invention improves the utilization rate of the data and improves the filling precision and the filling efficiency of the data.

Description

Medical data missing value repairing method based on multiple confidence degrees

Technical Field

The invention relates to the crossing field of medical health and information science, in particular to a method for repairing medical data missing values based on multiple confidence degrees.

Background

With the vigorous development of big data industry and the vigorous popularization of intelligent medical treatment in the whole society, more and more medical data sets are used for assisting medical diagnosis, and the quality of the data sets directly influences the diagnosis result. At present, medical data inevitably has a defect in the process of collection, transmission and storage due to various unavoidable factors. The missing data affects the authenticity of the data itself, reduces the validity of the data, and affects subsequent data analysis, so filling the missing data is highly necessary.

At present, researchers solve the problem of data missing mostly through mean value filling, regression filling, multiple filling, nearest neighbor filling and other modes, however, under the condition that a data set sample has multi-dimensional attribute missing and the missing rate is large, the filling methods are difficult to achieve accurate, effective and rapid filling.

Conventional statistical algorithms and machine learning algorithms are mostly populated for a single missing attribute. When one attribute in the data set is filled, deleting the samples with other missing attributes, so that resource waste is caused, valuable information in the missing data samples cannot be obtained, and accuracy of result analysis is possibly influenced.

Therefore, those skilled in the art are dedicated to develop a method for repairing missing values of medical data based on multiple confidence levels, to batch fill the missing values of multidimensional data in the data set samples, to improve filling efficiency, to reasonably add the data samples with the missing values into the training model, to fully mine the data information in the data set, so as to solve the defects in the prior art.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, the present invention aims to solve the technical problems that the filling method disclosed in the prior art mostly fills a single missing attribute, when filling a certain attribute in a data set, deleting samples with other missing attributes to cause resource waste, valuable information in the missing data samples cannot be obtained, and the utilization rate of data in the original data set is low, and the efficiency and accuracy of data filling are low.

In order to achieve the above object, the present invention provides a method for repairing missing values of medical data based on multiple confidence levels, which includes using each attribute in each sample of a dataset as a target attribute to be filled in sequence, analyzing correlation between each attribute in the dataset and the attribute of the data to be filled by a statistical method, and calculating weight of each attribute relative to the target attribute based on the correlation; updating the confidence coefficient of each data sample through the weight of the attribute and the number of missing data in the sample, changing the influence degree of the sample on the whole training model through dynamic adjustment of the confidence coefficient, and adding each sample into the model training process; the method comprises the steps of filling missing values in batches through a self-association neural network model, optimizing a transmission path of the neural network model on the basis of the self-association neural network to eliminate an identity mapping problem and enhance cross-correlation among nodes; introducing classification errors and sample confidence to improve a loss function so as to improve the sample utilization rate and filling accuracy;

further, the method for repairing the medical data missing values based on the multiple confidence degrees specifically comprises the following steps:

step 1, importing a missing data set;

step 2, eliminating dimension influence among sample indexes, and carrying out normalization processing on a data set, wherein the normalization formula of the data sample is as follows (1):

wherein,,

x _ij normalized values for the original data of the ith row and jth column;

x represents data to be normalized in the sample;

x _max representing a maximum value of the data attribute;

x _min representing a minimum value of the data attribute;

step 3, calculating an association relation matrix among all the attributes by a statistical method; the correlation coefficient between features is calculated as formula (2):

wherein,,

ρ _i,j representing the correlation between attribute i and attribute j;

cov (i, j) represents the covariance of attribute i and attribute j;

di and Dj represent the variances of attribute i and attribute j;

step 4, updating the weight of each other attribute relative to the target attribute by using the correlation coefficient obtained in the step 3; the target attributes are all attributes in the sample set; the specific weight calculation formula is as follows (3):

wherein,,

W _ij representing the weight of attribute j relative to attribute i;

d represents the total attribute number of the sample;

step 5, calculating the confidence coefficient of each sample set; in particular to the preparation method of the composite material,

step 5-1, calculating the weight of other attributes in the sample relative to the target attribute according to the mode of step 4; calculating multiple confidence of the sample according to the formula (4), wherein the destroyed degree of the sample is obtained by adding weights of all missing value attributes:

wherein,,

ms represents the missing attribute;

R _ki representing the confidence of the sample in predicting the ith value in the kth sample;

step 5-2, filling all missing values through a self-association neural network model at one time; on the premise of not determining the missing attribute of the sample, outputting a predicted value to each attribute of the sample; when predicting each attribute, the sample can regain a new confidence, namely, how many dimensions of a sample the sample has a plurality of confidence;

step 6, marking the missing value of the data set, returning the coordinate position information of the missing value in the data set, and pre-filling by using an average value or a mode or a median according to the characteristics of the attribute of the missing value;

step 7, dividing the data set into a training set and a testing set;

step 8, constructing a neural network, and optimizing a transmission path of the neural network; the number of output quantities of all attribute values of the model prediction sample is equal to the number of input quantities; adding the predicted classification result to an output layer of the neural network; when filling a certain missing value, removing the corresponding input data quantity from the training model, wherein the specific network transfer function is as follows:

the transfer formula of the next level of the input layer is as follows (5):

the transfer formula between hidden layers is as follows (6):

when the predicted value is a continuous value, the output value of the network is as shown in formula (7):

when the predicted value is a classification attribute, the output transfer function of the network is formula (8); when the predicted value is a multi-classification attribute, the output transfer function of the network is equation (9)

In the formulas (5), (6), (7), (8) and (9),

confidence of incomplete sample R _ij The R is _ij Representing the confidence that the sample was assigned when predicting the jth attribute of the ith sample;

g () represents the relu activation function;

f () represents a sigmoid activation function;

h () represents a softmax activation function;

Y _hj representing the output of the h neuron of the input layer to the next layer of the neural network in predicting the j-th attribute;

Y _h ” _j representing the output of the h' th neuron of the second hidden layer when predicting attribute j;

Y _j an output value representing the network model;

W _lh representing the transfer weight between the first attribute of the input sample and the h neuron of the second layer of the network;

W _hh' representing a transfer weight between an h neuron of the first hidden layer to an h' neuron of the second hidden layer;

W _h'j representing the transfer weight between the h' th neuron of the second hidden layer and the attribute j to be predicted of the output layer;

X _il a first attribute representing a first input sample;

b _n biasing the network model;

n represents a total of n neurons in the hidden layer;

d represents the attribute number of the sample;

step 9, performing one-hot coding on the classification attribute in the data set;

step 10, optimizing a loss function; applying the calculated multiple confidence coefficient to a loss function of the neural network model, and distinguishing the influence degree of each sample on model training through adjusting the confidence coefficient of the sample;

the expression of the loss function is as follows (10):

in the method, in the process of the invention,

R _ij the confidence of the sample when the ith sample fills the jth missing attribute is represented, and cont represents that the attribute value is a continuous numerical variable;

class represents a classification attribute;

x _ij represents the jth attribute in the ith sample (which is a continuous variable);

y _ij a predicted value representing the jth attribute in the ith sample;

z _ij a classification result of the jth attribute of the ith sample (the classification result is represented by a one-hot code);

p _ij a prediction result of a jth classification attribute of the ith sample;

an early stop strategy is introduced, so that the prediction accuracy is maximized;

the early-stopping strategy is that in the model training process, in order to determine the optimal training times and obtain the best result, the concept is introduced, if the training times are too small, the underfitting is caused, if the training times are too many, the underfitting is caused, in order to solve the problem, the early-stopping strategy is introduced, after each epoch is finished, a test result is obtained on a verification set, and as the epoch is increased to a certain value, the error of the verification set is changed from a descending trend to an ascending trend, the training is stopped at the moment, and the epoch at the moment is the optimal training times;

by adopting the scheme, the medical data missing value repairing method based on the multiple confidence degrees has the following advantages:

(1) According to the medical data missing value restoration method based on multiple confidence degrees, a statistical method and a machine learning method are combined, the number of dimensions of a sample is taken as a judgment basis, and multiple confidence degrees are given to the sample; the loss function of the self-association neural network model is optimized through dynamic selection of the confidence coefficient, incomplete samples are introduced into training of the model, and the utilization rate of data is improved;

(2) According to the medical data missing value restoration method based on the multiple confidence degrees, the transmission path of the self-association neural network model is optimized, the self-mapping problem from the input node to the output node is removed (when a certain attribute is predicted, the attribute does not participate in the input of the network model), and the cross-correlation among the nodes is enhanced;

(3) According to the medical data missing value restoration method based on the multiple confidence degrees, the missing values of the classification attribute and the continuous attribute are synchronously predicted, so that the model filling efficiency is improved, the transmission path of the self-association neural network model is optimized, and the self-mapping problem from the input node to the output node is solved;

in summary, according to the medical data missing value repairing method based on multiple confidence degrees, disclosed by the invention, the multidimensional data missing values in the data set samples are filled in batches, so that the filling efficiency is improved, the data samples with the multidimensional missing values are reasonably added into the training model, the data information in the data set is fully mined, the utilization rate of the data in the original data set is improved, the data filling efficiency is accelerated, and the data filling accuracy is improved;

the conception, specific technical scheme, and technical effects produced by the present invention will be further described in conjunction with the specific embodiments below to fully understand the objects, features, and effects of the present invention.

Drawings

FIG. 1 is a flow chart of a method of medical data missing value restoration based on multiple confidence levels of the present invention;

FIG. 2 is a network architecture diagram of a method of medical data missing value restoration based on multiple confidence levels in accordance with the present invention;

Detailed Description

The following describes a number of preferred embodiments of the present invention to make its technical contents more clear and easy to understand. This invention may be embodied in many different forms of embodiments which are exemplary of the description and the scope of the invention is not limited to only the embodiments set forth herein.

Example 1, method of medical data missing value repair based on multiple confidence

The data set adopts a heart disease data set on a kagle officer network;

in the embodiment 1, according to the attribute of the filled missing value, the relative weight of each attribute and the predicted attribute is calculated, a plurality of confidence degrees are given to the samples by combining the number of the sample missing values, and the influence of each sample on model training when each attribute is filled is distinguished by adjusting the confidence degrees of the samples; in particular to the preparation method of the composite material,

step 1, loading a heart disease data set on a kaggle officer network; randomly deleting the heart disease data set, and storing the relative position coordinates of the deletion value in the data set to the local, and marking the relative position coordinates as (a, b); on the dataset, the missing values for the classification attributes are filled with modes, and the missing values for the continuity data are filled with averages;

wherein,,

x _ij normalized values for the original data of the ith row and jth column;

x represents data to be normalized in the sample;

x _max representing a maximum value of the data attribute;

x _min representing a minimum value of the data attribute;

wherein,,

ρ _i,j representing the correlation between attribute i and attribute j;

cov (i, j) represents the covariance of attribute i and attribute j;

di and Dj represent the variances of attribute i and attribute j;

wherein,,

W _ij representing the weight of attribute j relative to attribute i;

d represents the total attribute number of the sample;

wherein,,

ms represents the missing attribute;

in step 5, the confidence is reset for each sample, the data set has fourteen attributes, and a confidence R is assigned to each sample for each attribute _ij ；

Thus, the sample has fourteen confidence levels; assuming that the first sample lacks the first, third, and fifth attributes, respectively, the sample confidence may be expressed as follows: r is R _i1 ＝1-W ₁₃ -W ₁₅ Wherein W is ₁₃ Representing the weight of attribute three relative to attribute one, W, in populating attribute one ₁₅ Representing the weight of attribute five relative to attribute one when filling attribute one;

similarly, R _i2 ＝1-W ₂₁ -W ₂₃ -W ₂₅ ；R _i3 ＝1-W ₃₁ -W ₃₅ And similarly, R is calculated _i14 Is a value of (2);

in the step 6, the position coordinates (a, B) of the data to be filled in the training set and the testing set are updated again according to the relative position coordinates (a, B) in the step 1;

step 7, dividing the data set into a training set and a testing set;

in the step 7, eight percent of the data set is a training set, and twenty percent is a testing set;

the transfer formula of the next level of the input layer is as follows (5):

the transfer formula between hidden layers is as follows (6):

In the formulas (5), (6), (7), (8) and (9),

g () represents the relu activation function;

f () represents a sigmoid activation function;

h () represents a softmax activation function;

Y _h′j representing the output of the h' th neuron of the second hidden layer when predicting attribute j;

Y _j an output value representing the network model;

X _il a first attribute representing a first input sample;

b _n biasing the network model;

n represents a total of n neurons in the hidden layer;

d represents the attribute number of the sample;

in the step 8, a specific network structure diagram is shown in fig. 2;

in the heart disease data set, when the first output of the prediction output layer is output, the first input of the data set does not participate in training, and so on, and when the second output of the prediction output layer is output, the second input of the data set does not participate in training of the model until the last predicted value is output;

step 9, performing one-hot coding on the classification attribute in the data set; adding predictive output of the classification attribute in an output layer of the neural network, outputting a set of probability values by using a softmax activation function, and taking the product of one-hot coding of the predictive value and the accurate value as a part of a loss function;

the expression of the loss function is as follows (10):

in the method, in the process of the invention,

class represents a classification attribute;

y _ij a predicted value representing the jth attribute in the ith sample;

p _ij a prediction result of a jth classification attribute of the ith sample;

in the step 10, the regression error (x _ij -y _ij ) ² And classification error z _i p _i Combining and multiplying the confidence level of the missing value to be filled of the corresponding sample as a final loss function; the sample training times follow an early stop strategy to obtain the optimal filling effect; the early-stopping strategy is that in the model training process, in order to determine the optimal training times and obtain the best result, the concept is introduced, if the training times are too small, the underfitting is caused, if the training times are too many, the underfitting is caused, in order to solve the problem, the early-stopping strategy is introduced, after each epoch is finished, a test result is obtained on a verification set, and as the epoch is increased to a certain value, the error of the verification set is changed from a descending trend to an ascending trend, the training is stopped at the moment, and the epoch at the moment is the optimal training times;

step 11, comparing the filling value with the accurate value in the original data according to the missing value coordinates (A, B) recorded in the step 6, and calculating the continuous value filling percentage error rate

(x _i For accurate value +.>

For filling values) and the accuracy of the classification property filling +.>

(i=1 when filling is correct, i=0 when not correct);

comparative example 2, without setting sample confidence, training the samples directly;

wherein,,

x _ij normalized values for the original data of the ith row and jth column;

x represents data to be normalized in the sample;

x _max representing a maximum value of the data attribute;

x _min representing a minimum value of the data attribute;

marking the missing value of the data set, returning the coordinate position information of the missing value in the data set, and pre-filling by using an average value or a mode or a median according to the characteristics of the attribute of the missing value;

in the step 3, the position coordinates of the data to be filled in the training set and the testing set are updated again according to the relative position coordinates (B) in the step 1;

step 4, dividing the data set into a training set and a testing set;

in the step 4, eight percent of the data set is a training set, and twenty percent is a testing set;

step 5, building a neural network and optimizing a transmission path of the neural network; the number of output quantities of all attribute values of the model prediction sample is equal to the number of input quantities; adding the predicted classification result to an output layer of the neural network; when filling a certain missing value, removing the corresponding input data quantity from the training model, wherein the specific network transfer function is as follows:

the transfer formula of the next level of the input layer is as follows (5):

the transfer formula between hidden layers is as follows (6):

In the formulas (5), (6), (7), (8) and (9),

g () represents the relu activation function;

f () represents a sigmoid activation function;

h () represents a softmax activation function;

Y _j an output value representing the network model;

X _il the ith input samplel attributes;

b _n biasing the network model;

n represents a total of n neurons in the hidden layer;

d represents the attribute number of the sample;

step 6, performing one-hot coding on the classification attribute in the data set; adding predictive output of the classification attribute in an output layer of the neural network, outputting a set of probability values by using a softmax activation function, and taking the product of one-hot coding of the predictive value and the accurate value as a part of a loss function;

step 7, optimizing a loss function; the expression of the loss function is as follows (10):

loss＝∑ _j＝cont (x _ij -y _ij ) ² -∑ _j＝cont z _ij p _ij (10)

in the method, in the process of the invention,

cont indicates that the attribute value is a continuous numerical variable;

class represents a classification attribute;

y _ij a predicted value representing the jth attribute in the ith sample;

p _ij a prediction result of a jth classification attribute of the ith sample;

in the step 7, the regression error (x _ij -y _ij ) ² And classification error z _i p _i Combining and multiplying the confidence level of the missing value to be filled of the corresponding sample as a final loss function; the sample training times follow an early stop strategy to obtain the optimal filling effect;

step 8, comparing the filling value with the accurate value in the original data according to the missing value coordinates (A, B) recorded in the step 3, and calculating the continuous value filling percentage error rate

(x _i For accurate value +.>

For filling values) and the accuracy of the classification property filling +.>

(i=1 when filling is correct, i=0 when not correct);

comparative example 3, calculating sample confidence according to the number of missing values, and further changing the influence degree of each sample on model training;

wherein,,

x _ij normalized values for the original data of the ith row and jth column;

x represents data to be normalized in the sample;

x _max representing a maximum value of the data attribute;

x _min representing a minimum value of the data attribute;

step 3, counting the number of missing values and the total number of sample attributes of each data sample, wherein the sample confidence level=the total number of samples missing values, the sample confidence level is only one confidence level, and the sample confidence level is recorded as R _i Representing the ith sample;

in the step 4, the position coordinates of the data to be filled in the training set and the testing set are updated again according to the relative position coordinates (B) in the step 1;

step 5, dividing the data set into a training set and a testing set;

in the step 5, eight percent of the data set is a training set, and twenty percent is a testing set;

step 6, constructing a neural network, and optimizing a transmission path of the neural network; the number of output quantities of all attribute values of the model prediction sample is equal to the number of input quantities; adding the predicted classification result to an output layer of the neural network; when filling a certain missing value, removing the corresponding input data quantity from the training model, wherein the specific network transfer function is as follows:

the transfer formula of the next level of the input layer is as follows (5):

the transfer formula between hidden layers is as follows (6):

In the formulas (5), (6), (7), (8) and (9),

g () represents the relu activation function;

f () represents a sigmoid activation function;

h () represents a softmax activation function;

Y _j an output value representing the network model;

X _il a first attribute representing a first input sample;

b _n biasing the network model;

n represents a total of n neurons in the hidden layer;

d represents the attribute number of the sample;

step 7, performing one-hot coding on the classification attribute in the data set; adding predictive output of the classification attribute in an output layer of the neural network, outputting a set of probability values by using a softmax activation function, and taking the product of one-hot coding of the predictive value and the accurate value as a part of a loss function;

step 8, optimizing a loss function; applying the calculated multiple confidence coefficient to a loss function of the neural network model, and distinguishing the influence degree of each sample on model training through adjusting the confidence coefficient of the sample;

the expression of the loss function is as follows (10):

loss＝∑ _j＝cont R _i (x _ij -y _ij ) ² -∑ _j＝cont R _i z _ij p _ij (10)

in the method, in the process of the invention,

R _i representing the confidence of the ith sample;

cont indicates that the attribute value is a continuous numerical variable;

class represents a classification attribute;

y _ij a predicted value representing the jth attribute in the ith sample;

p _ij a prediction result of a jth classification attribute of the ith sample;

in the step 8, the regression error (x _ij -y _ij ) ² And classification error z _i p _i Combining and multiplying the confidence level of the missing value to be filled of the corresponding sample as a final loss function; the sample training times follow an early stop strategy to obtain the optimal filling effect;

step 9, comparing the filling value with the accurate value in the original data according to the missing value coordinates (A, B) recorded in the step 4, and calculating the continuous value filling percentage error rate

(x _i For accurate value +.>

For a fill value) and classification attribute fillAccuracy of->

(i=1 when filling is correct, i=0 when not correct);

comparative example 4, filling the missing value attribute with a random forest algorithm;

the rest is the same as in example 1 except that a random forest algorithm is used when filling the missing value attribute;

test 5, comparison of the filling results of example 1, comparative example 2, comparative example 3, comparative example 4

Setting evaluation indexes:

according to the data type, dividing the data into continuous data and classified data; when calculating the continuous data filling accuracy, taking average absolute percentage error (MAPE) as an index for evaluating the data filling quality; in evaluating the filling quality of classified data, the filling accuracy thereof is described by the filling Accuracy (ACC); the calculation formulas of the two are as follows:

in the method, in the process of the invention,

a predicted value representing an i-th sample; />

x _i Representing the exact value of the original sample;

n represents the number of samples of the data set, i=1 when the predicted value coincides with the original data value, i=0 when the predicted value does not coincide with the original data value, when the classified data is predicted;

the final results are shown in Table 1 below:

table 1 comparison of four filling methods

As can be obtained, compared with comparative examples 2, 3 and 4, the filling result of the method for repairing the medical data missing values based on multiple confidence coefficients has the highest filling Accuracy (ACC) and the lowest percentage error (MAPE), and the best effect;

in summary, according to the technical scheme, the multi-dimensional data missing values in the data set samples are filled in batches, so that the filling efficiency is improved, the data samples with the multi-dimensional missing values are reasonably added into the training model, the data information in the data set is fully mined, the utilization rate of the data in the original data set is improved, the data filling efficiency is accelerated, and the data filling accuracy is improved;

the foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention without requiring creative effort by one of ordinary skill in the art. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by a person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims

1. A method for repairing missing values of medical data based on multiple confidence levels, comprising the steps of:

step 1, importing a missing data set;

step 2, eliminating dimension influence among sample indexes, and carrying out normalization processing on the data set;

step 3, calculating an association relation matrix among all the attributes by a statistical method;

step 4, updating the weight of each other attribute relative to the target attribute by using the correlation coefficient obtained in the step 3; the target attributes are all attributes in the sample set;

step 5, calculating the confidence coefficient of each sample set; in particular comprising the following steps of the method,

wherein,,

ms represents the missing attribute;

step 7, dividing the data set into a training set and a testing set;

step 8, constructing a neural network, and optimizing a transmission path of the neural network; the number of output quantities of all attribute values of the model prediction sample is equal to the number of input quantities; adding the predicted classification result to an output layer of the neural network; when filling a certain missing value, removing the corresponding input data quantity from the training model;

step 10, optimizing a loss function; applying the calculated multiple confidence coefficient to a loss function of the neural network model, and distinguishing the influence degree of each sample on model training through adjusting the confidence coefficient of the sample; and an early stop strategy is introduced, so that the prediction accuracy is maximized.

2. The method for repairing missing medical data values based on multiple confidence levels according to claim 1, wherein in the step 2,

the normalization formula of the data sample is shown as the following formula (1):

wherein,,

x _ij normalized values for the original data of the ith row and jth column;

x represents data to be normalized in the sample;

x _max representing a maximum value of the data attribute;

x _min representing the minimum value of the data attribute.

3. The method for repairing missing medical data values based on multiple confidence levels according to claim 1, wherein in the step 3,

the correlation coefficient between the features is calculated according to the following formula (2):

wherein,,

ρ _i,j representing the correlation between attribute i and attribute j;

cov (i, j) represents the covariance of attribute i and attribute j;

di and Dj represent the variances of attribute i and attribute j.

4. The method for repairing missing medical data values based on multiple confidence levels according to claim 1, wherein in the step 4,

the specific weight calculation formula is as follows (3):

wherein,,

W _ij representing the weight of attribute j relative to attribute i;

d represents the total number of attributes of the sample.

5. The method for repairing missing medical data values based on multiple confidence levels as set forth in claim 1, wherein in step 8,

the specific network transfer functions are as follows:

the transfer formula of the next level of the input layer is as follows (5):

the transfer formula between hidden layers is as follows (6):

In the formulas (5), (6), (7), (8) and (9),

g () represents the relu activation function;

f () represents a sigmoid activation function;

h () represents a softmax activation function;

Y _j an output value representing the network model;

X _il a first attribute representing a first input sample;

b _n biasing the network model;

n represents a total of n neurons in the hidden layer;

d represents the number of attributes of the sample.

6. The method for repairing missing medical data values based on multiple confidence levels as set forth in claim 1, wherein in step 10,

the expression of the loss function is as follows (10):

in the method, in the process of the invention,

class represents a classification attribute;

x _ij representing the jth attribute in the ith sample; the attribute is a continuous variable;

y _ij a predicted value representing the jth attribute in the ith sample;

z _ij a classification result representing a j-th attribute of the i-th sample; the classification result is represented by a one-hot code;

p _ij representing the predicted outcome of the j-th classification attribute of the i-th sample.

7. The method for repairing missing medical data values based on multiple confidence levels as set forth in claim 1, wherein in step 10,

the early-stopping strategy is that, in the model training process, in order to determine the optimal training times and obtain the best result, the concept is introduced, for example, too few training times can cause under fitting, too many training times can cause over fitting, in order to solve the problem, the early-stopping strategy is introduced, after each epoch is finished, a test result is obtained on a verification set, along with the increase of the epoch to a certain value, the error of the verification set is changed from a descending trend to an ascending trend, at the moment, the training is stopped, and at the moment, the epoch is the optimal training times.