CN114121296A

CN114121296A - Data-driven clinical information rule extraction method, storage medium and device

Info

Publication number: CN114121296A
Application number: CN202111500068.2A
Authority: CN
Inventors: 张少典; 马汉东; 位凯; 朱珉; 薛颜波
Original assignee: Shanghai Synyi Medical Technology Co ltd
Current assignee: Shanghai Synyi Medical Technology Co ltd
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-03-01
Anticipated expiration: 2041-12-09
Also published as: CN114121296B

Abstract

The invention provides a data-driven clinical information rule extraction method, a storage medium and equipment, wherein the data-driven clinical information rule extraction method comprises the following steps: acquiring patient sample data, wherein the patient sample data comprises various clinical characteristics of a patient; generating an initial rule set according to the patient sample data; screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universal rule set; and determining an optimal rule set through the accuracy and the interpretability of each rule in the universal rule set. The invention can mine a series of rules with high confidence and accuracy from clinical information on the premise of ensuring accuracy, thereby effectively obtaining a clear conclusion path and assisting a doctor to make a decision to a certain extent.

Description

Data-driven clinical information rule extraction method, storage medium and device

Technical Field

The invention belongs to the technical field of data mining, relates to a rule extraction method, and particularly relates to a data-driven clinical information rule extraction method, a storage medium and equipment.

Background

Currently, with the development of intelligent medical technology, medical rules play an important role in the processes of risk prediction, clinical diagnosis and the like of diseases, wherein rules with high confidence coefficient in data such as mining clinical diagnosis information, demographic information and the like can assist the decision of doctors to a certain extent.

Most of the existing disease risk and clinical diagnosis rules come from various medical quality tables and machine learning prediction models. (1) The medical scale can quantify clinical information, demographic information, various daily habits and the like of patients, endow different characteristics with different scores, and finally measure the degree of illness, the risk of illness and the like through the form of scoring. However, most of the existing medical scales are made by foreign people, and factors such as race, daily habits, individual difference and the like are often ignored, and have certain influence on the accuracy of scale evaluation. (2) The use of machine learning models can improve prediction and diagnostic accuracy to some extent. However, most existing machine learning models do not directly provide interpretable decision rules.

Therefore, how to provide a data-driven clinical information rule extraction method, a storage medium and a device to solve the defects that the prior art cannot provide a rule extraction scheme with high accuracy and interpretability, and the like, is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the present invention is directed to a data-driven clinical information rule extraction method, a storage medium and a device, which are used to solve the problem that the prior art cannot provide a rule extraction scheme with high accuracy and interpretability.

To achieve the above and other related objects, an aspect of the present invention provides a data-driven clinical information rule extraction method, including: acquiring patient sample data, wherein the patient sample data comprises various clinical characteristics of a patient; generating an initial rule set according to the patient sample data; screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universal rule set; and determining an optimal rule set through the accuracy and the interpretability of each rule in the universal rule set.

In an embodiment of the present invention, the patient sample data is table data without missing values, wherein each row of the table data represents a patient sample, and each column represents a feature of the patient.

In an embodiment of the present invention, the step of generating an initial rule set according to the patient sample data includes: pre-processing the patient sample data; aiming at the preprocessed patient sample data, utilizing a tree model to perform rule extraction on each node in each generated tree; and generating the initial rule set according to the rule extraction result.

In an embodiment of the present invention, the step of screening the initial rule set based on the timing characteristics in the initial rule set to obtain a universal rule set includes: acquiring the time frequency of the regular occurrence on each node by using a time sequence statistical method; and screening out the rule of which the time frequency meets the preset requirement of the user as the universal rule set.

In an embodiment of the invention, the step of determining the optimal rule set according to the accuracy and interpretability of each rule in the universal rule set comprises: aiming at each rule in the universal rule set, determining an optimal solution through a multi-objective optimization algorithm; and determining the combination of all the optimal solutions as the optimal rule set.

In an embodiment of the present invention, the step of determining the optimal solution through the multi-objective optimization algorithm includes: the accuracy and the interpretability of each rule are taken as two optimization targets; randomly initializing a particle swarm for the optimization target; determining a fitness of each particle in the population of particles; updating the speed and the position of the particle according to the fitness; judging whether the maximum iteration times or the global optimal position meets the minimum authority; and if so, determining the pareto optimal solution.

In an embodiment of the invention, after the step of determining the optimal rule set according to the accuracy and interpretability of each rule in the universal rule set, the data-driven clinical information rule extraction method further includes: acquiring prediction data of a user needing to make a clinical decision; all the acquired prediction data form a prediction data set; and comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rules which are met by the predicted data set according to the matching result of the predicted data and the optimal rule set.

In an embodiment of the present invention, the optimal rule set includes a first rule, a second rule and a third rule; the step of comparing the prediction data with the rules in the optimal rule set one by one, and obtaining the rules which the prediction data set accords with according to the matching result of the prediction data and the optimal rule set, comprises: and determining the user illness probability corresponding to the prediction data set in response to the prediction data simultaneously meeting the first rule, the second rule and the third rule, wherein the user illness probability is used for providing auxiliary judgment information for a doctor in the process of disease diagnosis of the doctor.

To achieve the above and other related objects, another aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the data-driven clinical information rule extraction method.

To achieve the above and other related objects, a final aspect of the present invention provides an electronic device, comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory so as to enable the electronic equipment to execute the data-driven clinical information rule extraction method.

As described above, the data-driven clinical information rule extraction method, the storage medium, and the device according to the present invention have the following advantages:

according to the method, an initial rule set is generated according to patient sample data, universal rule screening is further performed according to time sequence characteristics, and an optimal rule set is determined by utilizing the accuracy and the interpretability of each rule. Therefore, the problems of low prediction accuracy of the medical scale and poor solvability of a traditional machine learning model are well solved, and the rule extraction scheme based on data driving can mine a series of rules with high confidence coefficient and high accuracy from clinical information on the premise of ensuring the accuracy. The method can effectively obtain a clear conclusion path and assist a doctor in making a decision to a certain extent.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a data-driven clinical information rule extraction method according to an embodiment of the present invention.

FIG. 2 is a flow chart of the optimal rule set determination in an embodiment of the data-driven clinical information rule extraction method according to the present invention.

FIG. 3 is a flowchart illustrating the calculation of an optimal solution for the data-driven-based clinical information rule extraction method according to an embodiment of the present invention.

FIG. 4 is a flowchart illustrating predictive data matching in an embodiment of a data-driven clinical information rule extraction method according to the present invention.

Fig. 5 is a schematic structural connection diagram of an electronic device according to an embodiment of the invention.

Description of the element reference numerals

5 electronic device

51 processor

52 memory

S11-S16

S141 to S142

S141A-S141F steps

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

The data-driven clinical information rule extraction method, the storage medium and the equipment can mine a series of rules with high confidence coefficient and accuracy from clinical information on the premise of ensuring the accuracy, so that a clear conclusion path can be effectively obtained, and a doctor is assisted in making a decision to a certain extent.

The principle and implementation of a data-driven clinical information rule extraction method, a storage medium and a device according to the present embodiment will be described in detail below with reference to fig. 1 to 5, so that those skilled in the art can understand the data-driven clinical information rule extraction method, the storage medium and the device according to the present embodiment without creative work.

Referring to fig. 1, a schematic flow chart of a data-driven clinical information rule extraction method according to an embodiment of the invention is shown. As shown in fig. 1, the data-driven clinical information rule extraction method specifically includes the following steps:

s11, obtaining patient sample data including various clinical characteristics of the patient.

In practical applications, taking pulmonary artery embolism as an example, laboratory examination data of a batch of patients with outcome variables is taken out by a hospital-related department as patient sample data.

And S12, generating an initial rule set according to the patient sample data.

In one embodiment, S12 specifically includes the following steps:

(1) pre-processing the patient sample data.

Specifically, the preprocessing includes existing preprocessing means such as data cleaning, data merging, data transformation, and data normalization, so as to improve the availability of patient sample data.

(2) And aiming at the preprocessed patient sample data, performing rule extraction on each node in each generated tree by using a tree model.

Specifically, the Tree model may be any robust model such as a Decision Tree, a random forest, a GBDT (Gradient Boosting Decision Tree), and an Xgboost.

In practical application, a random forest algorithm is used for extracting a rule from each node in each generated tree. The random forest is a stable integrated learning model, a bag packing thought is adopted, a plurality of training sets are generated by a bootstrap method, a decision tree is constructed for each training set, and finally classification results of a plurality of decision tree-based classifiers are combined to obtain a relatively better prediction model.

Specifically, given a dataset D, a feature vector X and a corresponding label y, let D be (Xi, yi), i be 1,2, …, n. Then Xi e X, Xi (Xi1, Xi2, …, Xim), m is the number of features, yi e y {0,1, … }. Gini (D) is defined as the measure of the purity of D and can be expressed as follows:

p in formula 1_k(K-1, 2, …, K) represents the property of the kth class sample in the current dataset. k' represents other categories than the k category. The smaller Gini (D), the higher the purity of data set D. Assuming that the feature m has V possible values { m1, m2, …, mv }, dividing the data set D by using the feature m to generate V different branch nodes, wherein the V-th branch is marked as Dv, and Gini is defined_index(,)To represent the uncertainty of feature m in D, it can be expressed as:

for the training set D, the learning algorithm for constructing the decision tree can be represented as a mapping from X to y, and the data set D is circularly divided into a plurality of subsets by using the characteristic of the lowest divided kini index to form a tree. The selected features m are represented as:

then, the classification result is obtained by integrating the weighted outputs of all decision trees:

in equation 4, ω_hRepresenting the weight of the h-th tree, a sample can be classified according to the following formula:

in equation 5, S represents the number of trees.

(3) And generating the initial rule set according to the rule extraction result.

Specifically, the initial rule set obtaining mode is as follows: the random forest algorithm obtains the rule condition corresponding to the characteristics of the nodes in each path and the conclusion corresponding to the rules of the categories of the leaf nodes by traversing the path from the root node to each leaf node in each decision tree.

In practical applications, the type of tree model output is determined by the individual tree output when performing disease prediction or medical diagnosis tasks. Since the tree model is a "white-box model" that provides a clear path for each conclusion, the rules for all nodes on each tree in the tree model are output as the initial rule set.

S13, based on the time sequence characteristics in the initial rule set, screening the initial rule set to obtain a universal rule set. Therefore, through the screening of indexes such as time frequency and the like of the occurrence of the analysis rule, the phenomenon that some black swans are not provided with universal rules corresponding to the events can be effectively avoided.

In one embodiment, S13 specifically includes the following steps:

(1) and acquiring the time frequency of the regular occurrence on each node by using a time sequence statistical method.

Specifically, the timing statistic method may be a timing statistic function or other embodiments that can implement a timing statistic function.

In practical application, for the statistical analysis process of time series data in a rule, a python-based pandas package is used to implement a grouping and aggregation function on samples on each node according to time frequency, such as: and counting information with time frequency attribute, such as the number of days, the number of weeks, the number of months, the number of years or the starting and ending time of the appearance of the sample on the node.

(2) And screening out the rule of which the time frequency meets the preset requirement of the user as the universal rule set.

Specifically, for example, if the user preset requirement is 1 year, if a certain patient sample data appears within 2 weeks, the rule extracted corresponding to the patient sample data does not have universality, and if a certain patient sample data appears within 2 years, the rule extracted corresponding to the patient sample data has universality.

And S14, determining an optimal rule set according to the accuracy and the interpretability of each rule in the universal rule set.

Referring to fig. 2, a flow chart of determining an optimal rule set according to an embodiment of the data-driven clinical information rule extraction method of the present invention is shown. As shown in fig. 2, S14 specifically includes the following steps:

and S141, aiming at each rule in the universal rule set, determining an optimal solution through a multi-objective optimization algorithm. Wherein the multi-objective optimization algorithm is used to balance the accuracy and interpretability of the rules.

Specifically, the multi-objective optimization algorithm may be any algorithm capable of realizing optimization analysis of two or more objectives, such as a multi-objective particle swarm algorithm, a non-dominated sorting genetic algorithm, a multi-objective evolutionary algorithm, and the like.

Referring to fig. 3, a flowchart of an optimal solution calculation of the data-driven-based clinical information rule extraction method according to an embodiment of the invention is shown. As shown in fig. 3, S141 specifically includes the following steps:

S141A, with accuracy and interpretability of each rule as two optimization objectives.

In order to ensure the accuracy of the rule sets, the accuracy of each rule set, namely the ratio of the data sets which are correctly predicted, is calculated. Rule accuracy is defined as follows:

in equation 6, QACC represents the accuracy of the rule set, Q represents the number of samples, and xi represents the ith sample. To measure the interpretability of a rule, we define it as:

in formula 7, Q_FEA、Q_COV、Q_CNTRespectively representing the complexity of the rule, the convergence of the rule and the quality of the rule. Alpha, beta and gamma are the weights of the three, and they can be set according to the actual situation. Specifically, Q_FEAFor determining the number of features per rule, if the rule relates to a smaller number of average features, its Q_CNTThe larger the value. Q_COVFor indicating the coverage of each rule, when the rule has strong applicability, its Q_COVAnd is larger. Q_CNTFor measuring the quality of the rules. They are defined as:

in the formula 8, the first and second groups of the compound,

representing the valid features in the ith rule, in equation 9,

representing the number of samples that match the ith rule. In equation 10, rule_selectedRepresenting the number of rules derived from the algorithm. Z is the number of generation candidate rules. When Q is_FEAWhen 1 represents only one feature of the rule, Q_FEAWhen 0, the representation rule contains all the features. Namely Q_FEAThe smaller the rule, the easier the physician can understand at the time of diagnosis.

S141B, randomly initializing a particle swarm according to the optimization target.

In the invention, the solution in the optimization problem is taken as 'particles', all the particles are searched in an N-dimensional space, and each particle has only two attributes: position and speed, speed representing how fast the movement is, position representing the direction of movement. The current position of the particle is a candidate solution of the corresponding optimization problem, and the flight process of the particle is the search process of the individual.

S141C, determining the fitness of each particle in the particle swarm.

Specifically, a fitness function capable of determining an individual optimal solution of each particle is defined, and a global optimal value is found from the individual optimal solutions.

And S141D, updating the speed and the position of the particles according to the fitness.

Specifically, the flight speed of the particles can be dynamically adjusted according to the historical optimal positions of the particles and the historical optimal positions of the population. And updating the speed and the position of the particle according to the fitness.

And S141E, judging whether the maximum iteration number is reached or the global optimal position meets the minimum authority.

The optimal solution searched by each particle independently is called an individual extremum, and the optimal individual extremum in the particle swarm is used as the current global optimal solution. And continuously iterating, and updating the speed and the position. And finally obtaining the optimal solution meeting the termination condition. If the maximum iteration number is not reached or the global optimal position does not satisfy the minimum authority, the process returns to step S141C.

And S141F, if yes, determining the pareto optimal solution.

And determining the pareto optimal solution in the final population by using a fast non-dominated sorting method for the particles which reach the maximum iteration number or the global optimal position meets the minimum authority.

And S142, determining the combination of all the optimal solutions as the optimal rule set.

Specifically, for pulmonary artery embolism, the optimal rule set is: "1 month _ varicose vein of lower limb _ diagnosis _ any >0.5, 10000 days _ sex _ visit _ count ═ 1.5,10000 days _ age _ visit _ last ═ 26373.0".

When "1 month _ varicose vein of lower limb _ diagnosis _ any >0.5, 10000 days _ sex _ visit _ count < (1.5,10000 days _ age _ visit _ last < (26373.0)" are satisfied, the probability that the patient suffers from VTE is determined to be 90% or more.

Referring to fig. 4, a flow chart of predictive data matching in an embodiment of the data-driven-based clinical information rule extraction method of the invention is shown. As shown in fig. 4, after the step, the data-driven clinical information rule extraction-based method further includes the steps of:

s15, acquiring the prediction data of the user needing to make clinical decision; all acquired prediction data constitutes a prediction data set.

And S16, comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rules which the predicted data set accords with according to the matching result of the predicted data and the optimal rule set.

In one embodiment, the optimal rule set includes a first rule, a second rule, and a third rule.

And determining the user illness probability corresponding to the prediction data set in response to the prediction data simultaneously meeting the first rule, the second rule and the third rule, wherein the user illness probability is used for providing auxiliary judgment information for a doctor in the process of disease diagnosis of the doctor.

Specifically, for pulmonary artery embolism, the optimal rule set is: "1 month _ varicose vein of lower limb _ diagnosis _ any >0.5, 10000 days _ sex _ visit _ count ═ 1.5, and 10000 days _ age _ visit _ last ≦ 26373.0". The first rule is 1 month _ varicose vein _ diagnose _ any >0.5 of lower limb, the second rule is 10000 days _ sex _ visit _ count < (1.5), and the third rule is 10000 days _ age _ visit _ last < (26373.0). When the corresponding prediction data of a certain patient simultaneously satisfy three rules, the analyzed probability that the patient has pulmonary artery embolism is more than 90%, and after the doctor knows the information that the probability that the patient has pulmonary artery embolism is more than 90%, the doctor can diagnose the patient according to the information.

The effect comparison analysis of the invention and the existing machine learning model is as follows: the existing machine learning model takes a risk ratio regression model as an example, and simultaneously evaluates the influence of various factors on the risk or diagnosis result of the disease, and obtains a function which can be predicted and diagnosed by weighting the factors and carrying out nonlinear mapping. Taking the probability that chronic kidney disease is predicted to develop renal failure within five years as an example, the following risk ratio regression model can be obtained:

the function can obtain a more accurate prediction result, but rules obtained by weighting or nonlinear operation of factors such as GFR (Glomerular Filtration Rate), ACR (Autologous cell regeneration), AGE (Advanced Glycation End products) and the like have no interpretability, and a series of rules with high confidence and accuracy are mined from clinical information by a multi-objective optimization algorithm on the premise of ensuring the accuracy.

The protection scope of the data-driven-based clinical information rule extraction method according to the present invention is not limited to the execution sequence of the steps listed in this embodiment, and all the schemes of adding, subtracting, and replacing steps in the prior art according to the principles of the present invention are included in the protection scope of the present invention.

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data-driven clinical information rule extraction-based method.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned computer-readable storage media comprise: various computer storage media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Please refer to fig. 5, which is a schematic structural connection diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the present embodiment provides an electronic device 5, which specifically includes: a processor 51 and a memory 52; the memory 52 is used for storing computer programs, and the processor 51 is used for executing the computer programs stored in the memory 52 to make the electronic device 5 execute the steps of the data-driven clinical information rule extraction method.

The Processor 51 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component.

The Memory 52 may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

In practice, the electronic device may be a computer including all or some of the components of memory, memory controller, one or more processing units (CPUs), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports; the computer includes, but is not limited to, Personal computers such as desktop computers, notebook computers, tablet computers, smart phones, Personal Digital Assistants (PDAs), and the like. In other embodiments, the electronic device may also be a server, where the server may be arranged on one or more entity servers according to various factors such as functions and loads, or may be a cloud server formed by a distributed or centralized server cluster, which is not limited in this embodiment.

In summary, the data-driven clinical information rule extraction method, the storage medium and the device of the present invention generate an initial rule set according to patient sample data, further perform universal rule screening according to timing characteristics, and determine an optimal rule set by using the accuracy and interpretability of each rule. Therefore, the problems of low prediction accuracy of the medical scale and poor solvability of a traditional machine learning model are well solved, and the rule extraction scheme based on data driving can mine a series of rules with high confidence coefficient and high accuracy from clinical information on the premise of ensuring the accuracy. The method can effectively obtain a clear conclusion path and assist a doctor in making a decision to a certain extent. The invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A data-driven clinical information rule extraction method is characterized by comprising the following steps:

acquiring patient sample data, wherein the patient sample data comprises various clinical characteristics of a patient;

generating an initial rule set according to the patient sample data;

screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universal rule set;

and determining an optimal rule set through the accuracy and the interpretability of each rule in the universal rule set.

2. The data-driven-based clinical information rule extraction method according to claim 1, wherein:

the patient sample data is tabular data without missing values, wherein each row of the tabular data represents a patient sample, and each column represents a characteristic of the patient.

3. The data-driven clinical information rule extraction-based method of claim 1, wherein the step of generating an initial rule set from the patient sample data comprises:

pre-processing the patient sample data;

aiming at the preprocessed patient sample data, utilizing a tree model to perform rule extraction on each node in each generated tree;

and generating the initial rule set according to the rule extraction result.

4. The method according to claim 3, wherein the step of screening the initial rule set based on the time-series characteristics of the initial rule set to obtain a universal rule set comprises:

acquiring the time frequency of the regular occurrence on each node by using a time sequence statistical method;

and screening out the rule of which the time frequency meets the preset requirement of the user as the universal rule set.

5. The method of claim 1, wherein the step of determining the optimal rule set according to the accuracy and interpretability of each rule in the universal rule set comprises:

aiming at each rule in the universal rule set, determining an optimal solution through a multi-objective optimization algorithm;

and determining the combination of all the optimal solutions as the optimal rule set.

6. The data-driven-based clinical information rule extraction method of claim 5, wherein the step of determining an optimal solution through a multi-objective optimization algorithm comprises:

the accuracy and the interpretability of each rule are taken as two optimization targets;

randomly initializing a particle swarm for the optimization target;

determining a fitness of each particle in the population of particles;

updating the speed and the position of the particle according to the fitness;

judging whether the maximum iteration times or the global optimal position meets the minimum authority;

and if so, determining the pareto optimal solution.

7. The data-driven-based clinical information rule extraction method of claim 1, wherein after the step of determining the optimal rule set by the accuracy and interpretability of each rule in the universal rule set, the data-driven-based clinical information rule extraction method further comprises:

acquiring prediction data of a user needing to make a clinical decision; all the acquired prediction data form a prediction data set;

and comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rules which are met by the predicted data set according to the matching result of the predicted data and the optimal rule set.

8. The data-driven clinical information rule extraction-based method according to claim 7, wherein the optimal rule set includes a first rule, a second rule, and a third rule; the step of comparing the prediction data with the rules in the optimal rule set one by one, and obtaining the rules which the prediction data set accords with according to the matching result of the prediction data and the optimal rule set, comprises:

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the data-driven clinical information rule extraction-based method according to any one of claims 1 to 8.

10. An electronic device, comprising: a processor and a memory;

the memory is configured to store a computer program, and the processor is configured to execute the computer program stored by the memory to cause the electronic device to perform the data-driven clinical information rule extraction method according to any one of claims 1 to 8.