CN114121296A - Data-driven clinical information rule extraction method, storage medium and device - Google Patents

Data-driven clinical information rule extraction method, storage medium and device Download PDF

Info

Publication number
CN114121296A
CN114121296A CN202111500068.2A CN202111500068A CN114121296A CN 114121296 A CN114121296 A CN 114121296A CN 202111500068 A CN202111500068 A CN 202111500068A CN 114121296 A CN114121296 A CN 114121296A
Authority
CN
China
Prior art keywords
rule
data
rule set
optimal
clinical information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111500068.2A
Other languages
Chinese (zh)
Other versions
CN114121296B (en
Inventor
张少典
马汉东
位凯
朱珉
薛颜波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Synyi Medical Technology Co ltd
Original Assignee
Shanghai Synyi Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Synyi Medical Technology Co ltd filed Critical Shanghai Synyi Medical Technology Co ltd
Priority to CN202111500068.2A priority Critical patent/CN114121296B/en
Publication of CN114121296A publication Critical patent/CN114121296A/en
Application granted granted Critical
Publication of CN114121296B publication Critical patent/CN114121296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a data-driven clinical information rule extraction method, a storage medium and equipment, wherein the data-driven clinical information rule extraction method comprises the following steps: acquiring patient sample data, wherein the patient sample data comprises various clinical characteristics of a patient; generating an initial rule set according to the patient sample data; screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universal rule set; and determining an optimal rule set through the accuracy and the interpretability of each rule in the universal rule set. The invention can mine a series of rules with high confidence and accuracy from clinical information on the premise of ensuring accuracy, thereby effectively obtaining a clear conclusion path and assisting a doctor to make a decision to a certain extent.

Description

Data-driven clinical information rule extraction method, storage medium and device
Technical Field
The invention belongs to the technical field of data mining, relates to a rule extraction method, and particularly relates to a data-driven clinical information rule extraction method, a storage medium and equipment.
Background
Currently, with the development of intelligent medical technology, medical rules play an important role in the processes of risk prediction, clinical diagnosis and the like of diseases, wherein rules with high confidence coefficient in data such as mining clinical diagnosis information, demographic information and the like can assist the decision of doctors to a certain extent.
Most of the existing disease risk and clinical diagnosis rules come from various medical quality tables and machine learning prediction models. (1) The medical scale can quantify clinical information, demographic information, various daily habits and the like of patients, endow different characteristics with different scores, and finally measure the degree of illness, the risk of illness and the like through the form of scoring. However, most of the existing medical scales are made by foreign people, and factors such as race, daily habits, individual difference and the like are often ignored, and have certain influence on the accuracy of scale evaluation. (2) The use of machine learning models can improve prediction and diagnostic accuracy to some extent. However, most existing machine learning models do not directly provide interpretable decision rules.
Therefore, how to provide a data-driven clinical information rule extraction method, a storage medium and a device to solve the defects that the prior art cannot provide a rule extraction scheme with high accuracy and interpretability, and the like, is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention is directed to a data-driven clinical information rule extraction method, a storage medium and a device, which are used to solve the problem that the prior art cannot provide a rule extraction scheme with high accuracy and interpretability.
To achieve the above and other related objects, an aspect of the present invention provides a data-driven clinical information rule extraction method, including: acquiring patient sample data, wherein the patient sample data comprises various clinical characteristics of a patient; generating an initial rule set according to the patient sample data; screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universal rule set; and determining an optimal rule set through the accuracy and the interpretability of each rule in the universal rule set.
In an embodiment of the present invention, the patient sample data is table data without missing values, wherein each row of the table data represents a patient sample, and each column represents a feature of the patient.
In an embodiment of the present invention, the step of generating an initial rule set according to the patient sample data includes: pre-processing the patient sample data; aiming at the preprocessed patient sample data, utilizing a tree model to perform rule extraction on each node in each generated tree; and generating the initial rule set according to the rule extraction result.
In an embodiment of the present invention, the step of screening the initial rule set based on the timing characteristics in the initial rule set to obtain a universal rule set includes: acquiring the time frequency of the regular occurrence on each node by using a time sequence statistical method; and screening out the rule of which the time frequency meets the preset requirement of the user as the universal rule set.
In an embodiment of the invention, the step of determining the optimal rule set according to the accuracy and interpretability of each rule in the universal rule set comprises: aiming at each rule in the universal rule set, determining an optimal solution through a multi-objective optimization algorithm; and determining the combination of all the optimal solutions as the optimal rule set.
In an embodiment of the present invention, the step of determining the optimal solution through the multi-objective optimization algorithm includes: the accuracy and the interpretability of each rule are taken as two optimization targets; randomly initializing a particle swarm for the optimization target; determining a fitness of each particle in the population of particles; updating the speed and the position of the particle according to the fitness; judging whether the maximum iteration times or the global optimal position meets the minimum authority; and if so, determining the pareto optimal solution.
In an embodiment of the invention, after the step of determining the optimal rule set according to the accuracy and interpretability of each rule in the universal rule set, the data-driven clinical information rule extraction method further includes: acquiring prediction data of a user needing to make a clinical decision; all the acquired prediction data form a prediction data set; and comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rules which are met by the predicted data set according to the matching result of the predicted data and the optimal rule set.
In an embodiment of the present invention, the optimal rule set includes a first rule, a second rule and a third rule; the step of comparing the prediction data with the rules in the optimal rule set one by one, and obtaining the rules which the prediction data set accords with according to the matching result of the prediction data and the optimal rule set, comprises: and determining the user illness probability corresponding to the prediction data set in response to the prediction data simultaneously meeting the first rule, the second rule and the third rule, wherein the user illness probability is used for providing auxiliary judgment information for a doctor in the process of disease diagnosis of the doctor.
To achieve the above and other related objects, another aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the data-driven clinical information rule extraction method.
To achieve the above and other related objects, a final aspect of the present invention provides an electronic device, comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory so as to enable the electronic equipment to execute the data-driven clinical information rule extraction method.
As described above, the data-driven clinical information rule extraction method, the storage medium, and the device according to the present invention have the following advantages:
according to the method, an initial rule set is generated according to patient sample data, universal rule screening is further performed according to time sequence characteristics, and an optimal rule set is determined by utilizing the accuracy and the interpretability of each rule. Therefore, the problems of low prediction accuracy of the medical scale and poor solvability of a traditional machine learning model are well solved, and the rule extraction scheme based on data driving can mine a series of rules with high confidence coefficient and high accuracy from clinical information on the premise of ensuring the accuracy. The method can effectively obtain a clear conclusion path and assist a doctor in making a decision to a certain extent.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a data-driven clinical information rule extraction method according to an embodiment of the present invention.
FIG. 2 is a flow chart of the optimal rule set determination in an embodiment of the data-driven clinical information rule extraction method according to the present invention.
FIG. 3 is a flowchart illustrating the calculation of an optimal solution for the data-driven-based clinical information rule extraction method according to an embodiment of the present invention.
FIG. 4 is a flowchart illustrating predictive data matching in an embodiment of a data-driven clinical information rule extraction method according to the present invention.
Fig. 5 is a schematic structural connection diagram of an electronic device according to an embodiment of the invention.
Description of the element reference numerals
5 electronic device
51 processor
52 memory
S11-S16
S141 to S142
S141A-S141F steps
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
The data-driven clinical information rule extraction method, the storage medium and the equipment can mine a series of rules with high confidence coefficient and accuracy from clinical information on the premise of ensuring the accuracy, so that a clear conclusion path can be effectively obtained, and a doctor is assisted in making a decision to a certain extent.
The principle and implementation of a data-driven clinical information rule extraction method, a storage medium and a device according to the present embodiment will be described in detail below with reference to fig. 1 to 5, so that those skilled in the art can understand the data-driven clinical information rule extraction method, the storage medium and the device according to the present embodiment without creative work.
Referring to fig. 1, a schematic flow chart of a data-driven clinical information rule extraction method according to an embodiment of the invention is shown. As shown in fig. 1, the data-driven clinical information rule extraction method specifically includes the following steps:
s11, obtaining patient sample data including various clinical characteristics of the patient.
In an embodiment of the present invention, the patient sample data is table data without missing values, wherein each row of the table data represents a patient sample, and each column represents a feature of the patient.
In practical applications, taking pulmonary artery embolism as an example, laboratory examination data of a batch of patients with outcome variables is taken out by a hospital-related department as patient sample data.
And S12, generating an initial rule set according to the patient sample data.
In one embodiment, S12 specifically includes the following steps:
(1) pre-processing the patient sample data.
Specifically, the preprocessing includes existing preprocessing means such as data cleaning, data merging, data transformation, and data normalization, so as to improve the availability of patient sample data.
(2) And aiming at the preprocessed patient sample data, performing rule extraction on each node in each generated tree by using a tree model.
Specifically, the Tree model may be any robust model such as a Decision Tree, a random forest, a GBDT (Gradient Boosting Decision Tree), and an Xgboost.
In practical application, a random forest algorithm is used for extracting a rule from each node in each generated tree. The random forest is a stable integrated learning model, a bag packing thought is adopted, a plurality of training sets are generated by a bootstrap method, a decision tree is constructed for each training set, and finally classification results of a plurality of decision tree-based classifiers are combined to obtain a relatively better prediction model.
Specifically, given a dataset D, a feature vector X and a corresponding label y, let D be (Xi, yi), i be 1,2, …, n. Then Xi e X, Xi (Xi1, Xi2, …, Xim), m is the number of features, yi e y {0,1, … }. Gini (D) is defined as the measure of the purity of D and can be expressed as follows:
Figure BDA0003402352610000051
p in formula 1k(K-1, 2, …, K) represents the property of the kth class sample in the current dataset. k' represents other categories than the k category. The smaller Gini (D), the higher the purity of data set D. Assuming that the feature m has V possible values { m1, m2, …, mv }, dividing the data set D by using the feature m to generate V different branch nodes, wherein the V-th branch is marked as Dv, and Gini is definedindex(,)To represent the uncertainty of feature m in D, it can be expressed as:
Figure BDA0003402352610000052
for the training set D, the learning algorithm for constructing the decision tree can be represented as a mapping from X to y, and the data set D is circularly divided into a plurality of subsets by using the characteristic of the lowest divided kini index to form a tree. The selected features m are represented as:
Figure BDA0003402352610000053
then, the classification result is obtained by integrating the weighted outputs of all decision trees:
Figure BDA0003402352610000054
in equation 4, ωhRepresenting the weight of the h-th tree, a sample can be classified according to the following formula:
Figure BDA0003402352610000055
in equation 5, S represents the number of trees.
(3) And generating the initial rule set according to the rule extraction result.
Specifically, the initial rule set obtaining mode is as follows: the random forest algorithm obtains the rule condition corresponding to the characteristics of the nodes in each path and the conclusion corresponding to the rules of the categories of the leaf nodes by traversing the path from the root node to each leaf node in each decision tree.
In practical applications, the type of tree model output is determined by the individual tree output when performing disease prediction or medical diagnosis tasks. Since the tree model is a "white-box model" that provides a clear path for each conclusion, the rules for all nodes on each tree in the tree model are output as the initial rule set.
S13, based on the time sequence characteristics in the initial rule set, screening the initial rule set to obtain a universal rule set. Therefore, through the screening of indexes such as time frequency and the like of the occurrence of the analysis rule, the phenomenon that some black swans are not provided with universal rules corresponding to the events can be effectively avoided.
In one embodiment, S13 specifically includes the following steps:
(1) and acquiring the time frequency of the regular occurrence on each node by using a time sequence statistical method.
Specifically, the timing statistic method may be a timing statistic function or other embodiments that can implement a timing statistic function.
In practical application, for the statistical analysis process of time series data in a rule, a python-based pandas package is used to implement a grouping and aggregation function on samples on each node according to time frequency, such as: and counting information with time frequency attribute, such as the number of days, the number of weeks, the number of months, the number of years or the starting and ending time of the appearance of the sample on the node.
(2) And screening out the rule of which the time frequency meets the preset requirement of the user as the universal rule set.
Specifically, for example, if the user preset requirement is 1 year, if a certain patient sample data appears within 2 weeks, the rule extracted corresponding to the patient sample data does not have universality, and if a certain patient sample data appears within 2 years, the rule extracted corresponding to the patient sample data has universality.
And S14, determining an optimal rule set according to the accuracy and the interpretability of each rule in the universal rule set.
Referring to fig. 2, a flow chart of determining an optimal rule set according to an embodiment of the data-driven clinical information rule extraction method of the present invention is shown. As shown in fig. 2, S14 specifically includes the following steps:
and S141, aiming at each rule in the universal rule set, determining an optimal solution through a multi-objective optimization algorithm. Wherein the multi-objective optimization algorithm is used to balance the accuracy and interpretability of the rules.
Specifically, the multi-objective optimization algorithm may be any algorithm capable of realizing optimization analysis of two or more objectives, such as a multi-objective particle swarm algorithm, a non-dominated sorting genetic algorithm, a multi-objective evolutionary algorithm, and the like.
Referring to fig. 3, a flowchart of an optimal solution calculation of the data-driven-based clinical information rule extraction method according to an embodiment of the invention is shown. As shown in fig. 3, S141 specifically includes the following steps:
S141A, with accuracy and interpretability of each rule as two optimization objectives.
In order to ensure the accuracy of the rule sets, the accuracy of each rule set, namely the ratio of the data sets which are correctly predicted, is calculated. Rule accuracy is defined as follows:
Figure BDA0003402352610000061
in equation 6, QACC represents the accuracy of the rule set, Q represents the number of samples, and xi represents the ith sample. To measure the interpretability of a rule, we define it as:
Figure BDA0003402352610000071
in formula 7, QFEA、QCOV、QCNTRespectively representing the complexity of the rule, the convergence of the rule and the quality of the rule. Alpha, beta and gamma are the weights of the three, and they can be set according to the actual situation. Specifically, QFEAFor determining the number of features per rule, if the rule relates to a smaller number of average features, its QCNTThe larger the value. QCOVFor indicating the coverage of each rule, when the rule has strong applicability, its QCOVAnd is larger. QCNTFor measuring the quality of the rules. They are defined as:
Figure BDA0003402352610000072
Figure BDA0003402352610000073
Figure BDA0003402352610000074
in the formula 8, the first and second groups of the compound,
Figure BDA0003402352610000075
representing the valid features in the ith rule, in equation 9,
Figure BDA0003402352610000076
representing the number of samples that match the ith rule. In equation 10, ruleselectedRepresenting the number of rules derived from the algorithm. Z is the number of generation candidate rules. When Q isFEAWhen 1 represents only one feature of the rule, QFEAWhen 0, the representation rule contains all the features. Namely QFEAThe smaller the rule, the easier the physician can understand at the time of diagnosis.
S141B, randomly initializing a particle swarm according to the optimization target.
In the invention, the solution in the optimization problem is taken as 'particles', all the particles are searched in an N-dimensional space, and each particle has only two attributes: position and speed, speed representing how fast the movement is, position representing the direction of movement. The current position of the particle is a candidate solution of the corresponding optimization problem, and the flight process of the particle is the search process of the individual.
S141C, determining the fitness of each particle in the particle swarm.
Specifically, a fitness function capable of determining an individual optimal solution of each particle is defined, and a global optimal value is found from the individual optimal solutions.
And S141D, updating the speed and the position of the particles according to the fitness.
Specifically, the flight speed of the particles can be dynamically adjusted according to the historical optimal positions of the particles and the historical optimal positions of the population. And updating the speed and the position of the particle according to the fitness.
And S141E, judging whether the maximum iteration number is reached or the global optimal position meets the minimum authority.
The optimal solution searched by each particle independently is called an individual extremum, and the optimal individual extremum in the particle swarm is used as the current global optimal solution. And continuously iterating, and updating the speed and the position. And finally obtaining the optimal solution meeting the termination condition. If the maximum iteration number is not reached or the global optimal position does not satisfy the minimum authority, the process returns to step S141C.
And S141F, if yes, determining the pareto optimal solution.
And determining the pareto optimal solution in the final population by using a fast non-dominated sorting method for the particles which reach the maximum iteration number or the global optimal position meets the minimum authority.
And S142, determining the combination of all the optimal solutions as the optimal rule set.
Specifically, for pulmonary artery embolism, the optimal rule set is: "1 month _ varicose vein of lower limb _ diagnosis _ any >0.5, 10000 days _ sex _ visit _ count ═ 1.5,10000 days _ age _ visit _ last ═ 26373.0".
When "1 month _ varicose vein of lower limb _ diagnosis _ any >0.5, 10000 days _ sex _ visit _ count < (1.5,10000 days _ age _ visit _ last < (26373.0)" are satisfied, the probability that the patient suffers from VTE is determined to be 90% or more.
Referring to fig. 4, a flow chart of predictive data matching in an embodiment of the data-driven-based clinical information rule extraction method of the invention is shown. As shown in fig. 4, after the step, the data-driven clinical information rule extraction-based method further includes the steps of:
s15, acquiring the prediction data of the user needing to make clinical decision; all acquired prediction data constitutes a prediction data set.
And S16, comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rules which the predicted data set accords with according to the matching result of the predicted data and the optimal rule set.
In one embodiment, the optimal rule set includes a first rule, a second rule, and a third rule.
And determining the user illness probability corresponding to the prediction data set in response to the prediction data simultaneously meeting the first rule, the second rule and the third rule, wherein the user illness probability is used for providing auxiliary judgment information for a doctor in the process of disease diagnosis of the doctor.
Specifically, for pulmonary artery embolism, the optimal rule set is: "1 month _ varicose vein of lower limb _ diagnosis _ any >0.5, 10000 days _ sex _ visit _ count ═ 1.5, and 10000 days _ age _ visit _ last ≦ 26373.0". The first rule is 1 month _ varicose vein _ diagnose _ any >0.5 of lower limb, the second rule is 10000 days _ sex _ visit _ count < (1.5), and the third rule is 10000 days _ age _ visit _ last < (26373.0). When the corresponding prediction data of a certain patient simultaneously satisfy three rules, the analyzed probability that the patient has pulmonary artery embolism is more than 90%, and after the doctor knows the information that the probability that the patient has pulmonary artery embolism is more than 90%, the doctor can diagnose the patient according to the information.
The effect comparison analysis of the invention and the existing machine learning model is as follows: the existing machine learning model takes a risk ratio regression model as an example, and simultaneously evaluates the influence of various factors on the risk or diagnosis result of the disease, and obtains a function which can be predicted and diagnosed by weighting the factors and carrying out nonlinear mapping. Taking the probability that chronic kidney disease is predicted to develop renal failure within five years as an example, the following risk ratio regression model can be obtained:
Figure BDA0003402352610000091
the function can obtain a more accurate prediction result, but rules obtained by weighting or nonlinear operation of factors such as GFR (Glomerular Filtration Rate), ACR (Autologous cell regeneration), AGE (Advanced Glycation End products) and the like have no interpretability, and a series of rules with high confidence and accuracy are mined from clinical information by a multi-objective optimization algorithm on the premise of ensuring the accuracy.
The protection scope of the data-driven-based clinical information rule extraction method according to the present invention is not limited to the execution sequence of the steps listed in this embodiment, and all the schemes of adding, subtracting, and replacing steps in the prior art according to the principles of the present invention are included in the protection scope of the present invention.
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data-driven clinical information rule extraction-based method.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned computer-readable storage media comprise: various computer storage media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Please refer to fig. 5, which is a schematic structural connection diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the present embodiment provides an electronic device 5, which specifically includes: a processor 51 and a memory 52; the memory 52 is used for storing computer programs, and the processor 51 is used for executing the computer programs stored in the memory 52 to make the electronic device 5 execute the steps of the data-driven clinical information rule extraction method.
The Processor 51 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component.
The Memory 52 may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
In practice, the electronic device may be a computer including all or some of the components of memory, memory controller, one or more processing units (CPUs), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports; the computer includes, but is not limited to, Personal computers such as desktop computers, notebook computers, tablet computers, smart phones, Personal Digital Assistants (PDAs), and the like. In other embodiments, the electronic device may also be a server, where the server may be arranged on one or more entity servers according to various factors such as functions and loads, or may be a cloud server formed by a distributed or centralized server cluster, which is not limited in this embodiment.
In summary, the data-driven clinical information rule extraction method, the storage medium and the device of the present invention generate an initial rule set according to patient sample data, further perform universal rule screening according to timing characteristics, and determine an optimal rule set by using the accuracy and interpretability of each rule. Therefore, the problems of low prediction accuracy of the medical scale and poor solvability of a traditional machine learning model are well solved, and the rule extraction scheme based on data driving can mine a series of rules with high confidence coefficient and high accuracy from clinical information on the premise of ensuring the accuracy. The method can effectively obtain a clear conclusion path and assist a doctor in making a decision to a certain extent. The invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A data-driven clinical information rule extraction method is characterized by comprising the following steps:
acquiring patient sample data, wherein the patient sample data comprises various clinical characteristics of a patient;
generating an initial rule set according to the patient sample data;
screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universal rule set;
and determining an optimal rule set through the accuracy and the interpretability of each rule in the universal rule set.
2. The data-driven-based clinical information rule extraction method according to claim 1, wherein:
the patient sample data is tabular data without missing values, wherein each row of the tabular data represents a patient sample, and each column represents a characteristic of the patient.
3. The data-driven clinical information rule extraction-based method of claim 1, wherein the step of generating an initial rule set from the patient sample data comprises:
pre-processing the patient sample data;
aiming at the preprocessed patient sample data, utilizing a tree model to perform rule extraction on each node in each generated tree;
and generating the initial rule set according to the rule extraction result.
4. The method according to claim 3, wherein the step of screening the initial rule set based on the time-series characteristics of the initial rule set to obtain a universal rule set comprises:
acquiring the time frequency of the regular occurrence on each node by using a time sequence statistical method;
and screening out the rule of which the time frequency meets the preset requirement of the user as the universal rule set.
5. The method of claim 1, wherein the step of determining the optimal rule set according to the accuracy and interpretability of each rule in the universal rule set comprises:
aiming at each rule in the universal rule set, determining an optimal solution through a multi-objective optimization algorithm;
and determining the combination of all the optimal solutions as the optimal rule set.
6. The data-driven-based clinical information rule extraction method of claim 5, wherein the step of determining an optimal solution through a multi-objective optimization algorithm comprises:
the accuracy and the interpretability of each rule are taken as two optimization targets;
randomly initializing a particle swarm for the optimization target;
determining a fitness of each particle in the population of particles;
updating the speed and the position of the particle according to the fitness;
judging whether the maximum iteration times or the global optimal position meets the minimum authority;
and if so, determining the pareto optimal solution.
7. The data-driven-based clinical information rule extraction method of claim 1, wherein after the step of determining the optimal rule set by the accuracy and interpretability of each rule in the universal rule set, the data-driven-based clinical information rule extraction method further comprises:
acquiring prediction data of a user needing to make a clinical decision; all the acquired prediction data form a prediction data set;
and comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rules which are met by the predicted data set according to the matching result of the predicted data and the optimal rule set.
8. The data-driven clinical information rule extraction-based method according to claim 7, wherein the optimal rule set includes a first rule, a second rule, and a third rule; the step of comparing the prediction data with the rules in the optimal rule set one by one, and obtaining the rules which the prediction data set accords with according to the matching result of the prediction data and the optimal rule set, comprises:
and determining the user illness probability corresponding to the prediction data set in response to the prediction data simultaneously meeting the first rule, the second rule and the third rule, wherein the user illness probability is used for providing auxiliary judgment information for a doctor in the process of disease diagnosis of the doctor.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the data-driven clinical information rule extraction-based method according to any one of claims 1 to 8.
10. An electronic device, comprising: a processor and a memory;
the memory is configured to store a computer program, and the processor is configured to execute the computer program stored by the memory to cause the electronic device to perform the data-driven clinical information rule extraction method according to any one of claims 1 to 8.
CN202111500068.2A 2021-12-09 2021-12-09 Data-driven clinical information rule extraction method, storage medium and equipment Active CN114121296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111500068.2A CN114121296B (en) 2021-12-09 2021-12-09 Data-driven clinical information rule extraction method, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111500068.2A CN114121296B (en) 2021-12-09 2021-12-09 Data-driven clinical information rule extraction method, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN114121296A true CN114121296A (en) 2022-03-01
CN114121296B CN114121296B (en) 2024-02-02

Family

ID=80364078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111500068.2A Active CN114121296B (en) 2021-12-09 2021-12-09 Data-driven clinical information rule extraction method, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN114121296B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117059214A (en) * 2023-07-21 2023-11-14 南京智慧云网络科技有限公司 Clinical scientific research data integration and intelligent analysis system and method based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103326353A (en) * 2013-05-21 2013-09-25 武汉大学 Environmental economic power generation dispatching calculation method based on improved multi-objective particle swarm optimization algorithm
CN111489827A (en) * 2020-04-10 2020-08-04 吉林大学 Thyroid disease prediction modeling method based on associative decision tree
US20200357514A1 (en) * 2019-05-07 2020-11-12 International Business Machines Corporation Clinical decision support
CN112071420A (en) * 2020-08-12 2020-12-11 福建中榕数据科技有限公司 Clinical aid decision making method, system, equipment and medium based on real-time data
AU2020103709A4 (en) * 2020-11-26 2021-02-11 Daqing Oilfield Design Institute Co., Ltd A modified particle swarm intelligent optimization method for solving high-dimensional optimization problems of large oil and gas production systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103326353A (en) * 2013-05-21 2013-09-25 武汉大学 Environmental economic power generation dispatching calculation method based on improved multi-objective particle swarm optimization algorithm
US20200357514A1 (en) * 2019-05-07 2020-11-12 International Business Machines Corporation Clinical decision support
CN111489827A (en) * 2020-04-10 2020-08-04 吉林大学 Thyroid disease prediction modeling method based on associative decision tree
CN112071420A (en) * 2020-08-12 2020-12-11 福建中榕数据科技有限公司 Clinical aid decision making method, system, equipment and medium based on real-time data
AU2020103709A4 (en) * 2020-11-26 2021-02-11 Daqing Oilfield Design Institute Co., Ltd A modified particle swarm intelligent optimization method for solving high-dimensional optimization problems of large oil and gas production systems

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117059214A (en) * 2023-07-21 2023-11-14 南京智慧云网络科技有限公司 Clinical scientific research data integration and intelligent analysis system and method based on artificial intelligence

Also Published As

Publication number Publication date
CN114121296B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
Xia et al. Complete random forest based class noise filtering learning for improving the generalizability of classifiers
CN110929029A (en) Text classification method and system based on graph convolution neural network
CN109817339B (en) Patient grouping method and device based on big data
CN103559504A (en) Image target category identification method and device
US20190286978A1 (en) Using natural language processing and deep learning for mapping any schema data to a hierarchical standard data model (xdm)
EP2614470A2 (en) Method for providing with a score an object, and decision-support system
CN112102899A (en) Construction method of molecular prediction model and computing equipment
CN110728313B (en) Classification model training method and device for intention classification recognition
US20220277188A1 (en) Systems and methods for classifying data sets using corresponding neural networks
Hu et al. A novel support vector regression for data set with outliers
Durak A classification algorithm using Mahalanobis distance clustering of data with applications on biomedical data sets
Poolsawad et al. Issues in the mining of heart failure datasets
Karrar The effect of using data pre-processing by imputations in handling missing values
CN114121296B (en) Data-driven clinical information rule extraction method, storage medium and equipment
Bonakdarpour et al. Prediction rule reshaping
Saravanan et al. Prediction of insufficient accuracy for human activity recognition using convolutional neural network in compared with support vector machine
Li et al. A new two-stage hybrid feature selection algorithm and its application in Chinese medicine
Dineva et al. Methodology for data processing in modular IoT system
CN115206421B (en) Drug repositioning method, and repositioning model training method and device
CN116680401A (en) Document processing method, document processing device, apparatus and storage medium
CN115936841A (en) Method and device for constructing credit risk assessment model
Pandeeswari et al. K-means clustering and Naïve Bayes classifier for categorization of diabetes patients
Huang et al. Community detection algorithm for social network based on node intimacy and graph embedding model
Kang et al. Kernel optimisation for KPCA based on Gaussianity estimation
Vinutha et al. EPCA—enhanced principal component analysis for medical data dimensionality reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant