CN111986815B - Project combination mining method based on co-occurrence relation and related equipment - Google Patents

Project combination mining method based on co-occurrence relation and related equipment Download PDF

Info

Publication number
CN111986815B
CN111986815B CN202010893345.XA CN202010893345A CN111986815B CN 111986815 B CN111986815 B CN 111986815B CN 202010893345 A CN202010893345 A CN 202010893345A CN 111986815 B CN111986815 B CN 111986815B
Authority
CN
China
Prior art keywords
occurrence
graph
relation
item
treatment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010893345.XA
Other languages
Chinese (zh)
Other versions
CN111986815A (en
Inventor
盛建为
周全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ping An Medical Health Technology Service Co Ltd filed Critical Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority to CN202010893345.XA priority Critical patent/CN111986815B/en
Publication of CN111986815A publication Critical patent/CN111986815A/en
Application granted granted Critical
Publication of CN111986815B publication Critical patent/CN111986815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the field of data processing, and discloses a project combination mining method based on co-occurrence relation and related equipment, which are applied to the field of intelligent medical treatment. The method comprises the steps of preprocessing medical record data, classifying according to disease types, extracting treatment items in the medical record data, constructing an initial co-occurrence relation network by taking the treatment items as graph nodes, simplifying the treatment items to obtain an item co-occurrence relation graph, carrying out combined mining by using an item combined mining model to obtain a complete subgraph, and outputting a treatment item combined data set based on the complete subgraph; simultaneously, the diagnosis efficiency of doctors is improved, and further possibility is provided for subsequent medical intellectualization. In addition, the present invention relates to blockchain technology in which treatment items and co-occurrence relationships may be stored.

Description

Project combination mining method based on co-occurrence relation and related equipment
Technical Field
The application relates to the field of data processing, in particular to a project combination mining method based on co-occurrence relations and related equipment.
Background
Along with popularization and popularization of medical electronic medical records, more and more clinical medical records are electronized and dataized, and become a data source which can be directly processed by a computer. With the continuous development of big data technology, people increasingly utilize computer technology means to analyze a large amount of medical care data from patients and people to acquire valuable hidden information for assisting clinical researchers, clinicians, managers, researchers and health policy makers.
At present, the analysis of medical data in the clinical medicine is mainly the analysis of clinical symptoms of each disease species and the analysis of treatment effects and pathological reactions of single treatment items, and the analysis of association relations generated when a plurality of treatment items are applied to the same disease species, although some methods for revealing the association relations between diseases analyze the association relations when the study of the common disease syndrome treatment is carried out, the association methods do not conform to the habit of medical clinical study, so that the association relations between the treatment items are difficult to interface with the clinical study or cannot be well reflected.
Disclosure of Invention
The invention mainly aims to solve the technical problems that in the prior art, the correlation analysis is difficult to realize when a plurality of treatment items are used on the same disease, so that the diagnosis efficiency is lower.
The first aspect of the invention provides a method for excavating project combinations based on co-occurrence relations, which comprises the following steps:
Acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis sheets;
preprocessing the medical record data, clustering diagnosis sheets with diagnosis results belonging to the same disease type, and extracting all cases of the corresponding disease type and treatment items corresponding to each case from the clustered diagnosis sheets;
Constructing an initial co-occurrence relation network among all the graph nodes by taking all the treatment items as the graph nodes to obtain an item co-occurrence relation graph;
Simplifying an initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;
And generating a treatment project combination data set corresponding to the disease type based on the network relation among the graph nodes in the complete subgraph.
Optionally, in a first implementation manner of the first aspect of the present invention, the constructing an initial co-occurrence relationship network between the graph nodes by using all treatment items as graph nodes, and obtaining an item co-occurrence relationship graph includes:
Randomly selecting one treatment item from all treatment items as a main node, traversing the rest treatment items, and forming item combinations with each treatment item;
Calculating the probability of the simultaneous occurrence of the item combination in the medical record data;
judging whether the probability meets an initial co-occurrence condition;
if yes, adding an edge between the item combinations to form an initial co-occurrence relation network;
and outputting the project co-occurrence relation graph after all the project combinations are added.
Optionally, in a second implementation manner of the first aspect of the present invention, the item combination includes a first treatment item and a second treatment item, and the calculating the probability that the item combination appears at the same time in the medical record data includes:
Counting a first number of times that the first treatment item and the second treatment item occur simultaneously in the same diagnosis sheet, a second number of times that the first treatment item occurs independently in the diagnosis sheet, and a third number of times that the second treatment item occurs independently in the diagnosis sheet in the medical record data;
Calculating a first probability of occurrence of the combination of items relative to the first treatment item based on the first number of times and the second number of times;
and calculating a second occurrence probability of the item combination relative to the second treatment item according to the first times and the third times.
Optionally, in a third implementation manner of the first aspect of the present invention, the determining whether the probability meets a co-occurrence relationship construction condition includes:
comparing the first occurrence probability and the second occurrence probability with initial co-occurrence conditions respectively;
If the first occurrence probability and the second occurrence probability simultaneously meet the initial co-occurrence condition, determining that the item combination is a binding treatment item of the same disease type;
And if at least one of the first occurrence probability and the second occurrence probability does not meet the initial co-occurrence condition, determining that the item combination is an unbound treatment item of the same disease.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the simplifying, by presetting an item combination mining model, the initial co-occurrence relationship network in the item co-occurrence relationship graph to obtain a network relationship structure includes:
Extracting the probability of a first graph node and other graph nodes in the project co-occurrence relation graph, and comparing the probability with a preset weight value respectively, wherein the first graph node is a graph node which is currently selected to be simplified, the other nodes are graph nodes except the first graph node, and the weight value is the proportion of the number of medical records and the total number of medical records which simultaneously use two medical project data;
if the weight value is lower than the weight value, deleting the corresponding edge from the first graph node;
and after all graph nodes in the project co-occurrence relation graph are compared, outputting a network relation structure.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the adjusting the project co-occurrence relationship diagram based on the network relationship structure, and obtaining the complete sub-graph includes:
traversing all graph nodes, screening out zero-degree nodes, and deleting the zero-degree nodes from the initial co-occurrence relation network, wherein the zero-degree nodes are graph nodes which have no edges with any graph node;
Randomly selecting N graph nodes, and calculating the total edge number of a local relation graph consisting of the N graph nodes;
Judging whether the total edge number is equal to an edge number threshold value, wherein the edge number threshold value is equal to N (N-1)/2, and N is more than or equal to 2;
And if so, determining the layout relation diagram as a complete subgraph.
Optionally, in a sixth implementation manner of the first aspect of the present invention, after the generating, based on the network relationship between graph nodes in the complete sub-graph, a treatment item combination data set corresponding to the disease type, the method further includes:
extracting medicine information corresponding to each disease type and association relation among medicines;
constructing a medicine co-occurrence relation diagram corresponding to the disease seeds according to the medicine information and the corresponding association relation;
and simplifying the medicine co-occurrence relation graph according to a preset medicine combination mining model, and generating a medicine combination data set based on the result after simplifying.
The second aspect of the present invention provides an item combination mining device based on co-occurrence relationship, the item combination mining device based on co-occurrence relationship comprising:
the data acquisition module is used for acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis sheets;
the preprocessing module is used for preprocessing the medical record data, clustering diagnosis sheets with diagnosis results belonging to the same disease type, and extracting all cases of the corresponding disease type and treatment items corresponding to each case from the clustered diagnosis sheets;
The construction module is used for constructing an initial co-occurrence relation network among all the graph nodes by taking all the treatment items as the graph nodes to obtain an item co-occurrence relation graph;
The mining module is used for simplifying an initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;
And the generation module is used for generating a treatment project combination data set corresponding to the disease type based on the network relation among the graph nodes in the complete subgraph.
Optionally, in a first implementation manner of the second aspect of the present invention, the building module includes:
The traversing unit is used for randomly selecting one treatment item from all treatment items as a main node, traversing the rest treatment items and forming item combinations with each treatment item;
a first calculation unit for calculating the probability of the item combination occurring at the same time in the medical record data;
A first judging unit, configured to judge whether the probability meets an initial co-occurrence condition;
The creation unit is used for adding an edge between the item combinations when the probability meets the initial co-occurrence condition to form an initial co-occurrence relation network; and outputting the project co-occurrence relation graph after all the project combinations are added.
Optionally, in a second implementation manner of the second aspect of the present invention, the item combination includes a first treatment item and a second treatment item, and the first computing unit is specifically configured to:
Counting a first number of times that the first treatment item and the second treatment item occur simultaneously in the same diagnosis sheet, a second number of times that the first treatment item occurs independently in the diagnosis sheet, and a third number of times that the second treatment item occurs independently in the diagnosis sheet in the medical record data;
Calculating a first probability of occurrence of the combination of items relative to the first treatment item based on the first number of times and the second number of times;
and calculating a second occurrence probability of the item combination relative to the second treatment item according to the first times and the third times.
Optionally, in a third implementation manner of the second aspect of the present invention, the first determining unit is specifically configured to:
comparing the first occurrence probability and the second occurrence probability with initial co-occurrence conditions respectively;
when the first occurrence probability and the second occurrence probability simultaneously meet the initial co-occurrence condition, determining that the item combination is a binding treatment item of the same disease type; and determining that the item combination is an unbound treatment item of the same disease species when at least one of the first occurrence probability and the second occurrence probability does not satisfy the initial co-occurrence condition.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the mining module includes:
The comparison unit is used for extracting the probability of a first graph node and other graph nodes in the project co-occurrence relation graph, and comparing the probability with a preset weight value respectively, wherein the first graph node is a graph node which is currently selected to be simplified, the other nodes are graph nodes except the first graph node, and the weight value is the proportion of the number of medical records and the total number of medical records which simultaneously use two medical project data;
A deleting unit, configured to delete a corresponding edge from the first graph node when the probability is lower than the weight value;
And the output unit is used for outputting a network relation structure after all the graph nodes in the project co-occurrence relation graph are compared.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the mining module further includes:
The screening unit is used for traversing all graph nodes, screening out zero-degree nodes and deleting the zero-degree nodes from the initial co-occurrence relation network, wherein the zero-degree nodes are graph nodes which have no edges with any graph node;
the second calculation unit is used for randomly selecting N graph nodes and calculating the total edge number of the local relation graph formed by the N graph nodes;
a second judging unit, configured to judge whether the total edge number is equal to an edge number threshold, where the edge number threshold is equal to N (N-1)/2, and N is greater than or equal to 2;
And the determining unit is used for determining that the layout relation diagram is a complete subgraph when the total edge number is equal to the edge number threshold value.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the project combination mining apparatus based on co-occurrence relation further includes an optimization module, which is specifically configured to:
extracting medicine information corresponding to each disease type and association relation among medicines;
constructing a medicine co-occurrence relation diagram corresponding to the disease seeds according to the medicine information and the corresponding association relation;
and simplifying the medicine co-occurrence relation graph according to a preset medicine combination mining model, and generating a medicine combination data set based on the result after simplifying.
A third aspect of the present invention provides an item combination mining apparatus based on co-occurrence relationships, comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line;
The at least one processor invokes the instructions in the memory to cause the co-occurrence relationship based project combination mining apparatus to perform the co-occurrence relationship based project combination mining method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the above-described co-occurrence relationship-based project combination mining method.
According to the technical scheme provided by the invention, medical record data are preprocessed, classified according to disease types, treatment items in the medical record data are extracted, an initial co-occurrence relation network is constructed by taking the treatment items as graph nodes, an item co-occurrence relation graph is obtained by simplifying the treatment items, a complete subgraph is obtained by carrying out combined excavation by utilizing an item combined excavation model, and a treatment item combined data set is output based on the complete subgraph, so that the method shows the combined use condition of various diagnosis and treatment items in the actual clinical process, is simple and clear, is attached to the actual clinical process, and can truly reflect the combined use condition of various items in the clinical treatment process; meanwhile, after a doctor diagnoses a specific disease, the doctor can directly search the treatment item combination data set to obtain the treatment item recommendation which must be associated by determining one of the treatment items, so that the diagnosis efficiency of the doctor is greatly improved, and further possibility is provided for follow-up medical intellectualization.
Drawings
FIG. 1 is a schematic diagram of a first embodiment of a method for mining a combination of items based on co-occurrence relationships according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a second embodiment of a method for mining a combination of items based on co-occurrence relationships according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a third embodiment of a method for mining a combination of items based on co-occurrence relationships according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a fourth embodiment of a method for mining a combination of items based on co-occurrence relationships according to an embodiment of the present invention;
FIG. 5 is a schematic illustration of a treatment project set in an embodiment of the invention;
FIG. 6 is a schematic diagram of a complete sub-graph in an embodiment of the invention;
FIG. 7 is a schematic diagram of an embodiment of a combined project mining apparatus based on co-occurrence relationships according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of another embodiment of a combined project mining apparatus based on co-occurrence relationships according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of an embodiment of a co-occurrence relationship-based project combination mining apparatus according to an embodiment of the present invention.
Detailed Description
Aiming at the defects in the prior art, the application provides a method for excavating the co-occurrence relation of a plurality of treatment items in the same disease through a diagnosis and treatment item combined graph excavation model, in particular to a method for excavating the combined relation among different diagnosis and treatment items based on the graph excavation model of the co-occurrence relation so as to determine the treatment items which need to be simultaneously generated during treatment of the same disease, and when a doctor diagnoses, the method can rapidly provide a corresponding treatment scheme for a patient based on the combined relation, thereby greatly shortening the diagnosis time and improving the diagnosis efficiency of the doctor.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, a specific flow of an embodiment of the present invention is described below, referring to fig. 1, and a first embodiment of a method for mining a project combination based on co-occurrence relationships in the embodiment of the present invention includes:
101. Acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis sheets;
In the step, clinical data can be acquired through an interface on the same provided data analysis platform, for example, through a security transmission interface provided based on a mobile internet communication protocol, a user can log in the security transmission interface through an account number to directly access the clinical data of each medical institution, and particularly, the data acquisition program is also called to read and access a clinical database in the medical institution while calling the security transmission interface, so that the clinical data in the database is acquired.
The clinical data comprises a diagnosis list, the diagnosis list comprises patient data information, diagnosis information and treatment information, the diagnosis information comprises a diagnosis result, and the treatment information comprises medical treatment projects and medication information of each project.
102. Preprocessing medical record data, clustering diagnosis sheets with diagnosis results belonging to the same disease type, and extracting all medical records of the corresponding disease type and treatment items corresponding to each medical record from the clustered diagnosis sheets;
in this embodiment, the preprocessing includes a plurality of steps such as feature extraction, data screening, and clustering, and specifically includes:
Screening the medical record data, identifying the information of the non-diagnostic list in the medical record data, and deleting the non-diagnostic list from the medical record data to obtain a diagnostic list set;
Extracting the symptoms from each diagnosis list in the diagnosis list set by using a disease type knowledge graph, and matching a specific disease name based on the extracted symptoms;
Classifying the diagnostic sheets with the same disease name in the diagnostic sheet set into one type by using a clustering algorithm to obtain a disease data set;
further, the diagnostic sheets in each disease data set are classified into medical records, and the treatment items are extracted from the diagnostic sheets corresponding to each medical record to form item sets.
For example: preprocessing medical record data, screening all cases of a certain disease, and recording all cases as n. The medical treatment items (medicines or examination items) used by the patient in the hospitalization process are converted into item sets { a i1,ai2,…aik }, wherein the subscript i is a medical record label, and k is a medical treatment item accepted by the patient.
103. Constructing an initial co-occurrence relation network among all graph nodes by taking all treatment items as graph nodes, and obtaining an item co-occurrence relation graph;
In the step, the project co-occurrence relationship graph is constructed according to a construction mode of a complete graph, firstly, all treatment projects extracted from medical record data are used as nodes, then each node is connected with each other to obtain an initial complete graph, the betweenness of edges between the nodes is calculated based on the initial complete graph, whether the association relationship between the nodes and other nodes meets preset conditions is judged based on the betweenness, if yes, the connection of the edges is reserved, otherwise, the connection is deleted, and therefore an initial co-occurrence relationship network is constructed, and the project co-occurrence relationship graph is obtained.
In practical application, the shortest path between two nodes in the complete graph can be detected through a detection algorithm of python networkx modules, and the betweenness (betweenness) between the two nodes is calculated, and the specific implementation steps of the shortest path between the two nodes are as follows:
And identifying the module structure in the complete graph, optionally, searching nodes connected with each node in the complete graph by taking the node as a central node, recording the module if the number level of the searched nodes reaches the number level of the modules, and calculating the betweenness between two nodes in the module.
And specifically calculating the edge betweenness of all paths connected between the two nodes, and selecting one path with the smallest betweenness value as the path between the two modules, thereby obtaining the co-occurrence relation between the two nodes.
104. Simplifying an initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;
In this embodiment, the term combination mining model refers to learning of treatment terms in clinical treatment success medical data of various disease types based on natural language technology, and a detection model capable of identifying co-occurrence relations of the treatment terms in each diagnosis sheet is obtained.
The relation among different medical projects is analyzed and mined through the model, so that diagnosis and treatment projects which frequently appear together in clinic are obtained, and a co-occurrence relation between the two and a corresponding relation with disease types under the co-occurrence relation are established.
In practical application, the method comprises the steps of simplifying an initial co-occurrence relation network through the model, classifying treatment items according to different disease types, simplifying the model after extracting all treatment items in medical record data to obtain a network relation structure, classifying the network relation structure according to the same disease type to obtain a plurality of complete subgraphs, and converting treatment items corresponding to all nodes on the complete subgraphs into treatment item sets to obtain a classification chart, as shown in fig. 5.
105. Based on the network relation between graph nodes in the complete subgraph, a treatment item combination data set corresponding to the disease type is generated.
In the step, physical characteristics of treatment items in each complete subgraph are extracted by using a natural language algorithm, a corresponding medical knowledge graph is constructed according to the physical characteristics, and a corresponding relation between the medical knowledge graph and disease seeds is established, so that a treatment item combination data set is formed.
Through implementation of the method, based on actual diagnosis and treatment data, the combination relation among different diagnosis and treatment projects is mined based on a graph mining model of the co-occurrence relation. The method adopts the method of the graph to show the combined use condition of various diagnosis and treatment projects in the clinical actual process, is simple and clear, is attached to the clinical actual process, and can truly reflect the combined use condition of various projects in the clinical treatment process; meanwhile, after a doctor diagnoses a specific disease, the doctor can directly search the treatment item combination data set to obtain treatment item recommendation which is necessary to be associated by determining one of the treatment items, so that the diagnosis efficiency of the doctor is greatly improved.
Referring to fig. 2, a second embodiment of a method for mining a project combination based on co-occurrence relationships according to an embodiment of the present invention includes:
201. acquiring clinical data and extracting a plurality of diagnostic sheets in the clinical data;
202. preprocessing medical record data, clustering diagnosis sheets with diagnosis results belonging to the same disease type, and extracting all medical records of the corresponding disease type and treatment items corresponding to each medical record from the clustered diagnosis sheets;
203. Randomly selecting one treatment item from all treatment items as a main node, traversing the rest treatment items, and forming item combinations with each treatment item;
In the step, one of all treatment items is taken as a main node along with an algorithm, the treatment item is traversed with the rest of the treatment items to form item combinations, after the main node is completed, the next treatment item is continuously selected as the main node for combination until all treatment item traversal combinations are completed, and then all combinations are subjected to de-duplication screening to obtain the item combinations.
In practical applications, the deduplication is to only judge whether the treatment items in the combination are the same, and not to make the identification of the combination connection direction, that is, the combination of A- > B and B- > A is repeated, and of course, the treatment item combination is not limited to two item combinations, but can be more than three.
204. Calculating the probability of simultaneous occurrence of item combinations in medical record data;
In practical application, a scenario of simultaneous use of multiple treatment items is mainly evaluated by taking a diagnosis list as a unit, and no evaluation is performed among different diagnosis lists, that is, a co-occurrence relationship refers to a relationship of simultaneous use on the same disease to treat a patient, and in the same disease, multiple diseases or multiple symptoms occur simultaneously, for which different treatment items need to be selected for symptomatically taking medicines clinically, and in addition, whether collision exists before different treatment items needs to be determined, so that the co-occurrence relationship here includes a non-co-existence relationship besides the co-occurrence relationship.
205. Judging whether the probability meets the initial co-occurrence condition;
In this step, the initial co-occurrence condition includes the probability of being used simultaneously in a clinical trial between the treatment item and the treatment item, which is the probability in a clinical trial that in practice it will also be adapted according to the clinical reflection of the treatment item itself.
206. If yes, adding an edge between the project combinations to form an initial co-occurrence relation network;
In this embodiment, the edges added between the treatment items in the item combination are connected by the shortest path between the two, and the waiting of the shortest path to the medians in the multiple paths connected in the treatment items in particular is realized, so as to form the initial co-occurrence relation network.
207. After all the project combinations are added, outputting a project co-occurrence relation diagram;
In this step, a co-occurrence relationship diagram between all the treatment items is constructed according to the initial co-occurrence relationship network in the item combination, and the calculation of the co-occurrence relationship between the combination is also included, and the calculation principle is the same as that of the co-occurrence relationship between the treatment items in the combination, and is not repeated here.
208. Simplifying an initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;
In this embodiment, the term combination mining model is obtained by training treatment terms in a diagnosis list through a deep neural network, specifically, the treatment terms in the diagnosis list with history extracted are combined to form a training set, regression processing is performed by adopting a regression algorithm based on the training set, the training set and a verification set are divided, the deep neural network is trained based on the training set to obtain a model prototype, verification is performed on the model prototype based on the verification set, and after the probability that the treatment terms in the diagnosis list corresponding to the verification set and the verification result simultaneously appear reaches a threshold value, the term combination mining model is formed.
Edges in the initial co-occurrence relationship graph are simplified based on the project combination mining model, where simplification refers to deletion or modification.
And then, selecting a plurality of graph nodes in the simplified graph through a random combination algorithm to combine to form a relation sub graph, calculating whether the number of connecting edges of each node in the relation sub graph meets a combination formula of a complete sub graph, and outputting the relation sub graph as the complete sub graph if the number of connecting edges of each node in the relation sub graph meets the combination formula of the complete sub graph.
209. Based on the network relation between graph nodes in the complete subgraph, a treatment item combination data set corresponding to the disease type is generated.
Through the embodiment of the method, the combined relation among different diagnosis and treatment projects is mined based on the graph mining model of the co-occurrence relation. The method adopts the method of the graph to show the combined use condition of various diagnosis and treatment projects in the clinical actual process, is simple and clear, is attached to the clinical actual process, and can truly reflect the combined use condition of various projects in the clinical treatment process; meanwhile, after a doctor diagnoses a specific disease, the doctor can directly search the treatment item combination data set to obtain treatment item recommendation which is necessary to be associated by determining one of the treatment items, so that the diagnosis efficiency of the doctor is greatly improved.
Referring to fig. 3, a third embodiment of a method for mining a project combination based on co-occurrence relationships according to an embodiment of the present invention includes:
301. Acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis sheets;
302. preprocessing medical record data, clustering diagnosis sheets with diagnosis results belonging to the same disease type, and extracting all medical records of the corresponding disease type and treatment items corresponding to each medical record from the clustered diagnosis sheets;
303. Randomly selecting one treatment item from all treatment items as a main node, traversing the rest treatment items, and forming item combinations with each treatment item;
in this embodiment, the implementation principle of steps 301 to 303 is the same as that of steps 201 to 203 in the above embodiment, and will not be described again here.
304. Counting the first times of the first treatment item and the second treatment item in the medical record data, which occur simultaneously in the same diagnosis list, the second times of the first treatment item and the third times of the second treatment item in the diagnosis list;
305. calculating a first occurrence probability of the item combination relative to the first treatment item according to the first times and the second times;
306. Calculating a second probability of occurrence of the combination of items relative to the second treatment item based on the first number and the third number;
In practical application, when counting the first times, the second times and the third times, the number of diagnostic orders of the first treatment item and the second treatment item which occur simultaneously can be counted directly from the medical record data, the number of diagnostic orders is taken as the first times, then the number of diagnostic orders of the first treatment item is counted from the medical record data, the number of diagnostic orders is taken as the second times, and further the number of diagnostic orders of the second treatment item is counted from the medical record data, and the number of diagnostic orders is taken as the third times;
Further, statistics may be performed by first counting the number of diagnostic sheets in which the first treatment item exists alone or the second treatment item exists alone in the medical record data, then counting the number of diagnostic sheets in which two treatment items appear simultaneously from the diagnostic sheets of the two treatment items, and finally calculating to take out the first occurrence probability and the second occurrence probability based on the three parameters.
307. Comparing the first occurrence probability and the second occurrence probability with initial co-occurrence conditions respectively;
308. if the first occurrence probability and the second occurrence probability simultaneously meet the initial co-occurrence condition, determining that the item combination is a binding treatment item of the same disease type;
309. if at least one of the first occurrence probability and the second occurrence probability does not meet the initial co-occurrence condition, determining that the item combination is an unbound treatment item of the same disease;
in practical application, when the constructed project co-occurrence relation graph is G (v, e, w), wherein v represents nodes, namely all diagnosis and treatment projects, and e is an edge between the nodes.
Specifically, the initial co-occurrence condition is the minimum support, and the judgment that the initial co-occurrence condition is satisfied is specifically implemented as follows:
and adding an edge between the items a and b when the items a and b meet the co-occurrence relation. Wherein meeting the initial co-occurrence condition is defined as:
And/>
Wherein minsup is the minimum support, is an empirical parameter, and the larger the value of the user-defined variable is, the more strict the co-occurrence relation is indicated, and the value is usually determined by experiments to be between 0.8 and 0.95. Based on the project set of all patients, judging whether the co-occurrence relation is met or not according to any two-two combination of diagnosis and treatment projects, and adding one edge between two nodes when the co-occurrence relation is met.
310. If yes, adding an edge between the project combinations to form an initial co-occurrence relation network;
311. after all the project combinations are added, outputting a project co-occurrence relation diagram;
312. Simplifying an initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;
313. Based on the network relation between graph nodes in the complete subgraph, a treatment item combination data set corresponding to the disease type is generated.
The method mainly comprises the step of mining the combination relation among different diagnosis and treatment projects based on actual diagnosis and treatment data and a graph mining model of the co-occurrence relation. The method adopts the method of the graph to show the combined use condition of various diagnosis and treatment projects in the clinical actual process, is simple and clear, is attached to the clinical actual process, and can truly reflect the combined use condition of various projects in the clinical treatment process.
Furthermore, based on the mode, one of the treatment items is taken as a base point, the treatment items are mined according to the disease type and co-occurrence relation, based on the mined treatment items, a doctor can determine specific disease types according to the treatment items during diagnosis, and then after passing through one of the treatment items, the doctor can directly search the treatment item combination data set to obtain treatment item recommendation which is necessary to be associated, so that the diagnosis efficiency of the doctor is greatly improved.
Referring to fig. 4, a fourth embodiment of a method for mining a project combination based on co-occurrence relationships according to an embodiment of the present invention includes:
401. Acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis sheets;
402. Preprocessing medical record data, clustering diagnosis sheets with diagnosis results belonging to the same disease type, and extracting all medical records of the corresponding disease type and treatment items corresponding to each medical record from the clustered diagnosis sheets;
403. constructing an initial co-occurrence relation network among all graph nodes by taking all treatment items as graph nodes, and obtaining an item co-occurrence relation graph;
in this embodiment, the implementation principle of the steps 401 to 403 is the same as that of the steps 201 to 203 in the above embodiment, and will not be described again here.
404. Extracting the probability of a first graph node and other graph nodes in the project co-occurrence relation graph, and comparing the probability with a preset weight value respectively;
In this step, the first graph node is a current simplified graph node, the other nodes are graph nodes except the first graph node, and the weight value is a ratio of the number of medical records using two medical item data at the same time to the total number of medical records.
405. If the edge is lower than the weight value, deleting the corresponding edge from the first graph node;
406. after all graph nodes in the project co-occurrence relationship graph are compared, outputting a network relationship structure;
in practical application, in order to simplify the complexity of graph mining, a certain simplification needs to be performed on the project co-occurrence relationship graph. The preset probability value may be set to a weight threshold, edges below the threshold may be deleted, and then when those 0 degree nodes are removed, the 0 degree nodes refer to nodes that have no edges with any node.
In this embodiment, when the constructed project co-occurrence relationship graph is a complete graph, that is, a graph in which any two nodes in the constructed project co-occurrence relationship graph have one edge, the complete graph in the project co-occurrence relationship graph means that any two projects satisfy the co-occurrence relationship, that is, all occur together frequently. The maximum complete sub-graph refers to any node added, and the graph is not a complete graph.
And extracting complete subgraphs from the project co-occurrence relation graph, in particular using NetworkX to mine all maximum complete subgraphs, so as to find out all frequently occurring projects,
In this embodiment, the extraction of the complete subgraph specifically includes:
The step of adjusting the project co-occurrence relationship graph based on the network relationship structure, the step of obtaining a complete sub-graph comprises the following steps:
traversing all graph nodes, screening out zero-degree nodes, and deleting the zero-degree nodes from the initial co-occurrence relation network, wherein the zero-degree nodes are graph nodes which have no edges with any graph node;
Randomly selecting N graph nodes, and calculating the total edge number of a local relation graph consisting of the N graph nodes;
Judging whether the total edge number is equal to an edge number threshold value, wherein the edge number threshold value is equal to N (N-1)/2, and N is more than or equal to 2;
If the result is equal to the result, the layout relation diagram is determined to be a complete subgraph, and the obtained complete subgraph is specifically shown in fig. 6.
407. Generating a treatment item combination data set corresponding to the disease type based on the network relation among the graph nodes in the complete subgraph;
408. Extracting medicine information corresponding to each disease type and association relation among medicines;
409. constructing a medicine co-occurrence relation diagram corresponding to the disease types according to the medicine information and the corresponding association relation;
410. And simplifying the medicine co-occurrence relation diagram according to a preset medicine combination mining model, and generating a medicine combination data set based on the result after simplifying.
In this embodiment, a drug entity is specifically identified from the acquired medical record data;
preprocessing the extracted medicine entity, and co-occurrence matrix of the disease species and the medicine entity;
Calculating a confidence coefficient value IMPT of the relation between each pair of nodes in the co-occurrence matrix by adopting a naive Bayes model, or calculating and obtaining the confidence coefficient value IMPT of the relation between each pair of nodes in the co-occurrence matrix in the step B by adopting a NoisyOR model;
Ranking all confidence values from large to small, and constructing a medicine co-occurrence relation graph by taking the relation that the previous n or confidence values are larger than a certain threshold value as edges and taking all medicine entities as nodes;
and calling a medicine combination mining model based on the medicine co-occurrence relation diagram, simplifying the medicine co-occurrence relation diagram, and generating a medicine combination data set based on the simplified result.
In practical application, the constructed medicine co-occurrence relation diagram can be understood as a relation diagram of the indication of the tablet, specifically, by extracting the indication of the medicine applied to the same disease, constructing medicine combination and relation diagram based on the indication, then randomly selecting one treatment item from all indications in the relation diagram as a main node, traversing the rest indications, and forming item combination with each indication;
Calculating the probability of the simultaneous occurrence of the item combination in the medical record data;
judging whether the probability meets an initial co-occurrence condition;
if yes, adding an edge between the item combinations to form an initial co-occurrence relation network;
And outputting a medicine co-occurrence relation graph after all the item combinations are added.
Through implementation of the scheme, based on actual diagnosis and treatment data, the combination relation among different diagnosis and treatment projects is mined based on a graph mining model of the co-occurrence relation. The method adopts the method of the graph to show the combined use condition of various diagnosis and treatment projects in the clinical actual process, is simple and clear, is attached to the clinical actual process, and can truly reflect the combined use condition of various projects in the clinical treatment process.
The method for mining the project combination based on the co-occurrence relationship in the embodiment of the present invention is described above, and the device for mining the project combination based on the co-occurrence relationship in the embodiment of the present invention is described below, referring to fig. 7, a first embodiment of the device for mining the project combination based on the co-occurrence relationship in the embodiment of the present invention includes:
a data acquisition module 701, configured to acquire clinical data and extract medical record data in the clinical data, where the medical record data includes at least two diagnostic sheets;
the preprocessing module 702 is configured to preprocess the medical record data, cluster diagnostic sheets with diagnostic results belonging to the same disease, and extract all medical records of the corresponding disease and treatment items corresponding to each medical record from the clustered diagnostic sheets;
a construction module 703, configured to construct an initial co-occurrence relationship network between all the graph nodes by using all the treatment items as graph nodes, so as to obtain an item co-occurrence relationship graph;
The mining module 704 is configured to simplify an initial co-occurrence relationship network in the project co-occurrence relationship graph by presetting a project combination mining model, obtain a network relationship structure, and adjust the project co-occurrence relationship graph based on the network relationship structure to obtain a complete subgraph;
and a generating module 705, configured to generate a treatment item combination dataset corresponding to the disease type based on the network relationship between graph nodes in the complete subgraph.
In the embodiment, the project combination mining device based on the co-occurrence relation operates the project combination mining method based on the co-occurrence relation, and the method comprises the steps of preprocessing medical record data, classifying according to disease types, extracting treatment projects in the medical record data, constructing an initial co-occurrence relation network by taking the treatment projects as graph nodes, simplifying the treatment projects to obtain a project co-occurrence relation graph, carrying out combination mining by using a project combination mining model to obtain a complete subgraph, and outputting a treatment project combination data set based on the complete subgraph, wherein the method shows the combined use condition of various diagnosis and treatment projects in the actual clinical process, is simple and clear, is attached to the actual clinical process, and can truly reflect the combined use condition of various projects in the clinical treatment process; meanwhile, after a doctor diagnoses a specific disease, the doctor can directly search the treatment item combination data set to obtain the treatment item recommendation which must be associated by determining one of the treatment items, so that the diagnosis efficiency of the doctor is greatly improved, and further possibility is provided for follow-up medical intellectualization.
Referring to fig. 8, in a second embodiment of the project combining and excavating device based on co-occurrence relationship according to the present invention, the project combining and excavating device based on co-occurrence relationship specifically includes:
a data acquisition module 701, configured to acquire clinical data and extract medical record data in the clinical data, where the medical record data includes at least two diagnostic sheets;
the preprocessing module 702 is configured to preprocess the medical record data, cluster diagnostic sheets with diagnostic results belonging to the same disease, and extract all medical records of the corresponding disease and treatment items corresponding to each medical record from the clustered diagnostic sheets;
a construction module 703, configured to construct an initial co-occurrence relationship network between all the graph nodes by using all the treatment items as graph nodes, so as to obtain an item co-occurrence relationship graph;
The mining module 704 is configured to simplify an initial co-occurrence relationship network in the project co-occurrence relationship graph by presetting a project combination mining model, obtain a network relationship structure, and adjust the project co-occurrence relationship graph based on the network relationship structure to obtain a complete subgraph;
and a generating module 705, configured to generate a treatment item combination dataset corresponding to the disease type based on the network relationship between graph nodes in the complete subgraph.
Optionally, the building module 703 includes:
A traversing unit 7031 for randomly selecting one treatment item from all treatment items as a master node, traversing the rest of the treatment items, and forming item combinations with each treatment item;
a first calculating unit 7032, configured to calculate a probability that the item combination occurs at the same time in the medical record data;
a first judging unit 7033, configured to judge whether the probability meets an initial co-occurrence condition;
a creating unit 7034, configured to add an edge between the item combinations when the probability meets an initial co-occurrence condition, so as to form an initial co-occurrence relationship network; and outputting the project co-occurrence relation graph after all the project combinations are added.
Optionally, the item combination includes a first treatment item and a second treatment item, and the first computing unit 7032 is specifically configured to:
Counting a first number of times that the first treatment item and the second treatment item occur simultaneously in the same diagnosis sheet, a second number of times that the first treatment item occurs independently in the diagnosis sheet, and a third number of times that the second treatment item occurs independently in the diagnosis sheet in the medical record data;
Calculating a first probability of occurrence of the combination of items relative to the first treatment item based on the first number of times and the second number of times;
and calculating a second occurrence probability of the item combination relative to the second treatment item according to the first times and the third times.
Optionally, the first determining unit 7033 is specifically configured to:
comparing the first occurrence probability and the second occurrence probability with initial co-occurrence conditions respectively;
when the first occurrence probability and the second occurrence probability simultaneously meet the initial co-occurrence condition, determining that the item combination is a binding treatment item of the same disease type; and determining that the item combination is an unbound treatment item of the same disease species when at least one of the first occurrence probability and the second occurrence probability does not satisfy the initial co-occurrence condition.
Optionally, the mining module 704 includes:
a comparing unit 7041, configured to extract probabilities of a first graph node and other graph nodes in the project co-occurrence relationship graph, and compare the probabilities with preset weight values respectively, where the first graph node is a graph node that is currently selected to be simplified, the other nodes are graph nodes except for the first graph node, and the weight values are ratios of a number of medical records and a total number of medical records that use two medical project data simultaneously;
A deleting unit 7042, configured to delete a corresponding edge from the first graph node when the probability is lower than the weight value;
And the output unit 7043 is configured to output a network relationship structure after all graph nodes in the project co-occurrence relationship graph are compared.
Optionally, the mining module 704 further includes:
A screening unit 7044, configured to traverse all graph nodes, screen out zero-degree nodes, and delete the zero-degree nodes from the initial co-occurrence relationship network, where the zero-degree nodes are graph nodes that have no edges with any graph node;
a second calculating unit 7045, configured to randomly select N graph nodes, and calculate a total edge number of a local relationship graph formed by the N graph nodes;
A second judging unit 7046 configured to judge whether the total edge number is equal to an edge number threshold, where the edge number threshold is equal to N (N-1)/2, and N is equal to or greater than 2;
A determining unit 7047, configured to determine that the layout relationship diagram is a complete sub-graph when the total edge number is equal to the edge number threshold.
Wherein, the project combination mining device based on co-occurrence relationship further comprises an optimizing module 705, which is specifically configured to:
extracting medicine information corresponding to each disease type and association relation among medicines;
constructing a medicine co-occurrence relation diagram corresponding to the disease seeds according to the medicine information and the corresponding association relation;
and simplifying the medicine co-occurrence relation graph according to a preset medicine combination mining model, and generating a medicine combination data set based on the result after simplifying.
The project combination mining device based on the co-occurrence relation in the embodiment of the invention is described in detail from the angle of a modularized functional entity in the above figures 7 and 8, the project combination mining device based on the co-occurrence relation in the embodiment of the invention is described in detail from the angle of hardware processing in the following, and the project combination mining device based on the co-occurrence relation can be arranged in the form of a plug-in unit to mine the co-occurrence relation between treatment projects with the project combination mining device based on the co-occurrence relation, so as to extract the use combination of different treatment projects in the disease.
Fig. 9 is a schematic structural diagram of a co-occurrence relationship-based project combination mining apparatus according to an embodiment of the present invention, where the co-occurrence relationship-based project combination mining apparatus 600 may have relatively large differences due to configuration or performance, and may include one or more processors (central processing units, CPU) 610 (e.g., one or more processors) and a memory 620, and one or more storage mediums 630 (e.g., one or more mass storage devices) storing applications 633 or data 632. Wherein the memory 620 and the storage medium 630 may be transitory or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown in fig. 9), each of which may include a series of instruction operations on the co-occurrence relationship-based project portfolio mining apparatus 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the co-occurrence relationship-based project combination mining apparatus 600 to implement the steps of the co-occurrence relationship-based project combination mining method described above.
The co-occurrence relationship based project portfolio mining apparatus 600 can also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input output interfaces 660, and/or one or more operating systems 631, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the co-occurrence relationship based project organization mining apparatus structure illustrated in FIG. 9 is not limiting of the co-occurrence relationship based project organization mining apparatus provided by the present application, and may include more or fewer components than illustrated in FIG. 9, or may combine certain components, or a different arrangement of components.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, where the instructions when executed on a computer cause the computer to perform the steps of the project combination mining method based on co-occurrence relationships provided in the foregoing embodiments.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. The project combination mining method based on the co-occurrence relationship is characterized by comprising the following steps of:
Acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis sheets;
preprocessing the medical record data, clustering diagnosis sheets with diagnosis results belonging to the same disease type, and extracting all cases of the corresponding disease type and treatment items corresponding to each case from the clustered diagnosis sheets;
Constructing an initial co-occurrence relation network among all the graph nodes by taking all the treatment items as the graph nodes to obtain an item co-occurrence relation graph;
Simplifying an initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;
Generating a treatment project combination data set corresponding to the disease type based on the network relation among the graph nodes in the complete subgraph;
The simplifying processing is carried out on the initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model, and the obtaining of the network relation structure comprises the following steps: extracting the probability of a first graph node and other graph nodes in the project co-occurrence relation graph, and comparing the probability with a preset weight value respectively, wherein the first graph node is a graph node which is currently selected to be simplified, the other nodes are graph nodes except the first graph node, and the weight value is the proportion of the number of medical records and the total number of medical records which simultaneously use two medical project data; if the weight value is lower than the weight value, deleting the corresponding edge from the first graph node; outputting a network relation structure after all graph nodes in the project co-occurrence relation graph are compared;
The step of adjusting the project co-occurrence relationship graph based on the network relationship structure, the step of obtaining a complete sub-graph comprises the following steps: traversing all graph nodes, screening out zero-degree nodes, and deleting the zero-degree nodes from the initial co-occurrence relation network, wherein the zero-degree nodes are graph nodes which have no edges with any graph node; randomly selecting N graph nodes, and calculating the total edge number of a local relation graph consisting of the N graph nodes; judging whether the total edge number is equal to an edge number threshold value, wherein the edge number threshold value is equal to N (N-1)/2, and N is more than or equal to 2; and if so, determining the layout relation diagram as a complete subgraph.
2. The method for mining item combinations based on co-occurrence relationships according to claim 1, wherein the constructing an initial co-occurrence relationship network between the graph nodes by using all treatment items as graph nodes, and obtaining an item co-occurrence relationship graph comprises:
Randomly selecting one treatment item from all treatment items as a main node, traversing the rest treatment items, and forming item combinations with each treatment item;
Calculating the probability of the simultaneous occurrence of the item combination in the medical record data;
judging whether the probability meets an initial co-occurrence condition;
if yes, adding an edge between the item combinations to form an initial co-occurrence relation network;
and outputting the project co-occurrence relation graph after all the project combinations are added.
3. The co-occurrence relationship-based item combination mining method of claim 2, wherein the item combination includes a first treatment item and a second treatment item, and wherein calculating a probability that the item combination appears simultaneously in the medical record data includes:
Counting a first number of times that the first treatment item and the second treatment item occur simultaneously in the same diagnosis sheet, a second number of times that the first treatment item occurs independently in the diagnosis sheet, and a third number of times that the second treatment item occurs independently in the diagnosis sheet in the medical record data;
Calculating a first probability of occurrence of the combination of items relative to the first treatment item based on the first number of times and the second number of times;
and calculating a second occurrence probability of the item combination relative to the second treatment item according to the first times and the third times.
4. The co-occurrence relationship-based project combination mining method according to claim 3, wherein the determining whether the probability satisfies a co-occurrence relationship construction condition comprises:
comparing the first occurrence probability and the second occurrence probability with initial co-occurrence conditions respectively;
If the first occurrence probability and the second occurrence probability simultaneously meet the initial co-occurrence condition, determining that the item combination is a binding treatment item of the same disease type;
And if at least one of the first occurrence probability and the second occurrence probability does not meet the initial co-occurrence condition, determining that the item combination is an unbound treatment item of the same disease.
5. The co-occurrence relationship-based item combination mining method according to any one of claims 1 to 4, further comprising, after the generating of the treatment item combination dataset corresponding to the disease species based on the network relationship between graph nodes in the complete subgraph:
extracting medicine information corresponding to each disease type and association relation among medicines;
constructing a medicine co-occurrence relation diagram corresponding to the disease seeds according to the medicine information and the corresponding association relation;
and simplifying the medicine co-occurrence relation graph according to a preset medicine combination mining model, and generating a medicine combination data set based on the result after simplifying.
6. A co-occurrence relationship-based project combination mining apparatus, characterized by comprising:
the data acquisition module is used for acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis sheets;
the preprocessing module is used for preprocessing the medical record data, clustering diagnosis sheets with diagnosis results belonging to the same disease type, and extracting all cases of the corresponding disease type and treatment items corresponding to each case from the clustered diagnosis sheets;
The construction module is used for constructing an initial co-occurrence relation network among all the graph nodes by taking all the treatment items as the graph nodes to obtain an item co-occurrence relation graph;
The mining module is used for simplifying an initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;
The generation module is used for generating a treatment project combination data set corresponding to the disease type based on the network relation among the graph nodes in the complete subgraph;
The mining module includes: the comparison unit is used for extracting the probability of a first graph node and other graph nodes in the project co-occurrence relation graph, and comparing the probability with a preset weight value respectively, wherein the first graph node is a graph node which is currently selected to be simplified, the other nodes are graph nodes except the first graph node, and the weight value is the proportion of the number of medical records and the total number of medical records which simultaneously use two medical project data; a deleting unit, configured to delete a corresponding edge from the first graph node when the probability is lower than the weight value; the output unit is used for outputting a network relation structure after all graph nodes in the project co-occurrence relation graph are compared;
the mining module further includes: the screening unit is used for traversing all graph nodes, screening out zero-degree nodes and deleting the zero-degree nodes from the initial co-occurrence relation network, wherein the zero-degree nodes are graph nodes which have no edges with any graph node; the second calculation unit is used for randomly selecting N graph nodes and calculating the total edge number of the local relation graph formed by the N graph nodes; a second judging unit, configured to judge whether the total edge number is equal to an edge number threshold, where the edge number threshold is equal to N (N-1)/2, and N is greater than or equal to 2; and the determining unit is used for determining that the layout relation diagram is a complete subgraph when the total edge number is equal to the edge number threshold value.
7. A co-occurrence relationship-based item combination mining apparatus, characterized by comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line;
The at least one processor invoking the instructions in the memory to cause the co-occurrence relationship based project portfolio mining apparatus to perform the co-occurrence relationship based project portfolio mining method of any of claims 1-5.
8. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the co-occurrence relationship based project portfolio mining method of any one of claims 1-5.
CN202010893345.XA 2020-08-31 2020-08-31 Project combination mining method based on co-occurrence relation and related equipment Active CN111986815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010893345.XA CN111986815B (en) 2020-08-31 2020-08-31 Project combination mining method based on co-occurrence relation and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010893345.XA CN111986815B (en) 2020-08-31 2020-08-31 Project combination mining method based on co-occurrence relation and related equipment

Publications (2)

Publication Number Publication Date
CN111986815A CN111986815A (en) 2020-11-24
CN111986815B true CN111986815B (en) 2024-06-18

Family

ID=73440385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010893345.XA Active CN111986815B (en) 2020-08-31 2020-08-31 Project combination mining method based on co-occurrence relation and related equipment

Country Status (1)

Country Link
CN (1) CN111986815B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590777A (en) * 2021-06-30 2021-11-02 北京百度网讯科技有限公司 Text information processing method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919671A (en) * 2017-02-20 2017-07-04 广东省中医院 A kind of traditional Chinese medical science text medical record is excavated and aid decision intelligence system
CN109670051A (en) * 2018-12-14 2019-04-23 北京百度网讯科技有限公司 Knowledge mapping method for digging, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218397B (en) * 2013-03-12 2016-03-02 浙江大学 A kind of social networks method for secret protection based on non-directed graph amendment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919671A (en) * 2017-02-20 2017-07-04 广东省中医院 A kind of traditional Chinese medical science text medical record is excavated and aid decision intelligence system
CN109670051A (en) * 2018-12-14 2019-04-23 北京百度网讯科技有限公司 Knowledge mapping method for digging, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111986815A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
US11868856B2 (en) Systems and methods for topological data analysis using nearest neighbors
US11922348B2 (en) Generating final abnormality data for medical scans based on utilizing a set of sub-models
US7809660B2 (en) System and method to optimize control cohorts using clustering algorithms
US20160267397A1 (en) Systems and methods for predicting outcomes using a prediction learning model
Jacob et al. Data mining in clinical data sets: a review
CN111180024B (en) Data processing method and device based on word frequency and inverse document frequency and computer equipment
Gabriel et al. Identifying and characterizing highly similar notes in big clinical note datasets
Duggal et al. Improving patient matching: single patient view for Clinical Decision Support using Big Data analytics
Afeni et al. Hypertension prediction system using naive bayes classifier
US20240161035A1 (en) Multi-model medical scan analysis system and methods for use therewith
Rabie et al. A decision support system for diagnosing diabetes using deep neural network
Ullah et al. Detecting High‐Risk Factors and Early Diagnosis of Diabetes Using Machine Learning Methods
Alam et al. Classification of Covid-19 vaccine data screening with Naive Bayes algorithm using Knowledge Discovery in database method
CN111986815B (en) Project combination mining method based on co-occurrence relation and related equipment
US20220051114A1 (en) Inference process visualization system for medical scans
CN109522331B (en) Individual-centered regionalized multi-dimensional health data processing method and medium
Buragadda et al. Multi Disease Classification System Based on Symptoms using The Blended Approach
US20150339602A1 (en) System and method for modeling health care costs
US20220157442A1 (en) Systems and methods for providing health care search recommendations
US20230195763A1 (en) Systems and methods for providing health care search recommendations
Saravanan et al. Optimized attribute selection using artificial plant (ap) algorithm with esvm classifier (ap-esvm) and improved singular value decomposition (isvd)-based dimensionality reduction for large micro-array biological data
Amruth et al. Big Data Application in Cancer Classification by Analysis of RNA-seq Gene Expression
SANDHYA et al. CLINICAL DECISION SUPPORT SYSTEM ON COPD PREDICTION USING BIG DATA ANALYTICS WITH IMPROVED PATIENT MATCHING
Bhat Comparison of machine learning V/S deep learning model to predict ICD9 code using text mining techniques
Abdellah et al. Revisiting Machine Learning for Predictive Modeling for Stroke from Electronic Health Records

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220606

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Area H, 666 Beijing East Road, Huangpu District, Shanghai 200001

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

GR01 Patent grant
GR01 Patent grant