CN116564458A - Data processing method, system, equipment and medium based on electronic medical record - Google Patents

Data processing method, system, equipment and medium based on electronic medical record Download PDF

Info

Publication number
CN116564458A
CN116564458A CN202310529110.6A CN202310529110A CN116564458A CN 116564458 A CN116564458 A CN 116564458A CN 202310529110 A CN202310529110 A CN 202310529110A CN 116564458 A CN116564458 A CN 116564458A
Authority
CN
China
Prior art keywords
data
medical record
electronic medical
training
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310529110.6A
Other languages
Chinese (zh)
Inventor
栗占国
李春
贠泽霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Peoples Hospital
Original Assignee
Peking University Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Peoples Hospital filed Critical Peking University Peoples Hospital
Priority to CN202310529110.6A priority Critical patent/CN116564458A/en
Publication of CN116564458A publication Critical patent/CN116564458A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Mathematical Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)

Abstract

The invention discloses a data processing method, a system, equipment and a medium based on electronic medical records, and relates to the technical field of artificial intelligence; the method comprises the following steps: acquiring information data of a target electronic medical record in a target hospital; converting the information data of the target electronic medical record into target digital information data by adopting a missing value filling method and a data exchange algorithm; inputting the target digital information data into an extreme gradient lifting model to obtain the processing information of each case in the target electronic medical record; sorting the plurality of medical record characteristics according to the corresponding importance values and the set order to obtain a characteristic medical record sequence of the target electronic medical record; the characteristic medical record sequence is used for representing clinical consensus; the extreme gradient lifting model is established by adopting a machine learning method; the invention can realize quick and accurate processing of the data of the electronic medical record.

Description

Data processing method, system, equipment and medium based on electronic medical record
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data processing method, system, equipment and medium based on electronic medical records.
Background
The electronic medical record contains a large amount of structured and structured data based on natural language, and is important data for clinical medical practice. The electronic medical record information mainly relates to structured and unstructured data, the unstructured data mainly relates to current medical history, past history and the like, the structured data mainly comprises laboratory test results, imaging examination results, demographic characteristics and the like, and characteristics in medical record information are reasonably extracted, summarized and analyzed, so that clinical characteristics of a certain type of patient groups are often helpful to be quickly identified, and the electronic medical record is efficiently utilized. In fact, electronic medical records have different formats and various information, which brings considerable barriers to clinical extraction and analysis, and a lot of time is required for manually extracting information, and in addition, professional medical staff are relatively less, so that a simple, convenient and accurate method is urgently required to be developed to assist in completing the processing of the data of the electronic medical records, for example: and identifying, extracting and sequencing important feature identities in the electronic medical record data.
Disclosure of Invention
The invention aims to provide a data processing method, a system, equipment and a medium based on an electronic medical record, which can realize rapid and accurate processing of the data of the electronic medical record.
In order to achieve the above object, the present invention provides the following solutions:
a data processing method based on an electronic medical record, the method comprising:
acquiring information data of a target electronic medical record in a target hospital; the information data includes: structured data and unstructured data;
converting the information data of the target electronic medical record into target digital information data by adopting a missing value filling method and a data exchange algorithm;
inputting the target digital information data into an extreme gradient lifting model to obtain the processing information of each case in the target electronic medical record; the processing information includes: the medical record characteristics and the importance values corresponding to the medical record characteristics respectively;
sorting the plurality of medical record characteristics according to the corresponding importance values in a set order to obtain a characteristic medical record sequence of the target electronic medical record; the characteristic medical record sequence is used for representing clinical consensus;
wherein the extreme gradient lifting model is established by a machine learning method.
Optionally, the method for determining the extreme gradient lifting model is as follows:
acquiring training data; the training data includes: training information data and label data of the electronic medical record in each test hospital; the tag data includes: training the processing information of each case in the electronic medical record;
converting the information data of the training electronic medical record into training digital information data by adopting a missing value filling method and a data exchange algorithm;
dividing the training digital information data into a training set and a verification set;
constructing a shared neural network;
inputting the training set and the corresponding label data into the shared neural network, and training the characteristic parameters in the shared neural network by taking the minimum fitting error as a target to obtain a trained shared neural network; the characteristic parameters include: feature extraction parameters and feature importance parameters;
and adjusting the characteristic parameters of the trained shared neural network by adopting the verification set and the corresponding label data to obtain the extreme gradient lifting model.
Optionally, the feature parameters of the trained shared neural network are adjusted by using the verification set and the corresponding tag data, so as to obtain the extreme gradient lifting model, which specifically comprises the following steps:
dividing the verification set to obtain a training subset and a verification subset;
inputting the training subset and the corresponding label data into the trained shared neural network, and training the characteristic parameters of the trained shared neural network by taking the minimum fitting error as a target to obtain the trained shared neural network;
and adjusting the characteristic parameters of the trained shared neural network by adopting the verification subset and the corresponding label data to obtain the extreme gradient lifting model.
Optionally, the method for filling the missing value and the data exchange algorithm are adopted to convert the information data of the target electronic medical record into the target digital information data, which specifically comprises the following steps:
filling the missing data in the information data by adopting a missing value filling method to obtain filled information data;
and carrying out classification variable conversion and normalization processing on the filled information data by adopting a data exchange algorithm to obtain target digital information data.
Optionally, the missing value filling method is specifically a mean value filling method.
Optionally, the missing value filling method is specifically a K nearest neighbor classification algorithm.
A data processing system based on an electronic medical record, the system comprising:
the information data acquisition module is used for acquiring information data of a target electronic medical record in a target hospital; the information data includes: structured data and unstructured data;
the data conversion module is used for converting the information data of the target electronic medical record into target digital information data by adopting a missing value filling method and a data exchange algorithm;
the processing information determining module is used for inputting the target digital information data into an extreme gradient lifting model to obtain the processing information of each case in the target electronic medical record; the processing information includes: the medical record characteristics and the importance values corresponding to the medical record characteristics respectively;
the sorting module is used for sorting the plurality of medical record characteristics according to the corresponding importance values and according to a set sequence to obtain a characteristic medical record sequence of the target electronic medical record; the characteristic medical record sequence is used for representing clinical consensus;
wherein the extreme gradient lifting model is established by a machine learning method.
Optionally, the extreme gradient lifting model in the processing information determining module specifically includes:
the data acquisition sub-module is used for acquiring training data; the training data includes: training information data and label data of the electronic medical record in each test hospital; the tag data includes: training the processing information of each case in the electronic medical record;
the data conversion sub-module is used for converting the information data of the training electronic medical record into training digital information data by adopting a missing value filling method and a data exchange algorithm;
the dividing sub-module is used for dividing the training digital information data into a training set and a verification set;
the shared neural network construction submodule is used for constructing a shared neural network;
the training sub-module is used for inputting the training set and the corresponding label data into the shared neural network, and training the characteristic parameters in the shared neural network by taking the minimum fitting error as a target to obtain a trained shared neural network; the characteristic parameters include: feature extraction parameters and feature importance parameters;
and the verification sub-module is used for adjusting the characteristic parameters of the trained shared neural network by adopting the verification set and the corresponding label data to obtain the extreme gradient lifting model.
An electronic device comprising a memory for storing a computer program and a processor for executing the computer program to cause the electronic device to perform the electronic medical record based data processing method described above.
A computer readable storage medium storing a computer program which when executed by a processor implements the electronic medical record based data processing method described above.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a data processing method, a system, equipment and a medium based on electronic medical records, which are used for converting the acquired information data of target electronic medical records in a target hospital into target digital information data by adopting a missing value filling method and a data exchange algorithm; inputting the target digital information data into an extreme gradient lifting model established by a machine learning method to obtain the processing information of each case in the target electronic medical record; sorting the plurality of medical record characteristics according to the corresponding importance values and the set order to obtain a characteristic medical record sequence; because the missing value filling method and the data exchange algorithm are adopted to convert the information data into the target digital information data, the defect of different formats of the current electronic medical record is overcome; because the extreme gradient lifting model is established by adopting a machine learning method, the extreme gradient lifting model is adopted to process the target digital information data, and compared with the prior art that the information is numerous, the method can realize rapid and accurate processing of the data of the electronic medical record by manually extracting the information; the obtained characteristic medical record sequence can intuitively represent clinical consensus, so that the electronic medical record is efficiently utilized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a data processing method based on an electronic medical record according to an embodiment of the present invention;
FIG. 2 is a general frame diagram of a data processing method based on an electronic medical record according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a specific implementation of a data processing method based on an electronic medical record according to an embodiment of the present invention;
FIG. 4 is a block diagram of a data processing system based on electronic medical records according to an embodiment of the present invention.
Symbol description:
the system comprises an information data acquisition module-1, a data conversion module-2, a processing information determination module-3 and a sequencing module-4.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a data processing method, a system, equipment and a medium based on an electronic medical record, which can realize rapid and accurate processing of the data of the electronic medical record.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a data processing method based on an electronic medical record, where the method includes:
step 100: acquiring information data of a target electronic medical record in a target hospital; the information data includes: structured data and unstructured data.
Step 200: and converting the information data of the target electronic medical record into target digital information data by adopting a missing value filling method and a data exchange algorithm.
Step 300: inputting the target digital information data into an extreme gradient lifting model to obtain the processing information of each case in the target electronic medical record; the processing information comprises: the medical record feature and the importance value corresponding to the medical record feature respectively.
Step 400: sorting the plurality of medical record characteristics according to the corresponding importance values and the set order to obtain a characteristic medical record sequence of the target electronic medical record; the sequence of characteristic medical records is used to characterize clinical consensus.
The extreme gradient lifting model is established by adopting a machine learning method.
Specifically, the method for determining the extreme gradient lifting model comprises the following steps:
acquiring training data; the training data includes: training information data and label data of the electronic medical record in each test hospital; tag data comprising: and training the processing information of each case in the electronic medical record.
And converting the information data of the training electronic medical record into training digital information data by adopting a missing value filling method and a data exchange algorithm.
Specifically, the method for filling the missing value and the data exchange algorithm are adopted to convert the information data into the target digital information data, and specifically comprises the following steps:
filling the missing data in the information data by adopting a missing value filling method to obtain filled information data.
And carrying out classification variable conversion and normalization processing on the filled information data by adopting a data exchange algorithm to obtain target digital information data.
The missing value filling method is specifically a mean value filling method. The missing value filling method can also be a K nearest neighbor classification algorithm. Or the missing value filling method is that the mean value filling method and the K nearest neighbor classification algorithm are processed together.
The training digital information data is then divided into a training set and a validation set.
Constructing a shared neural network.
Inputting the training set and the corresponding label data into the shared neural network, and training the characteristic parameters in the shared neural network by taking the minimum fitting error as a target to obtain a trained shared neural network; the characteristic parameters include: feature extraction parameters and feature importance parameters.
And adjusting the characteristic parameters of the trained shared neural network by adopting the verification set and the corresponding label data to obtain an extreme gradient lifting model.
The method comprises the steps of adopting a verification set and corresponding tag data to adjust characteristic parameters of a trained shared neural network to obtain an extreme gradient lifting model, and specifically comprises the following steps:
the verification set is divided to obtain a training subset and a verification subset.
And inputting the training subset and the corresponding label data into the trained shared neural network, and training the characteristic parameters of the trained shared neural network by taking the fitting error as a target to obtain the trained shared neural network.
And adjusting the characteristic parameters of the trained shared neural network by adopting the verification subset and the corresponding label data to obtain an extreme gradient lifting model.
As shown in fig. 2, the key steps of the overall frame diagram provided in this embodiment include obtaining data of a training set, a test set and an external verification set, where the data is mainly extracted from an electronic medical record based on natural language. And then uniformly preprocessing the electronic medical records, namely converting data by adopting a missing value filling method and a data exchange algorithm, and training an extreme gradient lifting model after obtaining a standardized electronic medical record, wherein the process is as follows: inputting the training set and completing parameter optimization in the testing set. And carrying out external verification test and final optimization on the model by using the external standard electronic medical record to obtain a machine learning model, wherein the machine learning model is an extreme gradient lifting model. The method realizes automatic and accurate identification of medical record characteristics and extraction and sequencing after inputting the electronic medical record in the real medical scene.
As shown in fig. 3, a specific implementation manner of the data processing method based on the electronic medical record provided in this embodiment is as follows:
and acquiring information data of electronic medical records of known processing information in each test hospital, wherein the training data is obtained by labeling labels according to diagnosis results of each expert.
Specifically, the electronic medical record information is exported through the hospital and multi-center electronic medical record system, and the hospital data are used as a training data set and a test data set. The multi-center data and the electronic medical record information of different batches at different times in the hospital are used as external verification data sets. The training set and the test set have 8724 parts of electronic medical records together, and are aimed at various rheumatic patients, and mainly comprise rheumatoid arthritis and the like. The data ratio of training set to test set is 7:3. The electronic medical record information mainly relates to structured and unstructured data, wherein the structured data comprise age, gender, laboratory test results, imaging examination results and final diagnosis. Unstructured data includes present medical history, past history, personal history, menstrual wedding history, and family history.
And uniformly preprocessing the electronic medical records to finish filling of missing values and data conversion, so as to obtain standardized electronic medical records which can be used for computer recognition, namely training digital information data. Variables with loss rates exceeding 70% were deleted in medical history, laboratory test results, imaging test results, and final diagnosis. And filling partial missing data (K=3) by adopting a mean filling method and a K nearest neighbor classification algorithm, wherein the filling object mainly refers to the original numerical value of a laboratory examination result so as to obtain better training data. The average value filling method adopts the average value of the value of a certain attribute in the information table in all other objects to fill the missing value of the attribute.
The K nearest neighbor classification algorithm is a classification label for extracting the most similar data (nearest neighbor) of the features, the label type with the largest occurrence number in the first K data (K=3 in the method) is selected as a judgment type, and the K nearest neighbor algorithm is a classification algorithm (label for judging a certain object) and is mainly used for supplementing laboratory examination results and the like after classification according to a clinical normal value interval. Because the machine learning objective function takes the value between 0 and 1, the characteristic value screening and the model accuracy evaluation are carried out subsequently, and all data required are subjected to data conversion to obtain the numerical value in the interval between 0 and 1. Such as: gender was converted into a classification variable (male=1, female=0); normalization processing is carried out on age (continuous variable), and the normalization formula is as follows:
wherein x is age; x is x min Is the minimum of age in all cases; x is x max For the age in all casesMaximum value. Laboratory test results were categorized according to the clinical normal interval (normal=0, abnormal=1), with medical history and imaging results preserving natural language format. After filling the missing value and converting the data, the electronic medical record is converted into a digital format which can be processed by a computer, and the digital format can be input into an objective function of a machine learning algorithm.
The training set is input into a constructed shared neural network, the shared neural network utilizes an Embedded algorithm (an Embedded algorithm) to primarily identify and extract medical record characteristic values, and importance ranking is obtained according to the contribution of the characteristic values to accuracy, precision and recall.
The extreme gradient lifting model comprises an extreme gradient lifting (XGBoost) algorithm, is a comprehensive optimization version of a decision tree algorithm and a gradient lifting algorithm, can well avoid the problem of excessive fitting, and can rapidly and accurately complete a machine learning task and output a result. The XGBoost algorithm principle may be expressed as:
wherein X is in a broad sense i For the ith bar in the total data bar,as a predicted value, f k Representing the kth tree model, F is the sum of all decision tree numbers. Phi is a function corresponding to the extreme gradient lifting algorithm.
In the present invention, k is the total number of digital information data input; x is X i The ith piece of data in the digital information data;processing information for the obtained; f (f) k Is a decision tree; f is the sum of all decision tree numbers.
Carrying out Taylor expansion on the objective function, taking the first three terms, removing the high-order infinitely small terms, and finally converting the objective function into the following components:
wherein,,is an objective function after conversion; i is a number sequence number of digital information data; n is the total number of the digital information data; g i f t (x i ) Is the first term in the taylor expansion; />A second term in the taylor expansion; omega (f) t ) Is the third term in the taylor expansion.
The model accuracy can be improved by continuously training to achieve the minimum objective function value and taking the minimum fitting error as the objective.
The cross-validation adjustment parameters are K-fold cross-validation, all data of the training set are divided into K (K=10) disjoint subsets, one of the K-fold cross-validation is taken out of the disjoint subsets each time to serve as a validation set, the other K-1 cross-validation parameters are used as training sets for training out the model, and then accuracy of classification results of the evaluation model is verified. And selecting a model with optimal output result accuracy, wherein the model passes the cross verification.
The finally extracted medical record characteristics and the corresponding importance value sequences are expressed by F1-score so as to evaluate the contribution of each medical record characteristic in the aspects of correctly identifying, summarizing and processing the medical record by a machine learning model (the main reference indexes are sensitivity and specificity) and measure the importance of the characteristic values. And then model optimization is performed through the verification set. The final model parameters based on XGBoost algorithm are as follows:
params=
{'booster':'gbtree',
'objective':'multi:softmax',
'num_class':2,
gamma':0.1,
'max_depth':12,
'lambda':2,
'subsample':0.8,
'colsample_bytree':0.7,
'min_child_weight':3,
'silent':1,
'eta':0.1,
'seed':1000,
'nthread':4,
}
the model was developed based on python3.9, scikit-learn1.1.3.
Then, the external verification test and optimization can be performed on the model by using the external standard electronic medical records, and finally the external verification comprises 240 electronic medical records from the external multi-center and the non-simultaneous non-same batch queues of the home. Final features include (ordered by importance): anti-cyclic guanidine amino acid peptide antibodies, joint deformity (button flower deformity, swan neck deformity, etc.), joint surface erosion or destruction (X-ray), joint clearance disappearance, joint clearance partial disappearance, narrowing, blurring (X-ray), synovitis (B-ray), antinuclear Zhou Yinzi, joint swelling, synovial hyperplasia (B-ray), rheumatoid factors, C-reactive protein, joint blood flow signals (B-ray), subcutaneous nodules, morning stiffness, blood sedimentation, cortical bone discontinuities (bone surface irregularities) (X-ray), recessive rheumatoid factors (HRF), periarticular soft tissue swelling (X-ray), joint cystic changes (X-ray), joint pain, family history of rheumatoid arthritis, double lung interstitial lesions (X-ray or CT), joint effusions (B-ray), bone density reduction (osteoporosis) (X-ray), age, pleural effusion (X-ray), and sex.
Example 2
As shown in FIG. 4, an embodiment of the present invention provides a data processing system based on an electronic medical record, the system including: an information data acquisition module 1, a data conversion module 2, a processing information determination module 3 and a sequencing module 4.
The information data acquisition module 1 is used for acquiring information data of a target electronic medical record in a target hospital; the information data includes: structured data and unstructured data.
And the data conversion module 2 is used for converting the information data of the target electronic medical record into the target digital information data by adopting a missing value filling method and a data exchange algorithm.
The processing information determining module 3 is used for inputting the target digital information data into the extreme gradient lifting model to obtain the processing information of each case in the target electronic medical record; the processing information comprises: the medical record feature and the importance value corresponding to the medical record feature respectively.
The ordering module 4 is used for ordering the plurality of medical record characteristics according to the corresponding importance values and the set order to obtain a characteristic medical record sequence of the target electronic medical record; the sequence of characteristic medical records is used to characterize clinical consensus.
The extreme gradient lifting model is established by adopting a machine learning method.
Specifically, the extreme gradient lifting model in the processing information determining module 3 specifically includes: the system comprises a data acquisition sub-module, a data conversion sub-module, a dividing sub-module, a shared neural network construction sub-module, a training sub-module and a verification sub-module.
The data acquisition sub-module is used for acquiring training data; the training data includes: training information data and label data of the electronic medical record in each test hospital; tag data comprising: and training the processing information of each case in the electronic medical record.
And the data conversion sub-module is used for converting the information data of the training electronic medical record into training digital information data by adopting a missing value filling method and a data exchange algorithm.
And the dividing sub-module is used for dividing the training digital information data into a training set and a verification set.
The shared neural network construction sub-module is used for constructing the shared neural network.
The training sub-module is used for inputting the training set and the corresponding label data into the shared neural network, and training the characteristic parameters in the shared neural network by taking the minimum fitting error as a target to obtain a trained shared neural network; the characteristic parameters include: feature extraction parameters and feature importance parameters.
And the verification sub-module is used for adjusting the characteristic parameters of the trained shared neural network by adopting the verification set and the corresponding label data to obtain an extreme gradient lifting model.
Example 3
The embodiment of the invention provides an electronic device, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor runs the computer program to enable the electronic device to execute the data processing method based on the electronic medical record in the embodiment 1.
As an alternative embodiment, the electronic device may be a server.
In one embodiment, the present invention further provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the electronic medical record-based data processing method of embodiment 1.
Since machine learning is a branch of artificial intelligence, applying artificial intelligence algorithms can efficiently extract feature values and relationships between feature values in complex data sets. The electronic medical record contains a large amount of structured and structured data based on natural language, and is particularly suitable for training a machine learning model. Therefore, a machine learning model based on an extreme gradient lifting algorithm is constructed, and parameters are optimized according to test results of a test set after an optimal model is determined through cross verification selection; finally, through external verification of the multi-center outside, the medical record feature extraction and ordering system based on natural language identification is obtained, and the system has the functions of identifying, extracting and ordering important feature values in electronic medical record data.
The invention can automatically, simply, conveniently and accurately finish the identification, extraction and sequencing of important feature identities in the electronic medical record data according to the electronic medical record. In fact, many studies have demonstrated that machine learning has assisted clinical practice or medical research in many aspects, such as mass screening of molecular markers, predicting drug responses, establishing disease typing based on genetics, etc. to accurately guide patient classification, assisting imaging to interpret images, analyzing and interpreting semantics based on natural language, constructing a knowledge graph of traditional Chinese medicine to explore the links between knowledge and facilitate doctor's search, and real-time analysis of patient condition in conjunction with wearable devices. In the big data age, machine learning has optimistic prospects in clinical disciplines. The machine learning can utilize a more complex algorithm to find out the complex relation which cannot be found by a clinical statistician or a medical practitioner through a traditional model, reduce errors caused by overlarge error data noise, and then gradually go forward layer by layer to obtain a high-dimensional data result.
The accuracy of the extreme gradient lifting model provided by the invention is 95%, and the accuracy of external verification is 93%. The model can carry out natural language identification on medical records, deeply read the characteristics of the medical records on the basis of analyzing semantics, rapidly and accurately extract medical record information, effectively avoid bias caused by confounding factors, and assist doctors to identify and generalize the characteristics of patients so as to develop subsequent medical practice. Hopefully, the problems of untimely and inaccurate medical record information extraction and identification caused by less professional medical staff, busy clinical work and non-uniform electronic medical record formats are solved. In addition, because autoimmune and inflammatory diseases represented by rheumatoid arthritis have high disability, medical history characteristics are quickly identified in early stage, the classification of patients is beneficial to suppressing the occurrence of disability, improving the survival quality of the patients, reducing the medical economic burden of the patients and further saving related medical economic expenditure.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (10)

1. A data processing method based on an electronic medical record, the method comprising:
acquiring information data of a target electronic medical record in a target hospital; the information data includes: structured data and unstructured data;
converting the information data of the target electronic medical record into target digital information data by adopting a missing value filling method and a data exchange algorithm;
inputting the target digital information data into an extreme gradient lifting model to obtain the processing information of each case in the target electronic medical record; the processing information includes: the medical record characteristics and the importance values corresponding to the medical record characteristics respectively;
sorting the plurality of medical record characteristics according to the corresponding importance values in a set order to obtain a characteristic medical record sequence of the target electronic medical record; the characteristic medical record sequence is used for representing clinical consensus;
wherein the extreme gradient lifting model is established by a machine learning method.
2. The electronic medical record-based data processing method according to claim 1, wherein the method for determining the extreme gradient lifting model is as follows:
acquiring training data; the training data includes: training information data and label data of the electronic medical record in each test hospital; the tag data includes: training the processing information of each case in the electronic medical record;
converting the information data of the training electronic medical record into training digital information data by adopting a missing value filling method and a data exchange algorithm;
dividing the training digital information data into a training set and a verification set;
constructing a shared neural network;
inputting the training set and the corresponding label data into the shared neural network, and training the characteristic parameters in the shared neural network by taking the minimum fitting error as a target to obtain a trained shared neural network; the characteristic parameters include: feature extraction parameters and feature importance parameters;
and adjusting the characteristic parameters of the trained shared neural network by adopting the verification set and the corresponding label data to obtain the extreme gradient lifting model.
3. The electronic medical record-based data processing method according to claim 2, wherein the feature parameters of the trained shared neural network are adjusted by using the verification set and the corresponding tag data to obtain the extreme gradient lifting model, and specifically comprising:
dividing the verification set to obtain a training subset and a verification subset;
inputting the training subset and the corresponding label data into the trained shared neural network, and training the characteristic parameters of the trained shared neural network by taking the minimum fitting error as a target to obtain the trained shared neural network;
and adjusting the characteristic parameters of the trained shared neural network by adopting the verification subset and the corresponding label data to obtain the extreme gradient lifting model.
4. The data processing method based on the electronic medical record according to claim 1, wherein the step of converting the information data of the target electronic medical record into the target digital information data by using a missing value padding method and a data exchange algorithm comprises the following steps:
filling the missing data in the information data by adopting a missing value filling method to obtain filled information data;
and carrying out classification variable conversion and normalization processing on the filled information data by adopting a data exchange algorithm to obtain target digital information data.
5. The method for processing data based on electronic medical records according to claim 4, wherein the missing value filling method is specifically a mean filling method.
6. The method for processing data based on electronic medical records according to claim 4, wherein the missing value filling method is specifically a K nearest neighbor classification algorithm.
7. A data processing system based on an electronic medical record, the system comprising:
the information data acquisition module is used for acquiring information data of a target electronic medical record in a target hospital; the information data includes: structured data and unstructured data;
the data conversion module is used for converting the information data of the target electronic medical record into target digital information data by adopting a missing value filling method and a data exchange algorithm;
the processing information determining module is used for inputting the target digital information data into an extreme gradient lifting model to obtain the processing information of each case in the target electronic medical record; the processing information includes: the medical record characteristics and the importance values corresponding to the medical record characteristics respectively;
the sorting module is used for sorting the plurality of medical record characteristics according to the corresponding importance values and according to a set sequence to obtain a characteristic medical record sequence of the target electronic medical record; the characteristic medical record sequence is used for representing clinical consensus;
wherein the extreme gradient lifting model is established by a machine learning method.
8. The electronic medical record based data processing system of claim 7, wherein the extreme gradient lifting model in the processing information determination module specifically comprises:
the data acquisition sub-module is used for acquiring training data; the training data includes: training information data and label data of the electronic medical record in each test hospital; the tag data includes: training the processing information of each case in the electronic medical record;
the data conversion sub-module is used for converting the information data of the training electronic medical record into training digital information data by adopting a missing value filling method and a data exchange algorithm;
the dividing sub-module is used for dividing the training digital information data into a training set and a verification set;
the shared neural network construction submodule is used for constructing a shared neural network;
the training sub-module is used for inputting the training set and the corresponding label data into the shared neural network, and training the characteristic parameters in the shared neural network by taking the minimum fitting error as a target to obtain a trained shared neural network; the characteristic parameters include: feature extraction parameters and feature importance parameters;
and the verification sub-module is used for adjusting the characteristic parameters of the trained shared neural network by adopting the verification set and the corresponding label data to obtain the extreme gradient lifting model.
9. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the electronic medical record-based data processing method of any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the electronic medical record-based data processing method according to any one of claims 1 to 6.
CN202310529110.6A 2023-05-11 2023-05-11 Data processing method, system, equipment and medium based on electronic medical record Pending CN116564458A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310529110.6A CN116564458A (en) 2023-05-11 2023-05-11 Data processing method, system, equipment and medium based on electronic medical record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310529110.6A CN116564458A (en) 2023-05-11 2023-05-11 Data processing method, system, equipment and medium based on electronic medical record

Publications (1)

Publication Number Publication Date
CN116564458A true CN116564458A (en) 2023-08-08

Family

ID=87491180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310529110.6A Pending CN116564458A (en) 2023-05-11 2023-05-11 Data processing method, system, equipment and medium based on electronic medical record

Country Status (1)

Country Link
CN (1) CN116564458A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421548A (en) * 2023-12-18 2024-01-19 四川互慧软件有限公司 Method and system for treating loss of physiological index data based on convolutional neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421548A (en) * 2023-12-18 2024-01-19 四川互慧软件有限公司 Method and system for treating loss of physiological index data based on convolutional neural network
CN117421548B (en) * 2023-12-18 2024-03-12 四川互慧软件有限公司 Method and system for treating loss of physiological index data based on convolutional neural network

Similar Documents

Publication Publication Date Title
US20220254493A1 (en) Chronic disease prediction system based on multi-task learning model
CN108491770B (en) Data processing method based on fracture image
US20200105414A1 (en) Information processing apparatus, information processing system, information processing method, and non-transitory computer-readable storage medium
CN115131642B (en) Multi-modal medical data fusion system based on multi-view subspace clustering
CN114023441A (en) Severe AKI early risk assessment model and device based on interpretable machine learning model and development method thereof
CN110739076A (en) medical artificial intelligence public training platform
CN112598661A (en) Ankle fracture and ligament injury diagnosis method based on machine learning
Ozdemir et al. Age Estimation from Left-Hand Radiographs with Deep Learning Methods.
CN116564458A (en) Data processing method, system, equipment and medium based on electronic medical record
CN111681247A (en) Lung lobe and lung segment segmentation model training method and device
CN111986814A (en) Modeling method of lupus nephritis prediction model of lupus erythematosus patient
CN117476247B (en) Intelligent analysis method for disease multi-mode data
CN117423423B (en) Health record integration method, equipment and medium based on convolutional neural network
CN114519705A (en) Ultrasonic standard data processing method and system for medical selection and identification
CN114400086A (en) Articular disc forward movement auxiliary diagnosis system and method based on deep learning
CN113255734A (en) Depression classification method based on self-supervision learning and transfer learning
US11842492B2 (en) Cerebral hematoma volume analysis
CN116469148A (en) Probability prediction system and prediction method based on facial structure recognition
CN115719333A (en) Image quality control evaluation method, device, equipment and medium based on neural network
CN110289065A (en) A kind of auxiliary generates the control method and device of medical electronic report
Rihana et al. Artificial intelligence framework for COVID19 patients monitoring
CN117542528B (en) Ankylosing spondylitis hip joint affected risk marking system based on image histology
Vashisht et al. Using Support Vector Machine to Detect and Classify the Alzheimer Disease
CN116128837A (en) Artificial intelligence-based distal radius fracture AO typing method
He et al. Age and sex estimation in cephalometric radiographs based on multitask convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination