CN113360643A - Electronic medical record data quality evaluation method based on short text classification - Google Patents

Electronic medical record data quality evaluation method based on short text classification Download PDF

Info

Publication number
CN113360643A
CN113360643A CN202110587641.1A CN202110587641A CN113360643A CN 113360643 A CN113360643 A CN 113360643A CN 202110587641 A CN202110587641 A CN 202110587641A CN 113360643 A CN113360643 A CN 113360643A
Authority
CN
China
Prior art keywords
data
electronic medical
medical record
quality
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110587641.1A
Other languages
Chinese (zh)
Inventor
叶方全
陈逸龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Tianpeng Computer Technology Co ltd
Chongqing Nanpeng Artificial Intelligence Technology Research Institute Co ltd
Original Assignee
Guangzhou Tianpeng Computer Technology Co ltd
Chongqing Nanpeng Artificial Intelligence Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Tianpeng Computer Technology Co ltd, Chongqing Nanpeng Artificial Intelligence Technology Research Institute Co ltd filed Critical Guangzhou Tianpeng Computer Technology Co ltd
Priority to CN202110587641.1A priority Critical patent/CN113360643A/en
Publication of CN113360643A publication Critical patent/CN113360643A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses an electronic medical record data quality evaluation method based on short text classification, which comprises the following steps: s1: processing data; s2: according to the identification; s3: and (5) evaluating the quality. The invention provides a short text classification-based method, which comprises the steps of splitting an original text of an electronic medical record into shorter sentences, constructing a BilSTM-Attention model to classify the short sentences, and finally performing corresponding evaluation according to whether a classification result is consistent with diagnosis or not. The method provided by the invention does not need to manually process the original text of the electronic medical record, thereby not only saving the labor and time cost, but also reducing the requirements on professional medical personnel. Meanwhile, the deep learning model can make full use of massive electronic medical record data to effectively classify the split sentences, so that reasonable evaluation is made.

Description

Electronic medical record data quality evaluation method based on short text classification
Technical Field
The invention belongs to the technical field of electronic medical record data quality evaluation, and particularly relates to an electronic medical record data quality evaluation method based on short text classification.
Background
With the advent of the big data era, computer network technology is widely applied to the medical field, and various medical institutions collect massive electronic medical record data through an information management system to replace traditional handwritten paper medical records. The electronic medical record records the whole process of diagnosis and treatment of a patient by a doctor, contains information such as symptoms, signs, diagnosis, prescription and the like, and has great potential value in the fields of auxiliary diagnosis, risk prediction, medicine recommendation and the like. However, due to the limited data management level of the medical institution and the insufficient diagnosis and treatment capability of the doctor, a large amount of non-standard description texts exist in the electronic medical record data, so that the recorded information is inaccurate and incomplete, and the efficiency and the quality of medical research and product development are directly influenced. Therefore, data quality evaluation needs to be performed on the electronic medical records, and the electronic medical records with high quality are screened based on the data quality evaluation, so that interference of noise information and redundant information is reduced, which is of great significance for completing tasks such as medical data analysis, prediction model research, auxiliary system development and the like.
The existing electronic medical record data quality evaluation methods mainly comprise two methods, one is a manual evaluation method, and the other is a method combining information extraction and identification. In the manual evaluation method, professional medical personnel directly check each electronic medical record, and the clinical diagnosis and treatment experience of the professional medical personnel confirms whether the electronic medical record has the problems of inaccurate description, incomplete diagnosis, insufficient basis and the like, so that reliable evaluation is performed. This method has an advantage in that the evaluation results are stable and effective, and a disadvantage in that the labor and time costs are very high. The method of information extraction and basis identification firstly utilizes a question-answering system to extract key information of a patient from an electronic medical record, then establishes a basis identification model through machine learning algorithms such as logistic regression, decision trees, random forests and the like, and evaluates the electronic medical record according to whether the key information can improve sufficient diagnosis basis. The method has the advantages that a large number of electronic medical records can be efficiently processed, labor and time cost is saved, and the method has the defect that the representation according to the recognition model is greatly dependent on the quality of an information extraction result. The information extraction is firstly carried out by medical experts to design rules and formulate standards, then structured data are matched from an original text, and finally the structured data are subjected to standardization processing, so that the obtained result has high uncertainty. Due to the obvious shortcomings of both methods, the data quality evaluation of the electronic medical record is still a challenge at present.
In summary, the electronic medical record has a data quality problem, and an accurate and efficient data quality evaluation method is needed to solve the problem.
Disclosure of Invention
The invention aims to provide an electronic medical record data quality evaluation method based on short text classification, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a method for evaluating the quality of electronic medical record data based on short text classification is structurally characterized in that: the method comprises the following steps:
s1: data processing:
s1.1: classifying the electronic medical record data into the current medical history, physical examination, imaging examination and laboratory examination, and constructing corresponding data sets according to the diagnosis basis of different diseases;
s1.2: respectively splitting different types of data into short text sequences, taking commas and periods as separators, splitting original data to form short sentence samples, and constructing short text data sets according to the short sentence samples and diagnosis results thereof;
s1.3: removing a short sentence sample containing diagnosis description, and avoiding directly prompting a corresponding diagnosis result;
s2: according to the identification:
s2.1: dividing a data set into a training set and a verification set according to a ratio of 4:1, wherein the training set is used for training a model, and the model is optimized through a cross entropy loss function based on errors of a predicted label and a real label, and the cross entropy loss is calculated as follows:
Figure BDA0003088278260000031
wherein the content of the first and second substances,l is the number of tags, y is the true tag,
Figure BDA0003088278260000032
is a predictive tag;
the verification set is used for verifying the expression of the model, the effectiveness of the model is proved by calculating the precision, the recall rate and the F1 score, and the precision is calculated as follows:
Figure BDA0003088278260000033
the recall is calculated as follows:
Figure BDA0003088278260000034
the F1 score was calculated as follows:
Figure BDA0003088278260000035
where TP is the number of positive samples with positive prediction label, FN is the number of positive samples with negative prediction label, FP is the number of negative samples with positive prediction label
S2.2: respectively taking different types of data as the input of the models, and training different basis recognition models;
s2.3: sequentially carrying out the following processing, namely firstly, Embedding an input word of an original text x into an Embedding layer, and calculating to obtain a word vector representation e, wherein the calculation is as follows:
ei=Embed(xi)
and then inputting e into a bidirectional long-short term memory network (BilSTM), and calculating to obtain a hidden state h as follows:
it=σ(Wiht-1+Uixt+bi)
ft=σ(Wfht-1+Ufxt+bf)
ot=σ(Woht-1+Uoxt+bo)
at=tanh(Waht-1+Uaxt+ba)
Figure RE-GDA0003187811410000043
Figure RE-GDA0003187811410000044
wherein t is a time step, i is an input gate, f is a forgetting gate, o is an output gate, c is a cell state, h is a hidden state, W, U, B are model parameters, sigma and tanh are activation functions, and finally the hidden state is input into an Attention layer Attention to be calculated to obtain a predicted label
Figure BDA0003088278260000041
The calculation is as follows:
si=vtanh(hi)
wi=softmax(si)
Figure BDA0003088278260000042
wherein w is the weight and v is the model parameter;
s2.4: the model output is the probability of identifying short text as a diagnostic basis for different diseases;
s3: and quality evaluation, wherein the quality evaluation comprises pure data, high-quality data, low-quality data and noise data.
Preferably, the clean data in step S3 indicates that the predicted labels of all phrases in the electronic medical record are consistent with the true labels, which indicates that the sample has sufficient diagnostic basis.
Preferably, the high quality data in step S3 means that the most predictive label in the electronic medical record is a true label, which indicates that the sample has a large amount of diagnostic basis and a small amount of noise information.
Preferably, the low quality data in step S3 means that the most predictive label in the electronic medical record is not a true label, which indicates that the sample has a small amount of diagnostic basis and a large amount of noise information.
Preferably, the noise data in step S3 indicates that the prediction labels of all phrases in the electronic medical record do not match the true labels, and that the sample contains noise information at all.
Compared with the prior art, the method provided by the invention has the following advantages:
1) the labor and time costs are low. The manual evaluation method needs to check the electronic medical records, information extraction is combined with the identification method, information extraction rules need to be formulated, and the two methods not only consume a large amount of manpower and time, but also provide higher requirements for medical personnel participating in tasks. The short text classification only needs to split the original data into short sentences, and the short text classification model is used for identifying the short sentences, the whole process is completely realized by a computer, and the labor and time cost is saved on the whole.
2) The noise at the phrase level is small. For the model directly identified according to the electronic medical record, the noise at the short sentence level can influence the overall judgment, and strong interference is caused. The short text classification model used by the invention identifies each short sentence sample independently, and even if noise information exists in part of short sentences, the data quality evaluation of the model on the whole electronic medical record sample is difficult to influence. Therefore, the method of the invention has stronger anti-interference capability.
3) The evaluation result is stable and reliable. The prediction labels of the single electronic medical record sample have great contingency and are not necessarily convincing. Compared with the prior art, the data quality evaluation result based on the plurality of short sentence sample prediction labels is stable and reliable, and meanwhile, the noise of the electronic medical record samples is prompted, so that the method is more suitable for practical application scenes.
Drawings
FIG. 1 is a schematic view of the overall process of the present invention;
FIG. 2 is a schematic diagram of the data processing of the present invention;
FIG. 3 is a diagram illustrating a structure of a recognition model according to the present invention;
FIG. 4 is a schematic diagram of the bidirectional long short term memory network BilSTM according to the present invention;
FIG. 5 is a schematic diagram of the Attention layer Attention of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1-5, the present invention provides a technical solution, a method for evaluating the quality of electronic medical record data based on short text classification, comprising the following steps:
s1: data processing:
s1.1: classifying the electronic medical record data into the current medical history, physical examination, imaging examination and laboratory examination, and constructing corresponding data sets according to the diagnosis basis of different diseases;
s1.2: respectively splitting different types of data into short text sequences, taking commas and periods as separators, splitting original data to form short sentence samples, and constructing short text data sets according to the short sentence samples and diagnosis results thereof;
s1.3: removing a short sentence sample containing diagnosis description, and avoiding directly prompting a corresponding diagnosis result; for example, "consider two lungs with multiple bronchiectasis and infection" does not give a diagnostic basis, but directly indicates the result of diagnosis, and this diagnostic description cannot be used as a reference for data quality evaluation,
s2: according to the identification:
s2.1: dividing a data set into a training set and a verification set according to a ratio of 4:1, wherein the training set is used for training a model, and the model is optimized through a cross entropy loss function based on errors of a predicted label and a real label, and the cross entropy loss is calculated as follows:
Figure BDA0003088278260000071
where l is the number of tags, y is the true tag,
Figure BDA0003088278260000072
is a predictive tag;
the verification set is used for verifying the expression of the model, the effectiveness of the model is proved by calculating the precision, the recall rate and the F1 score, and the precision is calculated as follows:
Figure BDA0003088278260000073
the recall is calculated as follows:
Figure BDA0003088278260000074
the F1 score was calculated as follows:
Figure BDA0003088278260000075
where TP is the number of positive samples with positive prediction label, FN is the number of positive samples with negative prediction label, FP is the number of negative samples with positive prediction label
S2.2: respectively taking different types of data as the input of the models, and training different basis recognition models;
s2.3: sequentially carrying out the following processing, namely firstly, Embedding an input word of an original text x into an Embedding layer, and calculating to obtain a word vector representation e, wherein the calculation is as follows:
ei=Embed(xi)
then e is input into a bidirectional long-short term memory network BilSTM, as shown in FIG. 3, a hidden state h is obtained by calculation as follows:
it=σ(Wiht-1+Uixt+bi)
ft=σ(Wfht-1+Ufxt+bf)
ot=σ(Woht-1+Uoxt+bo)
at=tanh(Waht-1+Uaxt+ba)
Figure RE-GDA0003187811410000083
Figure RE-GDA0003187811410000084
wherein t is a time step, i is an input gate, f is a forgetting gate, o is an output gate, c is a cell state, h is a hidden state, W, U, B are model parameters, σ and tanh are activation functions, and finally the hidden state is input to the Attention layer Attention, as shown in fig. 4, a prediction label is obtained by calculation
Figure BDA0003088278260000083
The calculation is as follows:
si=v tanh(hi)
wi=soft max(si)
Figure BDA0003088278260000084
wherein w is the weight and v is the model parameter;
s2.4: the model output is the probability of identifying short text as diagnostic basis for different diseases, such as "interstitial lung disease-0.8538, bronchiectasis-0.0755, … …";
s3: and quality evaluation, wherein the quality evaluation comprises pure data, high-quality data, low-quality data and noise data.
In this embodiment, the clean data in step S3 means that the predicted labels of all phrases in the electronic calendar are consistent with the true labels, which indicates that the sample has sufficient diagnosis basis.
In this embodiment, the high quality data in step S3 means that the most predictive label in the electronic medical record is a true label, which indicates that the sample has a large amount of diagnostic bases and a small amount of noise information.
In this embodiment, the low quality data in step S3 means that the most predictive label in the electronic medical record is not a true label, which indicates that the sample has a small amount of diagnosis-dependent data and a large amount of noise information.
In this embodiment, the noise data in step S3 means that the predicted labels of all phrases in the electronic calendar are inconsistent with the true labels, which indicates that the sample contains all noise information.
The high-quality electronic medical record data contains accurate and complete information, and the disease of the patient can be effectively inferred. The low-quality electronic medical record data has a large amount of error information and redundant information, and the clinical performance is often inconsistent with the diagnosis result. In order to distinguish the two, a basis identification model needs to be constructed, and the prediction label of the model is compared with the real label of the electronic medical record. In order to accurately and efficiently evaluate the quality of electronic medical record data, the invention provides a method based on short text classification. The method provided by the invention does not need to manually process the original text of the electronic medical record, thereby not only saving the labor and time cost, but also reducing the requirements on professional medical personnel. Meanwhile, the deep learning model can make full use of massive electronic medical record data to effectively classify the split sentences, so that reasonable evaluation is made.
The method provided by the invention has the following advantages:
1) the labor and time costs are low. The manual evaluation method needs to check the electronic medical records, information extraction is combined with the identification method, information extraction rules need to be formulated, and the two methods not only consume a large amount of manpower and time, but also provide higher requirements for medical personnel participating in tasks. The short text classification only needs to split the original data into short sentences, and the short text classification model is used for identifying the short sentences, the whole process is completely realized by a computer, and the labor and time cost is saved on the whole.
2) The noise at the phrase level is small. For the model directly identified according to the electronic medical record, the noise at the short sentence level can influence the overall judgment, and strong interference is caused. The short text classification model used by the invention can independently identify each short sentence sample, and even if noise information exists in part of short sentences, the data quality evaluation of the model on the whole electronic medical record sample is difficult to influence. Therefore, the method of the invention has stronger anti-interference capability.
3) The evaluation result is stable and reliable. The prediction labels of the single electronic medical record sample have great contingency and are not necessarily convincing. Compared with the prior art, the data quality evaluation result based on the plurality of short sentence sample prediction labels is stable and reliable, and meanwhile, the noise of the electronic medical record samples is prompted, so that the method is more suitable for practical application scenes.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference in the claims is not intended to be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the embodiments may be appropriately combined to form other embodiments understood by those skilled in the art.

Claims (5)

1. A short text classification-based electronic medical record data quality evaluation method is characterized by comprising the following steps: the method comprises the following steps:
s1: data processing:
s1.1: classifying the electronic medical record data into the current medical history, physical examination, imaging examination and laboratory examination, and constructing corresponding data sets according to the diagnosis basis of different diseases;
s1.2: respectively splitting different types of data into short text sequences, taking commas and periods as separators, splitting original data to form short sentence samples, and constructing short text data sets according to the short sentence samples and diagnosis results thereof;
s1.3: removing a short sentence sample containing diagnosis description, and avoiding directly prompting a corresponding diagnosis result;
s2: according to the identification:
s2.1: dividing a data set into a training set and a verification set according to the ratio of 4:1, wherein the training set is used for training a model, and optimizing the model through a cross entropy loss function based on the error of a predicted label and a real label, wherein the cross entropy loss is calculated as follows:
Figure RE-FDA0003187811400000011
where l is the number of tags, y is the true tag,
Figure RE-FDA0003187811400000012
is a predictive tag;
the verification set is used for verifying the expression of the model, the effectiveness of the model is proved by calculating the precision, the recall rate and the F1 score, and the precision is calculated as follows:
Figure RE-FDA0003187811400000013
the recall is calculated as follows:
Figure RE-FDA0003187811400000014
the F1 score was calculated as follows:
Figure RE-FDA0003187811400000021
where TP is the number of positive samples with positive prediction label, FN is the number of positive samples with negative prediction label, FP is the number of negative samples with positive prediction label
S2.2: respectively taking different types of data as the input of the models, and training different basis recognition models;
s2.3: sequentially carrying out the following processing, namely firstly, Embedding an input word of an original text x into an Embedding layer, and calculating to obtain a word vector representation e, wherein the calculation is as follows:
ei=Embed(xi)
inputting e into a bidirectional long-short term memory network (BilSTM), and calculating to obtain a hidden state h as follows:
it=σ(Wiht-1+Uixt+bi)
ft=σ(Wfht-1+Ufxt+bf)
ot=σ(Woht-1+Uoxt+bo)
at=tanh(Waht-1+Uaxt+ba)
Figure RE-FDA0003187811400000022
Figure RE-FDA0003187811400000023
wherein t is a time step, i is an input gate, f is a forgetting gate, o is an output gate, c is a cell state, h is a hidden state, W, U, B are model parameters, sigma and tanh are activation functions, and finally the hidden state is input into an Attention layer Attention, a meter and the like to obtain a prediction label
Figure RE-FDA0003187811400000031
The calculation is as follows:
si=vtanh(hi)
wi=softmax(si)
Figure RE-FDA0003187811400000032
wherein w is the weight and v is the model parameter;
s2.4: the model output is the probability of identifying short text as a diagnostic basis for different diseases;
s3: and quality evaluation, wherein the quality evaluation comprises pure data, high-quality data, low-quality data and noise data.
2. The method for evaluating the quality of the electronic medical record data based on the short text classification as claimed in claim 1, wherein: the clean data in step S3 means that the predicted labels of all phrases in the electronic medical record are consistent with the true labels, which indicates that the sample has sufficient diagnostic basis.
3. The method for evaluating the quality of the electronic medical record data based on the short text classification as claimed in claim 1, wherein: the high quality data in step S3 means that the most predictive label in the electronic medical record is a true label, which indicates that the sample has a large amount of diagnostic bases and a small amount of noise information.
4. The method for evaluating the quality of the electronic medical record data based on the short text classification as claimed in claim 1, wherein: the low quality data in step S3 means that the most predictive label in the electronic medical record is not a true label, which indicates that the sample has a small amount of diagnostic basis and a large amount of noise information.
5. The method for evaluating the quality of the electronic medical record data based on the short text classification as claimed in claim 1, wherein: the noise data in step S3 means that the prediction labels of all phrases in the electronic medical record are not consistent with the true labels, which indicates that the sample contains noise information completely.
CN202110587641.1A 2021-05-27 2021-05-27 Electronic medical record data quality evaluation method based on short text classification Pending CN113360643A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110587641.1A CN113360643A (en) 2021-05-27 2021-05-27 Electronic medical record data quality evaluation method based on short text classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110587641.1A CN113360643A (en) 2021-05-27 2021-05-27 Electronic medical record data quality evaluation method based on short text classification

Publications (1)

Publication Number Publication Date
CN113360643A true CN113360643A (en) 2021-09-07

Family

ID=77528021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110587641.1A Pending CN113360643A (en) 2021-05-27 2021-05-27 Electronic medical record data quality evaluation method based on short text classification

Country Status (1)

Country Link
CN (1) CN113360643A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493810A (en) * 2022-04-14 2022-05-13 成都信息工程大学 Internet of things data processing method, device and medium
CN116719945A (en) * 2023-08-08 2023-09-08 北京惠每云科技有限公司 Medical short text classification method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189767A (en) * 2018-08-01 2019-01-11 北京三快在线科技有限公司 Data processing method, device, electronic equipment and storage medium
CN110444259A (en) * 2019-06-06 2019-11-12 昆明理工大学 Traditional Chinese medical electronic case history entity relationship extracting method based on entity relationship mark strategy
CN110569353A (en) * 2019-07-03 2019-12-13 重庆大学 Attention mechanism-based Bi-LSTM label recommendation method
CN111488739A (en) * 2020-03-17 2020-08-04 天津大学 Implicit discourse relation identification method based on multi-granularity generated image enhancement representation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189767A (en) * 2018-08-01 2019-01-11 北京三快在线科技有限公司 Data processing method, device, electronic equipment and storage medium
CN110444259A (en) * 2019-06-06 2019-11-12 昆明理工大学 Traditional Chinese medical electronic case history entity relationship extracting method based on entity relationship mark strategy
CN110569353A (en) * 2019-07-03 2019-12-13 重庆大学 Attention mechanism-based Bi-LSTM label recommendation method
CN111488739A (en) * 2020-03-17 2020-08-04 天津大学 Implicit discourse relation identification method based on multi-granularity generated image enhancement representation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493810A (en) * 2022-04-14 2022-05-13 成都信息工程大学 Internet of things data processing method, device and medium
CN116719945A (en) * 2023-08-08 2023-09-08 北京惠每云科技有限公司 Medical short text classification method and device, electronic equipment and storage medium
CN116719945B (en) * 2023-08-08 2023-10-24 北京惠每云科技有限公司 Medical short text classification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108831559B (en) Chinese electronic medical record text analysis method and system
CN111540468B (en) ICD automatic coding method and system for visualizing diagnostic reasons
CN109697285B (en) Hierarchical BilSt Chinese electronic medical record disease coding and labeling method for enhancing semantic representation
CN109935336B (en) Intelligent auxiliary diagnosis system for respiratory diseases of children
CN110705293A (en) Electronic medical record text named entity recognition method based on pre-training language model
CN109036577B (en) Diabetes complication analysis method and device
CN106874643A (en) Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
CN117744654A (en) Semantic classification method and system for numerical data in natural language context based on machine learning
CN106407443A (en) Structured medical data generation method and device
CN110532398B (en) Automatic family map construction method based on multi-task joint neural network model
CN109492105B (en) Text emotion classification method based on multi-feature ensemble learning
CN112241457A (en) Event detection method for event of affair knowledge graph fused with extension features
CN110600121B (en) Knowledge graph-based primary etiology diagnosis method
CN109003677B (en) Structured analysis processing method for medical record data
CN108091397A (en) A kind of bleeding episode Forecasting Methodology for the Ischemic Heart Disease analyzed based on promotion-resampling and feature association
CN111785387B (en) Method and system for classifying disease standardization mapping by using Bert
CN113360643A (en) Electronic medical record data quality evaluation method based on short text classification
CN112541066A (en) Text-structured-based medical and technical report detection method and related equipment
CN112489740A (en) Medical record detection method, training method of related model, related equipment and device
CN112530584A (en) Medical diagnosis assisting method and system
CN114242194A (en) Natural language processing device and method for medical image diagnosis report based on artificial intelligence
CN113779179A (en) ICD intelligent coding method based on deep learning and knowledge graph
CN111524570A (en) Ultrasonic follow-up patient screening method based on machine learning
CN114628008A (en) Social user depression tendency detection method based on heterogeneous graph attention network
CN114188022A (en) Clinical children cough intelligent pre-diagnosis system based on textCNN model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210907

RJ01 Rejection of invention patent application after publication