CN111178047A - Ancient medical record prescription extraction method based on hierarchical sequence labeling - Google Patents

Ancient medical record prescription extraction method based on hierarchical sequence labeling Download PDF

Info

Publication number
CN111178047A
CN111178047A CN201911347473.8A CN201911347473A CN111178047A CN 111178047 A CN111178047 A CN 111178047A CN 201911347473 A CN201911347473 A CN 201911347473A CN 111178047 A CN111178047 A CN 111178047A
Authority
CN
China
Prior art keywords
prescription
labeling
layer
sequence
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911347473.8A
Other languages
Chinese (zh)
Other versions
CN111178047B (en
Inventor
张引
熊海辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201911347473.8A priority Critical patent/CN111178047B/en
Publication of CN111178047A publication Critical patent/CN111178047A/en
Application granted granted Critical
Publication of CN111178047B publication Critical patent/CN111178047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses an ancient medical record prescription extraction method based on hierarchical sequence labeling, which adopts a hierarchical sequence labeling network of BERT + CRF and comprises an input layer, a feature extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer. Firstly, the invention does not need to divide sentences of medical records, and directly takes the complete medical record as input, thereby avoiding error propagation caused by the sentences. Secondly, the prescription text is obtained in a form of serial labeling, and the most relevant text fragment is directly obtained. Finally, the information of the medicine name and the prescription name is considered in the identification process, the characteristic representation in the prescription extraction process is enhanced, a better effect is obtained, and the prescription text in the ancient medical records can be identified by using a small amount of manually labeled data; the invention also designs an evaluation index method based on BLEU and suitable for model selection, which is used for quantifying the matching layer degree between the model extraction result and the labeling result and obtaining the optimal model.

Description

Ancient medical record prescription extraction method based on hierarchical sequence labeling
Technical Field
The invention relates to a pre-training language model in deep learning, namely a conditional random field. In particular to an ancient medical record and prescription extraction method based on hierarchical sequence labeling.
Background
The traditional Chinese medical record records the complete process of treating diseases of patients, including the contents of symptoms, prescriptions and medicines during the treatment period. However, the ancient medical records have complicated and simple contents and large format differences due to the times, personal styles and the like of the ancient physicians. This brings difficulties to the formatting and digitization of the medical record content. For the Chinese medical scholars and Chinese medical enthusiasts, learning the treatment experience of the past doctors from the medical plan is an important way to learn the treatment idea. For this reason, how to format the text of the ancient medical records is particularly important. The purpose of medical record structuring is to identify the contents of medicines, prescriptions, diseases, symptoms, certificates, prescriptions and the like from unstructured medical record texts, thereby forming a structured medical record data resource. The structured medical record is helpful for computer storage and retrieval, thereby better browsing the medical record. In addition, the structured medical record is helpful for mining and researching medical record data, and has important significance for traditional Chinese medicine enthusiasts and researchers.
Prescription extraction is one of the tasks in medical case structuring, and its basic purpose is to identify and extract pieces of prescription text from unstructured medical case text. One existing solution is to identify the prescription by classification, with sentence granularity. The method comprises the steps of firstly, carrying out sentence segmentation on a medical case, and marking each sentence as a prescription or not, so that a two-classification model can be established for sentence classification, and finally, obtaining a sentence representing the prescription. This approach has two problems: (1) clauses are carried out on the basis of rules, so that errors are easy to propagate due to errors; (2) classification at sentence granularity is too coarse and some prescriptions are only a portion of the sentence, not the entire sentence. Therefore, in order to better realize the task of extracting the ancient medical record and prescription, the technical difficulties mainly involved are as follows:
1. how to design a model and extract a long text segment;
2. how to reduce the labeling cost and the labeling pressure and only use a small amount of labeling data to realize prescription extraction;
3. how to design the effect of the evaluation index quantification model.
Disclosure of Invention
In order to solve the problems, the invention provides a hierarchical sequence labeling model for prescription extraction, which solves the problem of prescription extraction in a sequence labeling mode. Firstly, the complete medical record content is used as input, and error propagation caused by clauses is avoided. Secondly, the prescription text is obtained in a form of sequence marking, the most relevant text segment can be directly obtained, and the identified content is more accurate. Finally, the information of the medicine name and the prescription name is considered in the identification process, the characteristic representation in the prescription extraction process is enhanced, a better effect is obtained, and the prescription text in the ancient medical records can be identified by using a small amount of manually labeled data.
In order to achieve the purpose, the invention adopts the following technical scheme:
an ancient medical record prescription extraction method based on hierarchical sequence labeling comprises the following steps:
1) collecting authoritative medical record data resources, and extracting medical record text information through an OCR tool;
2) labeling prescription data, medicine names and prescription names in part of medical record texts by using a data labeling tool to obtain manual labeling data comprising two labeling sequences, wherein one labeling sequence is a prescription labeling sequence, and the other labeling sequence is a medicine name and prescription name labeling sequence; the two labeling sequences both adopt a BIO label system, wherein B represents the initial part of a prescription, a medicine name or a prescription name, I represents the middle part of the prescription, the medicine name or the prescription name, and O represents the part which is not the prescription, the medicine name or the prescription name;
3) the method comprises the steps of (1) carrying out sentence segmentation on an unlabeled medical scheme text, and filtering sentences with the number of words less than a preset threshold value to obtain a pre-training corpus; using the pre-training corpus in a BERT model to perform model parameter fine adjustment;
4) establishing a BERT + CRF hierarchical sequence labeling network which comprises an input layer, a feature extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer, wherein the feature extraction layer adopts the BERT model trained in the step 3); loading the trimmed BERT parameters, training a hierarchical sequence labeling network of BERT + CRF by using the artificial labeling data obtained in the step 2), calculating matching scores of prescription contents obtained by decoding a BIO label sequence predicted by prescription contents output by a prescription prediction CRF layer of the hierarchical sequence labeling network and an artificial labeling result by adopting an evaluation index method based on BLEU, and selecting network parameters corresponding to a model with the highest matching scores to obtain a hierarchical sequence labeling model of BERT + CRF;
5) inputting the text of the medical plan to be processed into the hierarchical sequence labeling model of BERT + CRF obtained in the step 4), outputting BIO label sequences predicted by prescription contents, and decoding all BI sequences from the BIO label sequences, wherein the text contents corresponding to the BI sequences are extracted prescription contents.
Further, the invention designs an evaluation index method based on BLEU, which is used for quantifying the matching layer degree between the model extraction result and the labeling result, and the evaluation index method based on BLEU specifically comprises the following steps:
in the training process, the BIO label sequence of the prescription content prediction output by the hierarchical sequence labeling network prescription prediction CRF layer is decoded to obtain all BI sequences, and the prescription content is obtained according to the text content corresponding to the BI sequences and is expressed as pred ═ p1,p2,...,pn](ii) a The manual labeling result is expressed as label ═ t1,t2,,...,tm]Wherein p isiAnd tjAll the text character strings are text character strings, n represents the number of prescription contents, and m represents the number of manually marked prescription contents;
when T is defined as min (m, n) and T is defined as max (m, n), the BLEU method is used to calculate { p ═ ni,tjCorrelation of }, i ═ 1,2, …, n, j ═ 1,2, …, n; enumerating the sum of the correlation degrees of different columns of all T different rows, taking the maximum value as a numerator and T as a denominator, and calculating to obtain a final matching score, wherein the calculation formula is as follows:
Figure BDA0002333790060000031
where matrix _ sum represents the sum of the correlation degrees of different columns of all t different rows.
Further, the hierarchical sequence labeling network of BERT + CRF comprises an input layer, a feature extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer, wherein the feature extraction layer adopts the BERT model trained in the step 3);
the input layer maps the input word sequence into a corresponding ID sequence; the BERT model takes the ID sequences as input to obtain the characteristic representation corresponding to each ID sequence, and the characteristic length is 768 dimensions; respectively inputting the feature representation corresponding to each ID sequence into two full-connection layers for feature conversion, wherein the feature dimension reduction is 3-dimensional feature, the feature converted through the full-connection layer 1 is the name feature of the medicine and the prescription, and the feature converted through the full-connection layer 2 is the prescription feature; inputting the medicine and prescription name characteristics output by the full connection layer 1 into the medicine and prescription name prediction CRF layer to obtain a BIO label sequence of the medicine and prescription name prediction, adding the prescription characteristics output by the full connection layer 2 and the medicine and prescription name characteristics output by the full connection layer 1 as new characteristics, and inputting the prescription prediction CRF layer to obtain a BIO label sequence of the prescription content prediction.
The invention has the following beneficial effects:
(1) according to the invention, through a sequence marking mode, when a hierarchical sequence marking network of BERT + CRF is trained, medical records do not need to be divided, complete medical records are directly used as input, the most relevant text segments are directly identified, and the problem of error propagation is avoided; the extracted prescription content is more accurate and clean and does not contain irrelevant text information;
(2) the invention uses the pre-training language model in the natural language processing technology to obtain the character representation, can utilize the large-scale non-labeled text for pre-training, learn the general semantics and grammar, reduce the labeling cost and the labeling pressure, and only use a small amount of labeling data to realize the prescription extraction;
(3) the prescription extraction system disclosed by the invention has the advantages that the medicine name and prescription name information is fully utilized to carry out prescription identification, a good auxiliary effect is realized on the prescription identification, the characteristic representation in the prescription extraction process is enhanced, and a better effect is obtained;
(4) the invention designs an evaluation index method based on BLEU, which is used for quantifying the matching layer degree between the model extraction result and the labeling result and obtaining the optimal model in the training process.
Drawings
FIG. 1 is a diagram of a model structure based on hierarchical sequence labeling;
fig. 2 is an evaluation index explanatory view.
Detailed Description
The present invention is described in detail below with reference to specific examples.
In current information extraction tasks, the goal is mainly to extract the named entities of the text, and the named entities are usually short compared to the prescription text, which is usually a long sequence of medication. Therefore, the invention provides two methods, one is to extract the prescription based on the way of hierarchical sequence marking. The prescription extraction is taken as a sequence labeling problem, and the corresponding fragment of the prescription is marked by a BIO label system. The other is a boundary prediction based method, that is, the starting position and the ending position of the prescription segment in the text are predicted. Through a large number of experiments, the scheme based on hierarchical sequence labeling is proved to be superior to the method based on boundary prediction, so that prescription extraction is finally carried out in a hierarchical sequence labeling-based mode.
Because the data annotation of the traditional Chinese medicine neighborhood requires that annotating personnel have basic professional domain knowledge, prescription annotation data is difficult to obtain. In order to solve the problem, the invention uses a pre-training language model in the natural language processing technology to obtain the character representation, can perform pre-training in large-scale label-free texts, and learns general semantics and grammar. Then fine tuning is carried out through non-labeled data in the field of traditional Chinese medicine, better field-related character characteristics are obtained, and finally training is carried out through labeled data; the labeling cost and the labeling pressure are reduced, and prescription extraction is realized by using a small amount of labeling data.
As shown in fig. 1, the ancient medical record prescription extraction method based on hierarchical sequence labeling provided by the invention adopts a hierarchical sequence labeling network based on BERT + CRF, which comprises an input layer, a BERT model, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer;
the input layer maps the input word sequence into a corresponding ID sequence; the BERT model takes the ID sequences as input to obtain the characteristic representation corresponding to each ID sequence, and the characteristic length is 768 dimensions; respectively inputting the feature representation corresponding to each ID sequence into two full-connection layers for feature conversion, wherein the feature dimension reduction is 3-dimensional feature, the feature converted through the full-connection layer 1 is the name feature of the medicine and the prescription, and the feature converted through the full-connection layer 2 is the prescription feature; inputting the medicine and prescription name characteristics output by the full connection layer 1 into the medicine and prescription name prediction CRF layer to obtain a BIO label sequence of the medicine and prescription name prediction, adding the prescription characteristics output by the full connection layer 2 and the medicine and prescription name characteristics output by the full connection layer 1 as new characteristics, and inputting the prescription prediction CRF layer to obtain a BIO label sequence of the prescription content prediction.
In a specific embodiment of the invention, the hierarchical sequence labeling network of BERT + CRF is adopted to extract the ancient medical record prescription, and the steps are as follows:
scanning and processing a classic book related to a traditional Chinese medical record, such as 'the second-generation famous medical record', through an OCR (optical character recognition), converting the classic book into text information, inserting related marks in the scanning process, and marking contents such as departments, diseases, texts and the like related to the medical record.
Marking out prescription text fragments in part of medical record texts by using a BIO label system through a medical record prescription marking tool to obtain manual marking data, wherein the manual marking data comprises two marking sequences, one marking sequence is a prescription marking sequence, and the other marking sequence is a medicine name and prescription name marking sequence; the two labeling sequences both adopt a BIO label system, and each word corresponds to one of BIO labels in the labeling process, wherein B represents the starting part of a prescription, a medicine name or a prescription name, I represents the middle part of the prescription, the medicine name or the prescription name, and O represents the part which is not the prescription, the medicine name or the prescription name;
for example, for text: "Yupingfeng san plus Guizhi Shaoyao is used to benefit the defense and strengthen the exterior to achieve the effect of sweating self-stopping taking", the labeling result of the corresponding prescription labeling sequence and the labeling result of the medicine and prescription name are as follows:
Figure BDA0002333790060000051
and step three, preparing the pre-training corpus. The method comprises the following steps of (1) carrying out sentence segmentation on an unlabeled medical scheme text, and filtering sentences with the number of words less than 5 to obtain pre-training corpus; the pre-training corpus is used for pre-training and fine-tuning the BERT model. The format of the corpus is as follows: each line is a text, and each case is treated as a document, separated by empty lines in the corpus.
In one embodiment of the present invention, the corpus format is as follows:
Figure BDA0002333790060000052
Figure BDA0002333790060000061
the lines 1-6 represent the contents of one medical case, each line represents a sentence, and the lines 8-10 represent the other medical case, with the medical cases separated by empty lines.
Step four, loading the trimmed BERT parameters, training a hierarchical sequence labeling network of BERT + CRF by using manual labeling data, decoding BIO label sequences predicted by prescription content output by a hierarchical sequence labeling network prescription prediction CRF layer in the training process by adopting an evaluation index method based on BLEU (block error unit), obtaining all BI sequences, obtaining the prescription content according to the character content corresponding to the BI sequences, and expressing as pred ═ p [ [ p ] ]1,p2,...,pn](ii) a The manual labeling result is expressed as label ═ t1,t2,,...,tm]Wherein p isiAnd tjAll the text character strings are text character strings, n represents the number of prescription contents, and m represents the number of manually marked prescription contents;
FIG. 2 is an explanatory diagram of evaluation indexes, where A is the number of predicted prescription contents smaller than the number of manually labeled prescription contents, B is the number of predicted prescription contents equal to the number of manually labeled prescription contents, and C is the number of predicted prescription contents larger than the number of manually labeled prescription contents; due to the number and number of predicted prescription contentsIn the calculation of the evaluation index, penalty measures for prediction results smaller than or larger than the labeled number need to be considered, so T is defined as min (m, n), T is defined as max (m, n), and the BLEU method is adopted to calculate { pi,tjCorrelation of }, i ═ 1,2, …, n, j ═ 1,2, …, n; enumerating the sum of the correlation degrees of different columns of all T different rows, taking the maximum value as a numerator and T as a denominator, and calculating to obtain a final matching score, wherein the calculation formula is as follows:
Figure BDA0002333790060000062
where matrix _ sum represents the sum of the correlation degrees of different columns of all t different rows. Selecting a network parameter corresponding to the model with the highest matching score to obtain a hierarchical sequence labeling model of BERT + CRF; inputting the text of the medical plan to be processed into a hierarchical sequence labeling model of BERT + CRF, outputting BIO label sequences predicted by prescription content, decoding all BI sequences from the BIO label sequences, wherein the text content corresponding to the BI sequences is the extracted prescription content, and obtaining a final prescription extraction result.
For instance, … … it is indicated for the syndrome of downward movement, slow and thready pulse, qi deficiency, exterior deficiency, wind-cold-yang deficiency, insecurity of body fluids and intolerance of wind and malaria. It is administered with Yupingfeng san and Guizhi Shaoyao to strengthen superficies and relieve sweating. The original prescription is used to remove cassia twig and ginseng, and the ginseng is cooked and attached to the skin to be taken for a plurality of times, so that the clothes can be completely removed. … … ", the extraction results based on sentence classification are (wherein the parenthesis content behind each sentence is the recognition result, and the ellipses indicate the postambles before and after the omitted part):
Figure BDA0002333790060000071
the extraction result based on the hierarchical sequence labeling method is as follows (wherein, the underline content is a prescription text segment, and the ellipses represent the postamble before and after the omitted part):
Figure BDA0002333790060000072
the invention does not need to divide sentences of medical records, and directly takes the complete medical record as input, thereby avoiding error propagation caused by the sentences; obtaining a prescription text in a form of sequence marking, and directly obtaining the most relevant text fragment; finally, the information of the medicine name and the prescription name is considered in the identification process, the characteristic representation in the prescription extraction process is enhanced, a better effect is obtained, a small amount of manually labeled data can be used for identifying the prescription text in the ancient medical record, and compared with the extraction result based on sentence classification, the prescription content extracted by the method is more accurate and clean and does not contain irrelevant text information.
The above examples only show one embodiment of the present invention, and the description is specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (3)

1. An ancient medical record prescription extraction method based on hierarchical sequence labeling is characterized by comprising the following steps:
1) collecting authoritative medical record data resources, and extracting medical record text information through an OCR tool;
2) labeling prescription data, medicine names and prescription names in part of medical record texts by using a data labeling tool to obtain manual labeling data comprising two labeling sequences, wherein one labeling sequence is a prescription labeling sequence, and the other labeling sequence is a medicine name and prescription name labeling sequence; the two labeling sequences both adopt a BIO label system, wherein B represents the initial part of a prescription, a medicine name or a prescription name, I represents the middle part of the prescription, the medicine name or the prescription name, and O represents the part which is not the prescription, the medicine name or the prescription name;
3) the method comprises the steps of (1) carrying out sentence segmentation on an unlabeled medical scheme text, and filtering sentences with the number of words less than a preset threshold value to obtain a pre-training corpus; using the pre-training corpus in a BERT model to perform model parameter fine adjustment;
4) establishing a BERT + CRF hierarchical sequence labeling network which comprises an input layer, a feature extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer, wherein the feature extraction layer adopts the BERT model trained in the step 3); loading the trimmed BERT parameters, training a hierarchical sequence labeling network of BERT + CRF by using the artificial labeling data obtained in the step 2), calculating matching scores of prescription contents obtained by decoding a BIO label sequence predicted by prescription contents output by a prescription prediction CRF layer of the hierarchical sequence labeling network and an artificial labeling result by adopting an evaluation index method based on BLEU, and selecting network parameters corresponding to a model with the highest matching scores to obtain a hierarchical sequence labeling model of BERT + CRF;
5) inputting the text of the medical plan to be processed into the hierarchical sequence labeling model of BERT + CRF obtained in the step 4), outputting BIO label sequences predicted by prescription contents, and decoding all BI sequences from the BIO label sequences, wherein the text contents corresponding to the BI sequences are extracted prescription contents.
2. The method for extracting ancient medical record prescriptions based on hierarchical sequence labeling of claim 1, wherein the evaluation index method based on BLEU specifically comprises:
in the training process, the BIO label sequence of the prescription content prediction output by the hierarchical sequence labeling network prescription prediction CRF layer is decoded to obtain all BI sequences, and the prescription content is obtained according to the text content corresponding to the BI sequences and is expressed as pred ═ p1,p2,...,pn](ii) a The manual labeling result is expressed as label ═ t1,t2,,...,tm]Wherein p isiAnd tjAll the text character strings are text character strings, n represents the number of prescription contents, and m represents the number of manually marked prescription contents;
when T is defined as min (m, n) and T is defined as max (m, n), the BLEU method is used to calculate { p ═ ni,tjCorrelation of }, i ═ 1,2, …, n, j ═ 1,2, …, n; enumerating the sum of the correlation degrees of different columns of all T different rows, taking the maximum value as a numerator and taking T as a numeratorAnd denominator, calculating to obtain a final matching score, wherein the calculation formula is as follows:
Figure FDA0002333790050000021
where matrix _ sum represents the sum of the correlation degrees of different columns of all t different rows.
3. The method for extracting the ancient medical record prescription based on hierarchical sequence labeling of claim 1, wherein the hierarchical sequence labeling network of BERT + CRF comprises an input layer, a characteristic extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer, wherein the characteristic extraction layer adopts the BERT model trained in the step 3);
the input layer maps the input word sequence into a corresponding ID sequence; the BERT model takes the ID sequences as input to obtain the characteristic representation corresponding to each ID sequence, and the characteristic length is 768 dimensions; respectively inputting the feature representation corresponding to each ID sequence into two full-connection layers for feature conversion, wherein the feature dimension reduction is 3-dimensional feature, the feature converted through the full-connection layer 1 is the name feature of the medicine and the prescription, and the feature converted through the full-connection layer 2 is the prescription feature; inputting the medicine and prescription name characteristics output by the full connection layer 1 into the medicine and prescription name prediction CRF layer to obtain a BIO label sequence of the medicine and prescription name prediction, adding the prescription characteristics output by the full connection layer 2 and the medicine and prescription name characteristics output by the full connection layer 1 as new characteristics, and inputting the prescription prediction CRF layer to obtain a BIO label sequence of the prescription content prediction.
CN201911347473.8A 2019-12-24 2019-12-24 Ancient medical record prescription extraction method based on hierarchical sequence labeling Active CN111178047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911347473.8A CN111178047B (en) 2019-12-24 2019-12-24 Ancient medical record prescription extraction method based on hierarchical sequence labeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911347473.8A CN111178047B (en) 2019-12-24 2019-12-24 Ancient medical record prescription extraction method based on hierarchical sequence labeling

Publications (2)

Publication Number Publication Date
CN111178047A true CN111178047A (en) 2020-05-19
CN111178047B CN111178047B (en) 2021-08-27

Family

ID=70657427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911347473.8A Active CN111178047B (en) 2019-12-24 2019-12-24 Ancient medical record prescription extraction method based on hierarchical sequence labeling

Country Status (1)

Country Link
CN (1) CN111178047B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401077A (en) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN112906367A (en) * 2021-02-08 2021-06-04 上海宏原信息科技有限公司 Information extraction structure, labeling method and identification method of consumer text

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
CN110134772A (en) * 2019-04-18 2019-08-16 五邑大学 Medical text Relation extraction method based on pre-training model and fine tuning technology
CN110516256A (en) * 2019-08-30 2019-11-29 的卢技术有限公司 A kind of Chinese name entity extraction method and its system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
CN110134772A (en) * 2019-04-18 2019-08-16 五邑大学 Medical text Relation extraction method based on pre-training model and fine tuning technology
CN110516256A (en) * 2019-08-30 2019-11-29 的卢技术有限公司 A kind of Chinese name entity extraction method and its system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUAN ZHAO ET AL.: "Pre-Trained Language Model Transfer on Chinese Named Entity Recognition", 《2019 IEEE 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 17TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS)》 *
王子牛 等: "基于BERT的中文命名实体识别方法", 《计算机科学》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401077A (en) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN111401077B (en) * 2020-06-02 2020-09-18 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN112906367A (en) * 2021-02-08 2021-06-04 上海宏原信息科技有限公司 Information extraction structure, labeling method and identification method of consumer text

Also Published As

Publication number Publication date
CN111178047B (en) 2021-08-27

Similar Documents

Publication Publication Date Title
WO2020211275A1 (en) Pre-trained model and fine-tuning technology-based medical text relationship extraction method
WO2021139424A1 (en) Text content quality evaluation method, apparatus and device, and storage medium
CN110134953B (en) Traditional Chinese medicine named entity recognition method and recognition system based on traditional Chinese medicine ancient book literature
CN109508459B (en) Method for extracting theme and key information from news
CN108628824A (en) A kind of entity recognition method based on Chinese electronic health record
CN107368547A (en) A kind of intelligent medical automatic question-answering method based on deep learning
CN112002411A (en) Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN108108449A (en) A kind of implementation method based on multi-source heterogeneous data question answering system and the system towards medical field
CN109697285A (en) Enhance the hierarchical B iLSTM Chinese electronic health record disease code mask method of semantic expressiveness
CN106980609A (en) A kind of name entity recognition method of the condition random field of word-based vector representation
CN109344250A (en) Single diseases diagnostic message rapid structure method based on medical insurance data
Wang et al. A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records
CN111178047B (en) Ancient medical record prescription extraction method based on hierarchical sequence labeling
CN109003677B (en) Structured analysis processing method for medical record data
CN109858042A (en) A kind of determination method and device of translation quality
CN110569343B (en) Clinical text structuring method based on question and answer
Flickinger et al. Wikiwoods: Syntacto-semantic annotation for english wikipedia
CN113632092A (en) Entity recognition method and device, dictionary establishing method, equipment and medium
CN110675962A (en) Traditional Chinese medicine pharmacological action identification method and system based on machine learning and text rules
CN110600123A (en) Clinical auxiliary diagnosis method for traditional Chinese medicine
CN116911300A (en) Language model pre-training method, entity recognition method and device
CN113254609B (en) Question-answering model integration method based on negative sample diversity
CN113111660A (en) Data processing method, device, equipment and storage medium
CN113130025A (en) Entity relationship extraction method, terminal equipment and computer readable storage medium
CN112613315B (en) Text knowledge automatic extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant