CN111178047A - Ancient medical record prescription extraction method based on hierarchical sequence labeling - Google Patents
Ancient medical record prescription extraction method based on hierarchical sequence labeling Download PDFInfo
- Publication number
- CN111178047A CN111178047A CN201911347473.8A CN201911347473A CN111178047A CN 111178047 A CN111178047 A CN 111178047A CN 201911347473 A CN201911347473 A CN 201911347473A CN 111178047 A CN111178047 A CN 111178047A
- Authority
- CN
- China
- Prior art keywords
- prescription
- labeling
- layer
- sequence
- name
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Medicinal Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses an ancient medical record prescription extraction method based on hierarchical sequence labeling, which adopts a hierarchical sequence labeling network of BERT + CRF and comprises an input layer, a feature extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer. Firstly, the invention does not need to divide sentences of medical records, and directly takes the complete medical record as input, thereby avoiding error propagation caused by the sentences. Secondly, the prescription text is obtained in a form of serial labeling, and the most relevant text fragment is directly obtained. Finally, the information of the medicine name and the prescription name is considered in the identification process, the characteristic representation in the prescription extraction process is enhanced, a better effect is obtained, and the prescription text in the ancient medical records can be identified by using a small amount of manually labeled data; the invention also designs an evaluation index method based on BLEU and suitable for model selection, which is used for quantifying the matching layer degree between the model extraction result and the labeling result and obtaining the optimal model.
Description
Technical Field
The invention relates to a pre-training language model in deep learning, namely a conditional random field. In particular to an ancient medical record and prescription extraction method based on hierarchical sequence labeling.
Background
The traditional Chinese medical record records the complete process of treating diseases of patients, including the contents of symptoms, prescriptions and medicines during the treatment period. However, the ancient medical records have complicated and simple contents and large format differences due to the times, personal styles and the like of the ancient physicians. This brings difficulties to the formatting and digitization of the medical record content. For the Chinese medical scholars and Chinese medical enthusiasts, learning the treatment experience of the past doctors from the medical plan is an important way to learn the treatment idea. For this reason, how to format the text of the ancient medical records is particularly important. The purpose of medical record structuring is to identify the contents of medicines, prescriptions, diseases, symptoms, certificates, prescriptions and the like from unstructured medical record texts, thereby forming a structured medical record data resource. The structured medical record is helpful for computer storage and retrieval, thereby better browsing the medical record. In addition, the structured medical record is helpful for mining and researching medical record data, and has important significance for traditional Chinese medicine enthusiasts and researchers.
Prescription extraction is one of the tasks in medical case structuring, and its basic purpose is to identify and extract pieces of prescription text from unstructured medical case text. One existing solution is to identify the prescription by classification, with sentence granularity. The method comprises the steps of firstly, carrying out sentence segmentation on a medical case, and marking each sentence as a prescription or not, so that a two-classification model can be established for sentence classification, and finally, obtaining a sentence representing the prescription. This approach has two problems: (1) clauses are carried out on the basis of rules, so that errors are easy to propagate due to errors; (2) classification at sentence granularity is too coarse and some prescriptions are only a portion of the sentence, not the entire sentence. Therefore, in order to better realize the task of extracting the ancient medical record and prescription, the technical difficulties mainly involved are as follows:
1. how to design a model and extract a long text segment;
2. how to reduce the labeling cost and the labeling pressure and only use a small amount of labeling data to realize prescription extraction;
3. how to design the effect of the evaluation index quantification model.
Disclosure of Invention
In order to solve the problems, the invention provides a hierarchical sequence labeling model for prescription extraction, which solves the problem of prescription extraction in a sequence labeling mode. Firstly, the complete medical record content is used as input, and error propagation caused by clauses is avoided. Secondly, the prescription text is obtained in a form of sequence marking, the most relevant text segment can be directly obtained, and the identified content is more accurate. Finally, the information of the medicine name and the prescription name is considered in the identification process, the characteristic representation in the prescription extraction process is enhanced, a better effect is obtained, and the prescription text in the ancient medical records can be identified by using a small amount of manually labeled data.
In order to achieve the purpose, the invention adopts the following technical scheme:
an ancient medical record prescription extraction method based on hierarchical sequence labeling comprises the following steps:
1) collecting authoritative medical record data resources, and extracting medical record text information through an OCR tool;
2) labeling prescription data, medicine names and prescription names in part of medical record texts by using a data labeling tool to obtain manual labeling data comprising two labeling sequences, wherein one labeling sequence is a prescription labeling sequence, and the other labeling sequence is a medicine name and prescription name labeling sequence; the two labeling sequences both adopt a BIO label system, wherein B represents the initial part of a prescription, a medicine name or a prescription name, I represents the middle part of the prescription, the medicine name or the prescription name, and O represents the part which is not the prescription, the medicine name or the prescription name;
3) the method comprises the steps of (1) carrying out sentence segmentation on an unlabeled medical scheme text, and filtering sentences with the number of words less than a preset threshold value to obtain a pre-training corpus; using the pre-training corpus in a BERT model to perform model parameter fine adjustment;
4) establishing a BERT + CRF hierarchical sequence labeling network which comprises an input layer, a feature extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer, wherein the feature extraction layer adopts the BERT model trained in the step 3); loading the trimmed BERT parameters, training a hierarchical sequence labeling network of BERT + CRF by using the artificial labeling data obtained in the step 2), calculating matching scores of prescription contents obtained by decoding a BIO label sequence predicted by prescription contents output by a prescription prediction CRF layer of the hierarchical sequence labeling network and an artificial labeling result by adopting an evaluation index method based on BLEU, and selecting network parameters corresponding to a model with the highest matching scores to obtain a hierarchical sequence labeling model of BERT + CRF;
5) inputting the text of the medical plan to be processed into the hierarchical sequence labeling model of BERT + CRF obtained in the step 4), outputting BIO label sequences predicted by prescription contents, and decoding all BI sequences from the BIO label sequences, wherein the text contents corresponding to the BI sequences are extracted prescription contents.
Further, the invention designs an evaluation index method based on BLEU, which is used for quantifying the matching layer degree between the model extraction result and the labeling result, and the evaluation index method based on BLEU specifically comprises the following steps:
in the training process, the BIO label sequence of the prescription content prediction output by the hierarchical sequence labeling network prescription prediction CRF layer is decoded to obtain all BI sequences, and the prescription content is obtained according to the text content corresponding to the BI sequences and is expressed as pred ═ p1,p2,...,pn](ii) a The manual labeling result is expressed as label ═ t1,t2,,...,tm]Wherein p isiAnd tjAll the text character strings are text character strings, n represents the number of prescription contents, and m represents the number of manually marked prescription contents;
when T is defined as min (m, n) and T is defined as max (m, n), the BLEU method is used to calculate { p ═ ni,tjCorrelation of }, i ═ 1,2, …, n, j ═ 1,2, …, n; enumerating the sum of the correlation degrees of different columns of all T different rows, taking the maximum value as a numerator and T as a denominator, and calculating to obtain a final matching score, wherein the calculation formula is as follows:
where matrix _ sum represents the sum of the correlation degrees of different columns of all t different rows.
Further, the hierarchical sequence labeling network of BERT + CRF comprises an input layer, a feature extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer, wherein the feature extraction layer adopts the BERT model trained in the step 3);
the input layer maps the input word sequence into a corresponding ID sequence; the BERT model takes the ID sequences as input to obtain the characteristic representation corresponding to each ID sequence, and the characteristic length is 768 dimensions; respectively inputting the feature representation corresponding to each ID sequence into two full-connection layers for feature conversion, wherein the feature dimension reduction is 3-dimensional feature, the feature converted through the full-connection layer 1 is the name feature of the medicine and the prescription, and the feature converted through the full-connection layer 2 is the prescription feature; inputting the medicine and prescription name characteristics output by the full connection layer 1 into the medicine and prescription name prediction CRF layer to obtain a BIO label sequence of the medicine and prescription name prediction, adding the prescription characteristics output by the full connection layer 2 and the medicine and prescription name characteristics output by the full connection layer 1 as new characteristics, and inputting the prescription prediction CRF layer to obtain a BIO label sequence of the prescription content prediction.
The invention has the following beneficial effects:
(1) according to the invention, through a sequence marking mode, when a hierarchical sequence marking network of BERT + CRF is trained, medical records do not need to be divided, complete medical records are directly used as input, the most relevant text segments are directly identified, and the problem of error propagation is avoided; the extracted prescription content is more accurate and clean and does not contain irrelevant text information;
(2) the invention uses the pre-training language model in the natural language processing technology to obtain the character representation, can utilize the large-scale non-labeled text for pre-training, learn the general semantics and grammar, reduce the labeling cost and the labeling pressure, and only use a small amount of labeling data to realize the prescription extraction;
(3) the prescription extraction system disclosed by the invention has the advantages that the medicine name and prescription name information is fully utilized to carry out prescription identification, a good auxiliary effect is realized on the prescription identification, the characteristic representation in the prescription extraction process is enhanced, and a better effect is obtained;
(4) the invention designs an evaluation index method based on BLEU, which is used for quantifying the matching layer degree between the model extraction result and the labeling result and obtaining the optimal model in the training process.
Drawings
FIG. 1 is a diagram of a model structure based on hierarchical sequence labeling;
fig. 2 is an evaluation index explanatory view.
Detailed Description
The present invention is described in detail below with reference to specific examples.
In current information extraction tasks, the goal is mainly to extract the named entities of the text, and the named entities are usually short compared to the prescription text, which is usually a long sequence of medication. Therefore, the invention provides two methods, one is to extract the prescription based on the way of hierarchical sequence marking. The prescription extraction is taken as a sequence labeling problem, and the corresponding fragment of the prescription is marked by a BIO label system. The other is a boundary prediction based method, that is, the starting position and the ending position of the prescription segment in the text are predicted. Through a large number of experiments, the scheme based on hierarchical sequence labeling is proved to be superior to the method based on boundary prediction, so that prescription extraction is finally carried out in a hierarchical sequence labeling-based mode.
Because the data annotation of the traditional Chinese medicine neighborhood requires that annotating personnel have basic professional domain knowledge, prescription annotation data is difficult to obtain. In order to solve the problem, the invention uses a pre-training language model in the natural language processing technology to obtain the character representation, can perform pre-training in large-scale label-free texts, and learns general semantics and grammar. Then fine tuning is carried out through non-labeled data in the field of traditional Chinese medicine, better field-related character characteristics are obtained, and finally training is carried out through labeled data; the labeling cost and the labeling pressure are reduced, and prescription extraction is realized by using a small amount of labeling data.
As shown in fig. 1, the ancient medical record prescription extraction method based on hierarchical sequence labeling provided by the invention adopts a hierarchical sequence labeling network based on BERT + CRF, which comprises an input layer, a BERT model, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer;
the input layer maps the input word sequence into a corresponding ID sequence; the BERT model takes the ID sequences as input to obtain the characteristic representation corresponding to each ID sequence, and the characteristic length is 768 dimensions; respectively inputting the feature representation corresponding to each ID sequence into two full-connection layers for feature conversion, wherein the feature dimension reduction is 3-dimensional feature, the feature converted through the full-connection layer 1 is the name feature of the medicine and the prescription, and the feature converted through the full-connection layer 2 is the prescription feature; inputting the medicine and prescription name characteristics output by the full connection layer 1 into the medicine and prescription name prediction CRF layer to obtain a BIO label sequence of the medicine and prescription name prediction, adding the prescription characteristics output by the full connection layer 2 and the medicine and prescription name characteristics output by the full connection layer 1 as new characteristics, and inputting the prescription prediction CRF layer to obtain a BIO label sequence of the prescription content prediction.
In a specific embodiment of the invention, the hierarchical sequence labeling network of BERT + CRF is adopted to extract the ancient medical record prescription, and the steps are as follows:
scanning and processing a classic book related to a traditional Chinese medical record, such as 'the second-generation famous medical record', through an OCR (optical character recognition), converting the classic book into text information, inserting related marks in the scanning process, and marking contents such as departments, diseases, texts and the like related to the medical record.
Marking out prescription text fragments in part of medical record texts by using a BIO label system through a medical record prescription marking tool to obtain manual marking data, wherein the manual marking data comprises two marking sequences, one marking sequence is a prescription marking sequence, and the other marking sequence is a medicine name and prescription name marking sequence; the two labeling sequences both adopt a BIO label system, and each word corresponds to one of BIO labels in the labeling process, wherein B represents the starting part of a prescription, a medicine name or a prescription name, I represents the middle part of the prescription, the medicine name or the prescription name, and O represents the part which is not the prescription, the medicine name or the prescription name;
for example, for text: "Yupingfeng san plus Guizhi Shaoyao is used to benefit the defense and strengthen the exterior to achieve the effect of sweating self-stopping taking", the labeling result of the corresponding prescription labeling sequence and the labeling result of the medicine and prescription name are as follows:
and step three, preparing the pre-training corpus. The method comprises the following steps of (1) carrying out sentence segmentation on an unlabeled medical scheme text, and filtering sentences with the number of words less than 5 to obtain pre-training corpus; the pre-training corpus is used for pre-training and fine-tuning the BERT model. The format of the corpus is as follows: each line is a text, and each case is treated as a document, separated by empty lines in the corpus.
In one embodiment of the present invention, the corpus format is as follows:
the lines 1-6 represent the contents of one medical case, each line represents a sentence, and the lines 8-10 represent the other medical case, with the medical cases separated by empty lines.
Step four, loading the trimmed BERT parameters, training a hierarchical sequence labeling network of BERT + CRF by using manual labeling data, decoding BIO label sequences predicted by prescription content output by a hierarchical sequence labeling network prescription prediction CRF layer in the training process by adopting an evaluation index method based on BLEU (block error unit), obtaining all BI sequences, obtaining the prescription content according to the character content corresponding to the BI sequences, and expressing as pred ═ p [ [ p ] ]1,p2,...,pn](ii) a The manual labeling result is expressed as label ═ t1,t2,,...,tm]Wherein p isiAnd tjAll the text character strings are text character strings, n represents the number of prescription contents, and m represents the number of manually marked prescription contents;
FIG. 2 is an explanatory diagram of evaluation indexes, where A is the number of predicted prescription contents smaller than the number of manually labeled prescription contents, B is the number of predicted prescription contents equal to the number of manually labeled prescription contents, and C is the number of predicted prescription contents larger than the number of manually labeled prescription contents; due to the number and number of predicted prescription contentsIn the calculation of the evaluation index, penalty measures for prediction results smaller than or larger than the labeled number need to be considered, so T is defined as min (m, n), T is defined as max (m, n), and the BLEU method is adopted to calculate { pi,tjCorrelation of }, i ═ 1,2, …, n, j ═ 1,2, …, n; enumerating the sum of the correlation degrees of different columns of all T different rows, taking the maximum value as a numerator and T as a denominator, and calculating to obtain a final matching score, wherein the calculation formula is as follows:
where matrix _ sum represents the sum of the correlation degrees of different columns of all t different rows. Selecting a network parameter corresponding to the model with the highest matching score to obtain a hierarchical sequence labeling model of BERT + CRF; inputting the text of the medical plan to be processed into a hierarchical sequence labeling model of BERT + CRF, outputting BIO label sequences predicted by prescription content, decoding all BI sequences from the BIO label sequences, wherein the text content corresponding to the BI sequences is the extracted prescription content, and obtaining a final prescription extraction result.
For instance, … … it is indicated for the syndrome of downward movement, slow and thready pulse, qi deficiency, exterior deficiency, wind-cold-yang deficiency, insecurity of body fluids and intolerance of wind and malaria. It is administered with Yupingfeng san and Guizhi Shaoyao to strengthen superficies and relieve sweating. The original prescription is used to remove cassia twig and ginseng, and the ginseng is cooked and attached to the skin to be taken for a plurality of times, so that the clothes can be completely removed. … … ", the extraction results based on sentence classification are (wherein the parenthesis content behind each sentence is the recognition result, and the ellipses indicate the postambles before and after the omitted part):
the extraction result based on the hierarchical sequence labeling method is as follows (wherein, the underline content is a prescription text segment, and the ellipses represent the postamble before and after the omitted part):
the invention does not need to divide sentences of medical records, and directly takes the complete medical record as input, thereby avoiding error propagation caused by the sentences; obtaining a prescription text in a form of sequence marking, and directly obtaining the most relevant text fragment; finally, the information of the medicine name and the prescription name is considered in the identification process, the characteristic representation in the prescription extraction process is enhanced, a better effect is obtained, a small amount of manually labeled data can be used for identifying the prescription text in the ancient medical record, and compared with the extraction result based on sentence classification, the prescription content extracted by the method is more accurate and clean and does not contain irrelevant text information.
The above examples only show one embodiment of the present invention, and the description is specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (3)
1. An ancient medical record prescription extraction method based on hierarchical sequence labeling is characterized by comprising the following steps:
1) collecting authoritative medical record data resources, and extracting medical record text information through an OCR tool;
2) labeling prescription data, medicine names and prescription names in part of medical record texts by using a data labeling tool to obtain manual labeling data comprising two labeling sequences, wherein one labeling sequence is a prescription labeling sequence, and the other labeling sequence is a medicine name and prescription name labeling sequence; the two labeling sequences both adopt a BIO label system, wherein B represents the initial part of a prescription, a medicine name or a prescription name, I represents the middle part of the prescription, the medicine name or the prescription name, and O represents the part which is not the prescription, the medicine name or the prescription name;
3) the method comprises the steps of (1) carrying out sentence segmentation on an unlabeled medical scheme text, and filtering sentences with the number of words less than a preset threshold value to obtain a pre-training corpus; using the pre-training corpus in a BERT model to perform model parameter fine adjustment;
4) establishing a BERT + CRF hierarchical sequence labeling network which comprises an input layer, a feature extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer, wherein the feature extraction layer adopts the BERT model trained in the step 3); loading the trimmed BERT parameters, training a hierarchical sequence labeling network of BERT + CRF by using the artificial labeling data obtained in the step 2), calculating matching scores of prescription contents obtained by decoding a BIO label sequence predicted by prescription contents output by a prescription prediction CRF layer of the hierarchical sequence labeling network and an artificial labeling result by adopting an evaluation index method based on BLEU, and selecting network parameters corresponding to a model with the highest matching scores to obtain a hierarchical sequence labeling model of BERT + CRF;
5) inputting the text of the medical plan to be processed into the hierarchical sequence labeling model of BERT + CRF obtained in the step 4), outputting BIO label sequences predicted by prescription contents, and decoding all BI sequences from the BIO label sequences, wherein the text contents corresponding to the BI sequences are extracted prescription contents.
2. The method for extracting ancient medical record prescriptions based on hierarchical sequence labeling of claim 1, wherein the evaluation index method based on BLEU specifically comprises:
in the training process, the BIO label sequence of the prescription content prediction output by the hierarchical sequence labeling network prescription prediction CRF layer is decoded to obtain all BI sequences, and the prescription content is obtained according to the text content corresponding to the BI sequences and is expressed as pred ═ p1,p2,...,pn](ii) a The manual labeling result is expressed as label ═ t1,t2,,...,tm]Wherein p isiAnd tjAll the text character strings are text character strings, n represents the number of prescription contents, and m represents the number of manually marked prescription contents;
when T is defined as min (m, n) and T is defined as max (m, n), the BLEU method is used to calculate { p ═ ni,tjCorrelation of }, i ═ 1,2, …, n, j ═ 1,2, …, n; enumerating the sum of the correlation degrees of different columns of all T different rows, taking the maximum value as a numerator and taking T as a numeratorAnd denominator, calculating to obtain a final matching score, wherein the calculation formula is as follows:
where matrix _ sum represents the sum of the correlation degrees of different columns of all t different rows.
3. The method for extracting the ancient medical record prescription based on hierarchical sequence labeling of claim 1, wherein the hierarchical sequence labeling network of BERT + CRF comprises an input layer, a characteristic extraction layer, a full connection layer, a medicine and prescription name prediction CRF layer and a prescription prediction CRF layer, wherein the characteristic extraction layer adopts the BERT model trained in the step 3);
the input layer maps the input word sequence into a corresponding ID sequence; the BERT model takes the ID sequences as input to obtain the characteristic representation corresponding to each ID sequence, and the characteristic length is 768 dimensions; respectively inputting the feature representation corresponding to each ID sequence into two full-connection layers for feature conversion, wherein the feature dimension reduction is 3-dimensional feature, the feature converted through the full-connection layer 1 is the name feature of the medicine and the prescription, and the feature converted through the full-connection layer 2 is the prescription feature; inputting the medicine and prescription name characteristics output by the full connection layer 1 into the medicine and prescription name prediction CRF layer to obtain a BIO label sequence of the medicine and prescription name prediction, adding the prescription characteristics output by the full connection layer 2 and the medicine and prescription name characteristics output by the full connection layer 1 as new characteristics, and inputting the prescription prediction CRF layer to obtain a BIO label sequence of the prescription content prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911347473.8A CN111178047B (en) | 2019-12-24 | 2019-12-24 | Ancient medical record prescription extraction method based on hierarchical sequence labeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911347473.8A CN111178047B (en) | 2019-12-24 | 2019-12-24 | Ancient medical record prescription extraction method based on hierarchical sequence labeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111178047A true CN111178047A (en) | 2020-05-19 |
CN111178047B CN111178047B (en) | 2021-08-27 |
Family
ID=70657427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911347473.8A Active CN111178047B (en) | 2019-12-24 | 2019-12-24 | Ancient medical record prescription extraction method based on hierarchical sequence labeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111178047B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401077A (en) * | 2020-06-02 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN112906367A (en) * | 2021-02-08 | 2021-06-04 | 上海宏原信息科技有限公司 | Information extraction structure, labeling method and identification method of consumer text |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083831A (en) * | 2019-04-16 | 2019-08-02 | 武汉大学 | A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF |
CN110134772A (en) * | 2019-04-18 | 2019-08-16 | 五邑大学 | Medical text Relation extraction method based on pre-training model and fine tuning technology |
CN110516256A (en) * | 2019-08-30 | 2019-11-29 | 的卢技术有限公司 | A kind of Chinese name entity extraction method and its system |
-
2019
- 2019-12-24 CN CN201911347473.8A patent/CN111178047B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083831A (en) * | 2019-04-16 | 2019-08-02 | 武汉大学 | A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF |
CN110134772A (en) * | 2019-04-18 | 2019-08-16 | 五邑大学 | Medical text Relation extraction method based on pre-training model and fine tuning technology |
CN110516256A (en) * | 2019-08-30 | 2019-11-29 | 的卢技术有限公司 | A kind of Chinese name entity extraction method and its system |
Non-Patent Citations (2)
Title |
---|
HUAN ZHAO ET AL.: "Pre-Trained Language Model Transfer on Chinese Named Entity Recognition", 《2019 IEEE 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 17TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS)》 * |
王子牛 等: "基于BERT的中文命名实体识别方法", 《计算机科学》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401077A (en) * | 2020-06-02 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN111401077B (en) * | 2020-06-02 | 2020-09-18 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN112906367A (en) * | 2021-02-08 | 2021-06-04 | 上海宏原信息科技有限公司 | Information extraction structure, labeling method and identification method of consumer text |
Also Published As
Publication number | Publication date |
---|---|
CN111178047B (en) | 2021-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020211275A1 (en) | Pre-trained model and fine-tuning technology-based medical text relationship extraction method | |
WO2021139424A1 (en) | Text content quality evaluation method, apparatus and device, and storage medium | |
CN110134953B (en) | Traditional Chinese medicine named entity recognition method and recognition system based on traditional Chinese medicine ancient book literature | |
CN109508459B (en) | Method for extracting theme and key information from news | |
CN108628824A (en) | A kind of entity recognition method based on Chinese electronic health record | |
CN107368547A (en) | A kind of intelligent medical automatic question-answering method based on deep learning | |
CN112002411A (en) | Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record | |
CN108108449A (en) | A kind of implementation method based on multi-source heterogeneous data question answering system and the system towards medical field | |
CN109697285A (en) | Enhance the hierarchical B iLSTM Chinese electronic health record disease code mask method of semantic expressiveness | |
CN106980609A (en) | A kind of name entity recognition method of the condition random field of word-based vector representation | |
CN109344250A (en) | Single diseases diagnostic message rapid structure method based on medical insurance data | |
Wang et al. | A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records | |
CN111178047B (en) | Ancient medical record prescription extraction method based on hierarchical sequence labeling | |
CN109003677B (en) | Structured analysis processing method for medical record data | |
CN109858042A (en) | A kind of determination method and device of translation quality | |
CN110569343B (en) | Clinical text structuring method based on question and answer | |
Flickinger et al. | Wikiwoods: Syntacto-semantic annotation for english wikipedia | |
CN113632092A (en) | Entity recognition method and device, dictionary establishing method, equipment and medium | |
CN110675962A (en) | Traditional Chinese medicine pharmacological action identification method and system based on machine learning and text rules | |
CN110600123A (en) | Clinical auxiliary diagnosis method for traditional Chinese medicine | |
CN116911300A (en) | Language model pre-training method, entity recognition method and device | |
CN113254609B (en) | Question-answering model integration method based on negative sample diversity | |
CN113111660A (en) | Data processing method, device, equipment and storage medium | |
CN113130025A (en) | Entity relationship extraction method, terminal equipment and computer readable storage medium | |
CN112613315B (en) | Text knowledge automatic extraction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |