CN113486666A - Medical named entity recognition method and system - Google Patents
Medical named entity recognition method and system Download PDFInfo
- Publication number
- CN113486666A CN113486666A CN202110770186.9A CN202110770186A CN113486666A CN 113486666 A CN113486666 A CN 113486666A CN 202110770186 A CN202110770186 A CN 202110770186A CN 113486666 A CN113486666 A CN 113486666A
- Authority
- CN
- China
- Prior art keywords
- named entity
- entity recognition
- feature extraction
- medical
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000000605 extraction Methods 0.000 claims abstract description 50
- 238000002372 labelling Methods 0.000 claims abstract description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000012549 training Methods 0.000 description 10
- 108090000623 proteins and genes Proteins 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a medical named entity recognition method and a system, wherein the method comprises the following steps: acquiring text data to be identified; the method comprises the steps of carrying out named entity recognition on text data to be recognized based on a medical named entity recognition model, wherein the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises a character embedding module and a word embedding module. The method considers the sentences in the text from two aspects of character level and word level, fully obtains the information quantity and meaning of the embedded words, and is beneficial to improving the identification precision of the named entities.
Description
Technical Field
The invention belongs to the technical field of medical text processing, and particularly relates to a medical named entity identification method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Named Entity Recognition (NER) is a basic task in the field of NLP, and is also an important basic tool for most NLP tasks such as question and answer systems, machine translation, syntactic analysis, and the like. Previous approaches have been primarily dictionary-based and rule-based. The dictionary-based method is a method of fuzzy search or complete matching through character strings, but the quality and the size of the dictionary are limited as new entity names are continuously emerged; the rule-based method is to manually specify some rules and expand a rule set by common collocation of self characteristics and phrases of entity names, but huge human resources and time cost are consumed, the rules are generally effective only in a certain specific field, the cost of manual migration is high, and the rule portability is not strong. Named entity recognition is carried out, machine learning methods are mostly adopted, and model training is continuously optimized, so that the trained model shows good performance in test evaluation. Currently, the most applied models include Hidden Markov Models (HMMs), Support Vector Machines (SVMs), Maximum Entropy Markov Models (MEMMs), Conditional Random Fields (CRFs), and the like. The conditional random field model can effectively process the influence problem of the adjacent labels on the prediction sequence, so that the conditional random field model is applied to entity recognition more and has good effect. At present, a deep learning algorithm is generally adopted for the problem of sequence labeling. Compared with the traditional algorithm, the deep learning algorithm eliminates the step of manually extracting the features, and can effectively extract the distinguishing features.
In the biomedical field, literature resources are increased by thousands of times every year, the information is mostly stored in the form of unstructured texts, and the biomedical named entity recognition aims to convert the unstructured texts into structured texts and recognize and classify specific entity names such as genes, proteins, diseases and the like in the biomedical texts. At present, biomedical named entity recognition faces a lot of difficulties, namely the entity name is provided with a plurality of modifiers, and the difficulty in distinguishing entity boundaries is increased; multiple entity names share a word; lack of strict naming standards; ambiguity in abbreviations, etc. In recent years, a neural network method combining bidirectional long-short term memory (BilSTM) and Conditional Random Fields (CRF) has achieved better effects on various NER data sets. Although BilSTM explores a great deal of context information, in the existing embedding of training words, the occurrence frequency of medical professional vocabularies is low, more accurate word senses cannot be obtained, and the word labels obtained each time cannot be correctly predicted.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a medical named entity identification method and system. And a multidimensional Transformer is adopted to explore word embedding information, so that the word embedding information of professional vocabularies is made up, and the recognition accuracy of the named entities is improved.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a medical named entity recognition method, comprising the steps of:
acquiring text data to be identified;
carrying out named entity recognition on the text data to be recognized based on the medical named entity recognition model,
the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, wherein the feature extraction layer comprises a character embedding module and a word embedding module.
Further, the character embedding module firstly carries out local Transformer feature extraction and global Transformer feature extraction on the text data to be recognized respectively, and then fuses character features.
Further, the global transform feature extraction includes:
combining characters of all sentences in the text data to be recognized;
extracting character context information by using a bidirectional long-short term memory neural network;
and (5) carrying out global transform feature extraction.
Further, the fusing the character features comprises:
and splicing and fusing character features obtained by extracting the local Transformer features and the global Transformer features.
Further, the word embedding module adopts a BERT model for feature extraction.
Further, the marking layer is marked and divided by adopting a conditional random field.
One or more embodiments provide a medical named entity recognition system, comprising:
the data acquisition module is configured to acquire text data to be recognized;
the system comprises a named entity recognition module and a word embedding module, wherein the named entity recognition module is configured to perform named entity recognition on text data to be recognized based on a medical named entity recognition model, the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises the character embedding module and the word embedding module.
Further, the character embedding module firstly carries out local Transformer feature extraction and global Transformer feature extraction on the text data to be recognized respectively, and then fuses character features.
One or more embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the medical named entity recognition method when executing the program.
One or more embodiments provide a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the medical named entity recognition method.
The above one or more technical solutions have the following beneficial effects:
the character-level and word-level multi-dimensional embedded information is acquired for the sentences in the text from the two aspects of character level and word level, so that the word embedded information of professional vocabularies is made up, and the accuracy of named entity recognition is improved.
The word embedding information is explored through a local Transformer and a global Transformer, word-level characteristic information is obtained through BERT, finally, the word embedding characteristic information with different dimensionalities is generated into an embedding vector through a splicing and fusing method, the training performance of the model is improved, and the vocabulary which can be processed by the model is greatly improved.
Before the global Transformer characteristic extraction, firstly, the BilSTM is used for extracting character context information, and then the global Transformer characteristic extraction is executed, so that the loss of the context information is avoided, and the characteristic extraction efficiency is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a flowchart of a medical named entity recognition method according to one or more embodiments of the present disclosure.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Transformer is an important tool for improving task performance in the field of natural language processing in recent years. Features within the sentence are extracted by utilizing a multi-head attention mechanism and position coding. And coding the position of the word in the sentence by using a sine and cosine function to obtain position coding information, and performing mask fusion on the embedded information and the original word embedding to form new word embedding. And (4) performing feature extraction on words in the sentence by using a multi-head attention mechanism, and strengthening important feature information in the words. The extraction of word embedding information in sentences by using a Transformer in the field of named entity recognition has become a novel technology.
In recent years, Bidirectional Transformer coders (BERTs) are mature feature extraction tools in the field of natural language processing, and the BERTs are pre-trained using a corpus in a specialized domain and then subjected to downstream task modeling. Recent studies have shown that fine-tuning the model by downstream tasks can achieve superior performance at each task.
Example one
The embodiment discloses a medical named entity recognition method, which is characterized in that deep word meaning information is mined through a neural network model of a multi-level transform and BERT, so that the accuracy of named entity recognition is improved, and as shown in FIG. 1, the method comprises the following steps:
step 1: acquiring text data to be identified, and preprocessing the text data;
the preprocessing specifically comprises preprocessing such as word segmentation and the like.
Step 2: and carrying out named entity recognition on the text data to be recognized based on the medical named entity recognition model.
The construction method of the medical named entity recognition model comprises the following steps:
step A: acquiring a training data set;
the training data set is text data subjected to word segmentation and pre-labeling, and a professional medical corpus is adopted as the training data set in the embodiment;
and B: and training the named entity recognition model based on the training data set to obtain the medical named entity recognition model.
The medical named entity recognition model architecture comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, wherein the feature extraction layer comprises a character embedding module and a word embedding module.
Specifically, the character embedding module comprises a local Transformer feature extraction sub-module, a global Transformer feature extraction sub-module and a character feature fusion sub-module.
The local Transformer feature extraction (LTT) sub-module employs the Transformer to mine the key components of the local characters to embed the characters into words, and then extracts word embedding using max-pooling. As an extension of native word embedding, it increases the amount of information of the embedded word. The details of LTT are as follows:
and a global Transformer feature extraction (GTT) sub-module, which firstly merges the characters of all sentences in each batch, and then extracts words to embed at a global character level by using a Transformer feature extraction technology. However, the use of the Transformer feature extraction technique directly at the global character level may lose contextual information. Therefore, in this embodiment, first, the BiLSTM is used to extract the character context information, and then the global Transformer feature extraction is performed. Experiments have found that not only better context information but also better computational efficiency can be obtained using BiLSTM. The GTT describes a specific algorithm as follows:
the method comprises the steps that character-based Transformer feature extraction is mainly used for modeling characters in words in a form of one-hot (one-hot) coding, then position coding is respectively carried out on modeled character embedding matrixes, position coding information and original character feature information are fused, multi-head attention calculation is carried out on the fused feature information, and finally calculated attention character embedding is carried out and proper dimension information is selected by pooling layer sampling; the character-based global transform feature extraction mainly comprises the steps of firstly using Bi-GRU to search context information on sentence characters for a modeled character single hot matrix, then carrying out transform feature extraction, and finally carrying out sampling by using a pooling layer to form corresponding word embedding.
And the character feature fusion submodule fuses words with different dimensions by using a splicing fusion method to generate an embedded vector required by a downstream task.
In medical texts, pre-trained word embedding vectors are usually used for model training in the next step, however, in the commonly used pre-trained word embedding, there is a limitation on the support of specialized vocabularies, namely, a large number of word embedding vectors in the form of OOV exist. Therefore, in the embodiment, a multidimensional Transformer is used for searching the word embedding information, so as to make up for the word embedding information of the professional vocabulary.
In the biomedical field, when naming genes, diseases and proteins, entities are generally labeled by using label modes such as { B, I, O }, { B, I, O, E, S }, and the like, wherein B refers to the beginning of an entity, I refers to the inside of an entity, E refers to the end of an entity, and O refers to a non-entity component. For example, "B-GENE" refers to the start position tag of a GENE structure. BilSTM outputs label scores, and if the label with the highest score is selected from the labels in the unit, the method is inaccurate, and the legality of the label needs to be ensured by means of a CRF layer.
The word embedding module, namely the feature extraction based on the BERT, is used for acquiring word embedding information which is already mature in a pretrained model based on the BERT and is used for word embedding. In the traditional process of extracting word features, trained word embedding is used, but the method may cause that special word embedding cannot be obtained for a large amount of medical texts.
And the marking layer marks and divides sequence structure data through a Conditional Random Field (CRF) when carrying out a named entity recognition task, so that a more accurate final sequence marking effect can be realized. The CRF is a variant of a Markov random field, is constructed on a transform, generally represents a model by conditional probability for a given output identification tag and an observation sequence, and performs global normalization processing on all characteristics, so that the method has more advantages compared with other machine learning methods.
Example two
It is an object of the present embodiment to provide a medical named entity recognition system. The system comprises:
the data acquisition module is configured to acquire text data to be recognized;
the system comprises a named entity recognition module and a word embedding module, wherein the named entity recognition module is configured to perform named entity recognition on text data to be recognized based on a medical named entity recognition model, the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises the character embedding module and the word embedding module.
EXAMPLE III
The embodiment aims at providing an electronic device.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program, comprising:
acquiring text data to be identified;
carrying out named entity recognition on the text data to be recognized based on the medical named entity recognition model,
the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, wherein the feature extraction layer comprises a character embedding module and a word embedding module.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs the steps of:
the data acquisition module is configured to acquire text data to be recognized;
the system comprises a named entity recognition module and a word embedding module, wherein the named entity recognition module is configured to perform named entity recognition on text data to be recognized based on a medical named entity recognition model, the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises the character embedding module and the word embedding module.
The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
When the named entity recognition task is carried out in one or more embodiments, the sentence in the text is considered from two aspects of character level and word level, feature extraction is carried out on local characters and global characters respectively by using a Transformer, word level feature information is obtained by using BERT, finally words with different dimensions are embedded into the feature information, a splicing and fusing method is used for embedding words with different dimensions into embedded vectors required by a downstream task generated by fusing, and the training performance of the model can be stably improved by using the scheme. Word-level representations can greatly enhance the vocabulary that our model can handle.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.
Claims (10)
1. A medical named entity recognition method, comprising the steps of:
acquiring text data to be identified;
carrying out named entity recognition on the text data to be recognized based on the medical named entity recognition model,
the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, wherein the feature extraction layer comprises a character embedding module and a word embedding module.
2. The medical named entity recognition method of claim 1, wherein the character embedding module first performs local Transformer feature extraction and global Transformer feature extraction on text data to be recognized respectively, and then fuses character features.
3. The medical named entity recognition method of claim 2, wherein the global Transformer feature extraction comprises:
combining characters of all sentences in the text data to be recognized;
extracting character context information by using a bidirectional long-short term memory neural network;
and (5) carrying out global transform feature extraction.
4. The medical named entity recognition method of claim 2, wherein the fusing character features comprises:
and splicing and fusing character features obtained by extracting the local Transformer features and the global Transformer features.
5. The medical named entity recognition method of claim 1, wherein the word embedding module employs a BERT model for feature extraction.
6. The medical named entity recognition method of claim 1, wherein the tagging layer employs conditional random fields for tagging and partitioning.
7. A medical named entity recognition system, comprising:
the data acquisition module is configured to acquire text data to be recognized;
the system comprises a named entity recognition module and a word embedding module, wherein the named entity recognition module is configured to perform named entity recognition on text data to be recognized based on a medical named entity recognition model, the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises the character embedding module and the word embedding module.
8. The medical named entity recognition system of claim 7, wherein the character embedding module first performs local Transformer feature extraction and global Transformer feature extraction on text data to be recognized, respectively, and then fuses character features.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the medical named entity recognition method according to any one of claims 1 to 6 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a medical named entity recognition method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110770186.9A CN113486666A (en) | 2021-07-07 | 2021-07-07 | Medical named entity recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110770186.9A CN113486666A (en) | 2021-07-07 | 2021-07-07 | Medical named entity recognition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113486666A true CN113486666A (en) | 2021-10-08 |
Family
ID=77937878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110770186.9A Pending CN113486666A (en) | 2021-07-07 | 2021-07-07 | Medical named entity recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113486666A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287334A (en) * | 2019-06-13 | 2019-09-27 | 淮阴工学院 | A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model |
CN112002409A (en) * | 2020-07-27 | 2020-11-27 | 山东师范大学 | Traditional Chinese medicine auxiliary diagnosis system |
CN112541356A (en) * | 2020-12-21 | 2021-03-23 | 山东师范大学 | Method and system for recognizing biomedical named entities |
CN113066572A (en) * | 2021-03-03 | 2021-07-02 | 山东师范大学 | Traditional Chinese medicine auxiliary diagnosis system and method for enhancing local feature extraction |
-
2021
- 2021-07-07 CN CN202110770186.9A patent/CN113486666A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287334A (en) * | 2019-06-13 | 2019-09-27 | 淮阴工学院 | A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model |
CN112002409A (en) * | 2020-07-27 | 2020-11-27 | 山东师范大学 | Traditional Chinese medicine auxiliary diagnosis system |
CN112541356A (en) * | 2020-12-21 | 2021-03-23 | 山东师范大学 | Method and system for recognizing biomedical named entities |
CN113066572A (en) * | 2021-03-03 | 2021-07-02 | 山东师范大学 | Traditional Chinese medicine auxiliary diagnosis system and method for enhancing local feature extraction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108959242B (en) | Target entity identification method and device based on part-of-speech characteristics of Chinese characters | |
CN109543181B (en) | Named entity model and system based on combination of active learning and deep learning | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
CN109960728B (en) | Method and system for identifying named entities of open domain conference information | |
CN112541356B (en) | Method and system for recognizing biomedical named entities | |
WO2008107305A2 (en) | Search-based word segmentation method and device for language without word boundary tag | |
CN112100332A (en) | Word embedding expression learning method and device and text recall method and device | |
CN111061882A (en) | Knowledge graph construction method | |
CN111046660B (en) | Method and device for identifying text professional terms | |
CN112417823B (en) | Chinese text word order adjustment and word completion method and system | |
CN114153971A (en) | Error-containing Chinese text error correction, identification and classification equipment | |
CN115600597A (en) | Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium | |
CN110287483B (en) | Unregistered word recognition method and system utilizing five-stroke character root deep learning | |
CN117454898A (en) | Method and device for realizing legal entity standardized output according to input text | |
CN112989839A (en) | Keyword feature-based intent recognition method and system embedded in language model | |
CN116029300A (en) | Language model training method and system for strengthening semantic features of Chinese entities | |
CN113468311B (en) | Knowledge graph-based complex question and answer method, device and storage medium | |
Kang et al. | Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval | |
CN109960782A (en) | A kind of Tibetan language segmenting method and device based on deep neural network | |
CN115358227A (en) | Open domain relation joint extraction method and system based on phrase enhancement | |
CN114239555A (en) | Training method of keyword extraction model and related device | |
Tolegen et al. | Voted-perceptron approach for Kazakh morphological disambiguation | |
CN113486666A (en) | Medical named entity recognition method and system | |
CN112966510A (en) | Weapon equipment entity extraction method, system and storage medium based on ALBERT | |
CN113408267A (en) | Word alignment performance improving method based on pre-training model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |