CN113486666A - Medical named entity recognition method and system - Google Patents

Medical named entity recognition method and system Download PDF

Info

Publication number
CN113486666A
CN113486666A CN202110770186.9A CN202110770186A CN113486666A CN 113486666 A CN113486666 A CN 113486666A CN 202110770186 A CN202110770186 A CN 202110770186A CN 113486666 A CN113486666 A CN 113486666A
Authority
CN
China
Prior art keywords
named entity
entity recognition
feature extraction
medical
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110770186.9A
Other languages
Chinese (zh)
Inventor
潘景山
徐卫志
范胜玉
涂阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Supercomputing Technology Research Institute
Shandong Normal University
Original Assignee
Jinan Supercomputing Technology Research Institute
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Supercomputing Technology Research Institute, Shandong Normal University filed Critical Jinan Supercomputing Technology Research Institute
Priority to CN202110770186.9A priority Critical patent/CN113486666A/en
Publication of CN113486666A publication Critical patent/CN113486666A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a medical named entity recognition method and a system, wherein the method comprises the following steps: acquiring text data to be identified; the method comprises the steps of carrying out named entity recognition on text data to be recognized based on a medical named entity recognition model, wherein the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises a character embedding module and a word embedding module. The method considers the sentences in the text from two aspects of character level and word level, fully obtains the information quantity and meaning of the embedded words, and is beneficial to improving the identification precision of the named entities.

Description

Medical named entity recognition method and system
Technical Field
The invention belongs to the technical field of medical text processing, and particularly relates to a medical named entity identification method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Named Entity Recognition (NER) is a basic task in the field of NLP, and is also an important basic tool for most NLP tasks such as question and answer systems, machine translation, syntactic analysis, and the like. Previous approaches have been primarily dictionary-based and rule-based. The dictionary-based method is a method of fuzzy search or complete matching through character strings, but the quality and the size of the dictionary are limited as new entity names are continuously emerged; the rule-based method is to manually specify some rules and expand a rule set by common collocation of self characteristics and phrases of entity names, but huge human resources and time cost are consumed, the rules are generally effective only in a certain specific field, the cost of manual migration is high, and the rule portability is not strong. Named entity recognition is carried out, machine learning methods are mostly adopted, and model training is continuously optimized, so that the trained model shows good performance in test evaluation. Currently, the most applied models include Hidden Markov Models (HMMs), Support Vector Machines (SVMs), Maximum Entropy Markov Models (MEMMs), Conditional Random Fields (CRFs), and the like. The conditional random field model can effectively process the influence problem of the adjacent labels on the prediction sequence, so that the conditional random field model is applied to entity recognition more and has good effect. At present, a deep learning algorithm is generally adopted for the problem of sequence labeling. Compared with the traditional algorithm, the deep learning algorithm eliminates the step of manually extracting the features, and can effectively extract the distinguishing features.
In the biomedical field, literature resources are increased by thousands of times every year, the information is mostly stored in the form of unstructured texts, and the biomedical named entity recognition aims to convert the unstructured texts into structured texts and recognize and classify specific entity names such as genes, proteins, diseases and the like in the biomedical texts. At present, biomedical named entity recognition faces a lot of difficulties, namely the entity name is provided with a plurality of modifiers, and the difficulty in distinguishing entity boundaries is increased; multiple entity names share a word; lack of strict naming standards; ambiguity in abbreviations, etc. In recent years, a neural network method combining bidirectional long-short term memory (BilSTM) and Conditional Random Fields (CRF) has achieved better effects on various NER data sets. Although BilSTM explores a great deal of context information, in the existing embedding of training words, the occurrence frequency of medical professional vocabularies is low, more accurate word senses cannot be obtained, and the word labels obtained each time cannot be correctly predicted.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a medical named entity identification method and system. And a multidimensional Transformer is adopted to explore word embedding information, so that the word embedding information of professional vocabularies is made up, and the recognition accuracy of the named entities is improved.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a medical named entity recognition method, comprising the steps of:
acquiring text data to be identified;
carrying out named entity recognition on the text data to be recognized based on the medical named entity recognition model,
the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, wherein the feature extraction layer comprises a character embedding module and a word embedding module.
Further, the character embedding module firstly carries out local Transformer feature extraction and global Transformer feature extraction on the text data to be recognized respectively, and then fuses character features.
Further, the global transform feature extraction includes:
combining characters of all sentences in the text data to be recognized;
extracting character context information by using a bidirectional long-short term memory neural network;
and (5) carrying out global transform feature extraction.
Further, the fusing the character features comprises:
and splicing and fusing character features obtained by extracting the local Transformer features and the global Transformer features.
Further, the word embedding module adopts a BERT model for feature extraction.
Further, the marking layer is marked and divided by adopting a conditional random field.
One or more embodiments provide a medical named entity recognition system, comprising:
the data acquisition module is configured to acquire text data to be recognized;
the system comprises a named entity recognition module and a word embedding module, wherein the named entity recognition module is configured to perform named entity recognition on text data to be recognized based on a medical named entity recognition model, the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises the character embedding module and the word embedding module.
Further, the character embedding module firstly carries out local Transformer feature extraction and global Transformer feature extraction on the text data to be recognized respectively, and then fuses character features.
One or more embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the medical named entity recognition method when executing the program.
One or more embodiments provide a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the medical named entity recognition method.
The above one or more technical solutions have the following beneficial effects:
the character-level and word-level multi-dimensional embedded information is acquired for the sentences in the text from the two aspects of character level and word level, so that the word embedded information of professional vocabularies is made up, and the accuracy of named entity recognition is improved.
The word embedding information is explored through a local Transformer and a global Transformer, word-level characteristic information is obtained through BERT, finally, the word embedding characteristic information with different dimensionalities is generated into an embedding vector through a splicing and fusing method, the training performance of the model is improved, and the vocabulary which can be processed by the model is greatly improved.
Before the global Transformer characteristic extraction, firstly, the BilSTM is used for extracting character context information, and then the global Transformer characteristic extraction is executed, so that the loss of the context information is avoided, and the characteristic extraction efficiency is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a flowchart of a medical named entity recognition method according to one or more embodiments of the present disclosure.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Transformer is an important tool for improving task performance in the field of natural language processing in recent years. Features within the sentence are extracted by utilizing a multi-head attention mechanism and position coding. And coding the position of the word in the sentence by using a sine and cosine function to obtain position coding information, and performing mask fusion on the embedded information and the original word embedding to form new word embedding. And (4) performing feature extraction on words in the sentence by using a multi-head attention mechanism, and strengthening important feature information in the words. The extraction of word embedding information in sentences by using a Transformer in the field of named entity recognition has become a novel technology.
In recent years, Bidirectional Transformer coders (BERTs) are mature feature extraction tools in the field of natural language processing, and the BERTs are pre-trained using a corpus in a specialized domain and then subjected to downstream task modeling. Recent studies have shown that fine-tuning the model by downstream tasks can achieve superior performance at each task.
Example one
The embodiment discloses a medical named entity recognition method, which is characterized in that deep word meaning information is mined through a neural network model of a multi-level transform and BERT, so that the accuracy of named entity recognition is improved, and as shown in FIG. 1, the method comprises the following steps:
step 1: acquiring text data to be identified, and preprocessing the text data;
the preprocessing specifically comprises preprocessing such as word segmentation and the like.
Step 2: and carrying out named entity recognition on the text data to be recognized based on the medical named entity recognition model.
The construction method of the medical named entity recognition model comprises the following steps:
step A: acquiring a training data set;
the training data set is text data subjected to word segmentation and pre-labeling, and a professional medical corpus is adopted as the training data set in the embodiment;
and B: and training the named entity recognition model based on the training data set to obtain the medical named entity recognition model.
The medical named entity recognition model architecture comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, wherein the feature extraction layer comprises a character embedding module and a word embedding module.
Specifically, the character embedding module comprises a local Transformer feature extraction sub-module, a global Transformer feature extraction sub-module and a character feature fusion sub-module.
The local Transformer feature extraction (LTT) sub-module employs the Transformer to mine the key components of the local characters to embed the characters into words, and then extracts word embedding using max-pooling. As an extension of native word embedding, it increases the amount of information of the embedded word. The details of LTT are as follows:
Figure BDA0003152685810000061
Figure BDA0003152685810000062
and a global Transformer feature extraction (GTT) sub-module, which firstly merges the characters of all sentences in each batch, and then extracts words to embed at a global character level by using a Transformer feature extraction technology. However, the use of the Transformer feature extraction technique directly at the global character level may lose contextual information. Therefore, in this embodiment, first, the BiLSTM is used to extract the character context information, and then the global Transformer feature extraction is performed. Experiments have found that not only better context information but also better computational efficiency can be obtained using BiLSTM. The GTT describes a specific algorithm as follows:
Figure BDA0003152685810000063
Figure BDA0003152685810000064
Figure BDA0003152685810000065
the method comprises the steps that character-based Transformer feature extraction is mainly used for modeling characters in words in a form of one-hot (one-hot) coding, then position coding is respectively carried out on modeled character embedding matrixes, position coding information and original character feature information are fused, multi-head attention calculation is carried out on the fused feature information, and finally calculated attention character embedding is carried out and proper dimension information is selected by pooling layer sampling; the character-based global transform feature extraction mainly comprises the steps of firstly using Bi-GRU to search context information on sentence characters for a modeled character single hot matrix, then carrying out transform feature extraction, and finally carrying out sampling by using a pooling layer to form corresponding word embedding.
And the character feature fusion submodule fuses words with different dimensions by using a splicing fusion method to generate an embedded vector required by a downstream task.
In medical texts, pre-trained word embedding vectors are usually used for model training in the next step, however, in the commonly used pre-trained word embedding, there is a limitation on the support of specialized vocabularies, namely, a large number of word embedding vectors in the form of OOV exist. Therefore, in the embodiment, a multidimensional Transformer is used for searching the word embedding information, so as to make up for the word embedding information of the professional vocabulary.
In the biomedical field, when naming genes, diseases and proteins, entities are generally labeled by using label modes such as { B, I, O }, { B, I, O, E, S }, and the like, wherein B refers to the beginning of an entity, I refers to the inside of an entity, E refers to the end of an entity, and O refers to a non-entity component. For example, "B-GENE" refers to the start position tag of a GENE structure. BilSTM outputs label scores, and if the label with the highest score is selected from the labels in the unit, the method is inaccurate, and the legality of the label needs to be ensured by means of a CRF layer.
The word embedding module, namely the feature extraction based on the BERT, is used for acquiring word embedding information which is already mature in a pretrained model based on the BERT and is used for word embedding. In the traditional process of extracting word features, trained word embedding is used, but the method may cause that special word embedding cannot be obtained for a large amount of medical texts.
And the marking layer marks and divides sequence structure data through a Conditional Random Field (CRF) when carrying out a named entity recognition task, so that a more accurate final sequence marking effect can be realized. The CRF is a variant of a Markov random field, is constructed on a transform, generally represents a model by conditional probability for a given output identification tag and an observation sequence, and performs global normalization processing on all characteristics, so that the method has more advantages compared with other machine learning methods.
Example two
It is an object of the present embodiment to provide a medical named entity recognition system. The system comprises:
the data acquisition module is configured to acquire text data to be recognized;
the system comprises a named entity recognition module and a word embedding module, wherein the named entity recognition module is configured to perform named entity recognition on text data to be recognized based on a medical named entity recognition model, the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises the character embedding module and the word embedding module.
EXAMPLE III
The embodiment aims at providing an electronic device.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program, comprising:
acquiring text data to be identified;
carrying out named entity recognition on the text data to be recognized based on the medical named entity recognition model,
the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, wherein the feature extraction layer comprises a character embedding module and a word embedding module.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs the steps of:
the data acquisition module is configured to acquire text data to be recognized;
the system comprises a named entity recognition module and a word embedding module, wherein the named entity recognition module is configured to perform named entity recognition on text data to be recognized based on a medical named entity recognition model, the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises the character embedding module and the word embedding module.
The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
When the named entity recognition task is carried out in one or more embodiments, the sentence in the text is considered from two aspects of character level and word level, feature extraction is carried out on local characters and global characters respectively by using a Transformer, word level feature information is obtained by using BERT, finally words with different dimensions are embedded into the feature information, a splicing and fusing method is used for embedding words with different dimensions into embedded vectors required by a downstream task generated by fusing, and the training performance of the model can be stably improved by using the scheme. Word-level representations can greatly enhance the vocabulary that our model can handle.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A medical named entity recognition method, comprising the steps of:
acquiring text data to be identified;
carrying out named entity recognition on the text data to be recognized based on the medical named entity recognition model,
the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, wherein the feature extraction layer comprises a character embedding module and a word embedding module.
2. The medical named entity recognition method of claim 1, wherein the character embedding module first performs local Transformer feature extraction and global Transformer feature extraction on text data to be recognized respectively, and then fuses character features.
3. The medical named entity recognition method of claim 2, wherein the global Transformer feature extraction comprises:
combining characters of all sentences in the text data to be recognized;
extracting character context information by using a bidirectional long-short term memory neural network;
and (5) carrying out global transform feature extraction.
4. The medical named entity recognition method of claim 2, wherein the fusing character features comprises:
and splicing and fusing character features obtained by extracting the local Transformer features and the global Transformer features.
5. The medical named entity recognition method of claim 1, wherein the word embedding module employs a BERT model for feature extraction.
6. The medical named entity recognition method of claim 1, wherein the tagging layer employs conditional random fields for tagging and partitioning.
7. A medical named entity recognition system, comprising:
the data acquisition module is configured to acquire text data to be recognized;
the system comprises a named entity recognition module and a word embedding module, wherein the named entity recognition module is configured to perform named entity recognition on text data to be recognized based on a medical named entity recognition model, the medical named entity recognition model comprises an input layer, a feature extraction layer and a labeling layer which are sequentially connected, and the feature extraction layer comprises the character embedding module and the word embedding module.
8. The medical named entity recognition system of claim 7, wherein the character embedding module first performs local Transformer feature extraction and global Transformer feature extraction on text data to be recognized, respectively, and then fuses character features.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the medical named entity recognition method according to any one of claims 1 to 6 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a medical named entity recognition method according to any one of claims 1 to 6.
CN202110770186.9A 2021-07-07 2021-07-07 Medical named entity recognition method and system Pending CN113486666A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110770186.9A CN113486666A (en) 2021-07-07 2021-07-07 Medical named entity recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110770186.9A CN113486666A (en) 2021-07-07 2021-07-07 Medical named entity recognition method and system

Publications (1)

Publication Number Publication Date
CN113486666A true CN113486666A (en) 2021-10-08

Family

ID=77937878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110770186.9A Pending CN113486666A (en) 2021-07-07 2021-07-07 Medical named entity recognition method and system

Country Status (1)

Country Link
CN (1) CN113486666A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN112002409A (en) * 2020-07-27 2020-11-27 山东师范大学 Traditional Chinese medicine auxiliary diagnosis system
CN112541356A (en) * 2020-12-21 2021-03-23 山东师范大学 Method and system for recognizing biomedical named entities
CN113066572A (en) * 2021-03-03 2021-07-02 山东师范大学 Traditional Chinese medicine auxiliary diagnosis system and method for enhancing local feature extraction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN112002409A (en) * 2020-07-27 2020-11-27 山东师范大学 Traditional Chinese medicine auxiliary diagnosis system
CN112541356A (en) * 2020-12-21 2021-03-23 山东师范大学 Method and system for recognizing biomedical named entities
CN113066572A (en) * 2021-03-03 2021-07-02 山东师范大学 Traditional Chinese medicine auxiliary diagnosis system and method for enhancing local feature extraction

Similar Documents

Publication Publication Date Title
CN108959242B (en) Target entity identification method and device based on part-of-speech characteristics of Chinese characters
CN109543181B (en) Named entity model and system based on combination of active learning and deep learning
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN109960728B (en) Method and system for identifying named entities of open domain conference information
CN112541356B (en) Method and system for recognizing biomedical named entities
WO2008107305A2 (en) Search-based word segmentation method and device for language without word boundary tag
CN112100332A (en) Word embedding expression learning method and device and text recall method and device
CN111061882A (en) Knowledge graph construction method
CN111046660B (en) Method and device for identifying text professional terms
CN112417823B (en) Chinese text word order adjustment and word completion method and system
CN114153971A (en) Error-containing Chinese text error correction, identification and classification equipment
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN110287483B (en) Unregistered word recognition method and system utilizing five-stroke character root deep learning
CN117454898A (en) Method and device for realizing legal entity standardized output according to input text
CN112989839A (en) Keyword feature-based intent recognition method and system embedded in language model
CN116029300A (en) Language model training method and system for strengthening semantic features of Chinese entities
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
Kang et al. Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval
CN109960782A (en) A kind of Tibetan language segmenting method and device based on deep neural network
CN115358227A (en) Open domain relation joint extraction method and system based on phrase enhancement
CN114239555A (en) Training method of keyword extraction model and related device
Tolegen et al. Voted-perceptron approach for Kazakh morphological disambiguation
CN113486666A (en) Medical named entity recognition method and system
CN112966510A (en) Weapon equipment entity extraction method, system and storage medium based on ALBERT
CN113408267A (en) Word alignment performance improving method based on pre-training model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination