CN107908614A - A kind of name entity recognition method based on Bi LSTM - Google Patents
A kind of name entity recognition method based on Bi LSTM Download PDFInfo
- Publication number
- CN107908614A CN107908614A CN201710946713.0A CN201710946713A CN107908614A CN 107908614 A CN107908614 A CN 107908614A CN 201710946713 A CN201710946713 A CN 201710946713A CN 107908614 A CN107908614 A CN 107908614A
- Authority
- CN
- China
- Prior art keywords
- lstm
- word
- character
- msub
- mrow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Character Discrimination (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of name entity recognition method based on Bi LSTM.This method includes:1) training corpus for naming Entity recognition is labeled, forms mark language material;2) word marked in language material and character are converted into vector;3) Named Entity Extraction Model based on Bi LSTM is established using the vector of word and character, and trains the parameter of the Named Entity Extraction Model;4) trained Named Entity Extraction Model is utilized, data to be predicted are named with Entity recognition prediction.The present invention uses the word-based and vector of character, the problem of can obtaining the feature of character and word at the same time, while can also evade unregistered word;In addition using CRF model algorithms pure compared to tradition two-way shot and long term Memory Neural Networks Bi LSTM, it can absorb more characters and word feature, so as to further lift the precision of Entity recognition.
Description
Technical field
The invention belongs to information technology field, and in particular to a kind of name entity recognition method based on Bi-LSTM.
Background technology
Name Entity recognition (Named Entity Recognition, abbreviation NER) refers to having in identification text
The entity of certain sense, mainly including name, place name, mechanism name, proper noun etc..
Naming the scene of putting into practice of the recognition methods of entity includes:
Scene 1:Event detection.Place, time, personage are several basic composition parts of time, in plucking for structure event
When wanting, related person, place, unit etc. can be protruded.In event search system, relevant personage, time, place can be made
For indexing key words.Relation between several composition parts of event, event has been described in more detail from semantic level.
Scene 2:Information retrieval.Name entity can be used for improving and improving the effect of searching system, when user's input " weight
When greatly ", it can be found that what user more thought retrieval is " University Of Chongqing ", rather than its corresponding adjective implication.In addition, establishing
When inverted index, if name entity is cut into multiple words, it will cause search efficiency to reduce.In addition, search engine
Develop to the direction of semantic understanding, calculating answer.
Scene 3:Semantic network.Concept and example and its corresponding relation, such as " country " are generally comprised in semantic network
It is a concept, China is an example, and " China " is the relation between one " country " expression entity and concept.Semantic network
In example have be greatly name entity.
Scene 4:Machine translation.The translation of name entity often has some special translation rules, such as Chinese people's translation
To be represented into during English using the phonetic of name, it is famous in the posterior rule of preceding surname, and common word will translate into correspondence
English word.The name entity in text is recognized accurately, has important meaning to the effect for improving machine translation.
Scene 5:Question answering system.Accurately identify that each part to go wrong is especially important, the association area of problem,
Related notion.At present, most of question answering system can only all search for answer, and cannot calculate answer.Search for answer and carry out keyword
Matching, user manually extracts answer according to search result, and more friendly mode is that answer is calculated to be presented to user.
Some problem needs to consider the relation between entity, such as " the 45th, U.S. president " in question answering system, at present
Search engine can return to answer " Donald Trump " in a particular format.
Traditional name entity recognition method can be divided into the name entity recognition method based on dictionary, based on word frequency statistics
Method and method based on artificial nerve network model.Name entity recognition method based on dictionary, its principle are:Will be to the greatest extent
In the more different classes of entity vocabulary income dictionary of amount, when identification, is matched text message with the word in dictionary,
That mixes is then labeled as corresponding entity class.Method based on word frequency statistics, such as CRF (condition random field), its principle be
Learn the semantic information to preceding the latter word, then make classification and judge.
Name Entity recognition based on dictionary depends critically upon dictionary, it is impossible to identifies unregistered word.United based on word frequency
It can only associate the semanteme of the previous word of current word by the HMM (hidden Markov) of meter and CRF (condition random field) method, identification
Precision is not high enough, and the discrimination of especially unregistered word is relatively low.Method based on artificial nerve network model, exists in training
Gradient disappearance problem, and the network number of plies is few in actual application, it is final to name Entity recognition result advantage unobvious.
The content of the invention
The present invention is based on Bi-LSTM (Bi-directional Long Short-Term in view of the above-mentioned problems, providing one kind
Memory, two-way shot and long term Memory Neural Networks) name entity recognition method, can effectively improve name Entity recognition essence
Degree.
In the present invention, posting term refers to being already present in the word in vocabulary, and unregistered word refers to not appearing in word
Word in table.
The technical solution adopted by the present invention is as follows:
A kind of name entity recognition method based on Bi-LSTM, it is characterised in that comprise the following steps:
1) training corpus for naming Entity recognition is labeled, forms mark language material;
2) word marked in language material and character are converted into vector;
3) Named Entity Extraction Model based on Bi-LSTM is established using the vector of word and character, and the training name is real
The parameter of body identification model;
4) trained Named Entity Extraction Model is utilized, data to be predicted are named with Entity recognition prediction.
Further, step 1) is labeled training corpus in the way of IOBES.
Further, the word of input is converted into vector by step 2) first, then carries out each character in word
Disassemble, all characters for being included word with Bi-LSTM models are converted into vector, and the vector converted to word and character is spelled
Connect.
Further, step 3) trains the parameter of Named Entity Extraction Model using Adam gradient descent algorithms.
Further, training corpus is carried out subordinate sentence by step 3) during training parameter according to Chinese syntactic rule
Processing, and the sentence data 0 for being less than neuron number to character length after subordinate sentence are filled.
Further, it is random without in the slave training corpus data set put back to every time in the iteration of Adam gradient descent algorithms
Random one sentence packet of selection, iterative data of some sentences as model single is extracted from sentence packet.
Further, step 4) pre-processes data to be predicted first, then into the vectorization of line character and word
Processing, is then named Entity recognition prediction.
Further, the pretreatment includes subordinate sentence processing and word segmentation processing;The vectorization processing includes term vector
Processing, character vectorization processing, and term vector, character vector are spliced.
Name entity recognition method of the invention based on Bi-LSTM, using word-based and character vector, can obtain at the same time
The problem of obtaining the feature of character and word, while unregistered word can also be evaded;In addition two-way shot and long term Memory Neural Networks are used
CRF model algorithms pure compared to tradition Bi-LSTM, it can absorb more characters and word feature, so as to more into one
The precision of the lifting Entity recognition of step.
Brief description of the drawings
The step flow chart of Fig. 1 Bi-LSTM entity recognition methods of the present invention.
Fig. 2 Bi-LSTM entity recognition models schematic diagrames of the present invention.
Fig. 3 .LSTM cell schematics.
Fig. 4 .Bi-LSTM character vector structure charts.
Embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below by specific implementation case
And with reference to attached drawing, the present invention is described in further details.
The invention discloses a kind of name entity recognition method based on Bi-LSTM, for example know from non-structured text
Do not go out name, place name, mechanism name, trade (brand) name, exabyte etc..The invention solves key problem include two:1. make
The precision of name Entity recognition is improved with LSTM-CRF models;2. adding the feature of the character vector of word, solve to unregistered word
Name the identification (Out of Vocabulary, OV) of entity.
In order to improve the precision of name Entity recognition, we add Bi-LSTM character features on traditional CRF models
Add word characteristic layer with Bi-LSTM characters, its detailed structure is as shown in Figure 2, Figure 3 and Figure 4.
Because entity to be identified is all to be not logged in vocabulary under many situations, in order to improve the knowledge to unregistered word
Not, present invention adds the feature extraction of the character vector shown in Fig. 4, so as to illustrate that an entity is not only deposited with word segmentation result
In much relations, while also name " Zhao, money, grandson, Lee .. ... " work of the character with itself there are much relations, such as China
When appearance for first character, there is a strong possibility is exactly a name for its word combination result closely followed.
The name entity recognition method flow of the present invention is as shown in Figure 1.This method is divided into two stages:Training stage, in advance
The survey stage.
(1) training stage:(" training " in Fig. 1)
Step 1:Language material is marked to prepare.
Step 2:Character and term vector.
Step 3:Bi-LSTM entity recognition models are built.
Step 4:Model parameter is trained.
Step 5:Model result preserves.
(2) forecast period:(" prediction " in Fig. 1)
Step 1:Data prediction.
Step 2:Character and term vector.
Step 3:The model preserved using the step 4 of training stage (one) does prediction data Entity recognition prediction.
The specific implementation process in two stages is specifically described below.
(1) training stage:
Step 1:Language material is marked to prepare.
Training language of the language material in the way of IOBES (Inside, Other, Begin, End, Single) to Entity recognition
Material is labeled and (can also be labeled using other manner, such as replaced with 0,1,2,3,4).If it is to a participle unit
One single entity, then be labeled as (tag S- ...);Start if a participle unit is an entity, be labeled as (tag
B-…);If a participle unit is vocabulary among an entity, it is labeled as (tag I- ...);An if participle unit
It is the end of an entity, then is labeled as (tag E- ...);If a participle unit is not an entity, (tag is labeled as
O).For example " Xiao Ming is born in Yunnan, knows wound space work in Sichuan Province China province Chengdu now.", with most common in entity
Exemplified by name (PER), place name (LOC) and mechanism name (ORG), it is segmented and the result of corpus labeling is:
Xiao Ming S-PER
Be born O
In O
Yunnan S-LOC
, O
Present O
In O
Chinese B-LOC
Sichuan Province I-LOC
Chengdu E-LOC
Know B-ORG
Create space E-ORG
Work O
。O
Step 2:The vectorization of character and word.
Step 2-1:Term vector.
Because computer is only capable of calculating the type of numeric type, and the word x inputted is character type, and computer cannot be direct
Calculate, it is therefore desirable to which word is converted into numerical value vector.Word is converted into by a vector, this hair using known word2vec herein
The bright vector for word being converted into 300 dimensions.
Step 2-2:Character vector.
Carry out character vector conversion according to Bi-LSTM models shown in Fig. 4, word be converted into vector first, for example, by word " in
State " be split as two characters " in " and " state ".A numerical value ID is then translated into, to LSTM neurons before being input to
Structure, in forward direction LSTM neurons the output of i-th of neuron be finally aggregated into as the input of i+1 neuron
The numerical value vector that one dimension is 64.On the other hand, character vector is input to consequent LSTM neural units, and it is rear to
In the transmittance process of LSTM neuron neurons, the output of i+1 neuron as i-th of neuron input, finally
Also the numerical value vector for being 64 for a dimension is collected.Then, the numerical value vector that forward and backward collects is done and spliced, be spliced into one
A length is the vector of 128 dimensions.Such as the term vector " U.S. " obtained in step 2-1, " U.S. " and " state " two characters can be led to
Cross the vector that two-way Bi-LSTM models are translated into one 128 dimension.
Step 2-3:Splicing.
Two vectors that step 2-1 and 2-2 are obtained are spliced, for example can obtain a 300+ to participle " U.S. "
The vector of 128=428 dimensions.
Step 3:Establish the model of Bi-LSTM Entity recognitions.
Framework according to the Bi-LSTM entity recognition models of Fig. 2 builds the model of Entity recognition, the word that step 2 is spliced
Symbol and term vector are input in first layer LSTM neuron elements (with " I am Chinese." exemplified by, as shown in Figure 2), while the
The output of one layer of LSTM, i-th of the LSTM unit input as first layer LSTM i+1 LSTM units at the same time.Meanwhile it will walk
The character and term vector of rapid 2 splicing are input in second layer LSTM neuron elements, while second layer LSTM i+1s LSTM is mono-
The output of the member input as i-th of LSTM unit of first layer LSTM at the same time.Then by each neural unit of two-way LSTM
Input of the output as sequence labelling MODEL C RF, every input character x so as to calculateiThe y calculated by above-mentioned modeli。
And set the result of real marking in language material asConstruct a loss function L based on entropy:
Wherein, n represents training samples number.Then, this loss function L is converted into an optimization problem by the present invention,
Solve:
" O " in the CRF Layer of Fig. 2 represents non-physical type, and Loc represents place name entity.
Fig. 3 is shown in detailed LSTM units description, wherein the implication of each symbol is described as follows:
w:Parameter list to be solved.
Ci-1, Ci:The semanteme that the semantic information and preceding i character that i-1 character is accumulated before representing respectively are accumulated
Information.
hi-1, hi:The characteristic information of the characteristic information of the i-th -1 character of expression and i-th of character respectively.
f:Forget door, the accumulation semantic information (C for i-1 character before controllingi-1) retain how much.
i:Input gate, for control input data (w and hi-1) retain how much.
o:Out gate, how many characteristic information exported in the feature of i-th of character of output for controlling.
tanh:Hyperbolic tangent function.
u:tanh:With controlling i-th of character how many characteristic information to be retained in C together with input gate ii-1In.
* ,+:Represent that step-by-step carries out multiplication and step-by-step carries out addition respectively.
Step 4:The training of model parameter.
For the parameter w in solving-optimizing function L, adopted in the present invention in known Adam gradient descent algorithms training L
Parameter.During training parameter, include following several key issues:
Step 4-1:Subordinate sentence.
Training corpus is subjected to subordinate sentence processing according to Chinese syntactic rule.If liRepresent the sentence length of the i-th word, then will
|li-lj| the sentence of < δ is included into one group, and wherein δ represents sentence length interval, if the data after packet are GroupData, one
M groups are set to altogether.
Step 4-2:Input data is filled.
Because the neuron elements of its input data of the Bi-LSTM Entity recognition structural models of Fig. 2 are regular lengths, right
Character length needs to be filled with data 0 less than the sentence of Bi-LSTM entity recognition model neuron numbers after subordinate sentence.
Step 4-3:The selection of iteration batch data.
In the iteration of Adam gradient descent algorithms the present invention every time it is random without in the slave training corpus data set put back to
One sentence packet of selection of machine, extracts iterative data of the BatchSize sentence as model single from sentence packet
(numerical value of BatchSize can be selected arbitrarily).
Step 4-4:The end condition of iteration.
In the selection of the model end condition of the parameter during Adam gradient descent algorithms train L, the present invention is provided with two
A end condition:1) maximum iterations Max_Iteration;2) penalty values iteration changes | Li-Li+1| < ε.Wherein ε tables
Show an acceptable error range.
Step 5:Model result preserves.
Trained model parameter preserves during finally step 1-4 is walked, so that forecast period uses these parameters.
(2) forecast period:
Step 1:Data prediction.
The present invention mainly includes two steps to the data prediction of the forecast period of Entity recognition:
Step 1-1:Subordinate sentence.
One section of word of peer entities identification, do subordinate sentence processing first.Such as " Xiao Ming is born in China, he is Chinese, he
Love China.Xiao Li is born in the U.S., he is American, he also likes China." according to the subordinate sentence result of Chinese grammar be:
First:Xiao Ming is born in China, and I am Chinese, I likes China.
Second:Xiao Li is born in the U.S., I is American, I also likes China.
Step 1-2:Participle.
The result of the subordinate sentence of step 1-1 is segmented, the present invention is based on dictionary+HMM (hidden horses in participle known to
Er Kefu) jieba (stammerer) participles of identification of the model to unregistered word segment it.The word segmentation result of step 1-1 is:
First:Xiao Ming is born in China, he is Chinese, he likes China.
Second:Xiao Li is born in the U.S., he is American, he also likes China.
Step 2:Character and term vector.
Character and term vector can be split as three following steps:
Step 2-1:Term vector.
Word is converted into a vector, such as 1-2 points of step by the word segmentation result of step 1-2 using known word2vec
" U.S. " in word result, first word2vec can be converted to " U.S. " vector of one 300 dimension.
Step 2-2:Character vector.
" U.S. " carried out according to Bi-LSTM models shown in Fig. 4 in character vector conversion, such as step 2-1 can be " U.S. "
" state " two characters are translated into the vector of one 128 dimension by LSTM models.
Step 2-3:Splicing.
Two vectors that step 2-1 and 2-2 are obtained are spliced, for example can obtain a 300+ to participle " U.S. "
The vector of 128=428 dimensions.
Step 3:Entity recognition is predicted.
The step 2-3 vector datas spliced are input in the model of (one) training stage step 5 preservation.Obtain every
The prediction result of one input data.Sentence subordinate sentence and the input data filling to input are also needed during prediction
Operation, the prediction process of Entity recognition is just completed to this.Prediction result to word segmentation result in step 1-2 is
First:Xiao Ming/S-PER births/O /O China/S-LOC ,/O he/O is /O China/S-ORG people/O ,/O he/O
Love/O China/S-ORG./O
Second:Xiao Li/S-PER births/O /the O U.S./S-LOC ,/O he/O is /the O U.S./S-ORG people/O ,/O he/O
Also/O love/O China/S-ORG./O
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this area
Personnel can be to technical scheme technical scheme is modified or replaced equivalently, without departing from the spirit and scope of the present invention, sheet
The protection domain of invention should be subject to described in claims.
Claims (10)
1. a kind of name entity recognition method based on Bi-LSTM, it is characterised in that comprise the following steps:
1) training corpus for naming Entity recognition is labeled, forms mark language material;
2) word marked in language material and character are converted into vector;
3) Named Entity Extraction Model based on Bi-LSTM, and training name entity knowledge are established using the vector of word and character
The parameter of other model;
4) trained Named Entity Extraction Model is utilized, data to be predicted are named with Entity recognition prediction.
2. the method as described in claim 1, it is characterised in that step 1) is in the way of IOBES to training corpus into rower
Note.
3. the method as described in claim 1, it is characterised in that the word of input is converted into vector by step 2) first, then will
Each character in word is disassembled, and all characters for being included word with Bi-LSTM models are converted into vector, and to word
Spliced with the vector of character conversion.
4. method as claimed in claim 3, it is characterised in that the name Entity recognition mould based on Bi-LSTM described in step 3)
Type includes LSTM layers and CRF layers, and the character and term vector of step 2) splicing are input in first layer LSTM neuron elements, and first
Input of the output of layer i-th of LSTM unit of LSTM as first layer LSTM i+1 LSTM units;Step 2) is spliced at the same time
Character and term vector be input in second layer LSTM neuron elements, the output of second layer LSTM i+1 LSTM units is same
The input of Shi Zuowei first layers i-th of LSTM unit of LSTM;Then using the output of two-way each neural unit of LSTM as
The input of CRF models, so as to calculate corresponding each input character xiYi, and set the result of real marking in language material as
Construct a loss function L based on entropy:
<mrow>
<mi>L</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<munder>
<mo>&Sigma;</mo>
<mi>i</mi>
</munder>
<msub>
<mover>
<mi>y</mi>
<mo>&OverBar;</mo>
</mover>
<mi>i</mi>
</msub>
<mi>l</mi>
<mi>o</mi>
<mi>g</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>-</mo>
<msub>
<mover>
<mi>y</mi>
<mo>&OverBar;</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mi>l</mi>
<mi>o</mi>
<mi>g</mi>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>-</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
Wherein, n represents training samples number;This loss function L is then converted into an optimization problem, is solved:
<mrow>
<mi>M</mi>
<mi>i</mi>
<mi>n</mi>
<mi> </mi>
<mi>L</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<msub>
<mi>&Sigma;</mi>
<mi>i</mi>
</msub>
<msub>
<mover>
<mi>y</mi>
<mo>&OverBar;</mo>
</mover>
<mi>i</mi>
</msub>
<mi>l</mi>
<mi>o</mi>
<mi>g</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>-</mo>
<msub>
<mover>
<mi>y</mi>
<mo>&OverBar;</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mi>l</mi>
<mi>o</mi>
<mi>g</mi>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>-</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>.</mo>
</mrow>
5. method as claimed in claim 4, it is characterised in that step 3) is using the ginseng in Adam gradient descent algorithms training L
Number.
6. method as claimed in claim 5, it is characterised in that step 3) presses training corpus during training parameter
Subordinate sentence processing is carried out according to Chinese syntactic rule, and the sentence data 0 for being less than neuron number to character length after subordinate sentence are filled.
7. method as claimed in claim 6, it is characterised in that random nothing is put every time in the iteration of Adam gradient descent algorithms
Random one sentence packet of selection, some sentences are extracted as mould from sentence packet in the slave training corpus data set returned
The iterative data of type single.
8. method as claimed in claim 5, it is characterised in that the end condition of iteration is in Adam gradient descent algorithms:1)
Maximum iterations;2) penalty values iteration changes | Li-Li+1| < ε, wherein ε represent acceptable error range.
9. the method as described in claim 1, it is characterised in that step 4) pre-processes data to be predicted first, so
The vectorization processing of laggard line character and word, is then named Entity recognition prediction.
10. method as claimed in claim 9, it is characterised in that the pretreatment includes subordinate sentence processing and word segmentation processing;It is described
Vectorization processing includes term vectorization processing, character vectorization processing, and term vector, character vector are spliced.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710946713.0A CN107908614A (en) | 2017-10-12 | 2017-10-12 | A kind of name entity recognition method based on Bi LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710946713.0A CN107908614A (en) | 2017-10-12 | 2017-10-12 | A kind of name entity recognition method based on Bi LSTM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107908614A true CN107908614A (en) | 2018-04-13 |
Family
ID=61840480
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710946713.0A Withdrawn CN107908614A (en) | 2017-10-12 | 2017-10-12 | A kind of name entity recognition method based on Bi LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107908614A (en) |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108768824A (en) * | 2018-05-15 | 2018-11-06 | 腾讯科技(深圳)有限公司 | Information processing method and device |
CN108845988A (en) * | 2018-06-07 | 2018-11-20 | 苏州大学 | A kind of entity recognition method, device, equipment and computer readable storage medium |
CN108920445A (en) * | 2018-04-23 | 2018-11-30 | 华中科技大学鄂州工业技术研究院 | A kind of name entity recognition method and device based on Bi-LSTM-CRF model |
CN108932229A (en) * | 2018-06-13 | 2018-12-04 | 北京信息科技大学 | A kind of money article proneness analysis method |
CN109165279A (en) * | 2018-09-06 | 2019-01-08 | 深圳和而泰数据资源与云技术有限公司 | information extraction method and device |
CN109241520A (en) * | 2018-07-18 | 2019-01-18 | 五邑大学 | A kind of sentence trunk analysis method and system based on the multilayer error Feedback Neural Network for segmenting and naming Entity recognition |
CN109271631A (en) * | 2018-09-12 | 2019-01-25 | 广州多益网络股份有限公司 | Segmenting method, device, equipment and storage medium |
CN109284400A (en) * | 2018-11-28 | 2019-01-29 | 电子科技大学 | A kind of name entity recognition method based on Lattice LSTM and language model |
CN109472026A (en) * | 2018-10-31 | 2019-03-15 | 北京国信云服科技有限公司 | Accurate emotion information extracting methods a kind of while for multiple name entities |
CN109493956A (en) * | 2018-10-15 | 2019-03-19 | 海口市人民医院(中南大学湘雅医学院附属海口医院) | Diagnosis guiding method |
CN109493265A (en) * | 2018-11-05 | 2019-03-19 | 北京奥法科技有限公司 | A kind of Policy Interpretation method and Policy Interpretation system based on deep learning |
CN109522546A (en) * | 2018-10-12 | 2019-03-26 | 浙江大学 | Entity recognition method is named based on context-sensitive medicine |
CN109543151A (en) * | 2018-10-31 | 2019-03-29 | 昆明理工大学 | A method of improving Laotian part-of-speech tagging accuracy rate |
CN109710927A (en) * | 2018-12-12 | 2019-05-03 | 东软集团股份有限公司 | Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity |
CN109753650A (en) * | 2018-12-14 | 2019-05-14 | 昆明理工大学 | A kind of Laotian name place name entity recognition method merging multiple features |
CN109815952A (en) * | 2019-01-24 | 2019-05-28 | 珠海市筑巢科技有限公司 | Brand name recognition methods, computer installation and computer readable storage medium |
CN109871545A (en) * | 2019-04-22 | 2019-06-11 | 京东方科技集团股份有限公司 | Name entity recognition method and device |
CN110162772A (en) * | 2018-12-13 | 2019-08-23 | 北京三快在线科技有限公司 | Name entity recognition method and device |
CN110222343A (en) * | 2019-06-13 | 2019-09-10 | 电子科技大学 | A kind of Chinese medicine plant resource name entity recognition method |
CN110232192A (en) * | 2019-06-19 | 2019-09-13 | 中国电力科学研究院有限公司 | Electric power term names entity recognition method and device |
CN110309769A (en) * | 2019-06-28 | 2019-10-08 | 北京邮电大学 | The method that character string in a kind of pair of picture is split |
CN110334357A (en) * | 2019-07-18 | 2019-10-15 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition |
CN110377731A (en) * | 2019-06-18 | 2019-10-25 | 深圳壹账通智能科技有限公司 | Complain text handling method, device, computer equipment and storage medium |
WO2019228466A1 (en) * | 2018-06-01 | 2019-12-05 | 中兴通讯股份有限公司 | Named entity recognition method, device and apparatus, and storage medium |
CN110717331A (en) * | 2019-10-21 | 2020-01-21 | 北京爱医博通信息技术有限公司 | Neural network-based Chinese named entity recognition method, device, equipment and storage medium |
CN110738319A (en) * | 2019-11-11 | 2020-01-31 | 四川隧唐科技股份有限公司 | LSTM model unit training method and device for recognizing bid-winning units based on CRF |
CN111191107A (en) * | 2018-10-25 | 2020-05-22 | 北京嘀嘀无限科技发展有限公司 | System and method for recalling points of interest using annotation model |
CN111310472A (en) * | 2020-01-19 | 2020-06-19 | 合肥讯飞数码科技有限公司 | Alias generation method, device and equipment |
CN111414757A (en) * | 2019-01-04 | 2020-07-14 | 阿里巴巴集团控股有限公司 | Text recognition method and device |
CN111428500A (en) * | 2019-01-09 | 2020-07-17 | 阿里巴巴集团控股有限公司 | Named entity identification method and device |
CN111428501A (en) * | 2019-01-09 | 2020-07-17 | 北大方正集团有限公司 | Named entity recognition method, recognition system and computer readable storage medium |
CN111476022A (en) * | 2020-05-15 | 2020-07-31 | 湖南工商大学 | Method, system and medium for recognizing STM entity by embedding and mixing L characters of entity characteristics |
CN111523325A (en) * | 2020-04-20 | 2020-08-11 | 电子科技大学 | Chinese named entity recognition method based on strokes |
CN111581387A (en) * | 2020-05-09 | 2020-08-25 | 电子科技大学 | Entity relation joint extraction method based on loss optimization |
WO2020232882A1 (en) * | 2019-05-20 | 2020-11-26 | 平安科技(深圳)有限公司 | Named entity recognition method and apparatus, device, and computer readable storage medium |
CN112036178A (en) * | 2020-08-25 | 2020-12-04 | 国家电网有限公司 | Distribution network entity related semantic search method |
CN112101023A (en) * | 2020-10-29 | 2020-12-18 | 深圳市欢太科技有限公司 | Text processing method and device and electronic equipment |
CN114385795A (en) * | 2021-08-05 | 2022-04-22 | 应急管理部通信信息中心 | Accident information extraction method and device and electronic equipment |
CN111401064B (en) * | 2019-01-02 | 2024-04-19 | ***通信有限公司研究院 | Named entity identification method and device and terminal equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140236578A1 (en) * | 2013-02-15 | 2014-08-21 | Nec Laboratories America, Inc. | Question-Answering by Recursive Parse Tree Descent |
CN104899304A (en) * | 2015-06-12 | 2015-09-09 | 北京京东尚科信息技术有限公司 | Named entity identification method and device |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106682220A (en) * | 2017-01-04 | 2017-05-17 | 华南理工大学 | Online traditional Chinese medicine text named entity identifying method based on deep learning |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
-
2017
- 2017-10-12 CN CN201710946713.0A patent/CN107908614A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140236578A1 (en) * | 2013-02-15 | 2014-08-21 | Nec Laboratories America, Inc. | Question-Answering by Recursive Parse Tree Descent |
CN104899304A (en) * | 2015-06-12 | 2015-09-09 | 北京京东尚科信息技术有限公司 | Named entity identification method and device |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106682220A (en) * | 2017-01-04 | 2017-05-17 | 华南理工大学 | Online traditional Chinese medicine text named entity identifying method based on deep learning |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
Non-Patent Citations (1)
Title |
---|
ONUR KURU等: "CharNER:Character-Level Named Entity Recognition", 《THE 26TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS: TECHNICAL PAPERS》 * |
Cited By (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920445A (en) * | 2018-04-23 | 2018-11-30 | 华中科技大学鄂州工业技术研究院 | A kind of name entity recognition method and device based on Bi-LSTM-CRF model |
CN108920445B (en) * | 2018-04-23 | 2022-06-17 | 华中科技大学鄂州工业技术研究院 | Named entity identification method and device based on Bi-LSTM-CRF model |
CN108768824A (en) * | 2018-05-15 | 2018-11-06 | 腾讯科技(深圳)有限公司 | Information processing method and device |
WO2019228466A1 (en) * | 2018-06-01 | 2019-12-05 | 中兴通讯股份有限公司 | Named entity recognition method, device and apparatus, and storage medium |
CN108845988B (en) * | 2018-06-07 | 2022-06-10 | 苏州大学 | Entity identification method, device, equipment and computer readable storage medium |
CN108845988A (en) * | 2018-06-07 | 2018-11-20 | 苏州大学 | A kind of entity recognition method, device, equipment and computer readable storage medium |
CN108932229A (en) * | 2018-06-13 | 2018-12-04 | 北京信息科技大学 | A kind of money article proneness analysis method |
CN109241520A (en) * | 2018-07-18 | 2019-01-18 | 五邑大学 | A kind of sentence trunk analysis method and system based on the multilayer error Feedback Neural Network for segmenting and naming Entity recognition |
CN109165279A (en) * | 2018-09-06 | 2019-01-08 | 深圳和而泰数据资源与云技术有限公司 | information extraction method and device |
CN109271631A (en) * | 2018-09-12 | 2019-01-25 | 广州多益网络股份有限公司 | Segmenting method, device, equipment and storage medium |
CN109271631B (en) * | 2018-09-12 | 2023-01-24 | 广州多益网络股份有限公司 | Word segmentation method, device, equipment and storage medium |
CN109522546A (en) * | 2018-10-12 | 2019-03-26 | 浙江大学 | Entity recognition method is named based on context-sensitive medicine |
CN109493956A (en) * | 2018-10-15 | 2019-03-19 | 海口市人民医院(中南大学湘雅医学院附属海口医院) | Diagnosis guiding method |
US11093531B2 (en) | 2018-10-25 | 2021-08-17 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for recalling points of interest using a tagging model |
CN111191107A (en) * | 2018-10-25 | 2020-05-22 | 北京嘀嘀无限科技发展有限公司 | System and method for recalling points of interest using annotation model |
CN111191107B (en) * | 2018-10-25 | 2023-06-30 | 北京嘀嘀无限科技发展有限公司 | System and method for recalling points of interest using annotation model |
CN109543151A (en) * | 2018-10-31 | 2019-03-29 | 昆明理工大学 | A method of improving Laotian part-of-speech tagging accuracy rate |
CN109472026A (en) * | 2018-10-31 | 2019-03-15 | 北京国信云服科技有限公司 | Accurate emotion information extracting methods a kind of while for multiple name entities |
CN109543151B (en) * | 2018-10-31 | 2021-05-25 | 昆明理工大学 | Method for improving wording accuracy of Laos language |
CN109493265A (en) * | 2018-11-05 | 2019-03-19 | 北京奥法科技有限公司 | A kind of Policy Interpretation method and Policy Interpretation system based on deep learning |
CN109284400A (en) * | 2018-11-28 | 2019-01-29 | 电子科技大学 | A kind of name entity recognition method based on Lattice LSTM and language model |
CN109710927B (en) * | 2018-12-12 | 2022-12-20 | 东软集团股份有限公司 | Named entity identification method and device, readable storage medium and electronic equipment |
CN109710927A (en) * | 2018-12-12 | 2019-05-03 | 东软集团股份有限公司 | Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity |
CN110162772A (en) * | 2018-12-13 | 2019-08-23 | 北京三快在线科技有限公司 | Name entity recognition method and device |
CN110162772B (en) * | 2018-12-13 | 2020-06-26 | 北京三快在线科技有限公司 | Named entity identification method and device |
CN109753650A (en) * | 2018-12-14 | 2019-05-14 | 昆明理工大学 | A kind of Laotian name place name entity recognition method merging multiple features |
CN111401064B (en) * | 2019-01-02 | 2024-04-19 | ***通信有限公司研究院 | Named entity identification method and device and terminal equipment |
CN111414757A (en) * | 2019-01-04 | 2020-07-14 | 阿里巴巴集团控股有限公司 | Text recognition method and device |
CN111414757B (en) * | 2019-01-04 | 2023-06-20 | 阿里巴巴集团控股有限公司 | Text recognition method and device |
CN111428500A (en) * | 2019-01-09 | 2020-07-17 | 阿里巴巴集团控股有限公司 | Named entity identification method and device |
CN111428501A (en) * | 2019-01-09 | 2020-07-17 | 北大方正集团有限公司 | Named entity recognition method, recognition system and computer readable storage medium |
CN111428500B (en) * | 2019-01-09 | 2023-04-25 | 阿里巴巴集团控股有限公司 | Named entity identification method and device |
CN109815952A (en) * | 2019-01-24 | 2019-05-28 | 珠海市筑巢科技有限公司 | Brand name recognition methods, computer installation and computer readable storage medium |
CN109871545A (en) * | 2019-04-22 | 2019-06-11 | 京东方科技集团股份有限公司 | Name entity recognition method and device |
WO2020215870A1 (en) * | 2019-04-22 | 2020-10-29 | 京东方科技集团股份有限公司 | Named entity identification method and apparatus |
CN109871545B (en) * | 2019-04-22 | 2022-08-05 | 京东方科技集团股份有限公司 | Named entity identification method and device |
US11574124B2 (en) | 2019-04-22 | 2023-02-07 | Boe Technology Group Co., Ltd. | Method and apparatus of recognizing named entity |
WO2020232882A1 (en) * | 2019-05-20 | 2020-11-26 | 平安科技(深圳)有限公司 | Named entity recognition method and apparatus, device, and computer readable storage medium |
CN110222343A (en) * | 2019-06-13 | 2019-09-10 | 电子科技大学 | A kind of Chinese medicine plant resource name entity recognition method |
CN110377731A (en) * | 2019-06-18 | 2019-10-25 | 深圳壹账通智能科技有限公司 | Complain text handling method, device, computer equipment and storage medium |
CN110232192A (en) * | 2019-06-19 | 2019-09-13 | 中国电力科学研究院有限公司 | Electric power term names entity recognition method and device |
CN110309769B (en) * | 2019-06-28 | 2021-06-15 | 北京邮电大学 | Method for segmenting character strings in picture |
CN110309769A (en) * | 2019-06-28 | 2019-10-08 | 北京邮电大学 | The method that character string in a kind of pair of picture is split |
CN110334357A (en) * | 2019-07-18 | 2019-10-15 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition |
CN110717331B (en) * | 2019-10-21 | 2023-10-24 | 北京爱医博通信息技术有限公司 | Chinese named entity recognition method, device and equipment based on neural network and storage medium |
CN110717331A (en) * | 2019-10-21 | 2020-01-21 | 北京爱医博通信息技术有限公司 | Neural network-based Chinese named entity recognition method, device, equipment and storage medium |
CN110738319A (en) * | 2019-11-11 | 2020-01-31 | 四川隧唐科技股份有限公司 | LSTM model unit training method and device for recognizing bid-winning units based on CRF |
CN111310472B (en) * | 2020-01-19 | 2024-02-09 | 合肥讯飞数码科技有限公司 | Alias generation method, device and equipment |
CN111310472A (en) * | 2020-01-19 | 2020-06-19 | 合肥讯飞数码科技有限公司 | Alias generation method, device and equipment |
CN111523325A (en) * | 2020-04-20 | 2020-08-11 | 电子科技大学 | Chinese named entity recognition method based on strokes |
CN111581387A (en) * | 2020-05-09 | 2020-08-25 | 电子科技大学 | Entity relation joint extraction method based on loss optimization |
CN111581387B (en) * | 2020-05-09 | 2022-10-11 | 电子科技大学 | Entity relation joint extraction method based on loss optimization |
CN111476022B (en) * | 2020-05-15 | 2023-07-07 | 湖南工商大学 | Character embedding and mixed LSTM entity identification method, system and medium for entity characteristics |
CN111476022A (en) * | 2020-05-15 | 2020-07-31 | 湖南工商大学 | Method, system and medium for recognizing STM entity by embedding and mixing L characters of entity characteristics |
CN112036178A (en) * | 2020-08-25 | 2020-12-04 | 国家电网有限公司 | Distribution network entity related semantic search method |
CN112101023B (en) * | 2020-10-29 | 2022-12-06 | 深圳市欢太科技有限公司 | Text processing method and device and electronic equipment |
CN112101023A (en) * | 2020-10-29 | 2020-12-18 | 深圳市欢太科技有限公司 | Text processing method and device and electronic equipment |
CN114385795A (en) * | 2021-08-05 | 2022-04-22 | 应急管理部通信信息中心 | Accident information extraction method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107908614A (en) | A kind of name entity recognition method based on Bi LSTM | |
CN107885721A (en) | A kind of name entity recognition method based on LSTM | |
CN107291693B (en) | Semantic calculation method for improved word vector model | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN109684642B (en) | Abstract extraction method combining page parsing rule and NLP text vectorization | |
CN107797987B (en) | Bi-LSTM-CNN-based mixed corpus named entity identification method | |
CN110134946B (en) | Machine reading understanding method for complex data | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN110362819B (en) | Text emotion analysis method based on convolutional neural network | |
CN107832289A (en) | A kind of name entity recognition method based on LSTM CNN | |
CN110597998A (en) | Military scenario entity relationship extraction method and device combined with syntactic analysis | |
CN108874896B (en) | Humor identification method based on neural network and humor characteristics | |
CN104778256B (en) | A kind of the quick of field question answering system consulting can increment clustering method | |
CN107977353A (en) | A kind of mixing language material name entity recognition method based on LSTM-CNN | |
CN107967251A (en) | A kind of name entity recognition method based on Bi-LSTM-CNN | |
CN111274794B (en) | Synonym expansion method based on transmission | |
CN107894975A (en) | A kind of segmenting method based on Bi LSTM | |
CN107797988A (en) | A kind of mixing language material name entity recognition method based on Bi LSTM | |
CN112364623A (en) | Bi-LSTM-CRF-based three-in-one word notation Chinese lexical analysis method | |
CN113312922A (en) | Improved chapter-level triple information extraction method | |
Ayifu et al. | Multilingual named entity recognition based on the BiGRU-CNN-CRF hybrid model | |
CN107844475A (en) | A kind of segmenting method based on LSTM | |
CN107894976A (en) | A kind of mixing language material segmenting method based on Bi LSTM | |
CN107943783A (en) | A kind of segmenting method based on LSTM CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180413 |
|
WW01 | Invention patent application withdrawn after publication |