CN107908614A - A kind of name entity recognition method based on Bi LSTM - Google Patents

A kind of name entity recognition method based on Bi LSTM Download PDF

Info

Publication number
CN107908614A
CN107908614A CN201710946713.0A CN201710946713A CN107908614A CN 107908614 A CN107908614 A CN 107908614A CN 201710946713 A CN201710946713 A CN 201710946713A CN 107908614 A CN107908614 A CN 107908614A
Authority
CN
China
Prior art keywords
lstm
word
character
msub
mrow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710946713.0A
Other languages
Chinese (zh)
Inventor
岳永鹏
唐华阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Future Information Technology Co Ltd
Original Assignee
Beijing Future Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Future Information Technology Co Ltd filed Critical Beijing Future Information Technology Co Ltd
Priority to CN201710946713.0A priority Critical patent/CN107908614A/en
Publication of CN107908614A publication Critical patent/CN107908614A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Character Discrimination (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of name entity recognition method based on Bi LSTM.This method includes:1) training corpus for naming Entity recognition is labeled, forms mark language material;2) word marked in language material and character are converted into vector;3) Named Entity Extraction Model based on Bi LSTM is established using the vector of word and character, and trains the parameter of the Named Entity Extraction Model;4) trained Named Entity Extraction Model is utilized, data to be predicted are named with Entity recognition prediction.The present invention uses the word-based and vector of character, the problem of can obtaining the feature of character and word at the same time, while can also evade unregistered word;In addition using CRF model algorithms pure compared to tradition two-way shot and long term Memory Neural Networks Bi LSTM, it can absorb more characters and word feature, so as to further lift the precision of Entity recognition.

Description

A kind of name entity recognition method based on Bi-LSTM
Technical field
The invention belongs to information technology field, and in particular to a kind of name entity recognition method based on Bi-LSTM.
Background technology
Name Entity recognition (Named Entity Recognition, abbreviation NER) refers to having in identification text The entity of certain sense, mainly including name, place name, mechanism name, proper noun etc..
Naming the scene of putting into practice of the recognition methods of entity includes:
Scene 1:Event detection.Place, time, personage are several basic composition parts of time, in plucking for structure event When wanting, related person, place, unit etc. can be protruded.In event search system, relevant personage, time, place can be made For indexing key words.Relation between several composition parts of event, event has been described in more detail from semantic level.
Scene 2:Information retrieval.Name entity can be used for improving and improving the effect of searching system, when user's input " weight When greatly ", it can be found that what user more thought retrieval is " University Of Chongqing ", rather than its corresponding adjective implication.In addition, establishing When inverted index, if name entity is cut into multiple words, it will cause search efficiency to reduce.In addition, search engine Develop to the direction of semantic understanding, calculating answer.
Scene 3:Semantic network.Concept and example and its corresponding relation, such as " country " are generally comprised in semantic network It is a concept, China is an example, and " China " is the relation between one " country " expression entity and concept.Semantic network In example have be greatly name entity.
Scene 4:Machine translation.The translation of name entity often has some special translation rules, such as Chinese people's translation To be represented into during English using the phonetic of name, it is famous in the posterior rule of preceding surname, and common word will translate into correspondence English word.The name entity in text is recognized accurately, has important meaning to the effect for improving machine translation.
Scene 5:Question answering system.Accurately identify that each part to go wrong is especially important, the association area of problem, Related notion.At present, most of question answering system can only all search for answer, and cannot calculate answer.Search for answer and carry out keyword Matching, user manually extracts answer according to search result, and more friendly mode is that answer is calculated to be presented to user. Some problem needs to consider the relation between entity, such as " the 45th, U.S. president " in question answering system, at present Search engine can return to answer " Donald Trump " in a particular format.
Traditional name entity recognition method can be divided into the name entity recognition method based on dictionary, based on word frequency statistics Method and method based on artificial nerve network model.Name entity recognition method based on dictionary, its principle are:Will be to the greatest extent In the more different classes of entity vocabulary income dictionary of amount, when identification, is matched text message with the word in dictionary, That mixes is then labeled as corresponding entity class.Method based on word frequency statistics, such as CRF (condition random field), its principle be Learn the semantic information to preceding the latter word, then make classification and judge.
Name Entity recognition based on dictionary depends critically upon dictionary, it is impossible to identifies unregistered word.United based on word frequency It can only associate the semanteme of the previous word of current word by the HMM (hidden Markov) of meter and CRF (condition random field) method, identification Precision is not high enough, and the discrimination of especially unregistered word is relatively low.Method based on artificial nerve network model, exists in training Gradient disappearance problem, and the network number of plies is few in actual application, it is final to name Entity recognition result advantage unobvious.
The content of the invention
The present invention is based on Bi-LSTM (Bi-directional Long Short-Term in view of the above-mentioned problems, providing one kind Memory, two-way shot and long term Memory Neural Networks) name entity recognition method, can effectively improve name Entity recognition essence Degree.
In the present invention, posting term refers to being already present in the word in vocabulary, and unregistered word refers to not appearing in word Word in table.
The technical solution adopted by the present invention is as follows:
A kind of name entity recognition method based on Bi-LSTM, it is characterised in that comprise the following steps:
1) training corpus for naming Entity recognition is labeled, forms mark language material;
2) word marked in language material and character are converted into vector;
3) Named Entity Extraction Model based on Bi-LSTM is established using the vector of word and character, and the training name is real The parameter of body identification model;
4) trained Named Entity Extraction Model is utilized, data to be predicted are named with Entity recognition prediction.
Further, step 1) is labeled training corpus in the way of IOBES.
Further, the word of input is converted into vector by step 2) first, then carries out each character in word Disassemble, all characters for being included word with Bi-LSTM models are converted into vector, and the vector converted to word and character is spelled Connect.
Further, step 3) trains the parameter of Named Entity Extraction Model using Adam gradient descent algorithms.
Further, training corpus is carried out subordinate sentence by step 3) during training parameter according to Chinese syntactic rule Processing, and the sentence data 0 for being less than neuron number to character length after subordinate sentence are filled.
Further, it is random without in the slave training corpus data set put back to every time in the iteration of Adam gradient descent algorithms Random one sentence packet of selection, iterative data of some sentences as model single is extracted from sentence packet.
Further, step 4) pre-processes data to be predicted first, then into the vectorization of line character and word Processing, is then named Entity recognition prediction.
Further, the pretreatment includes subordinate sentence processing and word segmentation processing;The vectorization processing includes term vector Processing, character vectorization processing, and term vector, character vector are spliced.
Name entity recognition method of the invention based on Bi-LSTM, using word-based and character vector, can obtain at the same time The problem of obtaining the feature of character and word, while unregistered word can also be evaded;In addition two-way shot and long term Memory Neural Networks are used CRF model algorithms pure compared to tradition Bi-LSTM, it can absorb more characters and word feature, so as to more into one The precision of the lifting Entity recognition of step.
Brief description of the drawings
The step flow chart of Fig. 1 Bi-LSTM entity recognition methods of the present invention.
Fig. 2 Bi-LSTM entity recognition models schematic diagrames of the present invention.
Fig. 3 .LSTM cell schematics.
Fig. 4 .Bi-LSTM character vector structure charts.
Embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below by specific implementation case And with reference to attached drawing, the present invention is described in further details.
The invention discloses a kind of name entity recognition method based on Bi-LSTM, for example know from non-structured text Do not go out name, place name, mechanism name, trade (brand) name, exabyte etc..The invention solves key problem include two:1. make The precision of name Entity recognition is improved with LSTM-CRF models;2. adding the feature of the character vector of word, solve to unregistered word Name the identification (Out of Vocabulary, OV) of entity.
In order to improve the precision of name Entity recognition, we add Bi-LSTM character features on traditional CRF models Add word characteristic layer with Bi-LSTM characters, its detailed structure is as shown in Figure 2, Figure 3 and Figure 4.
Because entity to be identified is all to be not logged in vocabulary under many situations, in order to improve the knowledge to unregistered word Not, present invention adds the feature extraction of the character vector shown in Fig. 4, so as to illustrate that an entity is not only deposited with word segmentation result In much relations, while also name " Zhao, money, grandson, Lee .. ... " work of the character with itself there are much relations, such as China When appearance for first character, there is a strong possibility is exactly a name for its word combination result closely followed.
The name entity recognition method flow of the present invention is as shown in Figure 1.This method is divided into two stages:Training stage, in advance The survey stage.
(1) training stage:(" training " in Fig. 1)
Step 1:Language material is marked to prepare.
Step 2:Character and term vector.
Step 3:Bi-LSTM entity recognition models are built.
Step 4:Model parameter is trained.
Step 5:Model result preserves.
(2) forecast period:(" prediction " in Fig. 1)
Step 1:Data prediction.
Step 2:Character and term vector.
Step 3:The model preserved using the step 4 of training stage (one) does prediction data Entity recognition prediction.
The specific implementation process in two stages is specifically described below.
(1) training stage:
Step 1:Language material is marked to prepare.
Training language of the language material in the way of IOBES (Inside, Other, Begin, End, Single) to Entity recognition Material is labeled and (can also be labeled using other manner, such as replaced with 0,1,2,3,4).If it is to a participle unit One single entity, then be labeled as (tag S- ...);Start if a participle unit is an entity, be labeled as (tag B-…);If a participle unit is vocabulary among an entity, it is labeled as (tag I- ...);An if participle unit It is the end of an entity, then is labeled as (tag E- ...);If a participle unit is not an entity, (tag is labeled as O).For example " Xiao Ming is born in Yunnan, knows wound space work in Sichuan Province China province Chengdu now.", with most common in entity Exemplified by name (PER), place name (LOC) and mechanism name (ORG), it is segmented and the result of corpus labeling is:
Xiao Ming S-PER
Be born O
In O
Yunnan S-LOC
, O
Present O
In O
Chinese B-LOC
Sichuan Province I-LOC
Chengdu E-LOC
Know B-ORG
Create space E-ORG
Work O
。O
Step 2:The vectorization of character and word.
Step 2-1:Term vector.
Because computer is only capable of calculating the type of numeric type, and the word x inputted is character type, and computer cannot be direct Calculate, it is therefore desirable to which word is converted into numerical value vector.Word is converted into by a vector, this hair using known word2vec herein The bright vector for word being converted into 300 dimensions.
Step 2-2:Character vector.
Carry out character vector conversion according to Bi-LSTM models shown in Fig. 4, word be converted into vector first, for example, by word " in State " be split as two characters " in " and " state ".A numerical value ID is then translated into, to LSTM neurons before being input to Structure, in forward direction LSTM neurons the output of i-th of neuron be finally aggregated into as the input of i+1 neuron The numerical value vector that one dimension is 64.On the other hand, character vector is input to consequent LSTM neural units, and it is rear to In the transmittance process of LSTM neuron neurons, the output of i+1 neuron as i-th of neuron input, finally Also the numerical value vector for being 64 for a dimension is collected.Then, the numerical value vector that forward and backward collects is done and spliced, be spliced into one A length is the vector of 128 dimensions.Such as the term vector " U.S. " obtained in step 2-1, " U.S. " and " state " two characters can be led to Cross the vector that two-way Bi-LSTM models are translated into one 128 dimension.
Step 2-3:Splicing.
Two vectors that step 2-1 and 2-2 are obtained are spliced, for example can obtain a 300+ to participle " U.S. " The vector of 128=428 dimensions.
Step 3:Establish the model of Bi-LSTM Entity recognitions.
Framework according to the Bi-LSTM entity recognition models of Fig. 2 builds the model of Entity recognition, the word that step 2 is spliced Symbol and term vector are input in first layer LSTM neuron elements (with " I am Chinese." exemplified by, as shown in Figure 2), while the The output of one layer of LSTM, i-th of the LSTM unit input as first layer LSTM i+1 LSTM units at the same time.Meanwhile it will walk The character and term vector of rapid 2 splicing are input in second layer LSTM neuron elements, while second layer LSTM i+1s LSTM is mono- The output of the member input as i-th of LSTM unit of first layer LSTM at the same time.Then by each neural unit of two-way LSTM Input of the output as sequence labelling MODEL C RF, every input character x so as to calculateiThe y calculated by above-mentioned modeli。 And set the result of real marking in language material asConstruct a loss function L based on entropy:
Wherein, n represents training samples number.Then, this loss function L is converted into an optimization problem by the present invention, Solve:
" O " in the CRF Layer of Fig. 2 represents non-physical type, and Loc represents place name entity.
Fig. 3 is shown in detailed LSTM units description, wherein the implication of each symbol is described as follows:
w:Parameter list to be solved.
Ci-1, Ci:The semanteme that the semantic information and preceding i character that i-1 character is accumulated before representing respectively are accumulated Information.
hi-1, hi:The characteristic information of the characteristic information of the i-th -1 character of expression and i-th of character respectively.
f:Forget door, the accumulation semantic information (C for i-1 character before controllingi-1) retain how much.
i:Input gate, for control input data (w and hi-1) retain how much.
o:Out gate, how many characteristic information exported in the feature of i-th of character of output for controlling.
tanh:Hyperbolic tangent function.
u:tanh:With controlling i-th of character how many characteristic information to be retained in C together with input gate ii-1In.
* ,+:Represent that step-by-step carries out multiplication and step-by-step carries out addition respectively.
Step 4:The training of model parameter.
For the parameter w in solving-optimizing function L, adopted in the present invention in known Adam gradient descent algorithms training L Parameter.During training parameter, include following several key issues:
Step 4-1:Subordinate sentence.
Training corpus is subjected to subordinate sentence processing according to Chinese syntactic rule.If liRepresent the sentence length of the i-th word, then will |li-lj| the sentence of < δ is included into one group, and wherein δ represents sentence length interval, if the data after packet are GroupData, one M groups are set to altogether.
Step 4-2:Input data is filled.
Because the neuron elements of its input data of the Bi-LSTM Entity recognition structural models of Fig. 2 are regular lengths, right Character length needs to be filled with data 0 less than the sentence of Bi-LSTM entity recognition model neuron numbers after subordinate sentence.
Step 4-3:The selection of iteration batch data.
In the iteration of Adam gradient descent algorithms the present invention every time it is random without in the slave training corpus data set put back to One sentence packet of selection of machine, extracts iterative data of the BatchSize sentence as model single from sentence packet (numerical value of BatchSize can be selected arbitrarily).
Step 4-4:The end condition of iteration.
In the selection of the model end condition of the parameter during Adam gradient descent algorithms train L, the present invention is provided with two A end condition:1) maximum iterations Max_Iteration;2) penalty values iteration changes | Li-Li+1| < ε.Wherein ε tables Show an acceptable error range.
Step 5:Model result preserves.
Trained model parameter preserves during finally step 1-4 is walked, so that forecast period uses these parameters.
(2) forecast period:
Step 1:Data prediction.
The present invention mainly includes two steps to the data prediction of the forecast period of Entity recognition:
Step 1-1:Subordinate sentence.
One section of word of peer entities identification, do subordinate sentence processing first.Such as " Xiao Ming is born in China, he is Chinese, he Love China.Xiao Li is born in the U.S., he is American, he also likes China." according to the subordinate sentence result of Chinese grammar be:
First:Xiao Ming is born in China, and I am Chinese, I likes China.
Second:Xiao Li is born in the U.S., I is American, I also likes China.
Step 1-2:Participle.
The result of the subordinate sentence of step 1-1 is segmented, the present invention is based on dictionary+HMM (hidden horses in participle known to Er Kefu) jieba (stammerer) participles of identification of the model to unregistered word segment it.The word segmentation result of step 1-1 is:
First:Xiao Ming is born in China, he is Chinese, he likes China.
Second:Xiao Li is born in the U.S., he is American, he also likes China.
Step 2:Character and term vector.
Character and term vector can be split as three following steps:
Step 2-1:Term vector.
Word is converted into a vector, such as 1-2 points of step by the word segmentation result of step 1-2 using known word2vec " U.S. " in word result, first word2vec can be converted to " U.S. " vector of one 300 dimension.
Step 2-2:Character vector.
" U.S. " carried out according to Bi-LSTM models shown in Fig. 4 in character vector conversion, such as step 2-1 can be " U.S. " " state " two characters are translated into the vector of one 128 dimension by LSTM models.
Step 2-3:Splicing.
Two vectors that step 2-1 and 2-2 are obtained are spliced, for example can obtain a 300+ to participle " U.S. " The vector of 128=428 dimensions.
Step 3:Entity recognition is predicted.
The step 2-3 vector datas spliced are input in the model of (one) training stage step 5 preservation.Obtain every The prediction result of one input data.Sentence subordinate sentence and the input data filling to input are also needed during prediction Operation, the prediction process of Entity recognition is just completed to this.Prediction result to word segmentation result in step 1-2 is
First:Xiao Ming/S-PER births/O /O China/S-LOC ,/O he/O is /O China/S-ORG people/O ,/O he/O Love/O China/S-ORG./O
Second:Xiao Li/S-PER births/O /the O U.S./S-LOC ,/O he/O is /the O U.S./S-ORG people/O ,/O he/O Also/O love/O China/S-ORG./O
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this area Personnel can be to technical scheme technical scheme is modified or replaced equivalently, without departing from the spirit and scope of the present invention, sheet The protection domain of invention should be subject to described in claims.

Claims (10)

1. a kind of name entity recognition method based on Bi-LSTM, it is characterised in that comprise the following steps:
1) training corpus for naming Entity recognition is labeled, forms mark language material;
2) word marked in language material and character are converted into vector;
3) Named Entity Extraction Model based on Bi-LSTM, and training name entity knowledge are established using the vector of word and character The parameter of other model;
4) trained Named Entity Extraction Model is utilized, data to be predicted are named with Entity recognition prediction.
2. the method as described in claim 1, it is characterised in that step 1) is in the way of IOBES to training corpus into rower Note.
3. the method as described in claim 1, it is characterised in that the word of input is converted into vector by step 2) first, then will Each character in word is disassembled, and all characters for being included word with Bi-LSTM models are converted into vector, and to word Spliced with the vector of character conversion.
4. method as claimed in claim 3, it is characterised in that the name Entity recognition mould based on Bi-LSTM described in step 3) Type includes LSTM layers and CRF layers, and the character and term vector of step 2) splicing are input in first layer LSTM neuron elements, and first Input of the output of layer i-th of LSTM unit of LSTM as first layer LSTM i+1 LSTM units;Step 2) is spliced at the same time Character and term vector be input in second layer LSTM neuron elements, the output of second layer LSTM i+1 LSTM units is same The input of Shi Zuowei first layers i-th of LSTM unit of LSTM;Then using the output of two-way each neural unit of LSTM as The input of CRF models, so as to calculate corresponding each input character xiYi, and set the result of real marking in language material as Construct a loss function L based on entropy:
<mrow> <mi>L</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munder> <mo>&amp;Sigma;</mo> <mi>i</mi> </munder> <msub> <mover> <mi>y</mi> <mo>&amp;OverBar;</mo> </mover> <mi>i</mi> </msub> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mover> <mi>y</mi> <mo>&amp;OverBar;</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow>
Wherein, n represents training samples number;This loss function L is then converted into an optimization problem, is solved:
<mrow> <mi>M</mi> <mi>i</mi> <mi>n</mi> <mi> </mi> <mi>L</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <msub> <mi>&amp;Sigma;</mi> <mi>i</mi> </msub> <msub> <mover> <mi>y</mi> <mo>&amp;OverBar;</mo> </mover> <mi>i</mi> </msub> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mover> <mi>y</mi> <mo>&amp;OverBar;</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow>
5. method as claimed in claim 4, it is characterised in that step 3) is using the ginseng in Adam gradient descent algorithms training L Number.
6. method as claimed in claim 5, it is characterised in that step 3) presses training corpus during training parameter Subordinate sentence processing is carried out according to Chinese syntactic rule, and the sentence data 0 for being less than neuron number to character length after subordinate sentence are filled.
7. method as claimed in claim 6, it is characterised in that random nothing is put every time in the iteration of Adam gradient descent algorithms Random one sentence packet of selection, some sentences are extracted as mould from sentence packet in the slave training corpus data set returned The iterative data of type single.
8. method as claimed in claim 5, it is characterised in that the end condition of iteration is in Adam gradient descent algorithms:1) Maximum iterations;2) penalty values iteration changes | Li-Li+1| < ε, wherein ε represent acceptable error range.
9. the method as described in claim 1, it is characterised in that step 4) pre-processes data to be predicted first, so The vectorization processing of laggard line character and word, is then named Entity recognition prediction.
10. method as claimed in claim 9, it is characterised in that the pretreatment includes subordinate sentence processing and word segmentation processing;It is described Vectorization processing includes term vectorization processing, character vectorization processing, and term vector, character vector are spliced.
CN201710946713.0A 2017-10-12 2017-10-12 A kind of name entity recognition method based on Bi LSTM Withdrawn CN107908614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710946713.0A CN107908614A (en) 2017-10-12 2017-10-12 A kind of name entity recognition method based on Bi LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710946713.0A CN107908614A (en) 2017-10-12 2017-10-12 A kind of name entity recognition method based on Bi LSTM

Publications (1)

Publication Number Publication Date
CN107908614A true CN107908614A (en) 2018-04-13

Family

ID=61840480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710946713.0A Withdrawn CN107908614A (en) 2017-10-12 2017-10-12 A kind of name entity recognition method based on Bi LSTM

Country Status (1)

Country Link
CN (1) CN107908614A (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768824A (en) * 2018-05-15 2018-11-06 腾讯科技(深圳)有限公司 Information processing method and device
CN108845988A (en) * 2018-06-07 2018-11-20 苏州大学 A kind of entity recognition method, device, equipment and computer readable storage medium
CN108920445A (en) * 2018-04-23 2018-11-30 华中科技大学鄂州工业技术研究院 A kind of name entity recognition method and device based on Bi-LSTM-CRF model
CN108932229A (en) * 2018-06-13 2018-12-04 北京信息科技大学 A kind of money article proneness analysis method
CN109165279A (en) * 2018-09-06 2019-01-08 深圳和而泰数据资源与云技术有限公司 information extraction method and device
CN109241520A (en) * 2018-07-18 2019-01-18 五邑大学 A kind of sentence trunk analysis method and system based on the multilayer error Feedback Neural Network for segmenting and naming Entity recognition
CN109271631A (en) * 2018-09-12 2019-01-25 广州多益网络股份有限公司 Segmenting method, device, equipment and storage medium
CN109284400A (en) * 2018-11-28 2019-01-29 电子科技大学 A kind of name entity recognition method based on Lattice LSTM and language model
CN109472026A (en) * 2018-10-31 2019-03-15 北京国信云服科技有限公司 Accurate emotion information extracting methods a kind of while for multiple name entities
CN109493956A (en) * 2018-10-15 2019-03-19 海口市人民医院(中南大学湘雅医学院附属海口医院) Diagnosis guiding method
CN109493265A (en) * 2018-11-05 2019-03-19 北京奥法科技有限公司 A kind of Policy Interpretation method and Policy Interpretation system based on deep learning
CN109522546A (en) * 2018-10-12 2019-03-26 浙江大学 Entity recognition method is named based on context-sensitive medicine
CN109543151A (en) * 2018-10-31 2019-03-29 昆明理工大学 A method of improving Laotian part-of-speech tagging accuracy rate
CN109710927A (en) * 2018-12-12 2019-05-03 东软集团股份有限公司 Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity
CN109753650A (en) * 2018-12-14 2019-05-14 昆明理工大学 A kind of Laotian name place name entity recognition method merging multiple features
CN109815952A (en) * 2019-01-24 2019-05-28 珠海市筑巢科技有限公司 Brand name recognition methods, computer installation and computer readable storage medium
CN109871545A (en) * 2019-04-22 2019-06-11 京东方科技集团股份有限公司 Name entity recognition method and device
CN110162772A (en) * 2018-12-13 2019-08-23 北京三快在线科技有限公司 Name entity recognition method and device
CN110222343A (en) * 2019-06-13 2019-09-10 电子科技大学 A kind of Chinese medicine plant resource name entity recognition method
CN110232192A (en) * 2019-06-19 2019-09-13 中国电力科学研究院有限公司 Electric power term names entity recognition method and device
CN110309769A (en) * 2019-06-28 2019-10-08 北京邮电大学 The method that character string in a kind of pair of picture is split
CN110334357A (en) * 2019-07-18 2019-10-15 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition
CN110377731A (en) * 2019-06-18 2019-10-25 深圳壹账通智能科技有限公司 Complain text handling method, device, computer equipment and storage medium
WO2019228466A1 (en) * 2018-06-01 2019-12-05 中兴通讯股份有限公司 Named entity recognition method, device and apparatus, and storage medium
CN110717331A (en) * 2019-10-21 2020-01-21 北京爱医博通信息技术有限公司 Neural network-based Chinese named entity recognition method, device, equipment and storage medium
CN110738319A (en) * 2019-11-11 2020-01-31 四川隧唐科技股份有限公司 LSTM model unit training method and device for recognizing bid-winning units based on CRF
CN111191107A (en) * 2018-10-25 2020-05-22 北京嘀嘀无限科技发展有限公司 System and method for recalling points of interest using annotation model
CN111310472A (en) * 2020-01-19 2020-06-19 合肥讯飞数码科技有限公司 Alias generation method, device and equipment
CN111414757A (en) * 2019-01-04 2020-07-14 阿里巴巴集团控股有限公司 Text recognition method and device
CN111428500A (en) * 2019-01-09 2020-07-17 阿里巴巴集团控股有限公司 Named entity identification method and device
CN111428501A (en) * 2019-01-09 2020-07-17 北大方正集团有限公司 Named entity recognition method, recognition system and computer readable storage medium
CN111476022A (en) * 2020-05-15 2020-07-31 湖南工商大学 Method, system and medium for recognizing STM entity by embedding and mixing L characters of entity characteristics
CN111523325A (en) * 2020-04-20 2020-08-11 电子科技大学 Chinese named entity recognition method based on strokes
CN111581387A (en) * 2020-05-09 2020-08-25 电子科技大学 Entity relation joint extraction method based on loss optimization
WO2020232882A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Named entity recognition method and apparatus, device, and computer readable storage medium
CN112036178A (en) * 2020-08-25 2020-12-04 国家电网有限公司 Distribution network entity related semantic search method
CN112101023A (en) * 2020-10-29 2020-12-18 深圳市欢太科技有限公司 Text processing method and device and electronic equipment
CN114385795A (en) * 2021-08-05 2022-04-22 应急管理部通信信息中心 Accident information extraction method and device and electronic equipment
CN111401064B (en) * 2019-01-02 2024-04-19 ***通信有限公司研究院 Named entity identification method and device and terminal equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140236578A1 (en) * 2013-02-15 2014-08-21 Nec Laboratories America, Inc. Question-Answering by Recursive Parse Tree Descent
CN104899304A (en) * 2015-06-12 2015-09-09 北京京东尚科信息技术有限公司 Named entity identification method and device
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106682220A (en) * 2017-01-04 2017-05-17 华南理工大学 Online traditional Chinese medicine text named entity identifying method based on deep learning
CN107203511A (en) * 2017-05-27 2017-09-26 中国矿业大学 A kind of network text name entity recognition method based on neutral net probability disambiguation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140236578A1 (en) * 2013-02-15 2014-08-21 Nec Laboratories America, Inc. Question-Answering by Recursive Parse Tree Descent
CN104899304A (en) * 2015-06-12 2015-09-09 北京京东尚科信息技术有限公司 Named entity identification method and device
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106682220A (en) * 2017-01-04 2017-05-17 华南理工大学 Online traditional Chinese medicine text named entity identifying method based on deep learning
CN107203511A (en) * 2017-05-27 2017-09-26 中国矿业大学 A kind of network text name entity recognition method based on neutral net probability disambiguation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ONUR KURU等: "CharNER:Character-Level Named Entity Recognition", 《THE 26TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS: TECHNICAL PAPERS》 *

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920445A (en) * 2018-04-23 2018-11-30 华中科技大学鄂州工业技术研究院 A kind of name entity recognition method and device based on Bi-LSTM-CRF model
CN108920445B (en) * 2018-04-23 2022-06-17 华中科技大学鄂州工业技术研究院 Named entity identification method and device based on Bi-LSTM-CRF model
CN108768824A (en) * 2018-05-15 2018-11-06 腾讯科技(深圳)有限公司 Information processing method and device
WO2019228466A1 (en) * 2018-06-01 2019-12-05 中兴通讯股份有限公司 Named entity recognition method, device and apparatus, and storage medium
CN108845988B (en) * 2018-06-07 2022-06-10 苏州大学 Entity identification method, device, equipment and computer readable storage medium
CN108845988A (en) * 2018-06-07 2018-11-20 苏州大学 A kind of entity recognition method, device, equipment and computer readable storage medium
CN108932229A (en) * 2018-06-13 2018-12-04 北京信息科技大学 A kind of money article proneness analysis method
CN109241520A (en) * 2018-07-18 2019-01-18 五邑大学 A kind of sentence trunk analysis method and system based on the multilayer error Feedback Neural Network for segmenting and naming Entity recognition
CN109165279A (en) * 2018-09-06 2019-01-08 深圳和而泰数据资源与云技术有限公司 information extraction method and device
CN109271631A (en) * 2018-09-12 2019-01-25 广州多益网络股份有限公司 Segmenting method, device, equipment and storage medium
CN109271631B (en) * 2018-09-12 2023-01-24 广州多益网络股份有限公司 Word segmentation method, device, equipment and storage medium
CN109522546A (en) * 2018-10-12 2019-03-26 浙江大学 Entity recognition method is named based on context-sensitive medicine
CN109493956A (en) * 2018-10-15 2019-03-19 海口市人民医院(中南大学湘雅医学院附属海口医院) Diagnosis guiding method
US11093531B2 (en) 2018-10-25 2021-08-17 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for recalling points of interest using a tagging model
CN111191107A (en) * 2018-10-25 2020-05-22 北京嘀嘀无限科技发展有限公司 System and method for recalling points of interest using annotation model
CN111191107B (en) * 2018-10-25 2023-06-30 北京嘀嘀无限科技发展有限公司 System and method for recalling points of interest using annotation model
CN109543151A (en) * 2018-10-31 2019-03-29 昆明理工大学 A method of improving Laotian part-of-speech tagging accuracy rate
CN109472026A (en) * 2018-10-31 2019-03-15 北京国信云服科技有限公司 Accurate emotion information extracting methods a kind of while for multiple name entities
CN109543151B (en) * 2018-10-31 2021-05-25 昆明理工大学 Method for improving wording accuracy of Laos language
CN109493265A (en) * 2018-11-05 2019-03-19 北京奥法科技有限公司 A kind of Policy Interpretation method and Policy Interpretation system based on deep learning
CN109284400A (en) * 2018-11-28 2019-01-29 电子科技大学 A kind of name entity recognition method based on Lattice LSTM and language model
CN109710927B (en) * 2018-12-12 2022-12-20 东软集团股份有限公司 Named entity identification method and device, readable storage medium and electronic equipment
CN109710927A (en) * 2018-12-12 2019-05-03 东软集团股份有限公司 Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity
CN110162772A (en) * 2018-12-13 2019-08-23 北京三快在线科技有限公司 Name entity recognition method and device
CN110162772B (en) * 2018-12-13 2020-06-26 北京三快在线科技有限公司 Named entity identification method and device
CN109753650A (en) * 2018-12-14 2019-05-14 昆明理工大学 A kind of Laotian name place name entity recognition method merging multiple features
CN111401064B (en) * 2019-01-02 2024-04-19 ***通信有限公司研究院 Named entity identification method and device and terminal equipment
CN111414757A (en) * 2019-01-04 2020-07-14 阿里巴巴集团控股有限公司 Text recognition method and device
CN111414757B (en) * 2019-01-04 2023-06-20 阿里巴巴集团控股有限公司 Text recognition method and device
CN111428500A (en) * 2019-01-09 2020-07-17 阿里巴巴集团控股有限公司 Named entity identification method and device
CN111428501A (en) * 2019-01-09 2020-07-17 北大方正集团有限公司 Named entity recognition method, recognition system and computer readable storage medium
CN111428500B (en) * 2019-01-09 2023-04-25 阿里巴巴集团控股有限公司 Named entity identification method and device
CN109815952A (en) * 2019-01-24 2019-05-28 珠海市筑巢科技有限公司 Brand name recognition methods, computer installation and computer readable storage medium
CN109871545A (en) * 2019-04-22 2019-06-11 京东方科技集团股份有限公司 Name entity recognition method and device
WO2020215870A1 (en) * 2019-04-22 2020-10-29 京东方科技集团股份有限公司 Named entity identification method and apparatus
CN109871545B (en) * 2019-04-22 2022-08-05 京东方科技集团股份有限公司 Named entity identification method and device
US11574124B2 (en) 2019-04-22 2023-02-07 Boe Technology Group Co., Ltd. Method and apparatus of recognizing named entity
WO2020232882A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Named entity recognition method and apparatus, device, and computer readable storage medium
CN110222343A (en) * 2019-06-13 2019-09-10 电子科技大学 A kind of Chinese medicine plant resource name entity recognition method
CN110377731A (en) * 2019-06-18 2019-10-25 深圳壹账通智能科技有限公司 Complain text handling method, device, computer equipment and storage medium
CN110232192A (en) * 2019-06-19 2019-09-13 中国电力科学研究院有限公司 Electric power term names entity recognition method and device
CN110309769B (en) * 2019-06-28 2021-06-15 北京邮电大学 Method for segmenting character strings in picture
CN110309769A (en) * 2019-06-28 2019-10-08 北京邮电大学 The method that character string in a kind of pair of picture is split
CN110334357A (en) * 2019-07-18 2019-10-15 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition
CN110717331B (en) * 2019-10-21 2023-10-24 北京爱医博通信息技术有限公司 Chinese named entity recognition method, device and equipment based on neural network and storage medium
CN110717331A (en) * 2019-10-21 2020-01-21 北京爱医博通信息技术有限公司 Neural network-based Chinese named entity recognition method, device, equipment and storage medium
CN110738319A (en) * 2019-11-11 2020-01-31 四川隧唐科技股份有限公司 LSTM model unit training method and device for recognizing bid-winning units based on CRF
CN111310472B (en) * 2020-01-19 2024-02-09 合肥讯飞数码科技有限公司 Alias generation method, device and equipment
CN111310472A (en) * 2020-01-19 2020-06-19 合肥讯飞数码科技有限公司 Alias generation method, device and equipment
CN111523325A (en) * 2020-04-20 2020-08-11 电子科技大学 Chinese named entity recognition method based on strokes
CN111581387A (en) * 2020-05-09 2020-08-25 电子科技大学 Entity relation joint extraction method based on loss optimization
CN111581387B (en) * 2020-05-09 2022-10-11 电子科技大学 Entity relation joint extraction method based on loss optimization
CN111476022B (en) * 2020-05-15 2023-07-07 湖南工商大学 Character embedding and mixed LSTM entity identification method, system and medium for entity characteristics
CN111476022A (en) * 2020-05-15 2020-07-31 湖南工商大学 Method, system and medium for recognizing STM entity by embedding and mixing L characters of entity characteristics
CN112036178A (en) * 2020-08-25 2020-12-04 国家电网有限公司 Distribution network entity related semantic search method
CN112101023B (en) * 2020-10-29 2022-12-06 深圳市欢太科技有限公司 Text processing method and device and electronic equipment
CN112101023A (en) * 2020-10-29 2020-12-18 深圳市欢太科技有限公司 Text processing method and device and electronic equipment
CN114385795A (en) * 2021-08-05 2022-04-22 应急管理部通信信息中心 Accident information extraction method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN107908614A (en) A kind of name entity recognition method based on Bi LSTM
CN107885721A (en) A kind of name entity recognition method based on LSTM
CN107291693B (en) Semantic calculation method for improved word vector model
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN109684642B (en) Abstract extraction method combining page parsing rule and NLP text vectorization
CN107797987B (en) Bi-LSTM-CNN-based mixed corpus named entity identification method
CN110134946B (en) Machine reading understanding method for complex data
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN107818164A (en) A kind of intelligent answer method and its system
CN110362819B (en) Text emotion analysis method based on convolutional neural network
CN107832289A (en) A kind of name entity recognition method based on LSTM CNN
CN110597998A (en) Military scenario entity relationship extraction method and device combined with syntactic analysis
CN108874896B (en) Humor identification method based on neural network and humor characteristics
CN104778256B (en) A kind of the quick of field question answering system consulting can increment clustering method
CN107977353A (en) A kind of mixing language material name entity recognition method based on LSTM-CNN
CN107967251A (en) A kind of name entity recognition method based on Bi-LSTM-CNN
CN111274794B (en) Synonym expansion method based on transmission
CN107894975A (en) A kind of segmenting method based on Bi LSTM
CN107797988A (en) A kind of mixing language material name entity recognition method based on Bi LSTM
CN112364623A (en) Bi-LSTM-CRF-based three-in-one word notation Chinese lexical analysis method
CN113312922A (en) Improved chapter-level triple information extraction method
Ayifu et al. Multilingual named entity recognition based on the BiGRU-CNN-CRF hybrid model
CN107844475A (en) A kind of segmenting method based on LSTM
CN107894976A (en) A kind of mixing language material segmenting method based on Bi LSTM
CN107943783A (en) A kind of segmenting method based on LSTM CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20180413

WW01 Invention patent application withdrawn after publication