CN104899304A - Named entity identification method and device - Google Patents
Named entity identification method and device Download PDFInfo
- Publication number
- CN104899304A CN104899304A CN201510321448.8A CN201510321448A CN104899304A CN 104899304 A CN104899304 A CN 104899304A CN 201510321448 A CN201510321448 A CN 201510321448A CN 104899304 A CN104899304 A CN 104899304A
- Authority
- CN
- China
- Prior art keywords
- word
- sample
- measured
- vector
- vector corresponding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a named entity identification method and a named entity identification device capable of accurately identifying a named entity, in particular to a named entity in the field of E-business. The method comprises: acquiring a vector library; carrying out word segmentation on a training corpus text string to obtain a plurality of sample words; inquiring the vector library of each sample word sequentially to obtain a first feature vector which comprises a word vector and a word class vector corresponding to the same word as well as an entity marking vector corresponding to the last word of the sample word; taking all the first feature vectors integrally as an input quantity, and training a named entity identification model of a neutral network; carrying out word segmentation on a to-be-predicted text string to obtain a plurality of to-be-tested words; inquiring the vector library of each sample word sequentially to obtain a second feature vector which comprises a word vector and a word class vector corresponding to the same word as well as an entity marking vector corresponding to the last word of the sample word; respectively inputting the second feature vectors corresponding to all the to-be-tested words into the model, and outputting entity identifiers of the to-be-tested words.
Description
Technical field
The present invention relates to natural language processing technique field, particularly relate to a kind of named entity recognition method and device.
Background technology
Along with the fast development of Internet technology, information service becomes more and more universal.Wherein, the identification of named entity is the important foundation work of the information service application such as metadata mark of information extraction, question answering system, syntactic analysis, mechanical translation, Internet.Named entity (abbreviation entity), refer to name, mechanism's name, place name and other all entities being called mark with name, named entity also comprises numeral, date, currency, address etc. widely.
The technology adopting nerual network technique training named entity recognition has been had in prior art.Existing method at least has following several shortcoming: (1) mainly relies on word itself as input feature vector, the aspect of model is single, directly do not introduce the front and back dependence between entity indicia, causing the accuracy rate of identification not high, particularly often identifying when identifying the named entity in electric business field inaccurate; (2) because the initial value of network is stochastic generation, final parameter optimization result is probably good not, and the training time, the longer development efficiency that causes was low; (3) do not take into full account that the distribution situation of training data causes the fitting degree of model to entity uneven.
The named entity in electricity business field, such as trade name (Nokia 1020, ThinkPad E431 14 inches of notebook computers), price, item property etc., these named entities are made up of continuous print word one or more in sentence usually, forms such as " noun+numbers " that part of speech is generally.In a word, the named entity in electric business field has salient feature, and the named entity needed badly at present for electric business field develops recognition methods or recognition device.
Summary of the invention
In view of this, the invention provides a kind of named entity recognition method and device, named entity can be identified exactly, particularly the named entity in electric business field.
For achieving the above object, according to an aspect of the present invention, provide a kind of named entity recognition method, comprise: obtain vectorial storehouse, described vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively; Corpus text string participle is obtained ordered multiple sample words; According to priority for the described vectorial storehouse of each sample word inquiry to build first eigenvector, described first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial; Using overall for described first eigenvector corresponding for all sample words training input quantity as neural network, utilize BP algorithm of neural network to carry out network parameter and solve, obtain neural network Named Entity Extraction Model; Text string participle to be predicted is obtained ordered word multiple to be measured; According to priority for the described vectorial storehouse of each word to be measured inquiry to build second feature vector, described second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial; Described second feature vector corresponding for each word to be measured described is inputted described neural network Named Entity Extraction Model respectively, exports the entity indicia of described word to be measured.
Alternatively, also comprise in described first eigenvector: described sample word is close to term vector corresponding to word and described sample word is close to part of speech vector corresponding to word, and, also comprise in described second feature vector: described word to be measured is close to term vector corresponding to word and described word to be measured is close to part of speech vector corresponding to word.
Alternatively, when described first eigenvector is built for the first sample word in ordered multiple sample words, the last word of described first sample word is book character string, and, when building described second feature vector for the word first to be measured in ordered word multiple to be measured, the last word of described first word to be measured is book character string.
Alternatively, negative routine sample is also comprised in the training input quantity of described neural network.
For achieving the above object, according to a further aspect in the invention, provide a kind of named entity recognition device, comprise: vectorial storehouse acquisition module, for obtaining vectorial storehouse, described vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively; First participle module, for obtaining ordered multiple sample words by corpus text string participle; First builds module, for according to priority for the described vectorial storehouse of each sample word inquiry to build first eigenvector, described first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial; Training module, for using overall for described first eigenvector corresponding for all sample words training input quantity as neural network, utilizes BP algorithm of neural network to carry out network parameter and solves, obtain neural network Named Entity Extraction Model; Second word-dividing mode, for obtaining ordered word multiple to be measured by text string participle to be predicted; Second builds module, for according to priority for the described vectorial storehouse of each word to be measured inquiry to build second feature vector, described second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial; Prediction module, for described second feature vector corresponding for each word to be measured described is inputted described neural network Named Entity Extraction Model respectively, exports the entity indicia of described word to be measured.
Alternatively, also comprise in described first eigenvector: described sample word is close to term vector corresponding to word and described sample word is close to part of speech vector corresponding to word, and, also comprise in described second feature vector: described word to be measured is close to term vector corresponding to word and described word to be measured is close to part of speech vector corresponding to word.
Alternatively, described first build module also for: when described first eigenvector is built for the first sample word in ordered multiple sample words, use book character string as the last word of described first sample word, and, described second build module also for: when building described second feature vector for the word first to be measured in ordered word multiple to be measured, use book character string as the last word of described first word to be measured.
Alternatively, in described training module, in the training input quantity of described neural network, also comprise negative routine sample.
According to technical scheme of the present invention, have employed more reasonably proper vector to carry out training pattern and utilize model to predict, this proper vector not only comprises the feature of current word word itself, also comprise the entity indicia feature of current word part of speech feature, the last word of current word, compared with the existing recognition technology only considering word itself, the information considered is more comprehensive, causes the recognition result that finally obtains more accurate, particularly higher to accuracy rate during electric business's domain entities identification.
Accompanying drawing explanation
Accompanying drawing is used for understanding the present invention better, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the process flow diagram of the key step of named entity recognition method according to the embodiment of the present invention;
Fig. 2 is the schematic diagram of the critical piece of named entity recognition device according to the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, one exemplary embodiment of the present invention is explained, comprising the various details of the embodiment of the present invention to help understanding, they should be thought it is only exemplary.Therefore, those of ordinary skill in the art will be appreciated that, can make various change and amendment, and can not deviate from scope and spirit of the present invention to the embodiments described herein.Equally, for clarity and conciseness, the description to known function and structure is eliminated in following description.
For making those skilled in the art understand better, first relational language is briefly introduced.
Word: the word of word itself.
Term vector: the vectorization of word represents, each word vector of a multidimensional represents.
Part of speech: the character of word.Usually word is divided into two classes, 12 kinds of parts of speech.One class is notional word: noun, verb, adjective, number, adverbial word, onomatopoeia, measure word and pronoun.One class is function word: preposition, conjunction, auxiliary word and interjection.
Part of speech vector: the vectorization of part of speech represents, often kind of part of speech multi-C vector represents, preferably adopts the multi-C vector of discrete form to represent.
Entity indicia: each entity indicia represents a kind of entity type, such as WID represents that commodity ID, WB represent first word of trade name, and WI represents the medium term of trade name, and WE represents the end word of trade name, and O represents other words etc.Such as: how (O) red (WI) mobile phone (WE) of millet (WB) 2s (WI).
Entity indicia vector: the vectorization of entity indicia represents, often kind of entity indicia multi-C vector represents, preferably adopts the multi-C vector of discrete form to represent.
It should be noted that, term vector, part of speech vector and vectorial these three the vectorial dimensions of entity indicia do not need to be consistent, and can arrange flexibly as required.
Fig. 1 is the process flow diagram of the key step of named entity recognition method according to the embodiment of the present invention.As shown in Figure 1, this named entity recognition method can comprise steps A to step G.
Steps A: obtain vectorial storehouse.This vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively.
In an embodiment of the invention, for given language material, word2dec can be utilized to determine the term vector that each word in language material is corresponding.Word2vec is a instrument word being characterized by real number value vector that Google increased income in 2013, and word can be mapped to K gt, the vector operations even between word with word can also be corresponding with semanteme.Therefore utilize word2vec to precalculate term vector, can save time, raise the efficiency, and can accuracy rate be improved.Part of speech vector sum entity indicia vector can adopt the method for random initializtion, obtains random vector.The term vector obtained by said process, part of speech vector sum entity indicia vector are stored in vectorial storehouse for subsequent use.
Step B: corpus text string participle is obtained ordered multiple sample words.
In embodiments of the present invention, corpus text string can be extracted from the data of electric business website and then carry out participle, obtain multiple ordered sample word, as shown in table 1:
Table 1 corpus text string and sample word
Corpus text string | Ordered sample word |
" iphone price " | " iphone " " price " |
" Huawei's honor 6 " | " Huawei " " honor " " 6 " |
" the red mobile phone of millet 1s " | " millet " " 1s " " redness " " mobile phone " |
…… | …… |
Step C: according to priority for each sample word query vector storehouse to build first eigenvector.First eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia vector corresponding to the last word of sample word.Outside the information that first eigenvector contains the word of sample word itself and part-of-speech information, also comprise the entity indicia information of the last word of sample word.Method of the present invention carrys out training pattern based on first eigenvector, comes compared with the prior art of training pattern with only relying on the information of word itself, and the information of consideration is more comprehensive, causes the recognition result that finally obtains more accurate.
It should be noted that the implication of " first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia vector corresponding to the last word of sample word " refers to that first eigenvector is spliced by three vectors below, such as: first eigenvector=[term vector that sample word is corresponding, the part of speech vector that sample word is corresponding, the entity indicia vector that the last word of sample word is corresponding].When the present invention is not spliced vector, splicing order limits, and different splicing orders does not affect principle of the present invention.But the splicing order in whole method is once determine, no longer changes, consistent to ensure all first eigenvector forms.
The detailed process of step C is exemplified below: obtain ordered multiple sample words " sample word 1+ sample word 2+ sample word 3+ sample word 4 ... " before supposing, then need according to priority to sample word 1, sample word 2, sample word 3, sample word 4 etc. builds first eigenvector respectively.It is 0 that word window width is got in setting.Wherein, when building first eigenvector to sample word 1 (i.e. first sample word), because sample word 1 does not exist word, so need to increase the last word of book character string " $ BEGIN " as sample word 1 artificially above originally.The entity indicia vector of this book character string " $ BEGIN " has been pre-existing in vectorial storehouse, is generally random initialization vector.At this moment, for sample word 1, suppose that the term vector inquiring sample word 1 from vectorial storehouse is designated as X1, the part of speech vector of sample word 1 is designated as Z1, and the entity indicia vector of " $ BEGIN " is designated as T0, then the first resultant vector=[X1 of sample word 1, Z1, T0].Then, for sample word 2, suppose that the term vector inquiring sample word 2 from vectorial storehouse is designated as X2, the part of speech vector of sample word 2 is Z2, the entity indicia vector of the last word (i.e. sample word 1) of sample word 2 is designated as T1, then the first resultant vector=[X2, Z2, the T1] of sample word 2.By that analogy, first eigenvector corresponding to all sample words can be obtained.
In embodiments of the present invention, can also comprise in first eigenvector: sample word is close to term vector corresponding to word and sample word is close to part of speech vector corresponding to word.The meaning herein " also comprised " refers to " being also spliced by vector below "." the contiguous word of sample word " refers to that before being positioned at current sample word or after being positioned at current sample word, distance is not more than the sample word getting word window width.Be exemplified below: suppose that getting word window width is 1, then the contiguous word of sample word refers to 1 word after front 1 word of current sample word and current sample word.The first eigenvector of current sample word can be designated as [the term vector that the last word of current sample word is corresponding, the term vector that current sample word is corresponding, the term vector that after current sample word, a word is corresponding, the part of speech vector that the last word of current sample word is corresponding, the part of speech vector that current sample word is corresponding, the part of speech vector that after current sample word, a word is corresponding, the entity indicia vector that the last word of current sample word is corresponding].The situation that other numerical value get word window width can be analogized, and repeats no more herein.It should be noted that, the present invention does not limit the numerical value getting word window width, can arrange flexibly as required, but once determine, no longer changes, consistent to ensure all first eigenvector forms.Also it should be noted that, when getting word window width and increasing, the contiguous word be positioned at before first sample word can be served as to the preset characters string increased before first sample word, can also to increasing preset characters string to serve as the contiguous word be positioned at after the sample word of end after the sample word of end, those skilled in the art can derive specific practice by content above, repeat no more herein.In this embodiment, first eigenvector has further contemplated word information and the part-of-speech information of the contiguous word of sample word, and the information of consideration is more comprehensive, causes the recognition result that finally obtains more accurate.
Step D: using overall for first eigenvector corresponding for all sample words training input quantity as neural network, utilize BP algorithm of neural network to carry out network parameter and solve, obtain neural network Named Entity Extraction Model.Particularly, square error can be adopted to build the objective function of model entirety, utilize stochastic gradient method to solve the parameter of neural network, obtain final neural network Named Entity Extraction Model.
In embodiments of the present invention, negative routine sample can also be comprised in the training input quantity of neural network.Due to the normally skewness of the entity indicia in the corpus text string of reality, this can cause model poor to a part of named entity matching.Be directed to this, in the process of training pattern, according to the distribution situation of these entity indicia, the sampling of data minus example can be carried out in proportion at random, ensure that its distribution is even as much as possible, thus ensure that the matching that model marks all named entities is more accurate.
Step e: text string participle to be predicted is obtained ordered word multiple to be measured.
In embodiments of the present invention, text string to be predicted can be obtained from user's read statement and then carry out participle, obtain multiple ordered word to be measured.
Step F: according to priority for each word query vector storehouse to be measured to build second feature vector, second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia vector corresponding to the last word of word to be measured.
It should be noted that, when second feature vector is built for the word first to be measured in ordered word multiple to be measured, the last word of book character string " $ BEGIN " as first word to be measured can be increased before first word to be measured.Operation herein with above before first sample word, increase the class of operation of book character string seemingly.
Also it should be noted that, the form of the first eigenvector that the second feature vector that word to be measured is corresponding should be corresponding with sample word is consistent.This means to comprise point vectorial kind in second feature vector and divide vectorial splicing order needs consistent with first eigenvector.Such as: when also comprising term vector corresponding to the contiguous word of sample word and part of speech vector corresponding to the contiguous word of sample word in first eigenvector, correspondingly, term vector corresponding to the contiguous word of word to be measured and part of speech vector corresponding to the contiguous word of word to be measured is also comprised in second feature vector.
Step G: respectively by second feature corresponding for word to be measured vector input neural network Named Entity Extraction Model, export the entity indicia of word to be measured.
For making those skilled in the art understand better, the specific embodiment enumerating a named entity recognition method is as follows.
(1) word2vec instrument is utilized to obtain vectorial storehouse.
(2) suppose that some corpus text strings are for " iphone price ", can obtain two sample words " iphone " and " price " through participle.The part of speech of " iphone " is noun n, and entity indicia is commodity entity indicia W.The part of speech of " price " is noun n, and entity indicia is other entity indicia O.
(3) first eigenvector corresponding to " iphone " is first built.Because " iphone " is first sample word, therefore need to add " $ BEGIN " (its term vector, part of speech are vectorial, entity indicia is vectorial is all random initializtion) above.Suppose that the word window width of getting in the present embodiment is 1.Query word vector storehouse, the term vector that after taking out the last word of current sample word " $ BEGIN ", current sample word " iphone ", current sample word, these three words of a word " price " are corresponding is expressed as Xi-1, Xi, Xi+1, and part of speech vector representation corresponding to these three words is Zi-1, Zi, Zi+1, the entity tag of adding " $ BEGIN " is expressed as Ti-1.These seven vectors are stitched together in order, form first eigenvector=[Xi-1, Xi, Xi+1, Zi-1, Zi, Zi+1, the Ti-1] that " iphone " is corresponding.
(4) using the input layer of first eigenvector as input quantity input neural network, obtain exporting h (X).In the present embodiment, entity indicia W/O is converted to the discrete representation of 1/0.Entity indicia due to known " iphone " is " W " desired output is here 1.Utilize gradient descent algorithm to carry out parameter optimization, make error minimum.By all corpus text strings through above training process, final neural network Named Entity Extraction Model can be obtained.(5) suppose some text strings to be predicted " Nokia white ", word segmentation result is two words to be measured " Nokia " and " white ", and the part of speech of known " Nokia " and " white " is noun n.
(6) process building second feature vector corresponding to " Nokia " is as follows: before " Nokia ", add " $ BEGIN ".Query word vector storehouse, obtains the term vector that " $ BEGIN " " Nokia " " white " is corresponding, then obtains the part of speech vector that " $ BEGIN " " Nokia " " white " is corresponding, and obtains the entity indicia vector of " $ BEGIN ".These seven vectors are stitched together in order, namely obtain the second feature vector that " Nokia " is corresponding.
(7) by the neural network Named Entity Extraction Model that second feature vector input step (4) corresponding to " Nokia " obtains, to predict the entity indicia of " Nokia ".If model exports h (X)=0.8, numerical value is greater than intermediate value 0.5, then " Nokia " is labeled as W (commodity entity).Export h (X)=0.2 as crossed model, numerical value is less than intermediate value 0.5, then " Nokia " is labeled as O (other entities).
Fig. 2 is the schematic diagram of the critical piece of named entity recognition method according to the embodiment of the present invention.As shown in Figure 2, this named entity recognition device 20 can comprise: vectorial storehouse acquisition module 21, first participle module 22, first build module 23, training module 24, second word-dividing mode 25, second builds module 26 and prediction module 27.
Vector storehouse acquisition module 21 is for obtaining vectorial storehouse, and vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively.Alternatively, word2dec is utilized to determine the term vector that multiple word is corresponding.Utilize word2dec to precalculate, save the training time.
First participle module 22 is for obtaining ordered multiple sample words by corpus text string participle.
First build module 23 for according to priority for each sample word query vector storehouse to build first eigenvector, first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial.
Training module 24, for using overall for first eigenvector corresponding for all sample words training input quantity as neural network, utilizes BP algorithm of neural network to carry out network parameter and solves, obtain neural network Named Entity Extraction Model.
Second word-dividing mode 25 is for obtaining ordered word multiple to be measured by text string participle to be predicted.
Second build module 26 for according to priority for each word query vector storehouse to be measured to build second feature vector, second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial.
Prediction module 27, for by second feature corresponding for each word to be measured vector input neural network Named Entity Extraction Model respectively, exports the entity indicia of word to be measured.
In embodiments of the present invention, can also comprise in first eigenvector: sample word is close to term vector corresponding to word and sample word is close to part of speech vector corresponding to word, and, can also comprise in second feature vector: word to be measured is close to term vector corresponding to word and word to be measured is close to part of speech vector corresponding to word.In this embodiment, first eigenvector and second feature vector have further contemplated word information and the part-of-speech information of contiguous word, and the information of consideration is more comprehensive, cause the recognition result that finally obtains more accurate.
In embodiments of the present invention, first builds module 23 can also be used for: when building first eigenvector for the first sample word in ordered multiple sample words, the last word of first sample word is book character string, and, second builds module 26 can also be used for: when building second feature vector for the word first to be measured in ordered word multiple to be measured, the last word of first word to be measured is book character string.This addresses the problem before first sample word or first word to be measured and originally lack word problem.
In embodiments of the present invention, in training module 27, in the training input quantity of neural network, also comprise negative routine sample.Introduce negative routine sample and can ensure that sample distribution is even as much as possible, thus ensure that the matching that model marks all named entities is more accurate.
In sum, named entity recognition method of the present invention and device have employed more reasonably proper vector to be carried out training pattern and utilizes model to predict, this proper vector not only comprises the feature of current word word itself, also comprise the entity indicia feature of current word part of speech feature, the last word of current word, compared with the existing recognition technology only considering word itself, the information considered is more comprehensive, cause the recognition result that finally obtains more accurate, particularly higher to accuracy rate during electric business's domain entities identification.
Above-mentioned embodiment, does not form limiting the scope of the invention.It is to be understood that depend on designing requirement and other factors, various amendment, combination, sub-portfolio can be there is and substitute in those skilled in the art.Any amendment done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within scope.
Claims (8)
1. a named entity recognition method, is characterized in that, comprising:
Obtain vectorial storehouse, described vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively;
Corpus text string participle is obtained ordered multiple sample words;
According to priority for the described vectorial storehouse of each sample word inquiry to build first eigenvector, described first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial;
Using overall for described first eigenvector corresponding for all sample words training input quantity as neural network, utilize BP algorithm of neural network to carry out network parameter and solve, obtain neural network Named Entity Extraction Model;
Text string participle to be predicted is obtained ordered word multiple to be measured;
According to priority for the described vectorial storehouse of each word to be measured inquiry to build second feature vector, described second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial;
Described second feature vector corresponding for each word to be measured described is inputted described neural network Named Entity Extraction Model respectively, exports the entity indicia of described word to be measured.
2. method according to claim 1, is characterized in that,
Also comprise in described first eigenvector: described sample word is close to term vector corresponding to word and described sample word is close to part of speech vector corresponding to word, and,
Also comprise in described second feature vector: described word to be measured is close to term vector corresponding to word and described word to be measured is close to part of speech vector corresponding to word.
3. method according to claim 1, is characterized in that,
When building described first eigenvector for the first sample word in ordered multiple sample words, the last word of described first sample word is book character string, and,
When building described second feature vector for the word first to be measured in ordered word multiple to be measured, the last word of described first word to be measured is book character string.
4. method according to claim 1, is characterized in that, also comprises negative routine sample in the training input quantity of described neural network.
5. a named entity recognition device, is characterized in that, comprising:
Vector storehouse acquisition module, for obtaining vectorial storehouse, described vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively;
First participle module, for obtaining ordered multiple sample words by corpus text string participle;
First builds module, for according to priority for the described vectorial storehouse of each sample word inquiry to build first eigenvector, described first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial;
Training module, for using overall for described first eigenvector corresponding for all sample words training input quantity as neural network, utilizes BP algorithm of neural network to carry out network parameter and solves, obtain neural network Named Entity Extraction Model;
Second word-dividing mode, for obtaining ordered word multiple to be measured by text string participle to be predicted;
Second builds module, for according to priority for the described vectorial storehouse of each word to be measured inquiry to build second feature vector, described second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial;
Prediction module, for described second feature vector corresponding for each word to be measured described is inputted described neural network Named Entity Extraction Model respectively, exports the entity indicia of described word to be measured.
6. device according to claim 5, is characterized in that,
Also comprise in described first eigenvector: described sample word is close to term vector corresponding to word and described sample word is close to part of speech vector corresponding to word, and,
Also comprise in described second feature vector: described word to be measured is close to term vector corresponding to word and described word to be measured is close to part of speech vector corresponding to word.
7. device according to claim 5, is characterized in that,
Described first build module also for: when building described first eigenvector for the first sample word in ordered multiple sample words, use book character string as the last word of described first sample word, and,
Described second build module also for: when building described second feature vector for the word first to be measured in ordered word multiple to be measured, use book character string as the last word of described first word to be measured.
8. device according to claim 5, is characterized in that, in described training module, also comprises negative routine sample in the training input quantity of described neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510321448.8A CN104899304B (en) | 2015-06-12 | 2015-06-12 | Name entity recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510321448.8A CN104899304B (en) | 2015-06-12 | 2015-06-12 | Name entity recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104899304A true CN104899304A (en) | 2015-09-09 |
CN104899304B CN104899304B (en) | 2018-02-16 |
Family
ID=54031966
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510321448.8A Active CN104899304B (en) | 2015-06-12 | 2015-06-12 | Name entity recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104899304B (en) |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105468780A (en) * | 2015-12-18 | 2016-04-06 | 北京理工大学 | Normalization method and device of product name entity in microblog text |
CN105550172A (en) * | 2016-01-13 | 2016-05-04 | 夏峰 | Distributive text detection method and system |
CN105550227A (en) * | 2015-12-07 | 2016-05-04 | 中国建设银行股份有限公司 | Named entity identification method and device |
CN105701075A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Joint detection method and system for literature |
CN105701086A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Method and system for detecting literature through sliding window |
CN105701213A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Literature comparison method and system |
CN105701077A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Multi-language literature detection method and system |
CN105701087A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Formula plagiarism detection method and system |
CN106095988A (en) * | 2016-06-21 | 2016-11-09 | 上海智臻智能网络科技股份有限公司 | Automatic question-answering method and device |
CN106202255A (en) * | 2016-06-30 | 2016-12-07 | 昆明理工大学 | Merge the Vietnamese name entity recognition method of physical characteristics |
CN106202054A (en) * | 2016-07-25 | 2016-12-07 | 哈尔滨工业大学 | A kind of name entity recognition method learnt based on the degree of depth towards medical field |
CN106294313A (en) * | 2015-06-26 | 2017-01-04 | 微软技术许可有限责任公司 | Study embeds for entity and the word of entity disambiguation |
CN106557462A (en) * | 2016-11-02 | 2017-04-05 | 数库(上海)科技有限公司 | Name entity recognition method and system |
CN106570170A (en) * | 2016-11-09 | 2017-04-19 | 武汉泰迪智慧科技有限公司 | Text classification and naming entity recognition integrated method and system based on depth cyclic neural network |
CN106682220A (en) * | 2017-01-04 | 2017-05-17 | 华南理工大学 | Online traditional Chinese medicine text named entity identifying method based on deep learning |
CN106815194A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and keyword recognition method and device |
CN106815193A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and wrong word recognition methods and device |
CN106844351A (en) * | 2017-02-24 | 2017-06-13 | 黑龙江特士信息技术有限公司 | A kind of medical institutions towards multi-data source organize class entity recognition method and device |
CN106933803A (en) * | 2017-02-24 | 2017-07-07 | 黑龙江特士信息技术有限公司 | A kind of medical equipment class entity recognition method and device towards multi-data source |
CN106933802A (en) * | 2017-02-24 | 2017-07-07 | 黑龙江特士信息技术有限公司 | A kind of social security class entity recognition method and device towards multi-data source |
CN107122582A (en) * | 2017-02-24 | 2017-09-01 | 黑龙江特士信息技术有限公司 | Towards the diagnosis and treatment class entity recognition method and device of multi-data source |
CN107195296A (en) * | 2016-03-15 | 2017-09-22 | 阿里巴巴集团控股有限公司 | A kind of audio recognition method, device, terminal and system |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN107506345A (en) * | 2016-06-14 | 2017-12-22 | 科大讯飞股份有限公司 | The construction method and device of language model |
CN107766559A (en) * | 2017-11-06 | 2018-03-06 | 第四范式(北京)技术有限公司 | Training method, trainer, dialogue method and the conversational system of dialog model |
CN107818080A (en) * | 2017-09-22 | 2018-03-20 | 新译信息科技(北京)有限公司 | Term recognition methods and device |
CN107832289A (en) * | 2017-10-12 | 2018-03-23 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on LSTM CNN |
CN107885721A (en) * | 2017-10-12 | 2018-04-06 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on LSTM |
CN107886943A (en) * | 2017-11-21 | 2018-04-06 | 广州势必可赢网络科技有限公司 | Voiceprint recognition method and device |
CN107908614A (en) * | 2017-10-12 | 2018-04-13 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on Bi LSTM |
CN107967251A (en) * | 2017-10-12 | 2018-04-27 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on Bi-LSTM-CNN |
CN108074565A (en) * | 2016-11-11 | 2018-05-25 | 上海诺悦智能科技有限公司 | Phonetic order redirects the method and system performed with detailed instructions |
CN108228682A (en) * | 2016-12-21 | 2018-06-29 | 财团法人工业技术研究院 | Character string verification method, character string expansion method and verification model training method |
CN108428137A (en) * | 2017-02-14 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Generate the method and device of abbreviation, verification electronic banking rightness of business |
CN108920457A (en) * | 2018-06-15 | 2018-11-30 | 腾讯大地通途(北京)科技有限公司 | Address Recognition method and apparatus and storage medium |
CN109101481A (en) * | 2018-06-25 | 2018-12-28 | 北京奇艺世纪科技有限公司 | A kind of name entity recognition method, device and electronic equipment |
CN109657230A (en) * | 2018-11-06 | 2019-04-19 | 众安信息技术服务有限公司 | Merge the name entity recognition method and device of term vector and part of speech vector |
CN110083820A (en) * | 2018-01-26 | 2019-08-02 | 普天信息技术有限公司 | A kind of improved method and device of benchmark participle model |
CN110162772A (en) * | 2018-12-13 | 2019-08-23 | 北京三快在线科技有限公司 | Name entity recognition method and device |
RU2699687C1 (en) * | 2018-06-18 | 2019-09-09 | Общество с ограниченной ответственностью "Аби Продакшн" | Detecting text fields using neural networks |
CN110276066A (en) * | 2018-03-16 | 2019-09-24 | 北京国双科技有限公司 | The analysis method and relevant apparatus of entity associated relationship |
CN110309515A (en) * | 2019-07-10 | 2019-10-08 | 北京奇艺世纪科技有限公司 | Entity recognition method and device |
CN111079418A (en) * | 2019-11-06 | 2020-04-28 | 科大讯飞股份有限公司 | Named body recognition method and device, electronic equipment and storage medium |
CN111444720A (en) * | 2020-03-30 | 2020-07-24 | 华南理工大学 | Named entity recognition method for English text |
CN113408273A (en) * | 2021-06-30 | 2021-09-17 | 北京百度网讯科技有限公司 | Entity recognition model training and entity recognition method and device |
US11675978B2 (en) | 2021-01-06 | 2023-06-13 | International Business Machines Corporation | Entity recognition based on multi-task learning and self-consistent verification |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050209844A1 (en) * | 2004-03-16 | 2005-09-22 | Google Inc., A Delaware Corporation | Systems and methods for translating chinese pinyin to chinese characters |
US7171350B2 (en) * | 2002-05-03 | 2007-01-30 | Industrial Technology Research Institute | Method for named-entity recognition and verification |
CN101075228A (en) * | 2006-05-15 | 2007-11-21 | 松下电器产业株式会社 | Method and apparatus for named entity recognition in natural language |
CN101576910A (en) * | 2009-05-31 | 2009-11-11 | 北京学之途网络科技有限公司 | Method and device for identifying product naming entity automatically |
CN104615589A (en) * | 2015-02-15 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Named-entity recognition model training method and named-entity recognition method and device |
-
2015
- 2015-06-12 CN CN201510321448.8A patent/CN104899304B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7171350B2 (en) * | 2002-05-03 | 2007-01-30 | Industrial Technology Research Institute | Method for named-entity recognition and verification |
US20050209844A1 (en) * | 2004-03-16 | 2005-09-22 | Google Inc., A Delaware Corporation | Systems and methods for translating chinese pinyin to chinese characters |
CN101075228A (en) * | 2006-05-15 | 2007-11-21 | 松下电器产业株式会社 | Method and apparatus for named entity recognition in natural language |
CN101576910A (en) * | 2009-05-31 | 2009-11-11 | 北京学之途网络科技有限公司 | Method and device for identifying product naming entity automatically |
CN104615589A (en) * | 2015-02-15 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Named-entity recognition model training method and named-entity recognition method and device |
Non-Patent Citations (2)
Title |
---|
姚霖等: "词边界字向量的中文命名实体识别", 《智能***学报》 * |
毕海滨等: "基于语义与SVM的中文实体关系抽取", 《第18届全国信息存储技术学术会议论文集》 * |
Cited By (68)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294313A (en) * | 2015-06-26 | 2017-01-04 | 微软技术许可有限责任公司 | Study embeds for entity and the word of entity disambiguation |
CN106815193A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and wrong word recognition methods and device |
CN106815194A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and keyword recognition method and device |
CN105550227A (en) * | 2015-12-07 | 2016-05-04 | 中国建设银行股份有限公司 | Named entity identification method and device |
CN105468780B (en) * | 2015-12-18 | 2019-01-29 | 北京理工大学 | The normalization method and device of ProductName entity in a kind of microblogging text |
CN105468780A (en) * | 2015-12-18 | 2016-04-06 | 北京理工大学 | Normalization method and device of product name entity in microblog text |
CN105701077B (en) * | 2016-01-13 | 2018-04-13 | 夏峰 | A kind of multilingual literature detection method and system |
CN105701087A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Formula plagiarism detection method and system |
CN105701077A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Multi-language literature detection method and system |
CN105550172B (en) * | 2016-01-13 | 2018-06-01 | 夏峰 | A kind of distributed text detection method and system |
CN105701086B (en) * | 2016-01-13 | 2018-06-01 | 夏峰 | A kind of sliding window document detection method and system |
CN105701213A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Literature comparison method and system |
CN105701087B (en) * | 2016-01-13 | 2018-03-16 | 夏峰 | A kind of formula plagiarizes detection method and system |
CN105701075B (en) * | 2016-01-13 | 2018-04-13 | 夏峰 | A kind of document associated detecting method and system |
CN105701213B (en) * | 2016-01-13 | 2018-12-28 | 夏峰 | A kind of document control methods and system |
CN105701086A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Method and system for detecting literature through sliding window |
CN105701075A (en) * | 2016-01-13 | 2016-06-22 | 夏峰 | Joint detection method and system for literature |
CN105550172A (en) * | 2016-01-13 | 2016-05-04 | 夏峰 | Distributive text detection method and system |
CN107195296A (en) * | 2016-03-15 | 2017-09-22 | 阿里巴巴集团控股有限公司 | A kind of audio recognition method, device, terminal and system |
CN107506345A (en) * | 2016-06-14 | 2017-12-22 | 科大讯飞股份有限公司 | The construction method and device of language model |
CN106095988A (en) * | 2016-06-21 | 2016-11-09 | 上海智臻智能网络科技股份有限公司 | Automatic question-answering method and device |
CN106202255A (en) * | 2016-06-30 | 2016-12-07 | 昆明理工大学 | Merge the Vietnamese name entity recognition method of physical characteristics |
CN106202054B (en) * | 2016-07-25 | 2018-12-14 | 哈尔滨工业大学 | A kind of name entity recognition method towards medical field based on deep learning |
CN106202054A (en) * | 2016-07-25 | 2016-12-07 | 哈尔滨工业大学 | A kind of name entity recognition method learnt based on the degree of depth towards medical field |
CN106557462A (en) * | 2016-11-02 | 2017-04-05 | 数库(上海)科技有限公司 | Name entity recognition method and system |
CN106570170A (en) * | 2016-11-09 | 2017-04-19 | 武汉泰迪智慧科技有限公司 | Text classification and naming entity recognition integrated method and system based on depth cyclic neural network |
CN108074565A (en) * | 2016-11-11 | 2018-05-25 | 上海诺悦智能科技有限公司 | Phonetic order redirects the method and system performed with detailed instructions |
CN108228682B (en) * | 2016-12-21 | 2020-09-29 | 财团法人工业技术研究院 | Character string verification method, character string expansion method and verification model training method |
CN108228682A (en) * | 2016-12-21 | 2018-06-29 | 财团法人工业技术研究院 | Character string verification method, character string expansion method and verification model training method |
CN106682220A (en) * | 2017-01-04 | 2017-05-17 | 华南理工大学 | Online traditional Chinese medicine text named entity identifying method based on deep learning |
CN108428137A (en) * | 2017-02-14 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Generate the method and device of abbreviation, verification electronic banking rightness of business |
CN106844351A (en) * | 2017-02-24 | 2017-06-13 | 黑龙江特士信息技术有限公司 | A kind of medical institutions towards multi-data source organize class entity recognition method and device |
CN107122582A (en) * | 2017-02-24 | 2017-09-01 | 黑龙江特士信息技术有限公司 | Towards the diagnosis and treatment class entity recognition method and device of multi-data source |
CN106933803A (en) * | 2017-02-24 | 2017-07-07 | 黑龙江特士信息技术有限公司 | A kind of medical equipment class entity recognition method and device towards multi-data source |
CN106933803B (en) * | 2017-02-24 | 2020-02-21 | 黑龙江特士信息技术有限公司 | Medical equipment type entity identification method and device oriented to multiple data sources |
CN106844351B (en) * | 2017-02-24 | 2020-02-21 | 易保互联医疗信息科技(北京)有限公司 | Medical institution organization entity identification method and device oriented to multiple data sources |
CN106933802B (en) * | 2017-02-24 | 2020-02-21 | 黑龙江特士信息技术有限公司 | Multi-data-source-oriented social security entity identification method and device |
CN107122582B (en) * | 2017-02-24 | 2019-12-06 | 黑龙江特士信息技术有限公司 | diagnosis and treatment entity identification method and device facing multiple data sources |
CN106933802A (en) * | 2017-02-24 | 2017-07-07 | 黑龙江特士信息技术有限公司 | A kind of social security class entity recognition method and device towards multi-data source |
CN107291693B (en) * | 2017-06-15 | 2021-01-12 | 广州赫炎大数据科技有限公司 | Semantic calculation method for improved word vector model |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN107818080A (en) * | 2017-09-22 | 2018-03-20 | 新译信息科技(北京)有限公司 | Term recognition methods and device |
CN107967251A (en) * | 2017-10-12 | 2018-04-27 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on Bi-LSTM-CNN |
CN107908614A (en) * | 2017-10-12 | 2018-04-13 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on Bi LSTM |
CN107832289A (en) * | 2017-10-12 | 2018-03-23 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on LSTM CNN |
CN107885721A (en) * | 2017-10-12 | 2018-04-06 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on LSTM |
CN107766559A (en) * | 2017-11-06 | 2018-03-06 | 第四范式(北京)技术有限公司 | Training method, trainer, dialogue method and the conversational system of dialog model |
CN107766559B (en) * | 2017-11-06 | 2019-12-13 | 第四范式(北京)技术有限公司 | training method, training device, dialogue method and dialogue system for dialogue model |
CN107886943A (en) * | 2017-11-21 | 2018-04-06 | 广州势必可赢网络科技有限公司 | Voiceprint recognition method and device |
CN110083820A (en) * | 2018-01-26 | 2019-08-02 | 普天信息技术有限公司 | A kind of improved method and device of benchmark participle model |
CN110083820B (en) * | 2018-01-26 | 2023-06-27 | 普天信息技术有限公司 | Improvement method and device of benchmark word segmentation model |
CN110276066B (en) * | 2018-03-16 | 2021-07-27 | 北京国双科技有限公司 | Entity association relation analysis method and related device |
CN110276066A (en) * | 2018-03-16 | 2019-09-24 | 北京国双科技有限公司 | The analysis method and relevant apparatus of entity associated relationship |
CN108920457A (en) * | 2018-06-15 | 2018-11-30 | 腾讯大地通途(北京)科技有限公司 | Address Recognition method and apparatus and storage medium |
RU2699687C1 (en) * | 2018-06-18 | 2019-09-09 | Общество с ограниченной ответственностью "Аби Продакшн" | Detecting text fields using neural networks |
CN109101481A (en) * | 2018-06-25 | 2018-12-28 | 北京奇艺世纪科技有限公司 | A kind of name entity recognition method, device and electronic equipment |
CN109101481B (en) * | 2018-06-25 | 2022-07-22 | 北京奇艺世纪科技有限公司 | Named entity identification method and device and electronic equipment |
CN109657230A (en) * | 2018-11-06 | 2019-04-19 | 众安信息技术服务有限公司 | Merge the name entity recognition method and device of term vector and part of speech vector |
CN110162772B (en) * | 2018-12-13 | 2020-06-26 | 北京三快在线科技有限公司 | Named entity identification method and device |
CN110162772A (en) * | 2018-12-13 | 2019-08-23 | 北京三快在线科技有限公司 | Name entity recognition method and device |
CN110309515A (en) * | 2019-07-10 | 2019-10-08 | 北京奇艺世纪科技有限公司 | Entity recognition method and device |
CN110309515B (en) * | 2019-07-10 | 2023-08-11 | 北京奇艺世纪科技有限公司 | Entity identification method and device |
CN111079418B (en) * | 2019-11-06 | 2023-12-05 | 科大讯飞股份有限公司 | Named entity recognition method, device, electronic equipment and storage medium |
CN111079418A (en) * | 2019-11-06 | 2020-04-28 | 科大讯飞股份有限公司 | Named body recognition method and device, electronic equipment and storage medium |
CN111444720A (en) * | 2020-03-30 | 2020-07-24 | 华南理工大学 | Named entity recognition method for English text |
US11675978B2 (en) | 2021-01-06 | 2023-06-13 | International Business Machines Corporation | Entity recognition based on multi-task learning and self-consistent verification |
CN113408273B (en) * | 2021-06-30 | 2022-08-23 | 北京百度网讯科技有限公司 | Training method and device of text entity recognition model and text entity recognition method and device |
CN113408273A (en) * | 2021-06-30 | 2021-09-17 | 北京百度网讯科技有限公司 | Entity recognition model training and entity recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
CN104899304B (en) | 2018-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104899304A (en) | Named entity identification method and device | |
CN111125331B (en) | Semantic recognition method, semantic recognition device, electronic equipment and computer readable storage medium | |
CN109685056A (en) | Obtain the method and device of document information | |
CN110427623A (en) | Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium | |
CN111666427B (en) | Entity relationship joint extraction method, device, equipment and medium | |
CN104615589A (en) | Named-entity recognition model training method and named-entity recognition method and device | |
CN111563384B (en) | Evaluation object identification method and device for E-commerce products and storage medium | |
CN111475617A (en) | Event body extraction method and device and storage medium | |
CN112800239B (en) | Training method of intention recognition model, and intention recognition method and device | |
CN112650858B (en) | Emergency assistance information acquisition method and device, computer equipment and medium | |
CN112990035B (en) | Text recognition method, device, equipment and storage medium | |
WO2021027125A1 (en) | Sequence labeling method and apparatus, computer device and storage medium | |
CN111160041B (en) | Semantic understanding method and device, electronic equipment and storage medium | |
CN110110213B (en) | Method and device for mining user occupation, computer readable storage medium and terminal equipment | |
CN109408802A (en) | A kind of method, system and storage medium promoting sentence vector semanteme | |
CN109508458A (en) | The recognition methods of legal entity and device | |
CN113449528B (en) | Address element extraction method and device, computer equipment and storage medium | |
CN109308311A (en) | A kind of multi-source heterogeneous data fusion system | |
CN113723077B (en) | Sentence vector generation method and device based on bidirectional characterization model and computer equipment | |
CN114240672A (en) | Method for identifying green asset proportion and related product | |
CN116644148A (en) | Keyword recognition method and device, electronic equipment and storage medium | |
CN113032523B (en) | Extraction method and device of triple information, electronic equipment and storage medium | |
CN114936271A (en) | Method, apparatus and medium for natural language translation database query | |
CN111708819B (en) | Method, apparatus, electronic device, and storage medium for information processing | |
CN110019829A (en) | Data attribute determines method, apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |