CN104899304A - Named entity identification method and device - Google Patents

Named entity identification method and device Download PDF

Info

Publication number
CN104899304A
CN104899304A CN201510321448.8A CN201510321448A CN104899304A CN 104899304 A CN104899304 A CN 104899304A CN 201510321448 A CN201510321448 A CN 201510321448A CN 104899304 A CN104899304 A CN 104899304A
Authority
CN
China
Prior art keywords
word
sample
measured
vector
vector corresponding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510321448.8A
Other languages
Chinese (zh)
Other versions
CN104899304B (en
Inventor
姜文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201510321448.8A priority Critical patent/CN104899304B/en
Publication of CN104899304A publication Critical patent/CN104899304A/en
Application granted granted Critical
Publication of CN104899304B publication Critical patent/CN104899304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a named entity identification method and a named entity identification device capable of accurately identifying a named entity, in particular to a named entity in the field of E-business. The method comprises: acquiring a vector library; carrying out word segmentation on a training corpus text string to obtain a plurality of sample words; inquiring the vector library of each sample word sequentially to obtain a first feature vector which comprises a word vector and a word class vector corresponding to the same word as well as an entity marking vector corresponding to the last word of the sample word; taking all the first feature vectors integrally as an input quantity, and training a named entity identification model of a neutral network; carrying out word segmentation on a to-be-predicted text string to obtain a plurality of to-be-tested words; inquiring the vector library of each sample word sequentially to obtain a second feature vector which comprises a word vector and a word class vector corresponding to the same word as well as an entity marking vector corresponding to the last word of the sample word; respectively inputting the second feature vectors corresponding to all the to-be-tested words into the model, and outputting entity identifiers of the to-be-tested words.

Description

Named entity recognition method and device
Technical field
The present invention relates to natural language processing technique field, particularly relate to a kind of named entity recognition method and device.
Background technology
Along with the fast development of Internet technology, information service becomes more and more universal.Wherein, the identification of named entity is the important foundation work of the information service application such as metadata mark of information extraction, question answering system, syntactic analysis, mechanical translation, Internet.Named entity (abbreviation entity), refer to name, mechanism's name, place name and other all entities being called mark with name, named entity also comprises numeral, date, currency, address etc. widely.
The technology adopting nerual network technique training named entity recognition has been had in prior art.Existing method at least has following several shortcoming: (1) mainly relies on word itself as input feature vector, the aspect of model is single, directly do not introduce the front and back dependence between entity indicia, causing the accuracy rate of identification not high, particularly often identifying when identifying the named entity in electric business field inaccurate; (2) because the initial value of network is stochastic generation, final parameter optimization result is probably good not, and the training time, the longer development efficiency that causes was low; (3) do not take into full account that the distribution situation of training data causes the fitting degree of model to entity uneven.
The named entity in electricity business field, such as trade name (Nokia 1020, ThinkPad E431 14 inches of notebook computers), price, item property etc., these named entities are made up of continuous print word one or more in sentence usually, forms such as " noun+numbers " that part of speech is generally.In a word, the named entity in electric business field has salient feature, and the named entity needed badly at present for electric business field develops recognition methods or recognition device.
Summary of the invention
In view of this, the invention provides a kind of named entity recognition method and device, named entity can be identified exactly, particularly the named entity in electric business field.
For achieving the above object, according to an aspect of the present invention, provide a kind of named entity recognition method, comprise: obtain vectorial storehouse, described vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively; Corpus text string participle is obtained ordered multiple sample words; According to priority for the described vectorial storehouse of each sample word inquiry to build first eigenvector, described first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial; Using overall for described first eigenvector corresponding for all sample words training input quantity as neural network, utilize BP algorithm of neural network to carry out network parameter and solve, obtain neural network Named Entity Extraction Model; Text string participle to be predicted is obtained ordered word multiple to be measured; According to priority for the described vectorial storehouse of each word to be measured inquiry to build second feature vector, described second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial; Described second feature vector corresponding for each word to be measured described is inputted described neural network Named Entity Extraction Model respectively, exports the entity indicia of described word to be measured.
Alternatively, also comprise in described first eigenvector: described sample word is close to term vector corresponding to word and described sample word is close to part of speech vector corresponding to word, and, also comprise in described second feature vector: described word to be measured is close to term vector corresponding to word and described word to be measured is close to part of speech vector corresponding to word.
Alternatively, when described first eigenvector is built for the first sample word in ordered multiple sample words, the last word of described first sample word is book character string, and, when building described second feature vector for the word first to be measured in ordered word multiple to be measured, the last word of described first word to be measured is book character string.
Alternatively, negative routine sample is also comprised in the training input quantity of described neural network.
For achieving the above object, according to a further aspect in the invention, provide a kind of named entity recognition device, comprise: vectorial storehouse acquisition module, for obtaining vectorial storehouse, described vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively; First participle module, for obtaining ordered multiple sample words by corpus text string participle; First builds module, for according to priority for the described vectorial storehouse of each sample word inquiry to build first eigenvector, described first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial; Training module, for using overall for described first eigenvector corresponding for all sample words training input quantity as neural network, utilizes BP algorithm of neural network to carry out network parameter and solves, obtain neural network Named Entity Extraction Model; Second word-dividing mode, for obtaining ordered word multiple to be measured by text string participle to be predicted; Second builds module, for according to priority for the described vectorial storehouse of each word to be measured inquiry to build second feature vector, described second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial; Prediction module, for described second feature vector corresponding for each word to be measured described is inputted described neural network Named Entity Extraction Model respectively, exports the entity indicia of described word to be measured.
Alternatively, also comprise in described first eigenvector: described sample word is close to term vector corresponding to word and described sample word is close to part of speech vector corresponding to word, and, also comprise in described second feature vector: described word to be measured is close to term vector corresponding to word and described word to be measured is close to part of speech vector corresponding to word.
Alternatively, described first build module also for: when described first eigenvector is built for the first sample word in ordered multiple sample words, use book character string as the last word of described first sample word, and, described second build module also for: when building described second feature vector for the word first to be measured in ordered word multiple to be measured, use book character string as the last word of described first word to be measured.
Alternatively, in described training module, in the training input quantity of described neural network, also comprise negative routine sample.
According to technical scheme of the present invention, have employed more reasonably proper vector to carry out training pattern and utilize model to predict, this proper vector not only comprises the feature of current word word itself, also comprise the entity indicia feature of current word part of speech feature, the last word of current word, compared with the existing recognition technology only considering word itself, the information considered is more comprehensive, causes the recognition result that finally obtains more accurate, particularly higher to accuracy rate during electric business's domain entities identification.
Accompanying drawing explanation
Accompanying drawing is used for understanding the present invention better, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the process flow diagram of the key step of named entity recognition method according to the embodiment of the present invention;
Fig. 2 is the schematic diagram of the critical piece of named entity recognition device according to the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, one exemplary embodiment of the present invention is explained, comprising the various details of the embodiment of the present invention to help understanding, they should be thought it is only exemplary.Therefore, those of ordinary skill in the art will be appreciated that, can make various change and amendment, and can not deviate from scope and spirit of the present invention to the embodiments described herein.Equally, for clarity and conciseness, the description to known function and structure is eliminated in following description.
For making those skilled in the art understand better, first relational language is briefly introduced.
Word: the word of word itself.
Term vector: the vectorization of word represents, each word vector of a multidimensional represents.
Part of speech: the character of word.Usually word is divided into two classes, 12 kinds of parts of speech.One class is notional word: noun, verb, adjective, number, adverbial word, onomatopoeia, measure word and pronoun.One class is function word: preposition, conjunction, auxiliary word and interjection.
Part of speech vector: the vectorization of part of speech represents, often kind of part of speech multi-C vector represents, preferably adopts the multi-C vector of discrete form to represent.
Entity indicia: each entity indicia represents a kind of entity type, such as WID represents that commodity ID, WB represent first word of trade name, and WI represents the medium term of trade name, and WE represents the end word of trade name, and O represents other words etc.Such as: how (O) red (WI) mobile phone (WE) of millet (WB) 2s (WI).
Entity indicia vector: the vectorization of entity indicia represents, often kind of entity indicia multi-C vector represents, preferably adopts the multi-C vector of discrete form to represent.
It should be noted that, term vector, part of speech vector and vectorial these three the vectorial dimensions of entity indicia do not need to be consistent, and can arrange flexibly as required.
Fig. 1 is the process flow diagram of the key step of named entity recognition method according to the embodiment of the present invention.As shown in Figure 1, this named entity recognition method can comprise steps A to step G.
Steps A: obtain vectorial storehouse.This vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively.
In an embodiment of the invention, for given language material, word2dec can be utilized to determine the term vector that each word in language material is corresponding.Word2vec is a instrument word being characterized by real number value vector that Google increased income in 2013, and word can be mapped to K gt, the vector operations even between word with word can also be corresponding with semanteme.Therefore utilize word2vec to precalculate term vector, can save time, raise the efficiency, and can accuracy rate be improved.Part of speech vector sum entity indicia vector can adopt the method for random initializtion, obtains random vector.The term vector obtained by said process, part of speech vector sum entity indicia vector are stored in vectorial storehouse for subsequent use.
Step B: corpus text string participle is obtained ordered multiple sample words.
In embodiments of the present invention, corpus text string can be extracted from the data of electric business website and then carry out participle, obtain multiple ordered sample word, as shown in table 1:
Table 1 corpus text string and sample word
Corpus text string Ordered sample word
" iphone price " " iphone " " price "
" Huawei's honor 6 " " Huawei " " honor " " 6 "
" the red mobile phone of millet 1s " " millet " " 1s " " redness " " mobile phone "
…… ……
Step C: according to priority for each sample word query vector storehouse to build first eigenvector.First eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia vector corresponding to the last word of sample word.Outside the information that first eigenvector contains the word of sample word itself and part-of-speech information, also comprise the entity indicia information of the last word of sample word.Method of the present invention carrys out training pattern based on first eigenvector, comes compared with the prior art of training pattern with only relying on the information of word itself, and the information of consideration is more comprehensive, causes the recognition result that finally obtains more accurate.
It should be noted that the implication of " first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia vector corresponding to the last word of sample word " refers to that first eigenvector is spliced by three vectors below, such as: first eigenvector=[term vector that sample word is corresponding, the part of speech vector that sample word is corresponding, the entity indicia vector that the last word of sample word is corresponding].When the present invention is not spliced vector, splicing order limits, and different splicing orders does not affect principle of the present invention.But the splicing order in whole method is once determine, no longer changes, consistent to ensure all first eigenvector forms.
The detailed process of step C is exemplified below: obtain ordered multiple sample words " sample word 1+ sample word 2+ sample word 3+ sample word 4 ... " before supposing, then need according to priority to sample word 1, sample word 2, sample word 3, sample word 4 etc. builds first eigenvector respectively.It is 0 that word window width is got in setting.Wherein, when building first eigenvector to sample word 1 (i.e. first sample word), because sample word 1 does not exist word, so need to increase the last word of book character string " $ BEGIN " as sample word 1 artificially above originally.The entity indicia vector of this book character string " $ BEGIN " has been pre-existing in vectorial storehouse, is generally random initialization vector.At this moment, for sample word 1, suppose that the term vector inquiring sample word 1 from vectorial storehouse is designated as X1, the part of speech vector of sample word 1 is designated as Z1, and the entity indicia vector of " $ BEGIN " is designated as T0, then the first resultant vector=[X1 of sample word 1, Z1, T0].Then, for sample word 2, suppose that the term vector inquiring sample word 2 from vectorial storehouse is designated as X2, the part of speech vector of sample word 2 is Z2, the entity indicia vector of the last word (i.e. sample word 1) of sample word 2 is designated as T1, then the first resultant vector=[X2, Z2, the T1] of sample word 2.By that analogy, first eigenvector corresponding to all sample words can be obtained.
In embodiments of the present invention, can also comprise in first eigenvector: sample word is close to term vector corresponding to word and sample word is close to part of speech vector corresponding to word.The meaning herein " also comprised " refers to " being also spliced by vector below "." the contiguous word of sample word " refers to that before being positioned at current sample word or after being positioned at current sample word, distance is not more than the sample word getting word window width.Be exemplified below: suppose that getting word window width is 1, then the contiguous word of sample word refers to 1 word after front 1 word of current sample word and current sample word.The first eigenvector of current sample word can be designated as [the term vector that the last word of current sample word is corresponding, the term vector that current sample word is corresponding, the term vector that after current sample word, a word is corresponding, the part of speech vector that the last word of current sample word is corresponding, the part of speech vector that current sample word is corresponding, the part of speech vector that after current sample word, a word is corresponding, the entity indicia vector that the last word of current sample word is corresponding].The situation that other numerical value get word window width can be analogized, and repeats no more herein.It should be noted that, the present invention does not limit the numerical value getting word window width, can arrange flexibly as required, but once determine, no longer changes, consistent to ensure all first eigenvector forms.Also it should be noted that, when getting word window width and increasing, the contiguous word be positioned at before first sample word can be served as to the preset characters string increased before first sample word, can also to increasing preset characters string to serve as the contiguous word be positioned at after the sample word of end after the sample word of end, those skilled in the art can derive specific practice by content above, repeat no more herein.In this embodiment, first eigenvector has further contemplated word information and the part-of-speech information of the contiguous word of sample word, and the information of consideration is more comprehensive, causes the recognition result that finally obtains more accurate.
Step D: using overall for first eigenvector corresponding for all sample words training input quantity as neural network, utilize BP algorithm of neural network to carry out network parameter and solve, obtain neural network Named Entity Extraction Model.Particularly, square error can be adopted to build the objective function of model entirety, utilize stochastic gradient method to solve the parameter of neural network, obtain final neural network Named Entity Extraction Model.
In embodiments of the present invention, negative routine sample can also be comprised in the training input quantity of neural network.Due to the normally skewness of the entity indicia in the corpus text string of reality, this can cause model poor to a part of named entity matching.Be directed to this, in the process of training pattern, according to the distribution situation of these entity indicia, the sampling of data minus example can be carried out in proportion at random, ensure that its distribution is even as much as possible, thus ensure that the matching that model marks all named entities is more accurate.
Step e: text string participle to be predicted is obtained ordered word multiple to be measured.
In embodiments of the present invention, text string to be predicted can be obtained from user's read statement and then carry out participle, obtain multiple ordered word to be measured.
Step F: according to priority for each word query vector storehouse to be measured to build second feature vector, second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia vector corresponding to the last word of word to be measured.
It should be noted that, when second feature vector is built for the word first to be measured in ordered word multiple to be measured, the last word of book character string " $ BEGIN " as first word to be measured can be increased before first word to be measured.Operation herein with above before first sample word, increase the class of operation of book character string seemingly.
Also it should be noted that, the form of the first eigenvector that the second feature vector that word to be measured is corresponding should be corresponding with sample word is consistent.This means to comprise point vectorial kind in second feature vector and divide vectorial splicing order needs consistent with first eigenvector.Such as: when also comprising term vector corresponding to the contiguous word of sample word and part of speech vector corresponding to the contiguous word of sample word in first eigenvector, correspondingly, term vector corresponding to the contiguous word of word to be measured and part of speech vector corresponding to the contiguous word of word to be measured is also comprised in second feature vector.
Step G: respectively by second feature corresponding for word to be measured vector input neural network Named Entity Extraction Model, export the entity indicia of word to be measured.
For making those skilled in the art understand better, the specific embodiment enumerating a named entity recognition method is as follows.
(1) word2vec instrument is utilized to obtain vectorial storehouse.
(2) suppose that some corpus text strings are for " iphone price ", can obtain two sample words " iphone " and " price " through participle.The part of speech of " iphone " is noun n, and entity indicia is commodity entity indicia W.The part of speech of " price " is noun n, and entity indicia is other entity indicia O.
(3) first eigenvector corresponding to " iphone " is first built.Because " iphone " is first sample word, therefore need to add " $ BEGIN " (its term vector, part of speech are vectorial, entity indicia is vectorial is all random initializtion) above.Suppose that the word window width of getting in the present embodiment is 1.Query word vector storehouse, the term vector that after taking out the last word of current sample word " $ BEGIN ", current sample word " iphone ", current sample word, these three words of a word " price " are corresponding is expressed as Xi-1, Xi, Xi+1, and part of speech vector representation corresponding to these three words is Zi-1, Zi, Zi+1, the entity tag of adding " $ BEGIN " is expressed as Ti-1.These seven vectors are stitched together in order, form first eigenvector=[Xi-1, Xi, Xi+1, Zi-1, Zi, Zi+1, the Ti-1] that " iphone " is corresponding.
(4) using the input layer of first eigenvector as input quantity input neural network, obtain exporting h (X).In the present embodiment, entity indicia W/O is converted to the discrete representation of 1/0.Entity indicia due to known " iphone " is " W " desired output is here 1.Utilize gradient descent algorithm to carry out parameter optimization, make error minimum.By all corpus text strings through above training process, final neural network Named Entity Extraction Model can be obtained.(5) suppose some text strings to be predicted " Nokia white ", word segmentation result is two words to be measured " Nokia " and " white ", and the part of speech of known " Nokia " and " white " is noun n.
(6) process building second feature vector corresponding to " Nokia " is as follows: before " Nokia ", add " $ BEGIN ".Query word vector storehouse, obtains the term vector that " $ BEGIN " " Nokia " " white " is corresponding, then obtains the part of speech vector that " $ BEGIN " " Nokia " " white " is corresponding, and obtains the entity indicia vector of " $ BEGIN ".These seven vectors are stitched together in order, namely obtain the second feature vector that " Nokia " is corresponding.
(7) by the neural network Named Entity Extraction Model that second feature vector input step (4) corresponding to " Nokia " obtains, to predict the entity indicia of " Nokia ".If model exports h (X)=0.8, numerical value is greater than intermediate value 0.5, then " Nokia " is labeled as W (commodity entity).Export h (X)=0.2 as crossed model, numerical value is less than intermediate value 0.5, then " Nokia " is labeled as O (other entities).
Fig. 2 is the schematic diagram of the critical piece of named entity recognition method according to the embodiment of the present invention.As shown in Figure 2, this named entity recognition device 20 can comprise: vectorial storehouse acquisition module 21, first participle module 22, first build module 23, training module 24, second word-dividing mode 25, second builds module 26 and prediction module 27.
Vector storehouse acquisition module 21 is for obtaining vectorial storehouse, and vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively.Alternatively, word2dec is utilized to determine the term vector that multiple word is corresponding.Utilize word2dec to precalculate, save the training time.
First participle module 22 is for obtaining ordered multiple sample words by corpus text string participle.
First build module 23 for according to priority for each sample word query vector storehouse to build first eigenvector, first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial.
Training module 24, for using overall for first eigenvector corresponding for all sample words training input quantity as neural network, utilizes BP algorithm of neural network to carry out network parameter and solves, obtain neural network Named Entity Extraction Model.
Second word-dividing mode 25 is for obtaining ordered word multiple to be measured by text string participle to be predicted.
Second build module 26 for according to priority for each word query vector storehouse to be measured to build second feature vector, second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial.
Prediction module 27, for by second feature corresponding for each word to be measured vector input neural network Named Entity Extraction Model respectively, exports the entity indicia of word to be measured.
In embodiments of the present invention, can also comprise in first eigenvector: sample word is close to term vector corresponding to word and sample word is close to part of speech vector corresponding to word, and, can also comprise in second feature vector: word to be measured is close to term vector corresponding to word and word to be measured is close to part of speech vector corresponding to word.In this embodiment, first eigenvector and second feature vector have further contemplated word information and the part-of-speech information of contiguous word, and the information of consideration is more comprehensive, cause the recognition result that finally obtains more accurate.
In embodiments of the present invention, first builds module 23 can also be used for: when building first eigenvector for the first sample word in ordered multiple sample words, the last word of first sample word is book character string, and, second builds module 26 can also be used for: when building second feature vector for the word first to be measured in ordered word multiple to be measured, the last word of first word to be measured is book character string.This addresses the problem before first sample word or first word to be measured and originally lack word problem.
In embodiments of the present invention, in training module 27, in the training input quantity of neural network, also comprise negative routine sample.Introduce negative routine sample and can ensure that sample distribution is even as much as possible, thus ensure that the matching that model marks all named entities is more accurate.
In sum, named entity recognition method of the present invention and device have employed more reasonably proper vector to be carried out training pattern and utilizes model to predict, this proper vector not only comprises the feature of current word word itself, also comprise the entity indicia feature of current word part of speech feature, the last word of current word, compared with the existing recognition technology only considering word itself, the information considered is more comprehensive, cause the recognition result that finally obtains more accurate, particularly higher to accuracy rate during electric business's domain entities identification.
Above-mentioned embodiment, does not form limiting the scope of the invention.It is to be understood that depend on designing requirement and other factors, various amendment, combination, sub-portfolio can be there is and substitute in those skilled in the art.Any amendment done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within scope.

Claims (8)

1. a named entity recognition method, is characterized in that, comprising:
Obtain vectorial storehouse, described vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively;
Corpus text string participle is obtained ordered multiple sample words;
According to priority for the described vectorial storehouse of each sample word inquiry to build first eigenvector, described first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial;
Using overall for described first eigenvector corresponding for all sample words training input quantity as neural network, utilize BP algorithm of neural network to carry out network parameter and solve, obtain neural network Named Entity Extraction Model;
Text string participle to be predicted is obtained ordered word multiple to be measured;
According to priority for the described vectorial storehouse of each word to be measured inquiry to build second feature vector, described second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial;
Described second feature vector corresponding for each word to be measured described is inputted described neural network Named Entity Extraction Model respectively, exports the entity indicia of described word to be measured.
2. method according to claim 1, is characterized in that,
Also comprise in described first eigenvector: described sample word is close to term vector corresponding to word and described sample word is close to part of speech vector corresponding to word, and,
Also comprise in described second feature vector: described word to be measured is close to term vector corresponding to word and described word to be measured is close to part of speech vector corresponding to word.
3. method according to claim 1, is characterized in that,
When building described first eigenvector for the first sample word in ordered multiple sample words, the last word of described first sample word is book character string, and,
When building described second feature vector for the word first to be measured in ordered word multiple to be measured, the last word of described first word to be measured is book character string.
4. method according to claim 1, is characterized in that, also comprises negative routine sample in the training input quantity of described neural network.
5. a named entity recognition device, is characterized in that, comprising:
Vector storehouse acquisition module, for obtaining vectorial storehouse, described vectorial storehouse comprises multiple word term vector corresponding respectively, the part of speech vector that multiclass part of speech is corresponding respectively, and the entity indicia vector that multiclass entity indicia is corresponding respectively;
First participle module, for obtaining ordered multiple sample words by corpus text string participle;
First builds module, for according to priority for the described vectorial storehouse of each sample word inquiry to build first eigenvector, described first eigenvector comprises part of speech vector corresponding to term vector corresponding to sample word, sample word and entity indicia corresponding to the last word of sample word is vectorial;
Training module, for using overall for described first eigenvector corresponding for all sample words training input quantity as neural network, utilizes BP algorithm of neural network to carry out network parameter and solves, obtain neural network Named Entity Extraction Model;
Second word-dividing mode, for obtaining ordered word multiple to be measured by text string participle to be predicted;
Second builds module, for according to priority for the described vectorial storehouse of each word to be measured inquiry to build second feature vector, described second feature vector comprises part of speech vector corresponding to term vector corresponding to word to be measured, word to be measured and entity indicia corresponding to the last word of word to be measured is vectorial;
Prediction module, for described second feature vector corresponding for each word to be measured described is inputted described neural network Named Entity Extraction Model respectively, exports the entity indicia of described word to be measured.
6. device according to claim 5, is characterized in that,
Also comprise in described first eigenvector: described sample word is close to term vector corresponding to word and described sample word is close to part of speech vector corresponding to word, and,
Also comprise in described second feature vector: described word to be measured is close to term vector corresponding to word and described word to be measured is close to part of speech vector corresponding to word.
7. device according to claim 5, is characterized in that,
Described first build module also for: when building described first eigenvector for the first sample word in ordered multiple sample words, use book character string as the last word of described first sample word, and,
Described second build module also for: when building described second feature vector for the word first to be measured in ordered word multiple to be measured, use book character string as the last word of described first word to be measured.
8. device according to claim 5, is characterized in that, in described training module, also comprises negative routine sample in the training input quantity of described neural network.
CN201510321448.8A 2015-06-12 2015-06-12 Name entity recognition method and device Active CN104899304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510321448.8A CN104899304B (en) 2015-06-12 2015-06-12 Name entity recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510321448.8A CN104899304B (en) 2015-06-12 2015-06-12 Name entity recognition method and device

Publications (2)

Publication Number Publication Date
CN104899304A true CN104899304A (en) 2015-09-09
CN104899304B CN104899304B (en) 2018-02-16

Family

ID=54031966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510321448.8A Active CN104899304B (en) 2015-06-12 2015-06-12 Name entity recognition method and device

Country Status (1)

Country Link
CN (1) CN104899304B (en)

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468780A (en) * 2015-12-18 2016-04-06 北京理工大学 Normalization method and device of product name entity in microblog text
CN105550172A (en) * 2016-01-13 2016-05-04 夏峰 Distributive text detection method and system
CN105550227A (en) * 2015-12-07 2016-05-04 中国建设银行股份有限公司 Named entity identification method and device
CN105701075A (en) * 2016-01-13 2016-06-22 夏峰 Joint detection method and system for literature
CN105701086A (en) * 2016-01-13 2016-06-22 夏峰 Method and system for detecting literature through sliding window
CN105701213A (en) * 2016-01-13 2016-06-22 夏峰 Literature comparison method and system
CN105701077A (en) * 2016-01-13 2016-06-22 夏峰 Multi-language literature detection method and system
CN105701087A (en) * 2016-01-13 2016-06-22 夏峰 Formula plagiarism detection method and system
CN106095988A (en) * 2016-06-21 2016-11-09 上海智臻智能网络科技股份有限公司 Automatic question-answering method and device
CN106202255A (en) * 2016-06-30 2016-12-07 昆明理工大学 Merge the Vietnamese name entity recognition method of physical characteristics
CN106202054A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 A kind of name entity recognition method learnt based on the degree of depth towards medical field
CN106294313A (en) * 2015-06-26 2017-01-04 微软技术许可有限责任公司 Study embeds for entity and the word of entity disambiguation
CN106557462A (en) * 2016-11-02 2017-04-05 数库(上海)科技有限公司 Name entity recognition method and system
CN106570170A (en) * 2016-11-09 2017-04-19 武汉泰迪智慧科技有限公司 Text classification and naming entity recognition integrated method and system based on depth cyclic neural network
CN106682220A (en) * 2017-01-04 2017-05-17 华南理工大学 Online traditional Chinese medicine text named entity identifying method based on deep learning
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN106815193A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and wrong word recognition methods and device
CN106844351A (en) * 2017-02-24 2017-06-13 黑龙江特士信息技术有限公司 A kind of medical institutions towards multi-data source organize class entity recognition method and device
CN106933803A (en) * 2017-02-24 2017-07-07 黑龙江特士信息技术有限公司 A kind of medical equipment class entity recognition method and device towards multi-data source
CN106933802A (en) * 2017-02-24 2017-07-07 黑龙江特士信息技术有限公司 A kind of social security class entity recognition method and device towards multi-data source
CN107122582A (en) * 2017-02-24 2017-09-01 黑龙江特士信息技术有限公司 Towards the diagnosis and treatment class entity recognition method and device of multi-data source
CN107195296A (en) * 2016-03-15 2017-09-22 阿里巴巴集团控股有限公司 A kind of audio recognition method, device, terminal and system
CN107291693A (en) * 2017-06-15 2017-10-24 广州赫炎大数据科技有限公司 A kind of semantic computation method for improving term vector model
CN107506345A (en) * 2016-06-14 2017-12-22 科大讯飞股份有限公司 The construction method and device of language model
CN107766559A (en) * 2017-11-06 2018-03-06 第四范式(北京)技术有限公司 Training method, trainer, dialogue method and the conversational system of dialog model
CN107818080A (en) * 2017-09-22 2018-03-20 新译信息科技(北京)有限公司 Term recognition methods and device
CN107832289A (en) * 2017-10-12 2018-03-23 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM CNN
CN107885721A (en) * 2017-10-12 2018-04-06 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM
CN107886943A (en) * 2017-11-21 2018-04-06 广州势必可赢网络科技有限公司 Voiceprint recognition method and device
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN107967251A (en) * 2017-10-12 2018-04-27 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi-LSTM-CNN
CN108074565A (en) * 2016-11-11 2018-05-25 上海诺悦智能科技有限公司 Phonetic order redirects the method and system performed with detailed instructions
CN108228682A (en) * 2016-12-21 2018-06-29 财团法人工业技术研究院 Character string verification method, character string expansion method and verification model training method
CN108428137A (en) * 2017-02-14 2018-08-21 阿里巴巴集团控股有限公司 Generate the method and device of abbreviation, verification electronic banking rightness of business
CN108920457A (en) * 2018-06-15 2018-11-30 腾讯大地通途(北京)科技有限公司 Address Recognition method and apparatus and storage medium
CN109101481A (en) * 2018-06-25 2018-12-28 北京奇艺世纪科技有限公司 A kind of name entity recognition method, device and electronic equipment
CN109657230A (en) * 2018-11-06 2019-04-19 众安信息技术服务有限公司 Merge the name entity recognition method and device of term vector and part of speech vector
CN110083820A (en) * 2018-01-26 2019-08-02 普天信息技术有限公司 A kind of improved method and device of benchmark participle model
CN110162772A (en) * 2018-12-13 2019-08-23 北京三快在线科技有限公司 Name entity recognition method and device
RU2699687C1 (en) * 2018-06-18 2019-09-09 Общество с ограниченной ответственностью "Аби Продакшн" Detecting text fields using neural networks
CN110276066A (en) * 2018-03-16 2019-09-24 北京国双科技有限公司 The analysis method and relevant apparatus of entity associated relationship
CN110309515A (en) * 2019-07-10 2019-10-08 北京奇艺世纪科技有限公司 Entity recognition method and device
CN111079418A (en) * 2019-11-06 2020-04-28 科大讯飞股份有限公司 Named body recognition method and device, electronic equipment and storage medium
CN111444720A (en) * 2020-03-30 2020-07-24 华南理工大学 Named entity recognition method for English text
CN113408273A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Entity recognition model training and entity recognition method and device
US11675978B2 (en) 2021-01-06 2023-06-13 International Business Machines Corporation Entity recognition based on multi-task learning and self-consistent verification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050209844A1 (en) * 2004-03-16 2005-09-22 Google Inc., A Delaware Corporation Systems and methods for translating chinese pinyin to chinese characters
US7171350B2 (en) * 2002-05-03 2007-01-30 Industrial Technology Research Institute Method for named-entity recognition and verification
CN101075228A (en) * 2006-05-15 2007-11-21 松下电器产业株式会社 Method and apparatus for named entity recognition in natural language
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7171350B2 (en) * 2002-05-03 2007-01-30 Industrial Technology Research Institute Method for named-entity recognition and verification
US20050209844A1 (en) * 2004-03-16 2005-09-22 Google Inc., A Delaware Corporation Systems and methods for translating chinese pinyin to chinese characters
CN101075228A (en) * 2006-05-15 2007-11-21 松下电器产业株式会社 Method and apparatus for named entity recognition in natural language
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姚霖等: "词边界字向量的中文命名实体识别", 《智能***学报》 *
毕海滨等: "基于语义与SVM的中文实体关系抽取", 《第18届全国信息存储技术学术会议论文集》 *

Cited By (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294313A (en) * 2015-06-26 2017-01-04 微软技术许可有限责任公司 Study embeds for entity and the word of entity disambiguation
CN106815193A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and wrong word recognition methods and device
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN105550227A (en) * 2015-12-07 2016-05-04 中国建设银行股份有限公司 Named entity identification method and device
CN105468780B (en) * 2015-12-18 2019-01-29 北京理工大学 The normalization method and device of ProductName entity in a kind of microblogging text
CN105468780A (en) * 2015-12-18 2016-04-06 北京理工大学 Normalization method and device of product name entity in microblog text
CN105701077B (en) * 2016-01-13 2018-04-13 夏峰 A kind of multilingual literature detection method and system
CN105701087A (en) * 2016-01-13 2016-06-22 夏峰 Formula plagiarism detection method and system
CN105701077A (en) * 2016-01-13 2016-06-22 夏峰 Multi-language literature detection method and system
CN105550172B (en) * 2016-01-13 2018-06-01 夏峰 A kind of distributed text detection method and system
CN105701086B (en) * 2016-01-13 2018-06-01 夏峰 A kind of sliding window document detection method and system
CN105701213A (en) * 2016-01-13 2016-06-22 夏峰 Literature comparison method and system
CN105701087B (en) * 2016-01-13 2018-03-16 夏峰 A kind of formula plagiarizes detection method and system
CN105701075B (en) * 2016-01-13 2018-04-13 夏峰 A kind of document associated detecting method and system
CN105701213B (en) * 2016-01-13 2018-12-28 夏峰 A kind of document control methods and system
CN105701086A (en) * 2016-01-13 2016-06-22 夏峰 Method and system for detecting literature through sliding window
CN105701075A (en) * 2016-01-13 2016-06-22 夏峰 Joint detection method and system for literature
CN105550172A (en) * 2016-01-13 2016-05-04 夏峰 Distributive text detection method and system
CN107195296A (en) * 2016-03-15 2017-09-22 阿里巴巴集团控股有限公司 A kind of audio recognition method, device, terminal and system
CN107506345A (en) * 2016-06-14 2017-12-22 科大讯飞股份有限公司 The construction method and device of language model
CN106095988A (en) * 2016-06-21 2016-11-09 上海智臻智能网络科技股份有限公司 Automatic question-answering method and device
CN106202255A (en) * 2016-06-30 2016-12-07 昆明理工大学 Merge the Vietnamese name entity recognition method of physical characteristics
CN106202054B (en) * 2016-07-25 2018-12-14 哈尔滨工业大学 A kind of name entity recognition method towards medical field based on deep learning
CN106202054A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 A kind of name entity recognition method learnt based on the degree of depth towards medical field
CN106557462A (en) * 2016-11-02 2017-04-05 数库(上海)科技有限公司 Name entity recognition method and system
CN106570170A (en) * 2016-11-09 2017-04-19 武汉泰迪智慧科技有限公司 Text classification and naming entity recognition integrated method and system based on depth cyclic neural network
CN108074565A (en) * 2016-11-11 2018-05-25 上海诺悦智能科技有限公司 Phonetic order redirects the method and system performed with detailed instructions
CN108228682B (en) * 2016-12-21 2020-09-29 财团法人工业技术研究院 Character string verification method, character string expansion method and verification model training method
CN108228682A (en) * 2016-12-21 2018-06-29 财团法人工业技术研究院 Character string verification method, character string expansion method and verification model training method
CN106682220A (en) * 2017-01-04 2017-05-17 华南理工大学 Online traditional Chinese medicine text named entity identifying method based on deep learning
CN108428137A (en) * 2017-02-14 2018-08-21 阿里巴巴集团控股有限公司 Generate the method and device of abbreviation, verification electronic banking rightness of business
CN106844351A (en) * 2017-02-24 2017-06-13 黑龙江特士信息技术有限公司 A kind of medical institutions towards multi-data source organize class entity recognition method and device
CN107122582A (en) * 2017-02-24 2017-09-01 黑龙江特士信息技术有限公司 Towards the diagnosis and treatment class entity recognition method and device of multi-data source
CN106933803A (en) * 2017-02-24 2017-07-07 黑龙江特士信息技术有限公司 A kind of medical equipment class entity recognition method and device towards multi-data source
CN106933803B (en) * 2017-02-24 2020-02-21 黑龙江特士信息技术有限公司 Medical equipment type entity identification method and device oriented to multiple data sources
CN106844351B (en) * 2017-02-24 2020-02-21 易保互联医疗信息科技(北京)有限公司 Medical institution organization entity identification method and device oriented to multiple data sources
CN106933802B (en) * 2017-02-24 2020-02-21 黑龙江特士信息技术有限公司 Multi-data-source-oriented social security entity identification method and device
CN107122582B (en) * 2017-02-24 2019-12-06 黑龙江特士信息技术有限公司 diagnosis and treatment entity identification method and device facing multiple data sources
CN106933802A (en) * 2017-02-24 2017-07-07 黑龙江特士信息技术有限公司 A kind of social security class entity recognition method and device towards multi-data source
CN107291693B (en) * 2017-06-15 2021-01-12 广州赫炎大数据科技有限公司 Semantic calculation method for improved word vector model
CN107291693A (en) * 2017-06-15 2017-10-24 广州赫炎大数据科技有限公司 A kind of semantic computation method for improving term vector model
CN107818080A (en) * 2017-09-22 2018-03-20 新译信息科技(北京)有限公司 Term recognition methods and device
CN107967251A (en) * 2017-10-12 2018-04-27 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi-LSTM-CNN
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN107832289A (en) * 2017-10-12 2018-03-23 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM CNN
CN107885721A (en) * 2017-10-12 2018-04-06 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM
CN107766559A (en) * 2017-11-06 2018-03-06 第四范式(北京)技术有限公司 Training method, trainer, dialogue method and the conversational system of dialog model
CN107766559B (en) * 2017-11-06 2019-12-13 第四范式(北京)技术有限公司 training method, training device, dialogue method and dialogue system for dialogue model
CN107886943A (en) * 2017-11-21 2018-04-06 广州势必可赢网络科技有限公司 Voiceprint recognition method and device
CN110083820A (en) * 2018-01-26 2019-08-02 普天信息技术有限公司 A kind of improved method and device of benchmark participle model
CN110083820B (en) * 2018-01-26 2023-06-27 普天信息技术有限公司 Improvement method and device of benchmark word segmentation model
CN110276066B (en) * 2018-03-16 2021-07-27 北京国双科技有限公司 Entity association relation analysis method and related device
CN110276066A (en) * 2018-03-16 2019-09-24 北京国双科技有限公司 The analysis method and relevant apparatus of entity associated relationship
CN108920457A (en) * 2018-06-15 2018-11-30 腾讯大地通途(北京)科技有限公司 Address Recognition method and apparatus and storage medium
RU2699687C1 (en) * 2018-06-18 2019-09-09 Общество с ограниченной ответственностью "Аби Продакшн" Detecting text fields using neural networks
CN109101481A (en) * 2018-06-25 2018-12-28 北京奇艺世纪科技有限公司 A kind of name entity recognition method, device and electronic equipment
CN109101481B (en) * 2018-06-25 2022-07-22 北京奇艺世纪科技有限公司 Named entity identification method and device and electronic equipment
CN109657230A (en) * 2018-11-06 2019-04-19 众安信息技术服务有限公司 Merge the name entity recognition method and device of term vector and part of speech vector
CN110162772B (en) * 2018-12-13 2020-06-26 北京三快在线科技有限公司 Named entity identification method and device
CN110162772A (en) * 2018-12-13 2019-08-23 北京三快在线科技有限公司 Name entity recognition method and device
CN110309515A (en) * 2019-07-10 2019-10-08 北京奇艺世纪科技有限公司 Entity recognition method and device
CN110309515B (en) * 2019-07-10 2023-08-11 北京奇艺世纪科技有限公司 Entity identification method and device
CN111079418B (en) * 2019-11-06 2023-12-05 科大讯飞股份有限公司 Named entity recognition method, device, electronic equipment and storage medium
CN111079418A (en) * 2019-11-06 2020-04-28 科大讯飞股份有限公司 Named body recognition method and device, electronic equipment and storage medium
CN111444720A (en) * 2020-03-30 2020-07-24 华南理工大学 Named entity recognition method for English text
US11675978B2 (en) 2021-01-06 2023-06-13 International Business Machines Corporation Entity recognition based on multi-task learning and self-consistent verification
CN113408273B (en) * 2021-06-30 2022-08-23 北京百度网讯科技有限公司 Training method and device of text entity recognition model and text entity recognition method and device
CN113408273A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Entity recognition model training and entity recognition method and device

Also Published As

Publication number Publication date
CN104899304B (en) 2018-02-16

Similar Documents

Publication Publication Date Title
CN104899304A (en) Named entity identification method and device
CN111125331B (en) Semantic recognition method, semantic recognition device, electronic equipment and computer readable storage medium
CN109685056A (en) Obtain the method and device of document information
CN110427623A (en) Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium
CN111666427B (en) Entity relationship joint extraction method, device, equipment and medium
CN104615589A (en) Named-entity recognition model training method and named-entity recognition method and device
CN111563384B (en) Evaluation object identification method and device for E-commerce products and storage medium
CN111475617A (en) Event body extraction method and device and storage medium
CN112800239B (en) Training method of intention recognition model, and intention recognition method and device
CN112650858B (en) Emergency assistance information acquisition method and device, computer equipment and medium
CN112990035B (en) Text recognition method, device, equipment and storage medium
WO2021027125A1 (en) Sequence labeling method and apparatus, computer device and storage medium
CN111160041B (en) Semantic understanding method and device, electronic equipment and storage medium
CN110110213B (en) Method and device for mining user occupation, computer readable storage medium and terminal equipment
CN109408802A (en) A kind of method, system and storage medium promoting sentence vector semanteme
CN109508458A (en) The recognition methods of legal entity and device
CN113449528B (en) Address element extraction method and device, computer equipment and storage medium
CN109308311A (en) A kind of multi-source heterogeneous data fusion system
CN113723077B (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN114240672A (en) Method for identifying green asset proportion and related product
CN116644148A (en) Keyword recognition method and device, electronic equipment and storage medium
CN113032523B (en) Extraction method and device of triple information, electronic equipment and storage medium
CN114936271A (en) Method, apparatus and medium for natural language translation database query
CN111708819B (en) Method, apparatus, electronic device, and storage medium for information processing
CN110019829A (en) Data attribute determines method, apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant