CN109902303A - A kind of entity recognition method and relevant device - Google Patents
A kind of entity recognition method and relevant device Download PDFInfo
- Publication number
- CN109902303A CN109902303A CN201910158600.3A CN201910158600A CN109902303A CN 109902303 A CN109902303 A CN 109902303A CN 201910158600 A CN201910158600 A CN 201910158600A CN 109902303 A CN109902303 A CN 109902303A
- Authority
- CN
- China
- Prior art keywords
- mark
- entity
- corpus
- path
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Character Discrimination (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of entity recognition method and relevant devices, comprising: obtains a plurality of mark corpus first, every mark corpus carries markup information in a plurality of mark corpus;Hypergraph model is established then according to preset entity mark rule;Then the corresponding mark path profile of every mark corpus is determined according to markup information and entity mark rule and training pattern is waited for according to hypergraph model and preset Establishment of Neural Model;Finally by mark path profile input to be trained in training pattern, entity recognition model is obtained, and according to entity recognition model, at least one of identification input corpus name entity.Using the embodiment of the present invention, the entity of nested structure can be effectively identified, to improve the accuracy of Entity recognition and entity extraction.
Description
Technical field
The present invention relates to technical field of information processing more particularly to a kind of entity recognition methods and relevant device.
Background technique
In the information explosion epoch, required information how is fast and effeciently extracted from mass data as hot research
Project, and thus caused the research to natural language processing.All the time, entity extracts task in natural language processing field
By extensive concern, under it is the previous step of many natural language processing tasks, therefore its performance also directly affects
Swim the performance, such as entity connection, entity relationship classification, knowledge mapping reasoning etc. of natural language processing task.Wherein, entity is
Entity is named, it refers to name, mechanism name, place name and other all entities with entitled mark in natural language, more
Extensive entity can also include number, date, currency, address etc..In entity extraction task, it may occur that entity weight
The folded phenomenon nested with entity, as shown in fig. 1 on the left-hand side, character string X1X2X3 is labeled as name entity (PER), character string X2X3X4
It is labeled as place name entity (GPE), the two overlaps (X2X3).For another example shown in the right side Fig. 1, character string X1X2 is labeled as PER,
Character string X1X2X3X4 is labeled as GPE, and X1X2 character string is the substring of X1X2X3X4 character string, belongs to nested structure.Currently, needle
Task is extracted to entity, the extraction model of mainstream is conditional random field models and neural network-conditional random field models, such mould
Type can not directly handle nested structure, and nested Entity recognition, but multiple models can only be completed in such a way that multiple models are superimposed
The mode of superposition again will be mutually indepedent because of each conditional random field models, and can not effectively capture the dependence between entity, lead
It is low that the performance of cause Entity recognition is poor, entity extracts accuracy rate.
Summary of the invention
The present invention provides a kind of entity recognition method and relevant device, can effectively identify the entity of nested structure, thus
Improve the accuracy of Entity recognition and entity extraction.
In a first aspect, the embodiment of the invention provides a kind of entity recognition methods, comprising:
A plurality of mark corpus is obtained, every mark corpus carries markup information in a plurality of mark corpus;
Hypergraph model is established according to preset entity mark rule;
According to the markup information and entity mark rule, the corresponding mark path of every mark corpus is determined
Figure;
According to the hypergraph model and preset neural network model, establish to training pattern;
Mark path profile input is described to be trained in training pattern, obtain entity recognition model;
According to the entity recognition model, at least one of identification input corpus name entity.
Second aspect, the embodiment of the invention provides a kind of entity recognition devices, comprising:
Module is obtained, for obtaining a plurality of mark corpus, every mark corpus carries mark in a plurality of mark corpus
Information;
Modeling module, for establishing hypergraph model according to preset entity mark rule;
Labeling module determines every mark corpus for marking rule according to the markup information and the entity
Corresponding mark path profile;
The modeling module is also used to establish mould to be trained according to the hypergraph model and preset neural network model
Type;
Training module, for, to be trained in training pattern, entity knowledge will to be obtained described in mark path profile input
Other model;
Identification module, for according to the entity recognition model, at least one of identification input corpus name entity.
The third aspect, the embodiment of the invention provides a kind of Entity recognition equipment, comprising: processor, memory and communication
Bus, wherein for realizing connection communication between processor and memory, processor executes to be stored in memory communication bus
The step in a kind of entity recognition method that program provides for realizing above-mentioned first aspect.
In a possible design, Entity recognition equipment provided by the invention be may include for executing in the above method
The corresponding module of behavior.Module can be software and/or hardware.
The another aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage
A plurality of instruction is stored in medium, described instruction is suitable for being loaded as processor and executing method described in above-mentioned various aspects.
The another aspect of the embodiment of the present invention provides a kind of computer program product comprising instruction, when it is in computer
When upper operation, so that computer executes method described in above-mentioned various aspects.
Implement the embodiment of the present invention, obtains a plurality of mark corpus, every mark corpus in a plurality of mark corpus first
Carry markup information;Hypergraph model is established then according to preset entity mark rule;Then according to markup information and entity mark
Note rule, determines the corresponding mark path profile of every mark corpus and according to hypergraph model and preset neural network model,
It establishes to training pattern;Mark path profile input is finally obtained into entity recognition model, and root to be trained in training pattern
According to entity recognition model, at least one of identification input corpus name entity.The entity that can effectively identify nested structure, from
And improve the accuracy of Entity recognition and entity extraction.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly or in background technique below will be implemented the present invention
Attached drawing needed in example or background technique is illustrated.
Fig. 1 is the schematic diagram of the nested entity that background technique provides and overlapping entity;
Fig. 2 is a kind of structural schematic diagram of information extraction system provided in an embodiment of the present invention;
Fig. 3 is a kind of flow diagram of entity recognition method provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of full connection hypergraph model provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram for marking path profile provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram to training pattern provided in an embodiment of the present invention;
Fig. 7 is the flow diagram of another entity recognition method provided in an embodiment of the present invention;
Fig. 8 is a kind of structural schematic diagram of entity recognition device provided in an embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of Entity recognition equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Fig. 2 is referred to, Fig. 2 is a kind of structural schematic diagram of information extraction system provided in an embodiment of the present invention.The information
It include information processing equipment, database and other equipment in extraction system.Wherein, information processing equipment can be computer, hand
Machine and server (such as database server, file server) store a large amount of voice messaging and text in database
Information etc., the database can be the local data base of information processing equipment, and being also possible to other allows the information processing equipment
The database of access.Information processing equipment can obtain information also from database and can receive the information of other equipment transmission,
And entity extraction is carried out to the information that obtains or receive, so as to execute subsequent information handling task (such as knowledge mapping reasoning,
Entity connection and entity relationship classification) or to other equipment pushed information etc., wherein it is mechanical, electrical that other equipment are also possible to hand
Brain and server etc..Information processing equipment can identify first obtain or the information that receives in name entity, then from identification
To a variety of name entities in extract required name entity.Based on above system, the embodiment of the invention provides following entities
Recognition methods.
Fig. 3 is referred to, Fig. 3 is a kind of flow diagram of entity recognition method provided in an embodiment of the present invention, this method
Including but not limited to following steps:
S301, obtains a plurality of mark corpus, and a plurality of mark corpus carries markup information.
In the specific implementation, mark corpus can be the sentence of random length, such as " He talked to the
U.S.president ", " office worker that Zhang San is XXX company " etc..The markup information of every mark corpus includes in the mark corpus
The mark label of each word/word.It whether is assured that in corpus according to mark label comprising naming entity and being included
Name entity type, in order to below the terseness of description will name entity be referred to as entity.Entity type can with but it is unlimited
In including name entity and place name entity, mark label can be number, character and character string, wherein the mark of name entity
Label can be, but not limited to can be, but not limited to for the mark label of PER, place name entity as GPE.For example, mark corpus " He
The mark situation of talked to the U.S.president " is as follows:
Wherein, " He " is name entity, and corresponding mark label is PER;" talked " and " to " is neither name entity
It is also not place name entity, corresponding mark label is " O ";" the U.S.president " is name entity, corresponding mark mark
Label are PER;And " U.S. " is place name entity, corresponding mark label is " GPE ".
S302 establishes hypergraph model according to preset entity mark rule.
In the specific implementation, hypergraph model is the extensive of traditional graph model, each edge in hypergraph model can connect two
And more than two nodes, wherein more than two nodes are connected in hypergraph model while commonly referred to as super.And in traditional artwork
In type, each edge can only at most connect two nodes, and in traditional condition random field graph model, each edge can only be connected
One node.
In order to mark nested entity, the embodiment of the present invention proposes a kind of special hypergraph model, includes in the hypergraph model
Multiple father nodes, each father node correspond to a plurality of types of child nodes.Input word on the corresponding time step of each father node,
Wherein, include the sentence of N number of word/word for one, one of word/word can be inputted on each time step, then often
Word/the word inputted on a time step is the input word on the time step, and a time step can be regarded as a default length
The period of degree, such as 0.1 millisecond.Wherein, each father node can be, but not limited to the child node of corresponding seven seed types, in the present invention
A is used in embodimentkIndicate the corresponding father node of k-th of time step, then:
AkIndicate the entity since k location or later position, described k location/position k is understood that herein and below
For position of the input word in the corpus where the input word on k-th of time step;
AkThe first seed type child node EkIndicate that left margin is located at the entity of k location;
AkSecond of type child nodeIndicate the entity that left margin is located at k location and type is j;
AkThird seed type child nodeIndicate the entity that the type since k location is j;
AkThe 4th seed type child nodeIndicate covering position k and type for the entity of j;
AkThe 5th seed type child nodeIndicate the entity for ending at k and type is j;
AkThe 6th seed type child nodeIndicate entity that there is unit length at the k of position and that type is j,
In, unit length indicates the beginning and end of the entity all in same position;
AkThe 7th seed type child node X presentation-entity end.
Wherein it is possible to count the length of every mark corpus in the mark corpus got first, wherein mark corpus
Length is identical as word/word number that the mark corpus is included.For example, " He talked to the
It include 6 words in U.S.president ", then length is 6, includes 7 words, then length in " office worker that Zhang San is XXX company "
It is 7;Then it can be, but not limited to the number for determining the father node in hypergraph model according to extreme length, that is, determine hypergraph model
Time extended length, such as extreme length are 10, it is determined that the number of father node is 10.Then it marks and advises according to preset entity
A plurality of types of child nodes and the multiple father nodes of connection are then connected, include the first father node and the in the multiple father node
Two father nodes, the first father node and the second father node are two adjacent father nodes of time step, wherein if the first father node is
Ak, then the second father node is Ak+1, i.e. the first father node is the father node on k-th of time step, and the second father node is kth+1
Father node on time step, then:
1、AkIt can connect Ak+1And Ek。
2、EkIt may be coupled toWherein, since the type of entity has m kind, m is the positive integer not less than 1, i.e.,Packet
Include Tk 1、Tk 2、…、Tk m, EkA super side and T can be passed throughk 1、Tk 2、…、Tk mIn it is any n connection, n be just no more than m
Integer.Wherein, Tk+1 1、Tk+1 2、…、Tk+1 mWith Tk 1、Tk 2、…、Tk mIt is identical.
3、It may be coupled to (1)The entity for indicating that type is j at the k of position has unit length;(2)
It indicates to have type at the k of position for the entity of j and next position will be proceeded to;(3)WithIt indicates (1) and (2)
Situation all occurs that there are nest relations between position k, i.e., the entity of two same types.
4、It may be coupled to (1) Ik+1 j, indicate that type is that the entity of j will continue since the k of position on the k+1 of position;
(2)Lk+1 j, indicate that type is that the entity of j terminates since the k of position, and in position k+1;(3)Ik+1 jAnd Ik+1 j, indicate (1) and
(2) situation all occurs in position k.
5、Uk jNode can only connect to X node, because the entity beginning and end of unit length is all in identical position.
6、It may be coupled to (1) Ik+1 j, indicate that there are the entity covering position k and k+1 that a type is j;(2)
Lk+1 j, indicate that there are the entity covering position k that a type is j, and terminate at the k+1 of position, (3) Ik+1 jAnd Lk+1 j, indicate
(1) all occur with (2) situation in position k.
Hypergraph model in the embodiment of the present invention is full connection hypergraph model.It is advised wherein it is possible to be connected according to above-mentioned node
Full connection hypergraph model as shown in Figure 4 is then established, it includes included by above-mentioned node concatenate rule that this connects hypergraph model entirely
All possible connection type between node, wherein identical line style indicates same side.It should be noted that for difference
Father node A, meaning expressed by each type of child node corresponding to them be all it is identical, subscript k and k+1 are only used
Come two different nodes for distinguishing same type, for example,And Ik+1 jEssence is the child node of the 4th seed type,And Ik+1 j
Connection is equivalent to the child node that two belong to the 4th seed type and is connected with each other.Therefore in Fig. 4, subscript k and k+1 is omitted.
In full connection hypergraph model as shown in Figure 4, figure composed by each father node and its corresponding child node is
A kind of special hypergraph (hypergraph is the subgraph in full connection hypergraph model), wherein multiple child nodes, which can form one, to be had
The child list of sequence, the hypergraph turn to the conditional probability model of the corresponding possible output sequence s of list entries x
Wherein, x can be word sequence, and s can be the sequence of the mark label composition of each word.Wherein G (x, s) is feature
Function, W are weight vectors.Z (x) is normalization factor of all possible side s on x.Then in order to find optimal partitioning algorithm,
Enable ajIndicate the optimal segmentation end point of j-th of input, (m, y) is indicated on m-th of position and had label y, then ajIt can pass
It is calculated as with returning
aj=max ψ (j-1, y)+aj-1 (2)
Wherein, ψ (j-1, y) is that the characteristic value defined on side s=(j-1, y) can be rapidly found out by above equation
Optimal sequence of partitions in hypergraph model, the i.e. maximum sequence of partitions of characteristic value.
S303 marks rule according to the markup information and the entity, determines the corresponding mark of every mark corpus
Infuse path profile.
In the specific implementation, being labeled for each word/word in mark corpus, the unique of each word/word is obtained
Path is marked, the mark path of multiple words is combined into constituted hypergraph as marking the corresponding mark path profile of corpus,
In, mark path profile is a subgraph of full connection hypergraph model.
For example, the first is place name entity as shown in fig. 5, it is assumed that entity type includes two kinds altogether, second real for name
Body according to markup information, determines that " He " is one then in mark corpus " He talked to the U.S.president "
Name entity, can be used U2Node (child node of above-mentioned 6th seed type) is labeled, " the U.S.president "
For a name entity, multiple I can be used2Node (child node of above-mentioned 4th seed type) and B2Node (it is above-mentioned the third
The child node of type) it marks, while in " the U.S.president ", " U.S. " is a place name entity, can use U1
Node is labeled, remaining word is non-physical, its corresponding father node and related child node can be directly connected to X node,
Indicate that there is no entities at this.
S304 is established according to the hypergraph model and preset neural network model to training pattern.
In the specific implementation, as shown in fig. 6, can be using hypergraph model and neural network model as to training pattern
Two logical layers, establish the super graphic layer structure of neural net layer-to training pattern.Wherein, preset neural network model can be with
But be not limited to two-way shot and long term Memory Neural Networks (Bidirectional Long-Short Time Memory,
BiLSTM)。
S305, mark path profile input is described to be trained in training pattern, obtain entity recognition model.
In the specific implementation, hypergraph model includes a plurality of mark path of every mark corpus, a plurality of mark path is
The all possible mark path of every mark corpus.The corresponding mark path profile of every mark corpus includes a plurality of mark
Target in path marks path, which marks the most reasonable mark path that path is the mark corpus.In addition, wait train
Model includes multiple training parameters, and the multiple training parameter can be, but not limited to as the corresponding weight of each edge in hypergraph model
Coefficient and/or state-transition matrix.Wherein it is possible to arbitrary initial is carried out to the multiple training parameter first, then according to
Hypergraph model and neural network model calculate the score in every mark path in a plurality of mark path, and according to the score tune
Whole the multiple training parameter, until the target mark path score in a plurality of mark path highest.
S306, according to the entity recognition model, at least one of identification input corpus name entity.
In the specific implementation, input corpus can be inputted entity recognition model, then entity recognition model will export the input
The corresponding mark path profile similar with Fig. 5 of corpus, so as to determine the mark path of the input corpus first, then according to
Mark path determines the corresponding mark label of the input corpus, including the mark mark of each character/word in the input corpus
Label identify at least one name entity then according to mark label.For example, the mark label of " U.S. " is in input corpus
GPE, it is determined that U.S. is place name entity.
Optionally, can after identifying input at least one of corpus name entity according to the entity recognition model,
The selection instruction of user's input is received, which carries entity type information, then from least one described name entity
The name entity that middle extraction matches with entity type information.For example, the entity type information carried in selection instruction is PER,
All name entities in input corpus can then be extracted.
In embodiments of the present invention, a plurality of mark corpus, every mark corpus in a plurality of mark corpus are obtained first
Carry markup information;Hypergraph model is established then according to preset entity mark rule;Then according to markup information and entity mark
Note rule, determines the corresponding mark path profile of every mark corpus and according to hypergraph model and preset neural network model
It establishes to training pattern;Mark path profile input is finally obtained into entity recognition model, and root to be trained in training pattern
According to entity recognition model, at least one of identification input corpus name entity.The entity that can effectively identify nested structure, from
And improve the accuracy of Entity recognition and entity extraction.
Fig. 7 is referred to, Fig. 7 is the flow diagram of another entity recognition method provided in an embodiment of the present invention, the party
Method includes but is not limited to following steps:
S701, obtains a plurality of mark corpus, and a plurality of mark corpus carries markup information.This step and a upper embodiment
In S301 it is identical, this step repeats no more.
S702 establishes hypergraph model according to preset entity mark rule.This step and the S302 phase in a upper embodiment
Together, this step repeats no more.
S703 marks rule according to the markup information and the entity, determines the corresponding mark of every mark corpus
Infuse path profile.This step is identical as the S303 in a upper embodiment, this step repeats no more.
S704 is established according to the hypergraph model and preset neural network model to training pattern.This step and upper one
S304 in embodiment is identical, this step repeats no more.
S705, mark path profile input is described to be trained in training pattern.
In the specific implementation, hypergraph model includes a plurality of mark path of every mark corpus, a plurality of mark path is
Every all possible mark path of mark corpus.The corresponding mark path profile of every mark corpus includes a plurality of mark road
Target in diameter marks path, which marks the most reasonable mark path that path is the mark corpus.In addition, mould to be trained
Type includes multiple training parameters, and the multiple training parameter can be, but not limited to as the corresponding weight system of each edge in hypergraph model
Several and/or state-transition matrix can carry out the multiple training parameter any before treating training pattern and being trained
Initialization.
It is possible, firstly, to determine first spy in every mark path in a plurality of mark path according to neural network model
Levy score, and according to hypergraph model determine every mark path second feature score, wherein can be, but not limited to combine to
Before-backward algorithm and greatest hope (Expectation Maximization Algori-thm, EM) algorithm calculate feature point
Number;Then the score in path, every mark road are marked using the sum of fisrt feature score and second feature score as described every
The score of diameter can be used to measure the reasonability with the mark path.
In order to improve the accuracy for the score for marking path, feature scores can be calculated by different characteristic dimension.Its
In, corresponding at least one first language material feature of hypergraph model, corresponding at least one second language material feature of neural network model, therefore
The fisrt feature point in every mark path can be determined first according to every kind of first language material feature at least one first language material feature
Magnitude and according to every kind of second language material feature at least one second language material feature, determines that the second of every mark path is special
Levy component value, then using the sum of fisrt feature component value as fisrt feature score and by second feature component value and conduct
Second feature score.Wherein, the first language material feature can be special by the context of forward-backward algorithm capture mark corpus
Sign, the second language material feature can include but is not limited to state transfer characteristic, word feature (window size can be 3), language mould
Formula feature (n-gram feature), part of speech label characteristics (window size can be 3), bag of words feature (window size can be 5) and
At least one of in word mode feature, wherein n-gram feature may include word n-grams feature and part of speech n-grams
Feature, n can be 2,3,4.Word mode may include full capitalization, digital, full alphanumeric, comprising number, comprising point, packet
Containing at least one in hyphen, initial caps, lonely initial, punctuation mark, Roman number, monocase and URL.State
Word/word that transfer characteristic is used to describe in mark corpus is changed into the general of another type of entity by a type of entity
Rate (alternatively score).
Such as: the first language material feature is contextual feature, and the second language material feature is state transfer characteristic, then for mark language
Expect [x]T1, [f can be usedθ]i,tIndicate [x] being calculated according to neural network modelTIn t-th of word be noted as the i-th type
The score of the entity of type, [x]TIn share T word, to obtain an eigenmatrix [f according to the score of each word/wordθ],
Wherein, θ is the setting parameter of neural network.State transfer characteristic matrix used in hypergraph model is [A], [A]I, jIndicate from
The score that i-th of state is converted to j-th, what needs to be explained here is that, turn for used in each word/word in corpus
It is all the same to move eigenmatrix.Finally, corpus [x] is markedTSequence label [i]TThe feature scores s in corresponding mark path
([x]T,[i]T, θ) and it can be calculated by (3) formula.And when the first language material feature and the second language material feature separately include multiple,
The corresponding feature scores matrix of every kind of language material feature is successively added according to (3) formula.Wherein it is possible to mark corpus " He
Illustrate sequence label for talked to the U.S.president " and mark the relationship in path, in the mark corpus
" He ", " talked ", " to ", " the U.S.president " and " U.S. " mark label be respectively PER, O, O, PER and
GPE, then sequence label is PER, O, O, PER, GPE, and the corresponding mark path profile of the sequence label is as figure 5 illustrates.
For another example, mark includes 4 word x in corpus X1、x2、x3And x4.Default entity type includes 3 kinds of a in total1、a2And a3。
It is W according to the eigenmatrix that neural network model obtains, wherein the element representation x of the i-th row jth columniBelong to jth seed type
The score of entity, then according to the feature scores in W available every mark path, such as: one of X may mark ω pairs of path
The sequence label answered is a1、a3、a2、a1, then x is obtained according to W1Belong to a1Score be 1.5, x2Belong to a3Score be 0.11, x3
Belong to a2Score be 0.002 and x4Belong to a1Score be 0.12, therefore the fisrt feature score of mark path ω is
1.5+0.11+0.002+0.12=1.772.The state-transition matrix of hypergraph model is Q, wherein the list of elements that m row n-th arranges
Show the probability (score) for being changed into n type by m seed type, is then directed to above-mentioned mark path ω, obtains x1From type a1Turn
It is changed to type a3Score be 0.1, x2From type a3It is changed to type a2Score be 0 and x3From type a2Be converted to type a1's
Score is 0.008, therefore the second feature score of mark path ω is 0.1+0+0.008=0.108, therefore marks path ω
It is scored at 1.772+0.108=1.88.
Then, the score that path is marked according to every adjusts the multiple training parameter to improve target mark road
The score of diameter.
The entity obtained in compared with the prior art based on the entity recognition model of hypergraph model, training of the embodiment of the present invention
Identification model is obtaining traditional characteristic matrix (such as state-transition matrix, the n-gram feature square in mark path using hypergraph model
Battle array etc.) on the basis of, a neural network characteristics matrix is calculated by neural network more, to calculate mark path together
Feature scores improve the accuracy that the reasonability in mark path differentiates.It is proposed in addition, the embodiment of the present invention is directed to nested entity
Special hypergraph model can effectively solve the problems, such as identify nested entity in the prior art.
S706 determines the score in the mark path profile corresponding target mark path included by the hypergraph model
It whether is top score in a plurality of mark path.If so, S707 is executed, if it is not, then continuing to execute S705.
In the specific implementation, a plurality of mark corpus that can be will acquire is divided into more parts (batch), for example, getting 1000 altogether
Item marks corpus, then is classified as 4 batch, and each batch includes 250.It therefore, can be first by one of batch
Input is trained to training pattern, if the score in target mark path is not highest in a plurality of mark path after the completion of training
Score then continues to treat training pattern being trained, to continue to adjust the instruction of gang mould type to be instructed followed by another batch
Practice parameter until target mark path is scored at the top score in a plurality of mark path.
S707 obtains entity knowledge using the corresponding training parameter of the top score as the setting parameter to training pattern
Other model.
In the specific implementation, when the score in target mark path is highest scoring in a plurality of mark path, target mark
Probability of occurrence of the path in connection hypergraph model entirely is maximum.The relevant parameter to training pattern can then be set to described to set
Entity recognition model is used as after setting parameter.
S708, according to the entity recognition model, at least one of identification input corpus name entity.This step with it is upper
S306 in one embodiment is identical, this step repeats no more.
In embodiments of the present invention, a plurality of mark corpus, every mark corpus in a plurality of mark corpus are obtained first
Carry markup information;Hypergraph model is established then according to preset entity mark rule;Then according to markup information and entity mark
Note rule, determines the corresponding mark path profile of every mark corpus and according to hypergraph model and preset neural network model,
It establishes to training pattern;Mark path profile input is finally obtained into entity recognition model, and root to be trained in training pattern
According to entity recognition model, at least one of identification input corpus name entity.The entity that can effectively identify nested structure, from
And improve the accuracy of Entity recognition and entity extraction.
It is above-mentioned to illustrate the method for the embodiment of the present invention, the relevant device of the embodiment of the present invention is provided below.
Fig. 8 is referred to, Fig. 8 is a kind of structural schematic diagram of entity recognition device provided in an embodiment of the present invention, the entity
Identification device may include:
Module 801 is obtained, for obtaining a plurality of mark corpus, a plurality of mark corpus carries markup information.
In the specific implementation, mark corpus can be the sentence of random length, such as " He talked to the
U.S.president ", " office worker that Zhang San is XXX company " etc..The markup information of every mark corpus includes in the mark corpus
The mark label of each word/word.It whether is assured that in corpus according to mark label comprising naming entity and being included
Name entity type, in order to below the terseness of description will name entity be referred to as entity.Entity type can with but it is unlimited
In including name entity and place name entity, mark label can be number, character and character string, wherein the mark of name entity
Label can be, but not limited to can be, but not limited to for the mark label of PER, place name entity as GPE.
Modeling module 802, for establishing hypergraph model according to preset entity mark rule.
In the specific implementation, hypergraph model is the extensive of traditional graph model, each edge in hypergraph model can connect two
And more than two nodes, wherein more than two nodes are connected in hypergraph model while commonly referred to as super.And in traditional artwork
In type, each edge can only at most connect two nodes, and in traditional condition random field graph model, each edge can only be connected
One node.
In order to mark nested entity, the embodiment of the present invention proposes a kind of special hypergraph model, includes in the hypergraph model
Multiple father nodes, each father node correspond to a plurality of types of child nodes.Input word on the corresponding time step of each father node,
Wherein, include the sentence of N number of word/word for one, one of word/word can be inputted on each time step, then often
Word/the word inputted on a time step is the input word on the time step, and a time step can be regarded as a default length
The period of degree, such as 0.1 millisecond.Wherein, each father node can be, but not limited to the child node of corresponding seven seed types, in the present invention
A is used in embodimentkIndicate the corresponding father node of k-th of time step, then:
AkIndicate the entity since k location or later position, described k location/position k is understood that herein and below
For position of the input word in the corpus where the input word on k-th of time step;
AkThe first seed type child node EkIndicate that left margin is located at the entity of k location;
AkSecond of type child nodeIndicate the entity that left margin is located at k location and type is j;
AkThird seed type child nodeIndicate the entity that the type since k location is j;
AkThe 4th seed type child nodeIndicate covering position k and type for the entity of j;
AkThe 5th seed type child nodeIndicate the entity for ending at k and type is j;
AkThe 6th seed type child nodeIndicate entity that there is unit length at the k of position and that type is j,
In, unit length indicates the beginning and end of the entity all in same position;
AkThe 7th seed type child node X presentation-entity end.
Wherein it is possible to count the length of every mark corpus in the mark corpus got first, wherein mark corpus
Length is identical as word/word number that the mark corpus is included.For example, " He talked to the
It include 6 words in U.S.president ", then length is 6, includes 7 words, then length in " office worker that Zhang San is XXX company "
It is 7;Then it can be, but not limited to the number for determining the father node in hypergraph model according to extreme length, that is, determine hypergraph model
Time extended length, such as extreme length are 10, it is determined that the number of father node is 10.Then it marks and advises according to preset entity
A plurality of types of child nodes and the multiple father nodes of connection are then connected, include the first father node and the in the multiple father node
Two father nodes, the first father node and the second father node are two adjacent father nodes of time step, wherein if the first father node is
Ak, then the second father node is Ak+1, i.e. the first father node is the father node on k-th of time step, and the second father node is kth+1
Father node on time step, then:
1、AkIt can connect Ak+1And Ek。
2、EkIt may be coupled toWherein, since the type of entity has m kind, m is the positive integer not less than 1, i.e.,Packet
Include Tk 1、Tk 2、…、Tk m, EkA super side and T can be passed throughk 1、Tk 2、…、Tk mIn it is any n connection, n be just no more than m
Integer.Wherein, Tk+1 1、Tk+1 2、…、Tk+1 mWith Tk 1、Tk 2、…、Tk mIt is identical.
3、It may be coupled to (1)The entity for indicating that type is j at the k of position has unit length;(2)
It indicates to have type at the k of position for the entity of j and next position will be proceeded to;(3)WithIt indicates (1) and (2)
Situation all occurs that there are nest relations between position k, i.e., the entity of two same types.
4、It may be coupled to (1) Ik+1 j, indicate that type is that the entity of j will continue since the k of position on the k+1 of position;
(2)Lk+1 j, indicate that type is that the entity of j terminates since the k of position, and in position k+1;(3)Ik+1 jAnd Ik+1 j, indicate (1) and
(2) situation all occurs in position k.
5、Node can only connect to X node, because the entity beginning and end of unit length is all in identical position.
6、It may be coupled to (1) Ik+1 j, indicate that there are the entity covering position k and k+1 that a type is j;(2)
Lk+1 j, indicate that there are the entity covering position k that a type is j, and terminate at the k+1 of position, (3) Ik+1 jAnd Lk+1 j, indicate
(1) all occur with (2) situation in position k.
Hypergraph model in the embodiment of the present invention is full connection hypergraph model.It is advised wherein it is possible to be connected according to above-mentioned node
Full connection hypergraph model as shown in Figure 4 is then established, it includes the section that above-mentioned node concatenate rule includes that this connects hypergraph model entirely
All possible connection type between point, wherein identical line style indicates same side.
Labeling module 803 determines every mark language for marking rule according to the markup information and the entity
Expect corresponding mark path profile.
In the specific implementation, being labeled for each word/word in mark corpus, the unique of each word/word is obtained
Path is marked, the mark path of multiple words is combined into constituted hypergraph as marking the corresponding mark path profile of corpus,
In, mark path profile is a subgraph of full connection hypergraph model.For example, as shown in fig. 5, it is assumed that entity type includes two altogether
Kind, the first is place name entity, and second is name entity, then in mark corpus " He talked to the
In U.S.president ", according to markup information, determines that " He " is a name entity, U can be used2Node the (the above-mentioned 6th
The child node of seed type) be labeled, " the U.S.president " is also for a name entity, multiple I can be used2Section
Point (child node of above-mentioned 4th seed type) and B2Node (child node of above-mentioned third seed type) marks, while in " the
In U.S.president ", " U.S. " is a place name entity, can use U1Node is labeled, remaining word is non-physical,
Its corresponding father node and related child node can be directly connected to X node, indicate that there is no entities at this.
Modeling module 802 is also used to establish mould to be trained according to the hypergraph model and preset neural network model
Type.
In the specific implementation, as shown in fig. 6, can be using hypergraph model and neural network model as to training pattern
Two logical layers, establish the super graphic layer structure of neural net layer-to training pattern.Wherein, preset neural network model can be with
But it is not limited to BiLSTM.
Training module 804, for, to be trained in training pattern, entity will to be obtained described in mark path profile input
Identification model.
In the specific implementation, hypergraph model includes a plurality of mark path of every mark corpus, a plurality of mark path is
Every all possible mark path of mark corpus.The corresponding mark path profile of every mark corpus includes a plurality of mark road
Target in diameter marks path, which marks the most reasonable mark path that path is the mark corpus.In addition, mould to be trained
Type includes multiple training parameters, and the multiple training parameter can be, but not limited to as the corresponding weight system of each edge in hypergraph model
Several and/or state-transition matrix can appoint the multiple training meal parameter before treating training pattern and being trained
Meaning initialization.
It is possible, firstly, to determine first spy in every mark path in a plurality of mark path according to neural network model
Levy score, and according to hypergraph model determine every mark path second feature score, wherein can be, but not limited to combine to
Before-backward algorithm and greatest hope (Expectation Maximization Algori-thm, EM) algorithm calculate feature point
Number;Then the score in path, every mark road are marked using the sum of fisrt feature score and second feature score as described every
The score of diameter can be used to measure the reasonability with the mark path.
In order to improve mark path score accuracy, feature scores can be calculated by different characteristic dimension,
In, corresponding at least one first language material feature of hypergraph model, corresponding at least one second language material feature of neural network model, therefore
The fisrt feature point in every mark path can be determined first according to every kind of first language material feature at least one first language material feature
Magnitude and according to every kind of second language material feature at least one second language material feature, determines that the second of every mark path is special
Levy component value, then using the sum of fisrt feature component value as fisrt feature score and by second feature component value and conduct
Second feature score.Wherein, the first language material feature can be special by the context of forward-backward algorithm capture mark corpus
Sign, the second language material feature can include but is not limited to state transfer characteristic, word feature (window size can be 3), language mould
Formula feature (n-gram feature), part of speech label characteristics (window size can be 3), bag of words feature (window size can be 5) and
At least one of in word mode feature, wherein n-gram feature may include word n-grams feature and part of speech n-grams
Feature, n can be 2,3,4.Word mode may include full capitalization, digital, full alphanumeric, comprising number, comprising point, packet
Containing at least one in hyphen, initial caps, lonely initial, punctuation mark, Roman number, monocase and URL.State
Word/word that transfer characteristic is used to describe in mark corpus is changed into another type of entity class by a type of entity
Probability (alternatively score).
Then, the score that path is marked according to every, adjusts the multiple training parameter, so that target mark path
The top score being divided into a plurality of mark path, and be to training pattern by the corresponding training parameter seat of the top score
Setting parameter, obtain entity recognition model.Wherein, a plurality of mark corpus that can be will acquire is divided into more parts (batch), example
Such as, 1000 mark corpus are got altogether, then are classified as 4 batch, and each batch includes 250.It therefore, can be first
One of batch is inputted and is trained to training pattern, if the score in target mark path is described more after the completion of training
It is not top score in path that item, which marks, then continues to treat training pattern being trained followed by another batch, so as to after
The continuous training parameter for adjusting gang mould type to be instructed is scored at the top score in a plurality of mark path until target mark path.
Identification module 805, for according to the entity recognition model, the name of at least one of identification input corpus to be real
Body.
In the specific implementation, input corpus can be inputted entity recognition model, then entity recognition model will export the input
The corresponding mark path profile similar with Fig. 5 of corpus, so as to determine the mark path of the input corpus first, then according to
Mark path determines the corresponding mark label of the input corpus, including the mark mark of each character/word in the input corpus
Label identify at least one name entity then according to mark label.For example, the mark label of " U.S. " is in input corpus
GPE, it is determined that U.S. is place name entity.
Optionally, device described in the embodiment of the present invention further includes abstraction module, can be used for knowing according to the entity
After at least one of other model identification input corpus name entity, the selection instruction of user's input is received, which takes
Then it is real to extract the name to match with entity type information from least one described name entity for band entity type information
Body.For example, the entity type information carried in selection instruction is PER, then all name entities in input corpus can be extracted.
In embodiments of the present invention, a plurality of mark corpus, every mark corpus in a plurality of mark corpus are obtained first
Carry markup information;Hypergraph model is established then according to preset entity mark rule;Then according to markup information and entity mark
Note rule, determines the corresponding mark path profile of every mark corpus and according to hypergraph model and preset neural network model,
It establishes to training pattern;Mark path profile input is finally obtained into entity recognition model, and root to be trained in training pattern
According to entity recognition model, at least one of identification input corpus name entity.The entity that can effectively identify nested structure, from
And improve the accuracy of Entity recognition and entity extraction.
Fig. 9 is referred to, Fig. 9 is a kind of structural schematic diagram of Entity recognition equipment provided in an embodiment of the present invention.As schemed
Show, which may include: at least one processor 901, at least one communication interface 902, at least one storage
Device 903 and at least one communication bus 904.
Wherein, processor 901 can be central processor unit, general processor, digital signal processor, dedicated integrated
Circuit, field programmable gate array or other programmable logic device, transistor logic, hardware component or it is any
Combination.It, which may be implemented or executes, combines various illustrative logic blocks, module and electricity described in the disclosure of invention
Road.The processor is also possible to realize the combination of computing function, such as combines comprising one or more microprocessors, number letter
Number processor and the combination of microprocessor etc..Communication bus 904 can be Peripheral Component Interconnect standard PCI bus or extension work
Industry normal structure eisa bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For convenient for indicate,
It is only indicated with a thick line in Fig. 9, it is not intended that an only bus or a type of bus.Communication bus 904 is used for
Realize the connection communication between these components.Wherein, the communication interface 902 of equipment is used for and other nodes in the embodiment of the present invention
Equipment carries out the communication of signaling or data.Memory 903 may include volatile memory, such as non-volatile dynamic random is deposited
Take memory (Nonvolatile Random Access Memory, NVRAM), phase change random access memory (Phase
Change RAM, PRAM), magnetic-resistance random access memory (Magetoresistive RAM, MRAM) etc., can also include non-
Volatile memory, for example, at least a disk memory, Electrical Erasable programmable read only memory (Electrically
Erasable Programmable Read-Only Memory, EEPROM), flush memory device, such as anti-or flash memory (NOR
Flash memory) or anti-and flash memory (NAND flash memory), semiconductor devices, such as solid state hard disk (Solid
State Disk, SSD) etc..Memory 903 optionally can also be that at least one is located remotely from the storage of aforementioned processor 901
Device.Batch processing code is stored in memory 903, and processor 901 executes the program in memory 903:
A plurality of mark corpus is obtained, every mark corpus carries markup information in a plurality of mark corpus;
Hypergraph model is established according to preset entity mark rule;
According to the markup information and entity mark rule, the corresponding mark path of every mark corpus is determined
Figure;
According to the hypergraph model and preset neural network model, establish to training pattern;
Mark path profile input is described to be trained in training pattern, obtain entity recognition model;
According to the entity recognition model, at least one of identification input corpus name entity.
Optionally, the hypergraph model includes multiple father nodes, and each father node correspondence in the multiple father node is more
The child node of seed type;
Processor 901 is also used to perform the following operations step:
Rule connection a plurality of types of child nodes are marked according to the entity and the multiple father node of connection obtains
To the hypergraph model.
Optionally, the multiple father node includes the first father node and the second father node;
Processor 901 is also used to perform the following operations step:
Connect the child node of the first seed type of first father node and second of type of first father node
Child node;And
Connect the child node of second of type of first father node and the third seed type of first father node
At least one of the child node of child node and the 6th seed type;And
Connect the child node of the third seed type of first father node and the 4th seed type of second father node
At least one of the child node of child node and the 5th seed type;And
Connect the child node of the 4th seed type of first father node and the 4th seed type of second father node
At least one of the child node of child node and the 5th seed type;And
Connect the child node of the 6th seed type of first father node and the child node and described first of the 5th seed type
The child node of 7th seed type of father node;And
Connect first father node and second father node.
Optionally, described to training pattern includes multiple training parameters;It include described every mark in the hypergraph model
The a plurality of mark path of corpus;It include the target mark path in a plurality of mark path in the mark path profile;
Optionally, processor 901 is also used to perform the following operations step:
According to the neural network model determine it is described it is a plurality of mark path in every mark path fisrt feature score,
And the second feature score in every mark path is determined according to the hypergraph model;
The score in path is marked using the sum of the fisrt feature score and the second feature score as described every;
The score that path is marked according to described every adjusts the multiple training parameter so that the target marks path
The top score being scored in a plurality of mark path;
Using the corresponding multiple training parameters of the top score as the setting parameter to training pattern, obtain described
Entity recognition model.
Optionally, corresponding at least one first language material feature of the hypergraph model;The neural network model is corresponding at least
A kind of second language material feature;
Processor 901 is also used to perform the following operations step:
According to every kind of first language material feature at least one first language material feature, every mark path is determined
Fisrt feature component value and according to every kind of second language material feature at least one second language material feature, determines described every
The second feature component value in item mark path;
Using the sum of the fisrt feature component value as the fisrt feature score and by the second feature component value
Sum as the second feature score.
Optionally, processor 901 is also used to perform the following operations step:
The input corpus is inputted into the entity recognition model, obtains the mark path of the input corpus;
According to the mark path, the corresponding mark label of the input corpus is determined;
According to the mark label, at least one described name entity of identification.
Optionally, processor 901 is also used to perform the following operations step:
The selection instruction of user's input is received, the selection instruction carries entity type information;
The name entity to match with the entity type information is extracted from least one described name entity.
Further, processor can also be matched with memory and communication interface, be executed real in foregoing invention embodiment
The operation of body identification device.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
Solid State Disk (SSD)) etc..
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail.All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in
Within protection scope of the present invention.
Claims (9)
1. a kind of entity recognition method, which is characterized in that the described method includes:
A plurality of mark corpus is obtained, every mark corpus carries markup information in a plurality of mark corpus;
Hypergraph model is established according to preset entity mark rule;
According to the markup information and entity mark rule, the corresponding mark path profile of every mark corpus is determined;
According to the hypergraph model and preset neural network model, establish to training pattern;
Mark path profile input is described to be trained in training pattern, obtain entity recognition model;
According to the entity recognition model, at least one of identification input corpus name entity.
2. the method as described in claim 1, which is characterized in that the hypergraph model includes multiple father nodes, the multiple father
Each father node in node corresponds to a plurality of types of child nodes;
It is described to establish hypergraph model according to preset entity mark rule and include:
Rule is marked according to the entity and connects a plurality of types of child nodes and the multiple father node of connection, is obtained
The hypergraph model.
3. method according to claim 2, which is characterized in that the multiple father node includes that the first father node and the second father save
Point;
It is described to mark rule connection a plurality of types of child nodes and the multiple father node of connection according to the entity,
Obtaining the hypergraph model includes:
The son for connecting the child node of the first seed type of first father node and second of type of first father node saves
Point;And
The son for connecting the child node of second of type of first father node and the third seed type of first father node saves
At least one of the child node of point and the 6th seed type;And
The son for connecting the child node of the third seed type of first father node and the 4th seed type of second father node saves
At least one of the child node of point and the 5th seed type;And
The son for connecting the child node of the 4th seed type of first father node and the 4th seed type of second father node saves
At least one of the child node of point and the 5th seed type;And
It connects the child node of the 6th seed type of first father node and the child node of the 5th seed type and first father saves
The child node of 7th seed type of point;And
Connect first father node and second father node.
4. the method as described in claim 1, which is characterized in that described to training pattern includes multiple training parameters;It is described super
Graph model includes a plurality of mark path of every mark corpus;The mark path profile includes in a plurality of mark path
Target mark path;
Described to be trained described in mark path profile input to training pattern, obtaining entity recognition model includes:
According to the neural network model determine it is described it is a plurality of mark path in every mark path fisrt feature score and
The second feature score in every mark path is determined according to the hypergraph model;
The score in path is marked using the sum of the fisrt feature score and the second feature score as described every;
The score that path is marked according to described every adjusts the multiple training parameter so that the target marks the score in path
For the top score in a plurality of mark path;
Using the corresponding multiple training parameters of the top score as the setting parameter to training pattern, the entity is obtained
Identification model.
5. method as claimed in claim 4, which is characterized in that corresponding at least one first language material feature of the hypergraph model;
Corresponding at least one second language material feature of the neural network model;
It is described according to the neural network model determine it is described it is a plurality of mark path in every mark path fisrt feature score,
And determine that the described every second feature score for marking path includes: according to the hypergraph model
According to every kind of first language material feature at least one first language material feature, the first of every mark path is determined
Characteristic component value and according to every kind of second language material feature at least one second language material feature, determines every mark
Infuse the second feature component value in path;
Using the sum of the fisrt feature component value as the fisrt feature score and by the sum of the second feature component value
As the second feature score.
6. method as claimed in claim 5, which is characterized in that at least one second language material feature includes that state transfer is special
At least one of in sign, word feature, language mode feature, part of speech label characteristics, bag of words feature and word mode feature.
7. the method as described in claim 1, which is characterized in that described according to the entity recognition model, identification input corpus
At least one of name entity include:
The input corpus is inputted into the entity recognition model, obtains the mark path of the input corpus;
According to the mark path, the corresponding mark label of the input corpus is determined;
According to the mark label, at least one described name entity of identification.
8. the method according to claim 1 to 7, which is characterized in that described according to the entity recognition model, identification
It inputs after at least one of corpus name entity, further includes:
The selection instruction of user's input is received, the selection instruction carries entity type information;
The name entity to match with the entity type information is extracted from least one described name entity.
9. a kind of entity recognition device, which is characterized in that described device includes:
Module is obtained, for obtaining a plurality of mark corpus, every mark corpus carries markup information in a plurality of mark corpus;
Modeling module, for establishing hypergraph model according to preset entity mark rule;
Labeling module determines that every mark corpus is corresponding for marking rule according to the markup information and the entity
Mark path profile;
The modeling module is also used to be established according to the hypergraph model and preset neural network model to training pattern;
Training module, for, to be trained in training pattern, Entity recognition mould will to be obtained described in mark path profile input
Type;
Identification module, for according to the entity recognition model, at least one of identification input corpus name entity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910158600.3A CN109902303B (en) | 2019-03-01 | 2019-03-01 | Entity identification method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910158600.3A CN109902303B (en) | 2019-03-01 | 2019-03-01 | Entity identification method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109902303A true CN109902303A (en) | 2019-06-18 |
CN109902303B CN109902303B (en) | 2023-05-26 |
Family
ID=66946183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910158600.3A Active CN109902303B (en) | 2019-03-01 | 2019-03-01 | Entity identification method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109902303B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427624A (en) * | 2019-07-30 | 2019-11-08 | 北京百度网讯科技有限公司 | Entity relation extraction method and device |
WO2021072848A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Text information extraction method and apparatus, and computer device and storage medium |
CN112861533A (en) * | 2019-11-26 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Entity word recognition method and device |
CN113033207A (en) * | 2021-04-07 | 2021-06-25 | 东北大学 | Biomedical nested type entity identification method based on layer-by-layer perception mechanism |
CN113971733A (en) * | 2021-10-29 | 2022-01-25 | 京东科技信息技术有限公司 | Model training method, classification method and device based on hypergraph structure |
WO2023035332A1 (en) * | 2021-09-08 | 2023-03-16 | 深圳前海环融联易信息科技服务有限公司 | Date extraction method and apparatus, computer device, and storage medium |
WO2023109436A1 (en) * | 2021-12-13 | 2023-06-22 | 广州大学 | Part of speech perception-based nested named entity recognition method and system, device and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040133536A1 (en) * | 2002-12-23 | 2004-07-08 | International Business Machines Corporation | Method and structure for template-based data retrieval for hypergraph entity-relation information structures |
CN106844947A (en) * | 2017-01-18 | 2017-06-13 | 清华大学 | A kind of locomotive energy saving optimizing automatic Pilot method based on high-order relational learning |
CN106951438A (en) * | 2017-02-13 | 2017-07-14 | 北京航空航天大学 | A kind of event extraction system and method towards open field |
CN107016012A (en) * | 2015-09-11 | 2017-08-04 | 谷歌公司 | Handle the failure in processing natural language querying |
US20180212996A1 (en) * | 2017-01-23 | 2018-07-26 | Cisco Technology, Inc. | Entity identification for enclave segmentation in a network |
CN108874997A (en) * | 2018-06-13 | 2018-11-23 | 广东外语外贸大学 | A kind of name name entity recognition method towards film comment |
CN108897805A (en) * | 2018-06-15 | 2018-11-27 | 江苏大学 | A kind of patent text automatic classification method |
CN109142317A (en) * | 2018-08-29 | 2019-01-04 | 厦门大学 | A kind of Raman spectrum substance recognition methods based on Random Forest model |
CN109191485A (en) * | 2018-08-29 | 2019-01-11 | 西安交通大学 | A kind of more video objects collaboration dividing method based on multilayer hypergraph model |
CN109299458A (en) * | 2018-09-12 | 2019-02-01 | 广州多益网络股份有限公司 | Entity recognition method, device, equipment and storage medium |
CN109359293A (en) * | 2018-09-13 | 2019-02-19 | 内蒙古大学 | Mongolian name entity recognition method neural network based and its identifying system |
-
2019
- 2019-03-01 CN CN201910158600.3A patent/CN109902303B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040133536A1 (en) * | 2002-12-23 | 2004-07-08 | International Business Machines Corporation | Method and structure for template-based data retrieval for hypergraph entity-relation information structures |
CN107016012A (en) * | 2015-09-11 | 2017-08-04 | 谷歌公司 | Handle the failure in processing natural language querying |
CN106844947A (en) * | 2017-01-18 | 2017-06-13 | 清华大学 | A kind of locomotive energy saving optimizing automatic Pilot method based on high-order relational learning |
US20180212996A1 (en) * | 2017-01-23 | 2018-07-26 | Cisco Technology, Inc. | Entity identification for enclave segmentation in a network |
CN106951438A (en) * | 2017-02-13 | 2017-07-14 | 北京航空航天大学 | A kind of event extraction system and method towards open field |
CN108874997A (en) * | 2018-06-13 | 2018-11-23 | 广东外语外贸大学 | A kind of name name entity recognition method towards film comment |
CN108897805A (en) * | 2018-06-15 | 2018-11-27 | 江苏大学 | A kind of patent text automatic classification method |
CN109142317A (en) * | 2018-08-29 | 2019-01-04 | 厦门大学 | A kind of Raman spectrum substance recognition methods based on Random Forest model |
CN109191485A (en) * | 2018-08-29 | 2019-01-11 | 西安交通大学 | A kind of more video objects collaboration dividing method based on multilayer hypergraph model |
CN109299458A (en) * | 2018-09-12 | 2019-02-01 | 广州多益网络股份有限公司 | Entity recognition method, device, equipment and storage medium |
CN109359293A (en) * | 2018-09-13 | 2019-02-19 | 内蒙古大学 | Mongolian name entity recognition method neural network based and its identifying system |
Non-Patent Citations (9)
Title |
---|
ALDRIAN OBAJA MUIS等: "Learning to Recognize Discontiguous Entities", 《ARXIV:1810.08579V1》 * |
ARZOO KATIYAR等: "Nested Named Entity Recognition Revisited", 《PROCEEDINGS OF NAACL-HLT 2018》 * |
BAILIN WANG 等: "Neural segmental hypergraphs for overlapping mention recognition", PROCEEDINGS OF THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING * |
JERRY CHUN-WEI LIN等: "A Bi-LSTM mention hypergraph model with encoding schema for mention extraction", 《ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE》 * |
JINGWEI ZHUO等: "Segment-Level Sequence Modeling using Gated Recursive Semi-Markov Conditional Random Fields", 《PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 * |
LEV RATINOV等: "Design Challenges and Misconceptions in Named Entity Recognition", 《PROCEEDINGS OF THE THIRTEENTH CONFERENCE ON COMPUTATIONAL NATURAL LANGUAGE LEARNING (CONLL)》 * |
WEI LU等: "Joint Mention Extraction and Classification with Mention Hypergraphs", 《PROCEEDINGS OF THE 2015 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 * |
徐建忠 等: "基于超图的非连续法律实体识别", 信息技术与信息化 * |
金国哲等: "一种新的朝鲜语词性标注方法", 《中文信息学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427624A (en) * | 2019-07-30 | 2019-11-08 | 北京百度网讯科技有限公司 | Entity relation extraction method and device |
CN110427624B (en) * | 2019-07-30 | 2023-04-25 | 北京百度网讯科技有限公司 | Entity relation extraction method and device |
WO2021072848A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Text information extraction method and apparatus, and computer device and storage medium |
CN112861533A (en) * | 2019-11-26 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Entity word recognition method and device |
CN113033207A (en) * | 2021-04-07 | 2021-06-25 | 东北大学 | Biomedical nested type entity identification method based on layer-by-layer perception mechanism |
CN113033207B (en) * | 2021-04-07 | 2023-08-29 | 东北大学 | Biomedical nested type entity identification method based on layer-by-layer perception mechanism |
WO2023035332A1 (en) * | 2021-09-08 | 2023-03-16 | 深圳前海环融联易信息科技服务有限公司 | Date extraction method and apparatus, computer device, and storage medium |
CN113971733A (en) * | 2021-10-29 | 2022-01-25 | 京东科技信息技术有限公司 | Model training method, classification method and device based on hypergraph structure |
WO2023109436A1 (en) * | 2021-12-13 | 2023-06-22 | 广州大学 | Part of speech perception-based nested named entity recognition method and system, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109902303B (en) | 2023-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109902303A (en) | A kind of entity recognition method and relevant device | |
CN111444320B (en) | Text retrieval method and device, computer equipment and storage medium | |
CN108804677B (en) | Deep learning problem classification method and system combining multi-level attention mechanism | |
CN107180023B (en) | Text classification method and system | |
WO2020073664A1 (en) | Anaphora resolution method and electronic device and computer-readable storage medium | |
CN108829801A (en) | A kind of event trigger word abstracting method based on documentation level attention mechanism | |
CN108509413A (en) | Digest extraction method, device, computer equipment and storage medium | |
CN109522557A (en) | Training method, device and the readable storage medium storing program for executing of text Relation extraction model | |
CN108197109A (en) | A kind of multilingual analysis method and device based on natural language processing | |
CN108363790A (en) | For the method, apparatus, equipment and storage medium to being assessed | |
US20120253792A1 (en) | Sentiment Classification Based on Supervised Latent N-Gram Analysis | |
CN110334357A (en) | A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition | |
CN106326484A (en) | Error correction method and device for search terms | |
WO2021051574A1 (en) | English text sequence labelling method and system, and computer device | |
CN106844350A (en) | A kind of computational methods of short text semantic similarity | |
CN107832292A (en) | A kind of conversion method based on the image of neural network model to Chinese ancient poetry | |
CN109446885A (en) | A kind of text based Identify chip method, system, device and storage medium | |
CN111898374B (en) | Text recognition method, device, storage medium and electronic equipment | |
CN107193915A (en) | A kind of company information sorting technique and device | |
CN106970981B (en) | Method for constructing relation extraction model based on transfer matrix | |
CN113220876B (en) | Multi-label classification method and system for English text | |
CN109739975A (en) | Focus incident abstracting method, device, readable storage medium storing program for executing and electronic equipment | |
CN110472062A (en) | The method and device of identification name entity | |
CN110222329A (en) | A kind of Chinese word cutting method and device based on deep learning | |
CN112287656B (en) | Text comparison method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |