CN104820694A - Automatic Q&A method and system based on multi-knowledge base and integral linear programming ILP - Google Patents

Automatic Q&A method and system based on multi-knowledge base and integral linear programming ILP Download PDF

Info

Publication number
CN104820694A
CN104820694A CN201510208978.1A CN201510208978A CN104820694A CN 104820694 A CN104820694 A CN 104820694A CN 201510208978 A CN201510208978 A CN 201510208978A CN 104820694 A CN104820694 A CN 104820694A
Authority
CN
China
Prior art keywords
resource
entity
relation
candidate resource
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510208978.1A
Other languages
Chinese (zh)
Other versions
CN104820694B (en
Inventor
刘康
赵军
徐立恒
张元哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201510208978.1A priority Critical patent/CN104820694B/en
Publication of CN104820694A publication Critical patent/CN104820694A/en
Application granted granted Critical
Publication of CN104820694B publication Critical patent/CN104820694B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an automatic Q&A method and system based on multi-knowledge base and integral linear programming ILP. The method comprises the following steps: creating a resource dictionary for indexing entities and/or relations of a plurality of knowledge bases; querying in the resource dictionary, and mapping to a plurality of entities and/or a plurality of relations from a plurality of text fragments of a natural sentence to construct a plurality of candidate resources; executing conversion on the candidate resources to obtain a plurality of corresponding templates; combining the text fragments, mapping, candidate resources, conversion and the templates as a disambiguation figure; performing integrated reasoning on the disambiguation figure according to the ILP, selecting at least one to-be-queried template to generate a standard query sentence. Through the adoption of the method provided by the invention, the natural sentence in the multi-knowledge base can be precisely queried.

Description

Based on automatic question-answering method and the system of multiple knowledge base and integral linear programming ILP
Technical field
The invention belongs to natural language processing technique field, particularly relate to the automatic question-answering method based on multiple knowledge base and integral linear programming ILP and system.
Background technology
Along with the development of semantic net and link data, the quantity of knowledge base also constantly increases.People how are made to use these knowledge to become study hotspot easily.Although these knowledge bases generally provide specific query language, such as SPARQL, but, profit carrys out search knowledge base in this way, user not only needs to be grasped vocabulary and the grammer of query language, also will understand the immanent structure of institute's search knowledge base, and this is very difficult for domestic consumer simultaneously.On the other hand, although use the question answering system of keyword easy-to-use, keyword is only used can not to give full expression to the inquiry needs of user.As a comparison, if use natural language as interface, not only can give full expression to the information requirement of user, and allow user to put question to the language of oneself.The question answering system in knowledge based storehouse uses natural language as the interface of knowledge base just, therefore obtains and pays close attention to widely and study.The target of the question and answer in knowledge based storehouse is a given natural language question sentence, in knowledge base, find answer.The difficult point of this problem is unambiguous semantic meaning representation natural language question sentence being converted into specification, and such semantic meaning representation can be converted into the query language of knowledge base easily.
Along with the quick growth of knowledge base quantity, the system can carrying out question and answer in multiple knowledge base obtains more research in recent years.Question answering system needs to find relevant knowledge base answer problem, and needs question sentence to be mapped on the semantic resources of corresponding knowledge base.Another more complicated situation be that a problem needs to combine multiple knowledge base and just can provide answer, and wherein each knowledge base can only provide a part for answer, these incomplete answers need to combine to obtain final result.This problem needs the challenge of solution two aspects: the first, and due to the increase of knowledge base, the word in natural language question sentence or phrase be the corresponding more knowledge base sources of possibility just, and namely resource ambiguity problem becomes severeer; The second, be heterogeneous between different knowledge bases, the expression way of structure and entity is different, form the existing connection that unified query statement needs are explored and understood between each knowledge base, obtain the corresponding relation between knowledge base.
Existing technology all adopts the structure of duct type, and the result of resource mapping is used for setting up inquiry, and the process of inquiring structuring can not have an impact to resource mapping.This just may cause a kind of mistake, i.e. the resource that obtains of resource mapping step, in inquiring structuring step with less than.
Summary of the invention
The invention provides a kind of automatic question-answering method based on multiple knowledge base and integral linear programming (Integer LinearPlanning, ILP) and system, to realize obtaining Query Result according to natural language question sentence in multiple knowledge base.
First aspect of the present invention is to provide a kind of automatic question-answering method based on multiple knowledge base and integral linear programming ILP, comprising:
Create and be used for the entity of the multiple knowledge base of index and/or the resource dictionary of relation;
In described resource dictionary inquiry and the multiple entity be mapped to by multiple text fragments of natural statement and/or multiple relation form multiple candidate resource;
Conversion is performed to each candidate resource, obtains corresponding multiple templates;
By described each text fragments, mapping, candidate resource, conversion and template group synthesis disambiguation figure;
According to ILP, cooperate reasoning carried out to described disambiguation figure, choose at least one template to be checked and carry out generating standard query statement.
Second aspect of the present invention is to provide a kind of automatically request-answering system based on multiple knowledge base and integral linear programming ILP, comprising:
Multiple knowledge base index module, for creating the resource dictionary of entity for the multiple knowledge base of index and/or relation;
Text mapping module, in described resource dictionary inquiry and the multiple entity be mapped to by multiple text fragments of natural statement and/or multiple relation form multiple candidate resource;
Resources shifting module, for performing conversion to each candidate resource, obtains corresponding multiple templates;
Figure generation module, for synthesizing disambiguation figure by described each text fragments, mapping, candidate resource, conversion and template group;
ILP module, for carrying out cooperate reasoning according to ILP to described disambiguation figure, chooses at least one template to be checked and carrys out generating standard query statement.
Beneficial effect of the present invention is:
The automatic question-answering method that the present invention is based on multiple knowledge base and integral linear programming ILP is by mapping to text fragments the resource that obtains when inquiring structuring and the tlv triple template after changing carries out cooperate reasoning, namely for the text fragments of natural language, selected candidate resource and mapping and transformational relation, restrictive condition is set and performance objective function maximization simultaneously, more accurate specification query statement can be obtained, thus make the result of finally carrying out specification inquiry in multiple knowledge base more accurate.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the automatic question-answering method embodiment one that the present invention is based on multiple knowledge base and integral linear programming ILP;
Fig. 2 is the Organization Chart of the automatic question-answering method embodiment one that the present invention is based on multiple knowledge base and integral linear programming ILP;
Fig. 3 is the structured flowchart of the automatically request-answering system embodiment one that the present invention is based on multiple knowledge base and integral linear programming ILP.
Embodiment
Fig. 1 is the process flow diagram of the automatic question-answering method embodiment one that the present invention is based on multiple knowledge base and integral linear programming ILP, Fig. 2 is the schematic diagram of the automatic question-answering method embodiment one that the present invention is based on multiple knowledge base and integral linear programming ILP, as depicted in figs. 1 and 2, the automatic question-answering method that the present invention is based on multiple knowledge base and integral linear programming ILP comprises:
S101, establishment are used for the entity of the multiple knowledge base of index and/or the resource dictionary of relation;
Preferably, described establishment comprises for the entity of the multiple knowledge base of index and the resource dictionary of relation:
To the entity of multiple knowledge base and/or relationship marking resource type tab and mark entity tag or relational tags, in resource dictionary, index entity or the relation of respective resources type to make user according to resource type tab and entity tag or relational tags;
S102, in described resource dictionary inquiry and the multiple entity be mapped to by multiple text fragments of natural statement and/or multiple relation form multiple candidate resource;
Preferably, described inquiry in resource dictionary the multiple entity be mapped to by multiple text fragments of natural statement and/or multiple relation form multiple candidate resource comprises:
S1021, in resource dictionary inquiry and respectively contrast comprise multiple entity of the text fragments of nature statement and/or the similarity of multiple relation and text fragments;
If the described similarity of S1022 is higher than first threshold, then by described entity or relation alternatively resource, retain corresponding text fragments simultaneously;
S103, conversion is performed to each candidate resource, obtain corresponding multiple templates;
Preferably, described conversion is performed to each candidate resource, obtains corresponding multiple templates and comprise and changing according to heuristic rule, obtain tlv triple template, comprising:
Heuristic conversion is performed to a relation candidate resource, obtains one and comprise the first variable, relation candidate resource and bivariate instance variable tlv triple template;
Heuristic conversion is performed to an entity candidate resource and a relation candidate resource, obtains the tlv triple template that comprises the first variable or the second variable, relation candidate resource and entity candidate resource, as shown in table 1:
Table 1
S104, by described each text fragments, mapping, candidate resource, conversion and template group synthesis disambiguation figure;
Preferably, described popularity, the degree of association and combination degree that each text fragments, mapping, candidate resource, conversion and template group synthesis disambiguation figure comprise according to candidate resource are combined into disambiguation figure, wherein, the popularity P be calculated as respectively according to formula (1) and (2) computational entity candidate resource of popularity ewith the popularity P of relation candidate resource r:
P E = InDegree ( r ) + OutDegree ( r ) max { InDegree ( r ′ ) + OutDegree ( r ′ ) } , r ′ ∈ KB - - - ( 1 ) ;
P R = Frequency ( r ) max { Frequency ( r ′ ) } , r ′ ∈ KB - - - ( 2 )
Being calculated as not according to the degree of association R of formula (3), (4) and (5) computational entity entity candidate resource (EE type) of the degree of association eE, relation relation candidate resource (RR type) degree of association R rRwith the degree of association R of entity relationship candidate resource (ER type) eR:
R EE=(#sharedRelArg1(r1,r2)+#sharedRelArg2(r1,r2))×pop -1(r1)×pop -1(r2) (3);
R RR=(#sharedEntArg1(r1,r2)+#sharedEntArg2(r1,r2))×pop -1(r1)×pop -1(r2) (4);
R ER=#cooccurrence(r1,r2)×pop -1(r1)×pop -1(r2) (5);
Being calculated as respectively according to the combination degree R of formula (6), (7) and (8) computational entity entity candidate resource of combination degree rR, relation candidate resource combination degree R eRwith the combination degree R of entity relationship relation candidate resource eRR:
R RR=confidence(r1)×confidence(r2)×pop(r1)×pop(r2) (6);
R ER=relatedness(e1,r1)×relatedness(e2,r2)×confidence(e1)×confidence(r1) (7);
×confidence(e2)×confidence(r2)×pop(e1)×pop(r1)×pop(e2)×pop(r2)
R ERR=relatedness(e,r2)×confidence(r1)×confidence(r2) (8);
×confidence(r1)×pop(r1)×pop(r2)×pop(e)
Wherein, InDegree (r) represents the in-degree of resource r, OutDegree (r) represents the out-degree of resource r, Frequency (r) represents the frequency that resource r occurs in its knowledge base, r' and KB represents that r ' represents any one resource, and KB represents knowledge base (Knowledge Base); SharedRelArg1 (r1, r2) presentation-entity r1 and r2 simultaneously as the first variable the relation shared, sharedRelArg2 (r1, r2) presentation-entity r1 and r2 simultaneously as the second variable the relation shared, sharedEntArg1 (r1, r2) represent relation r1 and r2 the first variable of sharing, sharedEntArg1 (r1, r2) represent relation r1 and r2 the second variable of sharing; Cooccurrence (r1, r2) represent that resource r1 and r2 appears at the number of times of a tlv triple jointly, pop (r1) represents the popularity of relation 1, confidence (r1) represents the degree of confidence of resource r1, the correlation degree of relatedness (e, r2) presentation-entity e and relation r2;
The calculating of popularity, the degree of association and combination degree is as table 2a), 2b) and 2c) shown in:
Table 2a)
Table 2b)
Table 2c);
S105, according to integral linear programming ILP, cooperate reasoning carried out to described disambiguation figure, choose at least one template to be checked and carry out generating standard query statement;
Preferably, describedly according to integral linear programming ILP, cooperate reasoning is carried out to described disambiguation figure, choose at least one template to be checked come generating standard query statement be included in ILP restrictive condition a) ~ l) under, maximize objective function and choose at least one template to be checked (also i.e. tlv triple template) with reasoning and carry out generating standard query statement:
If a) text fragments f ichoose, so map limit M i-must be selected:
F i≤∑ jM ij
B) text fragments is mapped to a candidate resource at the most:
Σ j M ij ≤ 1 , ∀ i ;
If c) map limit M ijselected, so corresponding text fragments f iwith candidate resource r jmust be selected:
M ij ≤ F i , ∀ jand M ij ≤ Σ k R kj + Σ l R jl ;
If d) candidate resource r kwith candidate resource r lsimultaneously selected, i.e. R kl=1, so corresponding text fragments must be mapped to r kand r lupper:
R kl≤∑ iM ikand R kl≤∑ jM jl
If e) two text fragments have overlap, so they can not be simultaneously selected:
Σ t ∈ O F t ≤ 1 , ∀ O ;
If f) change limit T kmselected, so corresponding candidate resource r kwith template p mmust be selected:
T km≤∑ iR ki+∑ jR jk
T km≤∑ iP im+∑ jP mj
If g) R mnselected, so change limit T m-and T n-also selected:
R mn≤∑ kT mkand R mn≤∑ kT nk
If h) P uvselected, so change limit T -uand T -valso selected:
P uv≤∑ mT muand P uv≤∑ mT mv
If i) an ER type terpolymers group template is selected, conversion limit is selected so accordingly:
T mu = T nu , ∀ ps . t . Type ( p u ) = ER ;
J) in order to ensure obtaining a result, at least one P uvselected, and if only have a template P uvselected, so its type must be ER:
u,vP uv≥1,Type(p u)=ER or Type(p v=ER);
K) only have when not contacting between any two templates, just allow P uv=1, u=v, this represents this template p uselected:
u,vP uv·∑ m,nP mn=0,u=v,m≠n;
If l) P uv=1, u=v, so can not have other P mn=1, m=n, because template p uand p mbetween not contact, otherwise P um=1;
Objective function is:
α Σ i , j s ij M ij + β Σ k , l w kl R kl + γ c uv P uv + λ Σ i F i × length ( f i ) length ( question ) ;
Preferably, describedly carrying out cooperate reasoning according to ILP to described disambiguation figure, choose after at least one template to be checked carrys out generating standard query statement, also comprise:
S106, in multiple knowledge base, inquiry is performed to described specification query statement and obtain final Query Result.
By testing the performance that the automatic question-answering method that the present invention is based on multiple knowledge base and integral linear programming ILP is described below:
1) testing material
We test on three data sets, and one is the benchmark dataset of question and answer in internal correlation knowledge base, and one is data set and a Chinese multi-source knowledge base question and answer test set of QALD-4 task 2.Question and answer benchmark dataset in internal correlation knowledge base is created by Shekarpour et al. [2013].It comprises 25 problems, is first data set carrying out conjunctive query on link data.It relies on three knowledge bases, Drugbank (describing the active component of the medicine of FDA approval), Sider (describing medicine and their spinoff) and Diseasome (describing disease and gene defect).Question and answer 4 (Question Answering over Linked Data 4, QALD-4) based on link data are the 4th evaluation and tests of question and answer on link data.Its second task carries out question and answer in the knowledge base of association, and training set and test set respectively have 25 problems.Chinese knowledge base has manual creation to obtain.In order to make problem have diversity, we ask 5 people independently to put question to.These problems are based on three knowledge bases.MOVIE is a knowledge base about cinematographic field, and MUSIC is the knowledge base of music field, and GENERAL provides some the common knowledges, such as, relation between personage.The fact in all knowledge bases is all extract from relevant website, is Mtime, Douban Music and Baidu Baike respectively.
2) based on the raising of the automatic question-answering method of multiple knowledge base and integral linear programming ILP
The validity of this method is described, based on multiple knowledge base and the automatic question-answering method of integral linear programming ILP and the comparing result of other methods and results as table 3 by contrasting other existing methods result:
Table 3
For benchmark dataset, be SINA as the system compared, for QALD-4 task 2 data set, we and all systematic comparisons participating in evaluation and test: GFMed, POMELO and RO_FII.For Chinese data collection, we find that SINA system only needs that cutting question sentence part is modified into applicable Chinese and just can apply.All systems mentioned all adopt duct type structure above.Ours is that this method does not adopt multiple knowledge base to link the result obtained, and Ours is the final experimental result of this method.
Can see from above-mentioned experimental result, use the automatic question-answering method for multiple knowledge base based on integral linear programming to achieve good effect in multiple knowledge base question-answering task.This method is proved to be effective.
The automatic question-answering method that the present invention is based on multiple knowledge base and integral linear programming ILP adopts conjunctive model to solve multi-source knowledge base question and answer problem, resource mapping and inquiring structuring is placed on a unified framework and completes.This makes resource mapping and inquiring structuring process to interact.Resource mapping process can provide necessary resource for inquiring structuring process, and inquiring structuring process can avoid resource mapping process to produce the resource of mistake simultaneously, decreases the incorrect integration that duct type method causes.Show through experiment, the F value for multiple knowledge base question and answer all has lifting on multiple data set.That is, cooperate reasoning is carried out by the tlv triple template after text fragments being mapped when inquiring structuring to the resource and conversion that obtain, namely for the text fragments of natural language, selected candidate resource and mapping and transformational relation, restrictive condition is set and performance objective function maximization simultaneously, more accurate specification query statement can be obtained, thus make the result of finally carrying out specification inquiry in multiple knowledge base more accurate.
Fig. 3 is the Organization Chart of the automatically request-answering system embodiment one that the present invention is based on multiple knowledge base and integral linear programming ILP, and as shown in Figure 3, the automatically request-answering system that the present invention is based on multiple knowledge base and integral linear programming ILP comprises:
Multiple knowledge base index module 21, for creating the resource dictionary of entity for the multiple knowledge base of index and/or relation;
Text mapping module 22, in described resource dictionary inquiry and the multiple entity be mapped to by multiple text fragments of natural statement and/or multiple relation form multiple candidate resource;
Resources shifting module 23, for performing conversion to each candidate resource, obtains corresponding multiple templates;
Figure generation module 24, for synthesizing disambiguation figure by described each text fragments, mapping, candidate resource, conversion and template group;
ILP module 25, for carrying out cooperate reasoning according to integral linear programming ILP to described disambiguation figure, chooses at least one template to be checked and carrys out generating standard query statement.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims (8)

1., based on an automatic question-answering method of multiple knowledge base and integral linear programming ILP, it is characterized in that, comprising:
Create and be used for the entity of the multiple knowledge base of index and/or the resource dictionary of relation;
In described resource dictionary inquiry and the multiple entity be mapped to by multiple text fragments of natural statement and/or multiple relation form multiple candidate resource;
Conversion is performed to each candidate resource, obtains corresponding multiple templates;
By described each text fragments, mapping, candidate resource, conversion and template group synthesis disambiguation figure;
According to ILP, cooperate reasoning carried out to described disambiguation figure, choose at least one template to be checked and carry out generating standard query statement.
2. the automatic question-answering method based on multiple knowledge base and integral linear programming ILP according to claim 1, is characterized in that, described establishment is used for the entity of the multiple knowledge base of index and the resource dictionary of relation comprises:
To the entity of multiple knowledge base and/or relationship marking resource type tab and mark entity tag or relational tags, in resource dictionary, index entity or the relation of respective resources type to make user according to resource type tab and entity tag or relational tags.
3. the automatic question-answering method based on multiple knowledge base and integral linear programming ILP according to claim 1, it is characterized in that, described inquiry in resource dictionary the multiple entity be mapped to by multiple text fragments of natural statement and/or multiple relation form multiple candidate resource comprises:
Inquire about in resource dictionary and contrast respectively and comprise multiple entity of the text fragments of nature statement and/or the similarity of multiple relation and text fragments;
If described similarity is higher than first threshold, then by described entity or relation alternatively resource, retain corresponding text fragments simultaneously.
4. the automatic question-answering method based on multiple knowledge base and integral linear programming ILP according to claim 1, it is characterized in that, described conversion is performed to each candidate resource, obtain corresponding multiple templates and comprise and changing according to heuristic rule, obtain tlv triple template, comprising:
Heuristic conversion is performed to a relation candidate resource, obtains one and comprise the first variable, relation candidate resource and bivariate entity relationship tlv triple template;
Heuristic conversion is performed to an entity candidate resource and a relation candidate resource, obtains the tlv triple template that comprises the first variable or the second variable, relation candidate resource and entity candidate resource.
5. the automatic question-answering method based on multiple knowledge base and integral linear programming ILP according to claim 1, it is characterized in that, described popularity, the degree of association and combination degree that each text fragments, mapping, candidate resource, conversion and template group synthesis disambiguation figure comprise according to candidate resource are combined into disambiguation figure, wherein, the popularity P be calculated as respectively according to formula (1) and (2) computational entity candidate resource of popularity ewith the popularity P of relation candidate resource r:
P E = InDegree ( r ) + OutDegree ( r ) max { InDegree ( r ′ ) + OutDegree ( r ′ ) } , r ′ ∈ KB - - - ( 1 ) ;
P R = Frequency ( r ) max { Frequency ( r ′ ) } , r ′ ∈ KB - - - ( 2 ) ;
Being calculated as not according to the degree of association R of formula (3), (4) and (5) computational entity entity candidate resource of the degree of association eE, relation relation candidate resource degree of association R rRwith the degree of association R of entity relationship candidate resource eR:
R EE=(#sharedRelArg1(r1,r2)+#sharedRelArg2(r1,r2))×pop -1(r1)×pop -1(r2) (3);
R RR=(#sharedEntArg1(r1,r2)+#sharedEntArg2(r1,r2))×pop -1(r1)×pop -1(r2) (4);
R ER=#cooccurrence(r1,r2)×pop -1(r1)×pop -1(r2) (5);
Being calculated as respectively according to the combination degree R of formula (6), (7) and (8) computational entity entity candidate resource of combination degree rR, relation candidate resource combination degree R eRwith the combination degree R of entity relationship relation candidate resource eRR:
R RR=confidence(r1)×confidence(r2)×pop(r1)×pop(r2) (6);
R ER = relatedness ( e 1 , r 1 ) × relatedness ( e 2 , r 2 ) × confidence ( e 1 ) × confidence ( r 1 ) × confidence ( e 2 ) × confidence ( r 2 ) × pop ( e 1 ) × pop ( r 1 ) × pop ( e 2 ) × pop ( r 2 ) - - - ( 7 ) ;
R ERR = relatedness ( e , r 2 ) × confidence ( r 1 ) × confidence ( r 2 ) × confidence ( r 1 ) × pop ( r 1 ) × pop ( r 2 ) × pop ( e ) - - - ( 8 ) ;
Wherein, InDegree (r) represents the in-degree of resource r, OutDegree (r) represents the out-degree of resource r, Frequency (r) represents that the frequency that resource r occurs in its knowledge base, r' and KB represent that r ' represents any one resource, and KB represents knowledge base; SharedRelArg1 (r1, r2) presentation-entity r1 and r2 simultaneously as the first variable the relation shared, sharedRelArg2 (r1, r2) presentation-entity r1 and r2 simultaneously as the second variable the relation shared, sharedEntArg1 (r1, r2) represent relation r1 and r2 the first variable of sharing, sharedEntArg1 (r1, r2) represent relation r1 and r2 the second variable of sharing; Cooccurrence (r1, r2) represent that resource r1 and r2 appears at the number of times of a tlv triple jointly, pop (r1) represents the popularity of relation 1, confidence (r1) represents the degree of confidence of resource r1, the correlation degree of relatedness (e, r2) presentation-entity e and relation r2.
6. the automatic question-answering method based on multiple knowledge base and integral linear programming ILP according to claim 5, it is characterized in that, describedly according to ILP, cooperate reasoning is carried out to described disambiguation figure, choose at least one template to be checked come generating standard query statement be included in ILP restrictive condition a) ~ l) under, maximize objective function and choose at least one template to be checked with reasoning and carry out generating standard query statement:
If a) text fragments f ichoose, so map limit M i-must be selected:
F i≤∑ jM ij
B) text fragments is mapped to a candidate resource at the most:
Σ j M ij ≤ 1 , ∀ i ;
If c) map limit M ijselected, so corresponding text fragments f iwith candidate resource r jmust be selected:
M ij ≤ F i , ∀ jand M ij ≤ Σ k R kj + Σ l R jl ;
If d) candidate resource r kwith candidate resource r lsimultaneously selected, i.e. R kl=1, so corresponding text fragments must be mapped to r kand r lupper:
R kl ≤ Σ i M ik and R kl ≤ Σ j M jl ;
If e) two text fragments have overlap, so they can not be simultaneously selected:
Σ t ∈ O F t ≤ 1 , ∀ O ;
If f) change limit T kmselected, so corresponding candidate resource r kwith template p mmust be selected:
T km≤∑ iR ki+∑ jR jk
T km≤∑ iP im+∑ jP mj
If g) R mnselected, so change limit T m-and T n-also selected:
R mn≤∑ kT mkand R mn≤∑ kT nk
If h) P uvselected, so change limit T -uand T -valso selected:
P uv≤∑ mT muand P uv≤∑ mT mv
If i) an entity relationship tlv triple template is selected, conversion limit is selected so accordingly:
T mu = T nu , ∀ ps . t . Type ( p u ) = ER ;
J) in order to ensure obtaining a result, at least one P uvselected, and if only have a template P uvselected, so its type must be ER:
u,vP uv≥1,Type(p u)=ER or Type(p v=ER);
K) only have when not contacting between any two templates, just allow P uv=1, u=v, this represents this template p uselected:
u,vP uv·∑ m,nP mn=0,u=v,m≠n;
If l) P uv=1, u=v, so can not have other P mn=1, m=n, because template p uand p mbetween not contact, otherwise P um=1;
Objective function is:
α Σ i , j s ij M ij + β Σ k , l w kl R kl + γc uv P uv + λ Σ i F i × length ( f i ) length ( question ) .
7. based on an automatic question-answering method of multiple knowledge base and integral linear programming ILP, it is characterized in that, describedly carrying out cooperate reasoning according to ILP to described disambiguation figure, choose after at least one template to be checked carrys out generating standard query statement, also comprise:
In multiple knowledge base, inquiry is performed to described specification query statement and obtain final Query Result.
8., based on an automatically request-answering system of multiple knowledge base and integral linear programming ILP, it is characterized in that, comprising:
Multiple knowledge base index module, for creating the resource dictionary of entity for the multiple knowledge base of index and/or relation;
Text mapping module, in described resource dictionary inquiry and the multiple entity be mapped to by multiple text fragments of natural statement and/or multiple relation form multiple candidate resource;
Resources shifting module, for performing conversion to each candidate resource, obtains corresponding multiple templates;
Figure generation module, for synthesizing disambiguation figure by described each text fragments, mapping, candidate resource, conversion and template group;
ILP module, for carrying out cooperate reasoning according to ILP to described disambiguation figure, chooses at least one template to be checked and carrys out generating standard query statement.
CN201510208978.1A 2015-04-28 2015-04-28 Automatic question-answering method and system based on multiple knowledge base and integral linear programming ILP Active CN104820694B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510208978.1A CN104820694B (en) 2015-04-28 2015-04-28 Automatic question-answering method and system based on multiple knowledge base and integral linear programming ILP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510208978.1A CN104820694B (en) 2015-04-28 2015-04-28 Automatic question-answering method and system based on multiple knowledge base and integral linear programming ILP

Publications (2)

Publication Number Publication Date
CN104820694A true CN104820694A (en) 2015-08-05
CN104820694B CN104820694B (en) 2019-03-15

Family

ID=53730989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510208978.1A Active CN104820694B (en) 2015-04-28 2015-04-28 Automatic question-answering method and system based on multiple knowledge base and integral linear programming ILP

Country Status (1)

Country Link
CN (1) CN104820694B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570138A (en) * 2016-11-03 2017-04-19 北京百度网讯科技有限公司 Information search method and device based on artificial intelligence
CN107992528A (en) * 2017-11-13 2018-05-04 清华大学 Utilize more relation question answering systems of interpretable inference network
CN108664465A (en) * 2018-03-07 2018-10-16 珍岛信息技术(上海)股份有限公司 One kind automatically generating text method and relevant apparatus
CN108920488A (en) * 2018-05-14 2018-11-30 平安科技(深圳)有限公司 The natural language processing method and device that multisystem combines
CN109376298A (en) * 2018-09-14 2019-02-22 广州神马移动信息科技有限公司 Data processing method, device, terminal device and computer storage medium
CN109656952A (en) * 2018-10-31 2019-04-19 北京百度网讯科技有限公司 Inquiry processing method, device and electronic equipment
CN107451240B (en) * 2017-07-26 2019-12-13 北京大学 interaction-based knowledge-graph question-answer Q/A system retrieval and promotion method and device
CN112256847A (en) * 2020-09-30 2021-01-22 昆明理工大学 Knowledge base question-answering method integrating fact texts

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101330432A (en) * 2007-06-18 2008-12-24 阿里巴巴集团控股有限公司 System and method for implementing on-line QA
CN102789496A (en) * 2012-07-13 2012-11-21 携程计算机技术(上海)有限公司 Method and system for implementing intelligent response
CN103049433A (en) * 2012-12-11 2013-04-17 微梦创科网络科技(中国)有限公司 Automatic question answering method, automatic question answering system and method for constructing question answering case base
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
CN104361127A (en) * 2014-12-05 2015-02-18 广西师范大学 Multilanguage question and answer interface fast constituting method based on domain ontology and template logics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101330432A (en) * 2007-06-18 2008-12-24 阿里巴巴集团控股有限公司 System and method for implementing on-line QA
CN102789496A (en) * 2012-07-13 2012-11-21 携程计算机技术(上海)有限公司 Method and system for implementing intelligent response
CN103049433A (en) * 2012-12-11 2013-04-17 微梦创科网络科技(中国)有限公司 Automatic question answering method, automatic question answering system and method for constructing question answering case base
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
CN104361127A (en) * 2014-12-05 2015-02-18 广西师范大学 Multilanguage question and answer interface fast constituting method based on domain ontology and template logics

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
C UNGER ETAL: "Template-based Question Answering over RDF Data", 《INTERNATIONAL CONFERENCE ON WORLD WIDE WEB》 *
GUANGYOU ZHOU ETAL: "Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives", 《MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES》 *
MOHAMED YAHYA ETAL: "Natural Language Questions for the Web of Data", 《JOINT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND COMPUTATIONAL NATURAL LANGUAGE LEARNING》 *
许坤 等: "面向知识库的中文自然语言问句的语义理解", 《北京大学学报(自然科学版)》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570138A (en) * 2016-11-03 2017-04-19 北京百度网讯科技有限公司 Information search method and device based on artificial intelligence
CN106570138B (en) * 2016-11-03 2020-03-03 北京百度网讯科技有限公司 Information searching method and device based on artificial intelligence
CN107451240B (en) * 2017-07-26 2019-12-13 北京大学 interaction-based knowledge-graph question-answer Q/A system retrieval and promotion method and device
CN107992528A (en) * 2017-11-13 2018-05-04 清华大学 Utilize more relation question answering systems of interpretable inference network
CN107992528B (en) * 2017-11-13 2022-07-05 清华大学 Multi-relational question-answering system using interpretable reasoning network
CN108664465A (en) * 2018-03-07 2018-10-16 珍岛信息技术(上海)股份有限公司 One kind automatically generating text method and relevant apparatus
CN108920488A (en) * 2018-05-14 2018-11-30 平安科技(深圳)有限公司 The natural language processing method and device that multisystem combines
CN108920488B (en) * 2018-05-14 2021-09-28 平安科技(深圳)有限公司 Multi-system combined natural language processing method and device
CN109376298A (en) * 2018-09-14 2019-02-22 广州神马移动信息科技有限公司 Data processing method, device, terminal device and computer storage medium
CN109656952A (en) * 2018-10-31 2019-04-19 北京百度网讯科技有限公司 Inquiry processing method, device and electronic equipment
CN109656952B (en) * 2018-10-31 2021-04-13 北京百度网讯科技有限公司 Query processing method and device and electronic equipment
CN112256847A (en) * 2020-09-30 2021-01-22 昆明理工大学 Knowledge base question-answering method integrating fact texts
CN112256847B (en) * 2020-09-30 2023-04-07 昆明理工大学 Knowledge base question-answering method integrating fact texts

Also Published As

Publication number Publication date
CN104820694B (en) 2019-03-15

Similar Documents

Publication Publication Date Title
CN104820694A (en) Automatic Q&A method and system based on multi-knowledge base and integral linear programming ILP
CN106934012B (en) Natural language question-answering implementation method and system based on knowledge graph
Khot et al. Scitail: A textual entailment dataset from science question answering
Nguyen et al. Codewebs: scalable homework search for massive open online programming courses
US10769552B2 (en) Justifying passage machine learning for question and answer systems
Aizawa et al. NTCIR-11 Math-2 Task Overview.
Aizawa et al. NTCIR-10 Math Pilot Task Overview.
CN102262634B (en) Automatic questioning and answering method and system
US20160026378A1 (en) Answer Confidence Output Mechanism for Question and Answer Systems
US20130198192A1 (en) Author disambiguation
CN105138864B (en) Protein interactive relation data base construction method based on Biomedical literature
US20160378853A1 (en) Systems and methods for reducing search-ability of problem statement text
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
CN112328800A (en) System and method for automatically generating programming specification question answers
CN104484380A (en) Personalized search method and personalized search device
Giordani et al. Semantic mapping between natural language questions and SQL queries via syntactic pairing
CN110427471B (en) Natural language question-answering method and system based on knowledge graph
Risse et al. The ARCOMEM architecture for social-and semantic-driven web archiving
CN108520038B (en) Biomedical literature retrieval method based on sequencing learning algorithm
Zheng et al. Question answering over knowledge graphs via structural query patterns
CN114780740A (en) Construction method of tea knowledge graph
Toti AQUEOS: a system for question answering over semantic data
Wang et al. Semi-supervised chinese open entity relation extraction
Waltinger et al. Usi answers: Natural language question answering over (semi-) structured industry data
CN116860991A (en) API recommendation-oriented intent clarification method based on knowledge graph driving path optimization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant