CN103440287B - A kind of Web question and answer searching system based on product information structure - Google Patents
A kind of Web question and answer searching system based on product information structure Download PDFInfo
- Publication number
- CN103440287B CN103440287B CN201310354888.4A CN201310354888A CN103440287B CN 103440287 B CN103440287 B CN 103440287B CN 201310354888 A CN201310354888 A CN 201310354888A CN 103440287 B CN103440287 B CN 103440287B
- Authority
- CN
- China
- Prior art keywords
- information
- module
- product
- product information
- question sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention is a kind of Web question and answer searching system based on product information structure.Crawl module including user interface, product information, information extraction module, inverted index set up module, data base interface, information integration module, question sentence processing module, data base, the present invention can obtain the latest development of online product information in real time, and by information extraction and integration module, in time structuring product data existing in data base can be updated or increase new structuring product data so that system can adapt to the change of online product information.Additionally, product information is acquired by the present invention from multiple infomediaries, and by information extraction and integration module, identical product product information on different web sites is integrated, the information of contradiction is judged, the information lacked is carried out the complementation between different aforementioned sources, it is ensured that the integrity of retrieval information and verity.The present invention is a kind of Web question and answer searching system based on product information structure with higher recall precision.
Description
Technical field
The present invention relates to the Internet destructuring, semi-structured information extraction, modeling and search field, be specially a kind of base
In Web question and answer searching system and the method for product information structure, belonging to Web question and answer based on product information structure retrieval is
The renovation technique of system.
Background technology
21 century is the informationalized epoch, and network has become people and lived an indispensable part.Along with the Internet
Developing rapidly, on the one hand people grow with each passing day for the demand of the network information, on the other hand there is the information of magnanimity on the Internet,
Yet with inherent characters such as the Internet Large Copacity, dynamics, these magnanimity informations are often scrappy, and inorganizable property is also wrapped
Contain a large amount of invalid data.It reduce people's utilization ratio to abundant information resource.In order to solve this " information overload "
Problem, many companies and research institution have turned to the research to automatically request-answering system.
Question answering system (Question Answering System, QA) is a kind of advanced form of information retrieval system.It
The problem that user proposes can be answered with natural language with accurate, succinct natural language.The main cause that its research is risen is people
To the demand obtaining information quickly and accurately.Question answering system is in current artificial intelligence and natural language processing field one
Receive much attention and there is the research direction of broad development prospect.
From the point of view of ken, existing question answering system can be divided into " closing field " and " Opening field " two class system.
Closing neighborhood system and be absorbed in the problem answering specific area, current most of question answering systems belong to closing neighborhood system.Open
Neighborhood system is then wished not limit the context of problem, and difficulty is of a relatively high.
Existing closing field question answering system mainly has: the Application No. 200810233734 of Kunming University of Science and Technology, invention
Entitled " tourism request-answer system answer abstracting method based on ontology inference ".The method concentrates on tourism request-answer system answer
Concept, attribute and relation in the research of abstracting method, first Manual definition's tour field, and artificial constructed tour field body
Knowledge base, tests to the concordance of body the most again;Next utilizes the semantic information in ontology knowledge base to user's question sentence
Carry out semantic disambiguation;Then the semantic rule in Manual definition's tour field;It is again based on the Research of Question Analysis knot of semantic disambiguation
Really, the method using the reasoning of corresponding semantic rule and information retrieval to combine extracts answer in ontology knowledge base;Finally
According to different question sentence types, design corresponding answer extracting algorithm, improve responsiveness and the recall rate of system.
It can be seen that need substantial amounts of artificial interference in the method for this invention employing, structure, concept including knowledge base belong to
The definition of property and the formulation of semantic rule are required for manually participating in.Too much artificial participation can cause the increase of human cost,
And need to keep certain personnel system is safeguarded and updates.
Summary of the invention
It is an object of the invention to consider the problems referred to above and a kind of integrity guaranteeing retrieval information and verity are provided, and
There is the Web question and answer searching system based on product information structure of higher recall precision.
The technical scheme is that the Web question and answer searching system based on product information structure of the present invention, include
User interface, product information crawl module, information extraction module, inverted index set up module, data base interface, information integration mould
Block, question sentence processing module, data base, wherein,
User interface, for realizing the various communications of Web question answering system and user, including the product phase obtaining user's input
Close natural language question sentence and question sentence is passed to question sentence processing module;Return to use by corresponding Search Results and related web page
Family;
Product information crawls module, for crawling webpage according to intervals, and is entered by the webpage crawled
Row storage, passes to information extraction module and processes;
Information extraction module, for crawling at the non-structured web page information that module crawls in webpage product information
These unstructured information are converted into structured message, and are built with structuring product information data by data base interface by reason
Vertical connection, is stored in the structured message handled well in data base;
Inverted index sets up module, for crawling extraction key content the webpage that module crawls from product information and right
These webpages set up inverted index;
Data base interface, it is achieved the access of structuring product data, the unified interface updating database manipulation and access right
Limit controls;
Information integration module, for integrating multiple Data Source structured messages of information extraction module output, and by whole
Structural data after conjunction is connected with Database by data base interface, is saved in data base;
Question sentence processing module, for the natural language question sentence that user inputs is converted into structurized statement, this module is led to
Cross user interface and set up the natural language question sentence being connected acquisition user's input with user, and built with data base by data base interface
Vertical connection, uses the statement obtained after converting to inquire about in data base, and by user interface by the Query Result of statement
Feed back to user.
Natural language question sentence is converted by above-mentioned question sentence processing module in two steps, first by the simple pattra leaves trained
Natural language question sentence is classified by this grader, then uses skip-chain CRF model to the life in natural language question sentence
Name entity is identified and extracts.
Above-mentioned name entity is mobile phone title, mobile phone attribute.
Above-mentioned Skip-chain CRF model is to develop on the basis of linear condition random field (Linear CRF) model
, it is the one in condition random field (CRF) model.
In above-mentioned name entity recognition method, ignore conjunction " with ", "or" effect in sentence, at Skip-chain CRF
Model establishes the contact between former and later two words of conjunction, helps the raising of final precision;Take out for inquiring about question sentence name entity
The identification model taken, after using Skip-Chain CRF model to learn training set, it is thus achieved that for the name of product information
Entity recognition and judgment criterion, and then question sentence is converted into key word and the product attribute of retrieval meaning.
Above-mentioned information integration module first obtains an attribute mapping table according to the attribute value information in two pending tables,
Two tables will have same meaning but name may different attribute-name be mapped, it is simple to next step integration work;
Create object table further according to the map information that obtains, the most sequentially rearrange the row name of two tables, according to can be the most true
The Major key of a fixed record determines whether the corresponding record in two tables may compare, if equal, is considered to compare
, if comparable, then the information in two tables merged or is removed redundancy and process, result is inserted in object table,
And the corresponding record in former table is marked;Finally unlabelled record is inserted in object table the most one by one, obtain a warp
Cross the object table integrated;If there being multiple tables, processing two tables the most every time, repeating said method and i.e. obtaining final result.
The said goods information crawler module, is used for according to intervals pconline, the large-scale digital website of bubble
On introduce the webpage of digital product details and crawl, and the webpage crawled is stored, passes to information extraction mould
Block processes.
Above-mentioned question sentence processing module, should for the natural language question sentence that user inputs is converted into structurized SQL statement
Module is set up with user by user interface and is connected the natural language question sentence obtaining user's input, and by data base interface and knot
Structure product information database is set up and is connected, and uses the SQL statement obtained after converting to inquire about in data base, and by using
The Query Result of SQL statement is fed back to user by family interface.
The present invention is directed to the analysis system of destructuring, semi-structured product information, to the multiple sources letter with a product
Breath is integrated, it is ensured that information true and perfect;Use sorting algorithm and name entity identification algorithms by nature language simultaneously
Speech question sentence is converted into structured database query statement;For the fine granularity sentiment analysis system of product review information, use
The separate sources information of identical product is integrated by the algorithm of a kind of Case-based Reasoning similarity.
The algorithm of above-mentioned Case-based Reasoning similarity is divided into mapping and merging two steps to carry out the integration of information, in mapping step
The algorithm using Case-based Reasoning similarity carries out Similarity Measure, at combining step according to back to the corresponding element of two tables
Two tables are merged by rapid result;For the fine granularity sentiment analysis system of product review information, first question sentence is entered
Row classification, then sets up identification model and extracts the name entity in question sentence, and the structure finally according to first two steps uses phase
This natural language question sentence is converted into SQL statement by the rule answered.
Present invention Web based on product information structure question and answer searching system, the advantage with the following aspects: 1) this
Inventing the well adapting to property of product information of change on the Internet, the effective periodic information that native system proposes updates to be collected
Technology can carry out same timely collection to the change of the product information on the Internet, it is possible to obtains online product information in real time
Latest development, and by information extraction and integration module, it is possible in time structuring product data existing in data base are carried out
Update or increase new structuring product data, so that system can adapt to the change of online product information.2) present invention
The product information collected is more complete and has higher verity.Product information is entered by the present invention from multiple infomediaries
Row gathers, and by information extraction and integration module, integrates identical product product information on different web sites, right
The information of contradiction judges, the information lacked is carried out the complementation between different aforementioned sources, ensures that the integrity of information
And verity.3) present invention has higher recall precision, returns key word related web page not with traditional information retrieval system
With, the natural question sentence of user's input by question sentence processing module, is asked while providing related web page information by the present invention
A series of process such as sentence classification, name Entity recognition, are converted into nature question sentence structurized SQL statement, finally use and obtain
SQL statement inquire about and return the simplest result to user to data base is carried out.The present invention is a kind of convenient and practical
Web question and answer searching system based on product information structure, it is a kind of advanced form of information retrieval, it can be with accurate, brief introduction
Language answer the problem that proposes with natural language of user.
Accompanying drawing explanation
Fig. 1 is the Web question answering system Organization Chart of the present invention;
Fig. 2 be the inverted index of the present invention set up module realize schematic diagram;
Fig. 3 be the Data Integration module of the present invention realize schematic diagram;
Fig. 4 be the question sentence processing module of the present invention realize schematic diagram;
Fig. 5 be the present invention question sentence processing module in Question Classification realize schematic diagram;
Fig. 6 be the present invention question sentence processing module in name Entity recognition realize schematic diagram;
Fig. 7 is the graph structure of the Linear-CRF model as a example by naming entity task.
Detailed description of the invention
Embodiment:
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference
Accompanying drawing, the present invention is described in more detail.
Fig. 1 shows present invention Web based on product information structure question answering system Organization Chart.
With reference to Fig. 1, the Web question answering system of the present invention includes user interface, question sentence processing module, data base interface, structure
Change product information database, information integration module, information extraction module, product information crawls module, inverted index sets up module.
User interface, for realizing the various communications of Web question answering system and user, including the product phase obtaining user's input
Close natural language question sentence and question sentence is passed to question sentence processing module;Return to use by corresponding Search Results and related web page
Family.
Product information crawls module, for according to intervals digital website large-scale to pconline, bubble etc.
The webpage introducing the digital product details such as mobile phone, computer crawls, and is stored by the webpage crawled, and passes to letter
Breath abstraction module processes.
Information extraction module, for crawling at the non-structured web page information that module crawls in webpage product information
Reason, such as the dominant frequency of mobile phone, screen size etc..These unstructured information are converted into structured message, and are connect by data base
Mouth is set up with structuring product information data and is connected, and is stored in data base by the structured message handled well.
Inverted index sets up module, for crawling extraction key content the webpage that module crawls from product information and right
These webpages set up inverted index.
Data base interface, it is achieved the unified interface of the database manipulations such as the access of structuring product data, renewal and access
Control of authority.
Information integration module, for integrating multiple Data Source structured messages of information extraction module output, and by whole
Structural data after conjunction is connected with Database by data base interface, is saved in data base.The present invention is first according to treating
The information such as the property value in two tables processed obtain an attribute mapping table, will have same meaning but name in two tables
May different attribute-name be mapped, it is simple to next step integration work;Target is created further according to the map information obtained
Table, the most sequentially rearranges the row name of two tables.According to can uniquely determine that a Major key recorded determines two tables
In corresponding record whether may compare, if equal, be considered to compare, if comparable, then the information in two tables entered
Row merges or removes the process such as redundancy, result is inserted in object table, and the corresponding record in former table is marked.?
After unlabelled record is inserted in object table the most one by one, obtain one through integration object table.If having multiple tables, the most every time
Process two tables, repeat said method and i.e. obtain final result.
Question sentence processing module, for being converted into structurized SQL statement by the natural language question sentence that user inputs.This module
Set up with user by user interface and be connected the natural language question sentence obtaining user's input, and by data base interface and structuring
Product information database is set up and is connected, and uses the SQL statement obtained after converting to inquire about in data base, and is connect by user
The Query Result of SQL statement is fed back to user by mouth.Natural language question sentence is converted by the present invention in two steps, first by instruction
Natural language question sentence is classified by the Naive Bayes Classifier perfected, and then uses skip-chain CRF model to nature
Name entity in language question sentence such as mobile phone title, mobile phone attribute etc. are identified and extract.Skip-chain CRF model is
In Linear CRF(linearity condition random field) develop on the basis of model, be CRF(condition random field) in model one
Kind.In conventional name entity recognition method, generally have ignored conjunction as " with ", the word effect in sentence such as "or",
Skip-chain CRF model establishes the contact between former and later two words of conjunction, helps the raising of final precision.
The present invention uses algorithm based on Similarity Measure to integrate the separate sources information of Uniform Product.Due to this
System crawls module in product information can carry out information crawler to multiple digital product websites, and the purpose of this way is to protect
The product information that card is collected can try one's best and completely enriched, but due to different web sites may to the different name of employing of same attribute or
Person's property value is different, and this causes the separate sources information of identical product to there may be the situation of redundancy or contradiction.The present invention adopts
Algorithm based on Similarity Measure can effectively the separate sources information of these redundancies or contradiction be integrated, from
And both ensure that the complete of data can guarantee that data have higher correctness.
The present invention uses the method for Question Classification and name Entity recognition to be converted into structurized by natural language question sentence
SQL statement.Question sentence is carried out classification and question sentence can be carried out finer process, different classes of question sentence is used different turning
Change rule, it is possible to increase the system understandability to natural language question sentence.Name entity in natural language question sentence is known
Being the most substantially to be identified the main body in question sentence or object, main body and object in a Rational Solutions question sentence could be in conjunction with concrete
Transformation rule carry out question sentence conversion.The natural language question sentence of plurality of classes can be entered by the question sentence converting algorithm that the present invention uses
Row converts, and is able to ensure that higher accuracy rate.
In sum, the main modular of this system is that question sentence processing module, Data Integration module and inverted index are set up
Module.Below in conjunction with accompanying drawing, these three module is further discussed in detail.
Fig. 2 be inverted index set up module realize schematic diagram.With reference to Fig. 2, this module realizes crawling mould from product information
The webpage that block crawls extracts key content, these webpages is set up inverted index and stores.The construction process of index can
To be divided into three parts:
1) pretreatment stage, uses Htmlparser to extract the key content information in webpage, removes in webpage
Noise information, improve later retrieval accuracy rate.Utilize Document pair of these data construct Lucene extracted
As and the Field object of correspondence.
2) analysis phase, by calling the addDocument(Document of index manager (IndexWriter)) method
Pass data to Lucene and be indexed operation.When being indexed data processing, Lucene can first analytical data, make
Be more suitable for indexed.
3) write index, after completing input data analysis, writes the result in index file, will input data with the row of falling
The data structure of index stores.
Fig. 3 be Data Integration module realize schematic diagram.With reference to Fig. 3, this module realizes integrating information extraction module output
Multiple Data Source structured messages, and will integrate after structural data be stored in data base.This module can be divided into
Two submodules:
1) obtain an attribute mapping table according to information such as the property values in two pending tables, will two tables have
There is same meaning but name and may different attribute-name be mapped, it is simple to next step integration work.
2) map information obtained according to the 1st step creates object table, the most sequentially rearranges the row name of two tables.Root
According to can uniquely determine that a Major key recorded determines whether the corresponding record in two tables may compare, if equal, think
Can compare, if comparable, then the information in two tables be merged or removed redundancy etc. and process, result is inserted
Enter in object table, and the corresponding record in former table is marked.Finally unlabelled record is inserted in object table the most one by one,
Obtain one through the object table integrated.
If there being multiple tables, processing two tables, repeat the above steps 1 and step 2 i.e. obtain final result the most every time.
The detailed step setting up mapping table and the information of integration is:
1) step of mapping table is obtained:
1. obtain the attribute value information in two tables, they are stored in result1 and result2 respectively, such as
result1=List<a1,a2,a3,....,am>,ai=<ai1,ai2,ai3,...ain>i=1,2,3,…m.
Wherein m is the columns of attribute column of first table, and n is the line number of the attribute column of first table.Will first table
In each row be stored in a respectively1,a2,a3,....,amIn.In like manner can obtain:
result2=List<b1,b2,b3,....,bm>,bi=<bi1,bi2,bi3,...bin>i=1,2,3,…m.
2. with Chinese Academy of Sciences participle instrument imdict-chinese-analyzer to a1,a2,a3,....,amAnd b1,b2,
b3,....,bmAfter carrying out participle, it is not stored in result1SegmentFilter=List < a1',a'2,a'3,...,a'm>,ai'=<
ai'1,ai'2,....,ai'k>result2egmentFilter=List<b1',b'2,b3',...,b'm>,bi'=<bi'1,bi
'2,....,bi'k>
The most respectively to a1',a'2,a'3,...,a'mTake set, to b1',b'2,b3',...,b'mTake set, i.e. remove repetition
The value occurred is stored in result1Set=List < a1'',a'2',a'3',...,a''m>,ai'=<ai''1,ai''2,....,ai''Li
>LiIt is ai' ' in the number of word
result2Set=List<b1'',b'2',b3'',...,b'm'>,bi''=<bi''1,bi''2,....,bi''L'i>
L'iIt is bi' ' in the number of word
4. calculate in result1Set and result2Set element a two-by-twoi' ' and bi' ' similarity:
A) if ai' ' and bi' ' the number difference of word less, then directly to ai' ' and bi' ' carry out Similarity Measure, phase
Seemingly spend computing formula:Wherein same function calculates ai' ' and bi' ' there is phase
Number with word.Result of calculation is stored in M, and (i, j), to each i, tries to achieve corresponding j value so that M(i, j) maximum.Should
J value is i.e. row number most possibly corresponding with the i-th row in first table in second table.If M(i, size j) is more than a certain
Threshold value, then it is assumed that i and j is corresponding, by they corresponding outputs to mapping table.
B) if ai' ' and bi' ' the number difference of word relatively big, then need ai' ' and bi' ' in number the greater of word carry out
Pretreatment, i.e. adds up word frequency, is ranked up from high to low by word frequency, block the most in position, obtain ai' ' and bi''。
Go to step A again.
2) step of information in two tables of integration:
1. utilize the major key (can uniquely identify the property value of a record) of two tables, record identical for major key is carried out
Data Integration, such as, remove redundancy, perfect information, conflict removal etc., marked respectively by the record processed in two tables
Note.Record after processing is inserted in object table.
After the key assignments of first table of traversal the most to be recycled, two tables are found not labeled record, by them respectively
Being inserted in object table, so far, the integration of two tables completes.To integrate multiple tables, can only need to will obtain according to the method described above
Integration table common treat integration table as one.
Fig. 4 be question sentence processing module realize schematic diagram.With reference to Fig. 4, this module realizes natural language user inputted
Question sentence is converted into structurized SQL statement.In this module, the conversion process of natural language question sentence is divided into three steps: text
Pretreatment, Question Classification and name Entity recognition.Participle and the part of speech of question sentence it is substantially carried out in Text Pretreatment step
Marks etc. process.Question Classification detailed herein and name Entity recognition step.
Fig. 5 be Question Classification of the present invention realize schematic diagram.With reference to Fig. 5, the present invention uses NB Algorithm to certainly
So language question sentence is classified, and selects final classification results according to maximal possibility estimation criterion.Assume that class set is combined into C={C1,
C2,....,Cn, the result after the natural language participle of input is X={x1,x2,.....,xm, wherein xiFor the word in question sentence
Language, according to training to data equation below calculate question sentence and belong to the probability of each class:
It is wherein fixing for each natural language P (X), the most only need to calculate P (x1|Ci)×P(x2|Ci)×...
×P(xm|Ci), the class of select probability maximum is as final class.
Fig. 6 be the present invention name Entity recognition realize schematic diagram.With reference to Fig. 6, the present invention uses one to have skip-
Name entity in question sentence is identified by the CRF model of chain structure.This model is that the present invention carries out natural language question sentence and turns
The key point changed, therefore, is described in detail below the structure of skip-chain CRF model, principle and advantage.
We carry out observation analysis discovery by question sentence of being correlated with substantial amounts of product information, can occur in many question sentences simultaneously
Two or more name entity names, such as input question sentence are that " which is more preferable for Nokia5230 and Nokia N8?", or input
" Nokia 5230 and Nokia 5233 which good " question sentence be, and in this kind of question sentence, entity name conventional as " with ", "or"
Connect Deng conjunction.So there is such a phenomenon, if the word before conjunction is judged as entity name, the word after conjunction has
The biggest may also be same entity name.This phenomenon has referred in conventional work, but does not propose to solve well
Certainly method.Word before and after conjunction is coupled together by the present invention by constructing the CRF model with skip-chain, thus is judging
During consider the information in this kind of phenomenon, help the raising of name Entity recognition accuracy rate.In Fig. 6, wherein labelling T1 table
Showing that entity word is anterior, T2 presentation-entity word rear portion, O represents other words.
Skip-chain CRF is a kind of special CRF model.CRF is a kind of non-directed graph model, and it is in given feature
In collection basis, the conditional probability distribution of sequence mark is modeled.As a example by most basic Linear-CRF, given observation
Under conditions of sequence, the conditional probability of labelled sequence can be following form with formalized description:
Wherein, ψiIt is the potential function in non-directed graph model concept,It is a length of I
The likely regularization factors under labelled sequence.Potential function ψiCan be to be decomposed into following form, wherein fkFeature for definition
Function.
The graph model structure of its correspondence is as it is shown in fig. 7, here as a example by name Entity recognition task, input pretreated
Text message, sets up the Linear-CRF model of its correspondence.The conditional probability of labelled sequence is directly built by Linear-CRF
Mould, is different from Directed Graph Model such as HMM(hidden horse model), it need not just can introduce rich to doing independence assumption between feature
Rich feature;On the other hand, it is also considered as the MEMM(maximum entropy Markov model of overall situation regularization), and avoid
Marking bias problem in MEMM.Therefore, Linear-CRF can when solving the identification of sequence mark problem such as name entity
Obtain good effect.
Skip-chain CRF is a kind of CRF model improved on the basis of Linear-CRF.Such as Skip-in Fig. 6
Shown in the graph model of chain CRF, its structure is in addition to comprising the linear-chain between Linear-CRF adjacent node, also
Introduce the skip-chain between former and later two words of conjunction, thus on the basis of Linear-CRF, add word before and after conjunction
Contact details between label.
The formalized description of Skip-chain CRF is as follows:
Wherein ΨiIt is defined on the potential function on adjacent label node, φj,j+2The potential function being defined on skip-chain, S
={ (j, j+2) } is the set of all skip-chain.Being defined as follows of they:
Wherein fk(yi,yi-1, X, i) characteristic function being defined on linear-chain, fl(yi,yi+2,X,j,j+2)
The characteristic function being defined on skip-chain.
When model training, the present invention uses L-BFGS algorithm to be trained the skip-chain CRF model launched,
Parameter lambda in learning modelkAnd ηl。
Particular embodiments described above, has been carried out the purpose of the present invention, technical scheme and beneficial effect the most in detail
Describe in detail bright, be it should be understood that the specific embodiment that the foregoing is only the present invention, be not limited to the present invention, all
Within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, should be included in the guarantor of the present invention
Within the scope of protecting.
Claims (9)
1. a Web question and answer searching system based on product information structure, it is characterised in that include user interface, product letter
Breath crawls module, information extraction module, inverted index set up module, data base interface, information integration module, question sentence process mould
Block, data base, wherein,
User interface, is used for the various communications realizing Web question answering system with user, relevant certainly including the product obtaining user's input
Question sentence is also passed to question sentence processing module by right language question sentence;Corresponding Search Results and related web page are returned to user;
Product information crawls module, for crawling webpage according to intervals, and is deposited by the webpage crawled
Storage, passes to information extraction module and processes;
Information extraction module, the non-structured web page information crawling in webpage for product information crawls module processes,
These unstructured information are converted into structured message, and are set up even with structuring product information data by data base interface
Connect, the structured message handled well is stored in data base;
Inverted index sets up module, for crawling extraction key content the webpage that module crawls from product information, and to these
Webpage sets up inverted index;
Data base interface, it is achieved the access of structuring product data, the unified interface updating database manipulation and access rights control
System;
Information integration module, for integrating multiple Data Source structured messages of information extraction module output, and by after integration
Structural data be connected with Database by data base interface, be saved in data base;
Question sentence processing module, for the natural language question sentence that user inputs is converted into structurized statement, this module is by using
Family interface is set up with user and is connected the natural language question sentence obtaining user's input, and by data base interface with Database even
Connect, use the statement obtained after converting to inquire about in data base, and by user interface, the Query Result of statement is fed back
To user;
Above-mentioned information integration module first obtains an attribute mapping table according to the attribute value information in two pending tables, will
Two tables have same meaning but names and may different attribute-name be mapped, it is simple to next step integration work;Root again
Create object table according to the map information obtained, the most sequentially rearrange the row name of two tables, according to can uniquely determine one
The Major key of bar record determines whether the corresponding record in two tables may compare, if equal, is considered to compare, if
Comparable, then the information in two tables is merged or removed redundancy and process, result is inserted in object table, and by former
Corresponding record in table is marked;Finally unlabelled record is inserted in object table the most one by one, obtain one through integrating
Object table;If there being multiple tables, processing two tables the most every time, repeating said method and i.e. obtaining final result.
Web question and answer searching system based on product information structure the most according to claim 1, it is characterised in that above-mentioned ask
Natural language question sentence is converted, first by the Naive Bayes Classifier trained to nature by sentence processing module in two steps
Language question sentence is classified, and then uses skip-chain CRF model to be identified the name entity in natural language question sentence
And extraction.
Web question and answer searching system based on product information structure the most according to claim 2, it is characterised in that above-mentioned life
Name entity is mobile phone title, mobile phone attribute.
Web question and answer searching system based on product information structure the most according to claim 2, it is characterised in that above-mentioned
Skip-chain CRF model is to develop on the basis of linear conditional random field models, is in conditional random field models
A kind of.
Web question and answer searching system based on product information structure the most according to claim 2, it is characterised in that above-mentioned life
Name entity recognition method in, ignore conjunction " with ", "or" effect in sentence, the company of establishing in Skip-chain CRF model
Contact between former and later two words of word, helps the raising of final precision;For inquiring about the identification model of question sentence name entity extraction, adopt
After training set being learnt with Skip-Chain CRF model, it is thus achieved that for name Entity recognition and the judgement standard of product information
Then, and then by question sentence key word and the product attribute of retrieval meaning it are converted into.
Web question and answer searching system based on product information structure the most according to claim 1, it is characterised in that above-mentioned product
Product information crawler module, for introducing digital product to pconline, bubble on large-scale digital website according to intervals
The webpage of details crawls, and is stored by the webpage crawled, and passes to information extraction module and processes.
Web question and answer searching system based on product information structure the most according to claim 1, it is characterised in that above-mentioned ask
Sentence processing module is for being converted into structurized SQL statement by the natural language question sentence that user inputs, and this module is connect by user
Mouth is set up with user and is connected the natural language question sentence obtaining user's input, and by data base interface and structuring product information number
Set up according to storehouse and connect, use the SQL statement obtained after converting to inquire about in data base, and by user interface by SQL language
The Query Result of sentence feeds back to user.
Web question and answer searching system based on product information structure the most according to claim 1, it is characterised in that for non-
Structuring, the analysis system of semi-structured product information, integrate with multiple source-informations of a product, it is ensured that information
True and perfect;Use sorting algorithm and name entity identification algorithms that natural language question sentence is converted into structural data simultaneously
Library inquiry statement;For the fine granularity sentiment analysis system of product review information, use the calculation of a kind of Case-based Reasoning similarity
The separate sources information of identical product is integrated by method.
Web question and answer searching system based on product information structure the most according to claim 8, it is characterised in that above-mentioned base
Algorithm in case similarity is divided into mapping and merging two steps to carry out the integration of information, uses Case-based Reasoning phase in mapping step
The corresponding element of two tables is carried out Similarity Measure, at combining step according to the result of previous step to two like the algorithm of degree
Table merges;For the fine granularity sentiment analysis system of product review information, first question sentence is classified, then set up
Identifying that the name entity in question sentence is extracted by model, the structure finally according to first two steps uses corresponding rule that this is natural
Language question sentence is converted into SQL statement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310354888.4A CN103440287B (en) | 2013-08-14 | 2013-08-14 | A kind of Web question and answer searching system based on product information structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310354888.4A CN103440287B (en) | 2013-08-14 | 2013-08-14 | A kind of Web question and answer searching system based on product information structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103440287A CN103440287A (en) | 2013-12-11 |
CN103440287B true CN103440287B (en) | 2016-12-28 |
Family
ID=49693979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310354888.4A Active CN103440287B (en) | 2013-08-14 | 2013-08-14 | A kind of Web question and answer searching system based on product information structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103440287B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104102738B (en) * | 2014-07-28 | 2018-04-27 | 百度在线网络技术(北京)有限公司 | A kind of method and device for expanding entity storehouse |
CN105302841A (en) * | 2014-07-31 | 2016-02-03 | 青岛海尔智能家电科技有限公司 | Information integration apparatus, server and method |
US9928269B2 (en) | 2015-01-03 | 2018-03-27 | International Business Machines Corporation | Apply corrections to an ingested corpus |
CN105045909B (en) * | 2015-08-11 | 2018-04-03 | 北京京东尚科信息技术有限公司 | The method and apparatus that trade name is identified from text |
CN105260590A (en) * | 2015-09-16 | 2016-01-20 | 西部天使(北京)健康科技有限公司 | Method and system for combining multiple follow-up visit plans |
CN106919563A (en) * | 2015-12-24 | 2017-07-04 | 神州数码信息***有限公司 | A kind of cross-border issue of government affairs machine question answering system is classified, distributes automatically, the method for response |
CN105786794B (en) * | 2016-02-05 | 2018-09-04 | 青岛理工大学 | A kind of question and answer are to search method and community's question and answer searching system |
CN107741939B (en) * | 2016-10-31 | 2020-05-12 | 腾讯科技(深圳)有限公司 | Webpage information identification method and device |
CN108182295B (en) * | 2018-02-09 | 2021-09-10 | 重庆电信***集成有限公司 | Enterprise knowledge graph attribute extraction method and system |
CN109002501A (en) * | 2018-06-29 | 2018-12-14 | 北京百度网讯科技有限公司 | For handling method, apparatus, electronic equipment and the computer readable storage medium of natural language dialogue |
CN109271459B (en) * | 2018-09-18 | 2021-12-21 | 四川长虹电器股份有限公司 | Chat robot based on Lucene and grammar network and implementation method thereof |
CN111914087B (en) * | 2020-07-30 | 2023-09-19 | 广州城市信息研究所有限公司 | Public opinion analysis method |
CN112507098B (en) * | 2020-12-18 | 2022-01-28 | 北京百度网讯科技有限公司 | Question processing method, question processing device, electronic equipment, storage medium and program product |
CN113987146B (en) * | 2021-10-22 | 2023-01-31 | 国网江苏省电力有限公司镇江供电分公司 | Dedicated intelligent question-answering system of electric power intranet |
CN117132392B (en) * | 2023-10-23 | 2024-01-30 | 蓝色火焰科技成都有限公司 | Vehicle loan fraud risk early warning method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101373532A (en) * | 2008-07-10 | 2009-02-25 | 昆明理工大学 | FAQ Chinese request-answering system implementing method in tourism field |
CN102262634A (en) * | 2010-05-24 | 2011-11-30 | 北京大学深圳研究生院 | Automatic questioning and answering method and system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006252382A (en) * | 2005-03-14 | 2006-09-21 | Fuji Xerox Co Ltd | Question answering system, data retrieval method and computer program |
US20090327234A1 (en) * | 2008-06-27 | 2009-12-31 | Google Inc. | Updating answers with references in forums |
-
2013
- 2013-08-14 CN CN201310354888.4A patent/CN103440287B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101373532A (en) * | 2008-07-10 | 2009-02-25 | 昆明理工大学 | FAQ Chinese request-answering system implementing method in tourism field |
CN102262634A (en) * | 2010-05-24 | 2011-11-30 | 北京大学深圳研究生院 | Automatic questioning and answering method and system |
Non-Patent Citations (3)
Title |
---|
基于产品信息结构化的web问答检索***V1.0;DMIR;《http://dmirlab.com/Achievement.php?id=135》;20110901;第1-16页 * |
基于混合跳链条件随机场的异构Web记录集成方法;黄健斌 等;《Journal of Software 软件学报》;20080831;第19卷(第8期);摘要 * |
汉语情感问题分析和比较类型情感问答方法的研究;黄高辉;《中国优秀硕士学位论文全文数据库 信息科技辑》;20101015;第2150-2153页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103440287A (en) | 2013-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103440287B (en) | A kind of Web question and answer searching system based on product information structure | |
CN109255031B (en) | Data processing method based on knowledge graph | |
CN110990590A (en) | Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning | |
CN110598000A (en) | Relationship extraction and knowledge graph construction method based on deep learning model | |
CN107220237A (en) | A kind of method of business entity's Relation extraction based on convolutional neural networks | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN106055675B (en) | A kind of Relation extraction method based on convolutional neural networks and apart from supervision | |
CN103324700B (en) | Noumenon concept attribute learning method based on Web information | |
WO2020010834A1 (en) | Faq question and answer library generalization method, apparatus, and device | |
Wu et al. | A survey of question answering over knowledge base | |
CN111651447B (en) | Intelligent construction life-span data processing, analyzing and controlling system | |
CN107679035B (en) | Information intention detection method, device, equipment and storage medium | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN111710428B (en) | Biomedical text representation method for modeling global and local context interaction | |
US20210350125A1 (en) | System for searching natural language documents | |
CN111538847A (en) | Ningxia rice knowledge graph construction method | |
CN112328800A (en) | System and method for automatically generating programming specification question answers | |
Konys | Ontology-based approaches to big data analytics | |
CN115080694A (en) | Power industry information analysis method and equipment based on knowledge graph | |
Miao et al. | A dynamic financial knowledge graph based on reinforcement learning and transfer learning | |
CN103678499A (en) | Data mining method based on multi-source heterogeneous patent data semantic integration | |
Xiong et al. | Transferable natural language interface to structured queries aided by adversarial generation | |
CN114780740A (en) | Construction method of tea knowledge graph | |
CN114490964A (en) | Soil fertility knowledge question-answering method, system, equipment and medium based on knowledge map | |
CN113449114A (en) | Method for constructing natural human life cycle holographic image based on knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20190425 Address after: 528000 No. 18 Jiangwan Road, Chancheng District, Foshan, Guangdong. Co-patentee after: Guangdong University of Technology Patentee after: Foshan Science &. Technology College Co-patentee after: BEIMING SOFTWARE CO., LTD. Address before: 510006 Panyu District, Guangzhou, Guangdong, Panyu District, No. 100, West Ring Road, outside the city. Patentee before: Guangdong University of Technology |
|
TR01 | Transfer of patent right |