CN108280218A - A kind of flow system based on retrieval and production mixing question and answer - Google Patents

A kind of flow system based on retrieval and production mixing question and answer Download PDF

Info

Publication number
CN108280218A
CN108280218A CN201810123117.7A CN201810123117A CN108280218A CN 108280218 A CN108280218 A CN 108280218A CN 201810123117 A CN201810123117 A CN 201810123117A CN 108280218 A CN108280218 A CN 108280218A
Authority
CN
China
Prior art keywords
question
retrieval
model
answer
grader
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810123117.7A
Other languages
Chinese (zh)
Inventor
王春辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yyi (beijing) Technology Co Ltd
Original Assignee
Yyi (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yyi (beijing) Technology Co Ltd filed Critical Yyi (beijing) Technology Co Ltd
Priority to CN201810123117.7A priority Critical patent/CN108280218A/en
Publication of CN108280218A publication Critical patent/CN108280218A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of flow systems mixing question and answer based on retrieval and production, including grader, Candidate Set, database and model discrimination.The beneficial effects of the invention are as follows:Grader is capable of the context of perception problems,There can be high accuracy rate to the classification of problem,Grader using based on deep learning disaggregated model and the method that is combined of regular expression identification carry out the classification of problem,Important attribute that can be in extraction problem calls corresponding api to carry out real-time query,Establish that inverted index carries out prototype statement retrieval and the query expansion of synonym is retrieved in the problem of Candidate Set searching system matching process,It can search out and the most similar sentence of problem,It is inaccurate to solve the problems, such as that searching system is searched for,Dialog model is engaged in the dialogue the generation of reply using seq2seq models,And Attention mechanism is added wherein,Simultaneously also BeamSearch mechanism is added in decoding end,The sentence of generation more has logicality and structural,Increase the diversity of reply.

Description

A kind of flow system based on retrieval and production mixing question and answer
Technical field
The present invention relates to a kind of flow system, specially a kind of flow system based on retrieval and production mixing question and answer belongs to In information retrieval processing technology field.
Background technology
In recent years, question and answer robot is since it is widely applied scene and huge commercial value, by more and more Vast science and technology-oriented company and scientific research institution attention, therefore also there are many outstanding products, such as the small ice of Microsoft, The Google assistant of the Siri of apple, ***.Unlike other conditional electronic app, people need not input fixed order Language (such as:" submission ", " purchase "), and can be exchanged with app using human language.
Question answering system is considered as one of the problem of artificial intelligence field is most difficult to all the time.But with recent years Carry out the appearance of various Ask-Answer Communities and social network sites, volatile growth, and the hair of hardware occurs in the quantity for talking with language material So that the calculating power of computer greatly improves, everything all provides new opportunity for the development of question answering system for exhibition.
Question answering system can be divided into based on two kinds of vertical field and Opening field, and Opening field mainly chats class, vertically Field is mainly assistant's class, and the current mainstream technology for establishing dialogue robot is mainly based upon retrieval model and generates model two Kind.
In retrieval model, system can according to the problem of look for from Question-Answer databases and asked with this The semantic most similar question sentence of topic, is then back to the corresponding answer of the question sentence, there are two the main problems of this method:First is Question-Answer is to limited amount in database, it is possible to the answer for the problem of retrieval is proposed less than user.Second Problem is that the problem of Question-Answer is to being fixed, possibly can not be proposed according to user obtains completely corresponding answer Case.
In generating model, conversational system can first understand that the problem of user proposes, then generation word for word, which corresponds to, is somebody's turn to do The answer of problem.The method of mainstream is Seq2Seq models in deep learning at present, which is first compiled question sentence with the ends encoder Code indicates for a vector, then vector expression is decoded as a reply by the ends decoder, and the main problem of the model is It is possible that the answer generated be easy to be general, dull reply (such as:" I does not know ", " good " etc.), such time The information for including again is less, not substantive meaning.
Invention content
The purpose of the present invention is that solve the above-mentioned problems and provides a kind of based on retrieval and production mixing question and answer Flow system.
The present invention is achieved through the following technical solutions above-mentioned purpose:A kind of stream based on retrieval and production mixing question and answer Journey system, including
Grader classifies to a query.
Candidate Set the problem of for failing to classify, is looked for and the immediate problem of the problem, sieve in searching system The candidate sentence subset elected.
Database, for storing various problem question sentences, convenient for looking for and semantic most similar question sentence of asking a question
Model discrimination calls generation system to generate corresponding answer and provides reply.
Wherein, the grader, which will ask a question, is divided into " weather ", " news ", " joke ", " flight/high ferro ", " near (geographical location) " and " other " six type, the Candidate Set are obtained using the own coding model based on Recognition with Recurrent Neural Network The vector expression of each sentence, the Candidate Set carry out problem using BM25 methods of marking and carry out phase with the sentence in database It is calculated like degree, the model discrimination has used the model based on Seq2Seq to carry out building for generation system.
A kind of flow system based on retrieval and production mixing question and answer, mainly includes the following steps that:
Step A has used the disaggregated model based on convolutional neural networks (CNN) and has been based on two methods of regular expression To build query graders.
Step B has selected key-value memory databases redis to carry out inverted index and has taken when establishing searching system Build and stored with question and answer language material, using python realize common retrieval, expanding query, BM25 models the work(such as similarity evaluation Can, and own coding model is had trained come the semantics recognition of sentence when solving the problems, such as retrieval using Tensorflow.
Step C selects Open Framework Tensorflow to engage in the dialogue model to establish dialog generation system, Tensorflow is the artificial intelligence framework platform of *** exploitations, can be used for the multinomial depth such as image and natural language processing Learning areas.
Preferably, in order to have high accuracy rate to the classification of problem, the grader is capable of perception problems Context is combined by the context with problem.
Preferably, real-time query is carried out in order to the important attribute in extraction problem, the grader is used based on deep It spends the disaggregated model of study and regular expression identifies that the method being combined carries out the classification of problem.
Preferably, it is matched to solve the problem of that searching system searches for the inaccurate Candidate Set searching system Inverted index is established in the process carries out prototype statement retrieval and the query expansion retrieval of synonym.
Preferably, there is logicality and structural, in the step C, dialog model makes in order to make the sentence of generation more Engaged in the dialogue the generation of reply with seq2seq models, and adds Attention mechanism wherein, while also being added in decoding end BeamSearch mechanism is entered.
The beneficial effects of the invention are as follows:The flow system reasonable design based on retrieval and production mixing question and answer, grader It is capable of the context of perception problems, is combined by the context with problem, it is high accurate to have to the classification of problem Rate, grader using based on deep learning disaggregated model and the method that is combined of regular expression identification carry out point of problem Class, important attribute that can be in extraction problem call corresponding api to carry out real-time query, have stronger real-time, Candidate Set Establish that inverted index carries out prototype statement retrieval and the query expansion of synonym is retrieved, energy in the problem of searching system matching process Enough search out with the most similar sentence of problem, solve the problems, such as searching system search for it is inaccurate, in step C, dialog model Engaged in the dialogue the generation of reply using seq2seq models, and adds Attention mechanism wherein, while also being decoded End adds BeamSearch mechanism, adds the sentence of Attention mechanism and the Seq2Seq models generation of BeamSearch Son more has logicality and structural, increases the diversity of reply.
Description of the drawings
Fig. 1 is schematic structural view of the invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.
Referring to Fig. 1, a kind of flow system based on retrieval and production mixing question and answer, including
Grader classifies to a query.
Candidate Set the problem of for failing to classify, is looked for and the immediate problem of the problem, sieve in searching system The candidate sentence subset elected.
Database, for storing various problem question sentences, convenient for looking for and semantic most similar question sentence of asking a question
Model discrimination calls generation system to generate corresponding answer and provides reply.
Wherein, the grader, which will ask a question, is divided into " weather ", " news ", " joke ", " flight/high ferro ", " near (geographical location) " and " other " six type, the Candidate Set are obtained using the own coding model based on Recognition with Recurrent Neural Network The vector expression of each sentence, the Candidate Set carry out problem using BM25 methods of marking and carry out phase with the sentence in database It is calculated like degree, the model discrimination has used the model based on Seq2Seq to carry out building for generation system.
A kind of flow system based on retrieval and production mixing question and answer, mainly includes the following steps that:
Step A has used the disaggregated model based on convolutional neural networks (CNN) and has been based on two methods of regular expression To build query graders.
Step B has selected key-value memory databases redis to carry out inverted index and has taken when establishing searching system Build and stored with question and answer language material, using python realize common retrieval, expanding query, BM25 models the work(such as similarity evaluation Can, and own coding model is had trained come the semantics recognition of sentence when solving the problems, such as retrieval using Tensorflow.
Step C selects Open Framework Tensorflow to engage in the dialogue model to establish dialog generation system, Tensorflow is the artificial intelligence framework platform of *** exploitations, can be used for the multinomial depth such as image and natural language processing Learning areas.
The grader is capable of the context of perception problems, is combined by the context with problem, can divide problem There is class high accuracy rate, the grader mutually to be tied with regular expression identification using the disaggregated model based on deep learning The method of conjunction carries out the classification of problem, and important attribute that can be in extraction problem calls corresponding api to carry out real-time query, tool Have and establishes inverted index progress prototype statement retrieval in the problem of stronger real-time, Candidate Set searching system matching process Retrieved with the query expansion of synonym, can search out with the most similar sentence of problem, it is not smart to solve searching system search True problem, in the step C, dialog model is engaged in the dialogue the generation of reply using seq2seq models, and is added wherein Attention mechanism, while also adding BeamSearch mechanism in decoding end, add Attention mechanism and The sentence that the Seq2Seq models of BeamSearch generate more has logicality and structural, increases the diversity of reply.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, nothing By from the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by institute Attached claim rather than above description limit, it is intended that will fall within the meaning and scope of the equivalent requirements of the claims All changes be included within the present invention.Any reference numeral in claim should not be considered as to the involved right of limitation It is required that.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only It contains an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art answer When considering the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms people in the art The other embodiment that member is appreciated that.

Claims (6)

1. a kind of flow system based on retrieval and production mixing question and answer, it is characterised in that:Including
Grader classifies to a query;
Candidate Set, the problem of for failing to classify, searching system look for the immediate problem of the problem, screen Candidate sentence subset;
Database, for storing various problem question sentences, convenient for looking for and semantic most similar question sentence of asking a question;
Model discrimination calls generation system to generate corresponding answer and provides reply;
Wherein, the grader, which will ask a question, is divided into " weather ", " news ", " joke ", " flight/high ferro ", " near " and " its He " six types, the Candidate Set obtains the vector table of each sentence using the own coding model based on Recognition with Recurrent Neural Network It reaches, the Candidate Set carries out problem using BM25 methods of marking and carries out similarity calculation, the model with the sentence in database Model of the Select to use based on Seq2Seq carries out building for generation system.
2. a kind of flow system based on retrieval and production mixing question and answer according to claim 1, which is characterized in that described Flow system includes the following steps:
Step A has used the disaggregated model based on convolutional neural networks and has built query based on two methods of regular expression Grader;
Step B has selected key-value memory databases redis to carry out inverted index and has built and ask when establishing searching system Answer language material storage, using python realize common retrieval, expanding query, BM25 models similarity evaluation function, and use Tensorflow has trained own coding model come the semantics recognition of sentence when solving the problems, such as retrieval;
Step C selects Open Framework Tensorflow to engage in the dialogue model to establish dialog generation system, can be used for image and oneself The right multinomial deep learning field of Language Processing.
3. a kind of flow system based on retrieval and production mixing question and answer according to claim 1, it is characterised in that:It is described Grader is capable of the context of perception problems.
4. a kind of flow system based on retrieval and production mixing question and answer according to claim 1, it is characterised in that:It is described Grader using based on deep learning disaggregated model and the method that is combined of regular expression identification carry out the classification of problem.
5. a kind of flow system based on retrieval and production mixing question and answer according to claim 1, it is characterised in that:It is described Establish that inverted index carries out prototype statement retrieval and the query expansion of synonym is examined in the problem of Candidate Set searching system matching process Rope.
6. a kind of flow system based on retrieval and production mixing question and answer according to claim 2, it is characterised in that:It is described In step C, dialog model is engaged in the dialogue the generation of reply using seq2seq models, and adds Attention machines wherein System, while also BeamSearch mechanism is added in decoding end.
CN201810123117.7A 2018-02-07 2018-02-07 A kind of flow system based on retrieval and production mixing question and answer Pending CN108280218A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810123117.7A CN108280218A (en) 2018-02-07 2018-02-07 A kind of flow system based on retrieval and production mixing question and answer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810123117.7A CN108280218A (en) 2018-02-07 2018-02-07 A kind of flow system based on retrieval and production mixing question and answer

Publications (1)

Publication Number Publication Date
CN108280218A true CN108280218A (en) 2018-07-13

Family

ID=62807935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810123117.7A Pending CN108280218A (en) 2018-02-07 2018-02-07 A kind of flow system based on retrieval and production mixing question and answer

Country Status (1)

Country Link
CN (1) CN108280218A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657126A (en) * 2018-12-17 2019-04-19 北京百度网讯科技有限公司 Answer generation method, device, equipment and medium
CN109657041A (en) * 2018-12-04 2019-04-19 南京理工大学 The problem of based on deep learning automatic generation method
CN109918484A (en) * 2018-12-28 2019-06-21 中国人民大学 Talk with generation method and device
CN110297895A (en) * 2019-05-24 2019-10-01 山东大学 A kind of dialogue method and system based on free text knowledge
CN110362651A (en) * 2019-06-11 2019-10-22 华南师范大学 Dialogue method, system, device and the storage medium that retrieval and generation combine
CN111090664A (en) * 2019-07-18 2020-05-01 重庆大学 High-imitation person multi-mode dialogue method based on neural network
CN111966782A (en) * 2020-06-29 2020-11-20 百度在线网络技术(北京)有限公司 Retrieval method and device for multi-turn conversations, storage medium and electronic equipment
CN113220856A (en) * 2021-05-28 2021-08-06 天津大学 Multi-round dialogue system based on Chinese pre-training model
US20210365810A1 (en) * 2020-05-12 2021-11-25 Bayestree Intelligence Pvt Ltd. Method of automatically assigning a classification

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1928864A (en) * 2006-09-22 2007-03-14 浙江大学 FAQ based Chinese natural language ask and answer method
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN104050256A (en) * 2014-06-13 2014-09-17 西安蒜泥电子科技有限责任公司 Initiative study-based questioning and answering method and questioning and answering system adopting initiative study-based questioning and answering method
CN105824933A (en) * 2016-03-18 2016-08-03 苏州大学 Automatic question answering system based on main statement position and implementation method thereof
CN107463699A (en) * 2017-08-15 2017-12-12 济南浪潮高新科技投资发展有限公司 A kind of method for realizing question and answer robot based on seq2seq models
CN107562792A (en) * 2017-07-31 2018-01-09 同济大学 A kind of question and answer matching process based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1928864A (en) * 2006-09-22 2007-03-14 浙江大学 FAQ based Chinese natural language ask and answer method
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN104050256A (en) * 2014-06-13 2014-09-17 西安蒜泥电子科技有限责任公司 Initiative study-based questioning and answering method and questioning and answering system adopting initiative study-based questioning and answering method
CN105824933A (en) * 2016-03-18 2016-08-03 苏州大学 Automatic question answering system based on main statement position and implementation method thereof
CN107562792A (en) * 2017-07-31 2018-01-09 同济大学 A kind of question and answer matching process based on deep learning
CN107463699A (en) * 2017-08-15 2017-12-12 济南浪潮高新科技投资发展有限公司 A kind of method for realizing question and answer robot based on seq2seq models

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657041A (en) * 2018-12-04 2019-04-19 南京理工大学 The problem of based on deep learning automatic generation method
CN109657041B (en) * 2018-12-04 2023-09-29 南京理工大学 Deep learning-based automatic problem generation method
CN109657126A (en) * 2018-12-17 2019-04-19 北京百度网讯科技有限公司 Answer generation method, device, equipment and medium
CN109657126B (en) * 2018-12-17 2021-03-23 北京百度网讯科技有限公司 Answer generation method, device, equipment and medium
CN109918484A (en) * 2018-12-28 2019-06-21 中国人民大学 Talk with generation method and device
CN109918484B (en) * 2018-12-28 2020-12-15 中国人民大学 Dialog generation method and device
CN110297895A (en) * 2019-05-24 2019-10-01 山东大学 A kind of dialogue method and system based on free text knowledge
CN110297895B (en) * 2019-05-24 2021-09-17 山东大学 Dialogue method and system based on free text knowledge
CN110362651A (en) * 2019-06-11 2019-10-22 华南师范大学 Dialogue method, system, device and the storage medium that retrieval and generation combine
CN111090664A (en) * 2019-07-18 2020-05-01 重庆大学 High-imitation person multi-mode dialogue method based on neural network
US20210365810A1 (en) * 2020-05-12 2021-11-25 Bayestree Intelligence Pvt Ltd. Method of automatically assigning a classification
CN111966782A (en) * 2020-06-29 2020-11-20 百度在线网络技术(北京)有限公司 Retrieval method and device for multi-turn conversations, storage medium and electronic equipment
CN111966782B (en) * 2020-06-29 2023-12-12 百度在线网络技术(北京)有限公司 Multi-round dialogue retrieval method and device, storage medium and electronic equipment
US11947578B2 (en) 2020-06-29 2024-04-02 Baidu Online Network Technology (Beijing) Co., Ltd. Method for retrieving multi-turn dialogue, storage medium, and electronic device
CN113220856A (en) * 2021-05-28 2021-08-06 天津大学 Multi-round dialogue system based on Chinese pre-training model

Similar Documents

Publication Publication Date Title
CN108280218A (en) A kind of flow system based on retrieval and production mixing question and answer
WO2021159632A1 (en) Intelligent questioning and answering method and apparatus, computer device, and computer storage medium
US10649990B2 (en) Linking ontologies to expand supported language
CN104461525B (en) A kind of intelligent consulting platform generation system that can customize
CN110209897B (en) Intelligent dialogue method, device, storage medium and equipment
CN109960786A (en) Chinese Measurement of word similarity based on convergence strategy
CN110096567B (en) QA knowledge base reasoning-based multi-round dialogue reply selection method and system
TW202009749A (en) Human-machine dialog method, device, electronic apparatus and computer readable medium
CN111639252A (en) False news identification method based on news-comment relevance analysis
CN110019729B (en) Intelligent question-answering method, storage medium and terminal
CN111353049A (en) Data updating method and device, electronic equipment and computer readable storage medium
CN112632239A (en) Brain-like question-answering system based on artificial intelligence technology
Dsouza et al. Chat with bots intelligently: A critical review & analysis
CN108364066B (en) Artificial neural network chip and its application method based on N-GRAM and WFST model
CN110377752A (en) A kind of knowledge base system applied to the operation of government affairs hall
CN116932733A (en) Information recommendation method and related device based on large language model
CN112364148A (en) Deep learning method-based generative chat robot
CN116541493A (en) Interactive response method, device, equipment and storage medium based on intention recognition
CN117251552A (en) Dialogue processing method and device based on large language model and electronic equipment
KR20180116103A (en) Continuous conversation method and system by using automating conversation scenario network
CN114330704A (en) Statement generation model updating method and device, computer equipment and storage medium
CN113065324A (en) Text generation method and device based on structured triples and anchor templates
CN111767386A (en) Conversation processing method and device, electronic equipment and computer readable storage medium
CN115378890B (en) Information input method, device, storage medium and computer equipment
CN116957128A (en) Service index prediction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180713