CN106649868B - Question and answer matching process and device - Google Patents

Question and answer matching process and device Download PDF

Info

Publication number
CN106649868B
CN106649868B CN201611271173.2A CN201611271173A CN106649868B CN 106649868 B CN106649868 B CN 106649868B CN 201611271173 A CN201611271173 A CN 201611271173A CN 106649868 B CN106649868 B CN 106649868B
Authority
CN
China
Prior art keywords
question sentence
text
sentence text
answer
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611271173.2A
Other languages
Chinese (zh)
Other versions
CN106649868A (en
Inventor
周建设
袁家政
刘宏哲
刘琴
史金生
刘杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University filed Critical Capital Normal University
Priority to CN201611271173.2A priority Critical patent/CN106649868B/en
Publication of CN106649868A publication Critical patent/CN106649868A/en
Application granted granted Critical
Publication of CN106649868B publication Critical patent/CN106649868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of question and answer matching process and devices, are related to intelligent answer technical field, including a kind of question and answer matching process, comprising: extract the keyword in input question sentence text;According to keyword, object matching question sentence text is determined from library the problem of pre-establishing by the way of index filtering;Based on Lay Weinstein distance algorithm, the highest best match question sentence text of similarity with input question sentence text is determined from object matching question sentence text;According to best match question sentence text, output answer text corresponding with input question sentence text.The present invention can export answer corresponding with input question sentence in a relatively short period of time, can not only shorten question and answer matching duration, but also can promote accuracy rate.

Description

Question and answer matching process and device
Technical field
The present invention relates to intelligent answer technical fields, more particularly, to a kind of question and answer matching process and device.
Background technique
With the development of science and technology, conveniently question answering system also gradually appears in people's daily life, question and answer system System can be according to providing corresponding answer automatically the problem of user, and then realizes human-computer interaction.
Question answering system it is substantially a kind of find to put question to user in existing " problem-answer " set match Question text, and its corresponding answer is presented to the user.The core concept of the system is the question sentence for proposing user and problem The problem of recording in library carries out similarity calculation.The TF-IDF question sentence based on spatial model is mostly used in existing question answering system greatly Similarity calculating method, however, the putd question to sentence of user is mostly shorter in human-computer interaction, and this method is closed when question sentence is shorter The accuracy rate that keyword extracts is not high, and match time is long, after user's proposition problem, needs the long period that can just receive matching Answer, user experience be not high.
The lower and used time longer problem for the matched mode accuracy rate of the above-mentioned question and answer used in the prior art, at present Not yet put forward effective solutions.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of question and answer matching process and device, to alleviate in the prior art The matched mode of question and answer existing for accuracy rate is lower and used time longer problem.
To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:
In a first aspect, the embodiment of the invention provides a kind of question and answer matching process, comprising: extract in input question sentence text Keyword;According to the keyword, object matching question sentence text is determined from library the problem of pre-establishing by the way of index filtering This;Based on Lay Weinstein distance algorithm, determined from object matching question sentence text highest with the similarity of input question sentence text Best match question sentence text;According to the best match question sentence text, output answer text corresponding with input question sentence text.
With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein on Stating the keyword extracted in input question sentence text includes: to segment to input question sentence text, generates word sequence;Remove word sequence In stop words, obtain entry;Using improved comentropy formula, the corresponding weight of each entry is calculated;After improvement Comentropy formula are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttFor entry t appearance Frequency in all text collections, N are the sum of text in text collection;By all entries according to obtaining after calculating The size of weight is ranked up, and obtains weight sequencing table;According to pre-set withdrawal ratio, extracts and close from weight sequencing table Keyword.
With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein on It states according to keyword, determines that object matching question sentence text includes: from library the problem of pre-establishing by the way of index filtering Predetermined keyword and default question sentence text according to the keyword in input question sentence text, and the problem of pre-establish in library it Between index relative, obtain default question sentence text matching value corresponding with input question sentence text;Matching value is greater than preset matching The default question sentence text of threshold value is determined as object matching question sentence text.
The possible embodiment of second with reference to first aspect, the embodiment of the invention provides the third of first aspect Possible embodiment, wherein the above-mentioned keyword according in input question sentence text, and the problem of pre-establish it is pre- in library If the index relative between keyword and default question sentence text, the matching corresponding with question sentence text is inputted of default question sentence text is obtained Value include: using the problem of pre-establishing in library with input question sentence text in the identical predetermined keyword of keyword as match pass Keyword;It is default in Traversal Problem library according to the predetermined keyword in problem base and the index relative between default question sentence text Question sentence text, to determine the number for the matching keywords for including in default question sentence text;That will include in default question sentence text Number with keyword is as default question sentence text matching value corresponding with input question sentence text.
Second with reference to first aspect or the third possible embodiment, the embodiment of the invention provides first aspects The 4th kind of possible embodiment, wherein the foundation in above problem library includes: to preset default question sentence text, Yi Jiyu The default corresponding model answer text of question sentence text, and default question sentence text and model answer text are stored in problem base; Number-mark is established for each default question sentence text;Extract the corresponding predetermined keyword of each default question sentence text;It establishes default Index relative between keyword and default question sentence text;Wherein, in index relative, predetermined keyword with include default key The number-mark that the one or more of word presets question sentence text is corresponding.
With reference to first aspect, the embodiment of the invention provides the 5th kind of possible embodiments of first aspect, wherein on It states according to best match question sentence text, output answer text corresponding with input question sentence text includes: to judge best match question sentence Whether the similarity of text reaches default similarity threshold;If so, it is corresponding to search best match question sentence text from problem base Model answer text, using model answer text as the corresponding answer text output of input question sentence text;If not, from interconnection Net searches the corresponding network answers text of input question sentence text, using network answers text as the corresponding answer of input question sentence text Text output.
Second aspect, the embodiment of the present invention also provide a kind of question and answer coalignment, comprising: extraction module, it is defeated for extracting Enter the keyword in question sentence text;First determining module, for according to keyword, from pre-establishing by the way of index filtering The problem of library in determine object matching question sentence text;Second determining module, for being based on Lay Weinstein distance algorithm, from target With the highest best match question sentence text of similarity determined in question sentence text with input question sentence text;Answer output module is used According to best match question sentence text, output answer text corresponding with input question sentence text.
In conjunction with second aspect, the embodiment of the invention provides the first possible embodiments of second aspect, wherein on Stating extraction module includes: participle unit, for segmenting to input question sentence text, generates word sequence;Stop words removal unit, For removing the stop words in word sequence, entry is obtained;Weight calculation unit, for utilizing improved comentropy formula, meter Calculation obtains the corresponding weight of each entry;Improved comentropy formula are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttFor entry t appearance Frequency in all text collections, N are the sum of text in text collection;Sequencing unit, for pressing all entries It is ranked up according to the size of the weight obtained after calculating, obtains weight sequencing table;Keyword extracting unit is set in advance for basis The withdrawal ratio set extracts keyword from weight sequencing table.
In conjunction with second aspect, the embodiment of the invention provides second of possible embodiments of second aspect, wherein on Stating the first determining module includes: matching value acquiring unit, for and pre-establishing according to the keyword in input question sentence text The problem of library in predetermined keyword and default question sentence text between index relative, obtain default question sentence text and input question sentence The corresponding matching value of text;First determination unit, the default question sentence text for matching value to be greater than to preset matching threshold value determine For object matching question sentence text.
In conjunction with second aspect, the embodiment of the invention provides the third possible embodiments of second aspect, wherein on Stating answer output module includes: judging unit, for judge the similarity of best match question sentence text whether reach preset it is similar Spend threshold value;Model answer output unit, for judging that the similarity of best match question sentence text reaches default similarity threshold When, the corresponding model answer text of best match question sentence text is searched from problem base, is asked model answer text as input The corresponding answer text output of sentence text;Network answers output unit, in the similarity for judging best match question sentence text When not up to presetting similarity threshold, the corresponding network answers text of input question sentence text is searched from internet, by network answers Text is as the corresponding answer text output of input question sentence text.
The embodiment of the invention provides a kind of question and answer matching process and devices, are extracting the keyword in input question sentence text Afterwards, index filtering by way of from problem base determine object matching question sentence text, with reduce in problem base with input question sentence The question sentence range that text matches, then it is determining highest most with the similarity of input question sentence text based on Lay Weinstein distance algorithm Good matching question sentence text, finally output answer text corresponding with input question sentence text.With the question and answer used in the prior art The mode accuracy rate matched is lower and used time longer problem is compared, and method and device provided in an embodiment of the present invention can be shorter Time in corresponding with the question sentence answer of output, can not only shorten question and answer and match duration, but also accuracy rate can be promoted.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention are in specification, claims And specifically noted structure is achieved and obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 shows a kind of question and answer matching process flow chart provided by the embodiment of the present invention;
Fig. 2 shows a kind of specific flow charts of question and answer matching process provided by the embodiment of the present invention;
Fig. 3 shows a kind of method for building up flow chart of problem base provided by the embodiment of the present invention;
Fig. 4 shows a kind of structural block diagram of question and answer coalignment provided by the embodiment of the present invention;
Fig. 5 shows a kind of specific block diagram of question and answer coalignment provided by the embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Human-computer interaction gradually incorporates people's lives at present, from the equipment of primary response or can answer aiming at the problem that user It is commonplace with software, realize that question and answer are matched by recording the question answering system for thering is " problem-answer " to gather substantially;So And question and answer matching way in the prior art mostly uses greatly the TF-IDF Question sentence parsing calculation method based on spatial model to obtain Family is taken, the accuracy rate of which is lower and the used time is longer, is based on this, a kind of question and answer matching process provided in an embodiment of the present invention And device, the matched accuracy rate of question and answer can be improved, while shortening matching duration.It is situated between in detail to the embodiment of the present invention below It continues.
Embodiment one:
A kind of question and answer matching process flow chart shown in Figure 1, comprising the following steps:
Step S102 extracts the keyword in input question sentence text;The input question sentence text is that user passes through human-computer interaction The question sentence text that mode inputs;When user uses voice input mode, then need to be converted to the phonetic problem of user into text text This, then using the writing text as input question sentence text;
Step S104 determines target by the way of index filtering according to keyword from library the problem of pre-establishing With question sentence text;The object matching question sentence text includes multiple texts, it is therefore intended that can reduce in advance in problem base with user The pre-set text range that matches of input question sentence text, be conducive to promote subsequent question and answer matching speed;
Step S106 is based on Lay Weinstein distance algorithm, determines from object matching question sentence text and input question sentence text The highest best match question sentence text of similarity;Lay Weinstein distance algorithm are as follows: grasped by editors such as insertion, deletion, replacements Make, calculates from a character string and be transformed into the editor's number of minimum required for another character string, to measure two character strings Between similarity;Based on the algorithm, can fast and accurately be found from the object matching question sentence text screened in advance with The highest matching question sentence of similarity for inputting question sentence text, using the matching question sentence as best match question sentence text;
Step S108, according to best match question sentence text, output answer text corresponding with input question sentence text.
In the above method of the present embodiment, after extracting the keyword in input question sentence text, pass through the side of index filtering Formula determines object matching question sentence text from problem base, to reduce the question sentence model to match in problem base with input question sentence text It encloses, then the highest best match question sentence text of similarity with input question sentence text is determined based on Lay Weinstein distance algorithm, most Output answer text corresponding with input question sentence text afterwards.This method can export answer corresponding with question sentence in a relatively short period of time Case can not only shorten question and answer matching duration, but also can promote accuracy rate.
Specifically, in the prior art mostly using the TF-IDF Question sentence parsing calculation method based on spatial model, This method is primarily adapted for use in the similarity for calculating longer sentence or document, and the accuracy rate of keyword extraction is carried out for short question sentence It is not high;But the question sentence that user is mentioned in human-computer interaction is usually shorter, therefore spatial model is based on used by the prior art TF-IDF Question sentence parsing calculation method cannot preferably reach the expected of user and answer;In addition, the TF- based on spatial model IDF Question sentence parsing calculation method also needs to establish vector space model, and process is complex and the used time is longer, thus finally from It is longer to find the answer time to match with the input question sentence of user in problem base (or question answering system), in conjunction with speech recognition with An important factor for particularity of man-machine answer, question and answer matching speed is also association user Experience Degree, in conclusion the prior art causes Keep user experience not high, and the process that the above method provided in an embodiment of the present invention obtains input question sentence text is simple, matching Used time is shorter, and is not limited by question sentence length, is suitable for short sentence, can effectively improve the matched accuracy rate of question and answer, gives user Bring good experience.
In order to facilitate understanding with implementation, reference can be made to a kind of specific flow chart of question and answer matching process shown in Fig. 2, including with Lower step:
Step S202 segments input question sentence text, generates word sequence;It is one that question sentence text dividing, which will be inputted, Input question sentence text after cutting can be known as word sequence by one individual word;
Step S204 removes the stop words in word sequence, obtains entry;To save memory space and improving search efficiency, Search engine can ignore certain words or word in index pages or processing searching request automatically, these words or word, which are referred to as, to be deactivated Word, such as auxiliary words of mood etc. usually itself have no the word of meaning, can remove word according to the deactivated vocabulary pre-established Stop words in sequence.
The corresponding weight of each entry is calculated using improved comentropy formula in step S206;Wherein, it improves Comentropy formula afterwards are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttFor entry t appearance Frequency in all text collections, N are the sum of text in text collection;
The corresponding weight of each entry is calculated by above-mentioned improved comentropy formula, is facilitated subsequent based on each The corresponding weight of entry differentiates keyword, can preferably promote the accuracy rate for extracting keyword, and use comentropy formula Calculating process it is relatively simple, the used time for obtaining result is shorter, helps to improve question and answer matching speed.
All entries are ranked up according to the size of the weight obtained after calculating, obtain weight sequencing table by step S208; It can sort, can also sort from small to large from large to small, according to the actual situation flexibly setting.
Step S210 extracts keyword from weight sequencing table according to pre-set withdrawal ratio;For example, setting mentions Taking ratio is 30 percent, then highest preceding 30 percent keyword of weight, such as weight are extracted from weight sequencing table Sequencing table is that ranking, total record have 100 keywords, then extract preceding 30 keywords from large to small according to weight.This mode It can effectively reduce the scope, help to promote subsequent question and answer matching efficiency.
In order to make it easy to understand, the embodiment of the invention provides the specific example of applying step S202 to step S210 a kind of, For example, input question sentence text is " Chinese four great classical masterpieces ", the word sequence of " China/tetra-/big/masterpiece " is obtained after participle, so After remove stop words, and the weight of each entry is calculated using average information entropy (i.e. above-mentioned improved comentropy formula), most Obtaining keyword eventually is { China, masterpiece }.
Step S212, according to the keyword in input question sentence text, and default key the problem of pre-establish in library Index relative between word and default question sentence text obtains default question sentence text matching value corresponding with question sentence text is inputted.
Following present a kind of concrete implementation modes:
(1) using the problem of pre-establishing in library with the identical predetermined keyword of keyword in input question sentence text as With keyword;
(2) according to the predetermined keyword in problem base and the index relative between default question sentence text, in Traversal Problem library Default question sentence text, to determine the number for the matching keywords for including in default question sentence text;It will be wrapped in default question sentence text The number of the matching keywords contained is as default question sentence text matching value corresponding with input question sentence text.
In addition, in order to make it easy to understand, the present embodiment gives a kind of example using above-mentioned implementation: assuming that input Question sentence text has m keyword, then can be used and is initialized as 0, length is the one-dimension array of N to record each text in problem base The number k value for the designated key word for including, the index chain for m keyword for then including in traversal input question sentence are every to occur one The corresponding position of array is just added 1 by a text, after the completion of traversal, just obtains the k value of full text, which is matching value.
The default question sentence text that matching value is greater than preset matching threshold value is determined as object matching question sentence text by step S214 This;
Default question sentence text is measured by above-mentioned matching value and inputs the similarity between question sentence text, it is as a result more quasi- Really reliably, and according to matching value the default question sentence text in problem base is screened in advance, can effectively reduce energy in problem base Enough question sentence ranges with input question matching, facilitate the efficiency for promoting subsequent determining matched text, shorten match time.
Step S216 is based on Lay Weinstein distance algorithm, determines from object matching question sentence text and input question sentence text The highest best match question sentence text of similarity;It, can be fast and accurately from the object matching screened in advance based on the algorithm Found in question sentence text with input question sentence text similarity it is highest match question sentence (same keyword for including is most), Using the matching question sentence as best match question sentence text.
Step S218, judges whether the similarity of best match question sentence text reaches default similarity threshold;If so, holding Row step S220;If not, executing step S222;After determining best match question sentence text in problem base, this step can be with Finally examine the best text if appropriate for as matching result, without as the prior art finally find it is most suitable Answer is blindly exported after the matching result of conjunction, causes to give an irrelevant answer, causes user experience not high.
Step S220 searches the corresponding model answer text of best match question sentence text, by model answer from problem base Text is as the corresponding answer text output of input question sentence text;Wherein, each default question sentence text is previously stored in problem base Sheet and corresponding model answer text.
Step S222 searches the corresponding network answers text of input question sentence text from internet, network answers text is made For the corresponding answer text output of input question sentence text.It can be directly defeated by the input question sentence of user by modes such as rustling sound engines Enter into internet with Network Search answer text, when not finding the text to match with user's question sentence in problem base, Meet user demand by network answers text, promotes user experience.
Wherein, it is step S102 in Fig. 1 that the step S202 in Fig. 2 is corresponding to step S210;Step S212 in Fig. 2 Corresponding with step S214 is the step S104 in Fig. 1;Step S216 in Fig. 2 is corresponding with the step S106 in Fig. 1;Fig. 2 In step S218 it is corresponding to step S222 be step S108 in Fig. 1.
By executing the above-mentioned steps in Fig. 2, can fast and accurately obtain corresponding with the input question sentence text of user Answer text, and then promoted user experience.
Further, a kind of establishment process of problem base is given in the present embodiment, specifically, shown in Figure 3 A kind of method for building up flow chart of problem base, the foundation of problem base are referred to following step:
Step S302 presets default question sentence text, and model answer text corresponding with default question sentence text, and Default question sentence text and model answer text are stored in problem base;
Step S304 establishes number-mark for each default question sentence text;
Step S306 extracts the corresponding predetermined keyword of each default question sentence text;Wherein, the tool of predetermined keyword is extracted Body implementation is referred to the step S202 in Fig. 2 to step S210.
Step S308 establishes the index relative between predetermined keyword and default question sentence text;Wherein, in index relative In, predetermined keyword is corresponding with the default number-mark of question sentence text of the one or more comprising predetermined keyword.
Problem base provided by the embodiment of the present invention, not just for the conjunction of the question answering system " problem-answer " of the prior art Collection, but also profound processing has been carried out to the intersection of " problem-answer ", such as keyword is extracted in advance to each question sentence, and Keyword and the question sentence comprising the keyword are established into index, and facilitate to reduce memory space by way of number, Search speed is improved simultaneously, further shortens the used time for applying the problem library lookup text in question and answer matching process.
In conclusion above-mentioned question and answer matching process provided in an embodiment of the present invention, can export in a relatively short period of time with The corresponding answer of input question sentence of user can achieve and export answer in 1s, preferably shorten question and answer matching duration, but also Improve accuracy rate, comprehensive the user experience is improved degree.
Embodiment two:
For question and answer matching process provided in embodiment one, the embodiment of the invention provides a kind of matchings of question and answer to fill It sets, shown in Figure 4, which comprises the following modules:
Extraction module 402, for extracting the keyword in input question sentence text;
First determining module 404 is used for according to keyword, by the way of index filtering from library the problem of pre-establishing Determine object matching question sentence text;
Second determining module 406, for be based on Lay Weinstein distance algorithm, from object matching question sentence text determine with it is defeated Enter the highest best match question sentence text of similarity of question sentence text;
Answer output module 408, for according to best match question sentence text, output answer corresponding with input question sentence text Text.
In the above-mentioned apparatus of the present embodiment, after the keyword that input question sentence text is extracted by extraction module 402, by the One determining module 404 determines object matching question sentence text by the way of index filtering from problem base, to reduce in problem base The question sentence range to match with input question sentence text, then determined by the second determining module 406 based on Lay Weinstein distance algorithm With the highest best match question sentence text of similarity of input question sentence text, is finally exported and inputted by answer output module 408 The corresponding answer text of question sentence text.The device can export answer corresponding with question sentence in a relatively short period of time, can both shorten Question and answer match duration, and can promote accuracy rate.
In order to facilitate understanding with implementation, on the basis of fig. 4, reference can be made to a kind of tool of question and answer coalignment shown in fig. 5 Body structural block diagram, in which:
Extraction module 402 includes: participle unit 4021, for segmenting to input question sentence text, generates word sequence;Stop Word removal unit 4022 obtains entry for removing the stop words in word sequence;Weight calculation unit 4023, for utilizing The corresponding weight of each entry is calculated in improved comentropy formula;Improved comentropy formula are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttFor entry t appearance Frequency in all text collections, N are the sum of text in text collection;
It further include sequencing unit 4024, for all entries to be ranked up according to the size of the weight obtained after calculating, Obtain weight sequencing table;Keyword extracting unit 4025, for being mentioned from weight sequencing table according to pre-set withdrawal ratio Take keyword.
First determining module 404 includes: matching value acquiring unit 4041, for according to the key in input question sentence text Word, and predetermined keyword the problem of pre-establish in library and the index relative between default question sentence text, obtain pre- rhetoric question Sentence text matching value corresponding with input question sentence text;Specifically, matching value acquiring unit 4041 may include matching keywords Determine subelement, default key identical with the keyword in the input question sentence text in library the problem of for that will pre-establish Word is as matching keywords;And matching value determines subelement, for according to predetermined keyword in described problem library and default Index relative between question sentence text traverses the default question sentence text in described problem library, with the determination default question sentence text In include the matching keywords number;The number for the matching keywords for including in the default question sentence text is made For default question sentence text matching value corresponding with the input question sentence text.The above subelement is not shown in FIG. 5.
First determining module 404 further includes the first determination unit 4042, for matching value to be greater than preset matching threshold value Default question sentence text is determined as object matching question sentence text.
Answer output module 408 includes: judging unit 4081, for judge best match question sentence text similarity whether Reach default similarity threshold;Model answer output unit 4082, for judging that the similarity of best match question sentence text reaches When to default similarity threshold, the corresponding model answer text of best match question sentence text is searched from problem base, standard is answered Case text is as the corresponding answer text output of input question sentence text;Network answers output unit 4083, for best in judgement When the similarity of matching question sentence text not up to presets similarity threshold, the corresponding network of input question sentence text is searched from internet Answer text, using network answers text as the corresponding answer text output of input question sentence text.
The technical effect of device provided by the present embodiment, realization principle and generation is identical with previous embodiment, for letter It describes, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
In conclusion question and answer matching process provided in an embodiment of the present invention and device, in extracting input question sentence text After keyword, index filtering by way of from problem base determine object matching question sentence text, with reduce in problem base with it is defeated Enter the question sentence range that question sentence text matches, then determines based on Lay Weinstein distance algorithm and input the similarity of question sentence text most High best match question sentence text, finally output answer text corresponding with input question sentence text.With in the prior art use The matched mode accuracy rate of question and answer is lower and used time longer problem is compared, and method and device provided in an embodiment of the present invention can be with Answer corresponding with question sentence is exported in a relatively short period of time, can not only shorten question and answer matching duration, but also can promote accuracy rate.
The computer program product of question and answer matching process and device provided by the embodiment of the present invention, including store program The computer readable storage medium of code, the instruction that said program code includes can be used for executing described in previous methods embodiment Method, specific implementation can be found in embodiment of the method, details are not described herein.
In addition, in the description of the embodiment of the present invention unless specifically defined or limited otherwise, term " installation ", " phase Even ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can To be mechanical connection, it is also possible to be electrically connected;It can be directly connected, can also can be indirectly connected through an intermediary Connection inside two elements.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete condition Concrete meaning in invention.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical", The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation, It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ", " third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. a kind of question and answer matching process characterized by comprising
Extract the keyword in input question sentence text;
According to the keyword, object matching question sentence text is determined from library the problem of pre-establishing by the way of index filtering This;
Based on Lay Weinstein distance algorithm, determined from the object matching question sentence text similar to the input question sentence text Spend highest best match question sentence text;
According to the best match question sentence text, answer text corresponding with the input question sentence text is exported;
Extracting the keyword inputted in question sentence text includes:
Input question sentence text is segmented, word sequence is generated;
The stop words in the word sequence is removed, entry is obtained;
Using improved comentropy formula, the corresponding weight of each entry is calculated;The improved comentropy formula Are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttInstitute is appeared in for entry t Frequency in some text collections, N are the sum of text in text collection;
All entries are ranked up according to the size of the weight obtained after calculating, obtain weight sequencing table;
According to pre-set withdrawal ratio, keyword is extracted from the weight sequencing table.
2. the method according to claim 1, wherein according to the keyword, by the way of index filtering from Determine that object matching question sentence text includes: in the problem of pre-establishing library
It predetermined keyword according to the keyword in the input question sentence text, and the problem of pre-establish in library and pre- puts up a question Index relative between sentence text, obtains default question sentence text matching value corresponding with the input question sentence text;
The default question sentence text that the matching value is greater than preset matching threshold value is determined as object matching question sentence text.
3. according to the method described in claim 2, it is characterized in that, according to it is described input question sentence text in keyword, and The index relative between predetermined keyword and default question sentence text in the problem of pre-establishing library obtains the default question sentence text Originally matching value corresponding with the input question sentence text includes:
Using the problem of pre-establishing in library with the identical predetermined keyword of keyword in the input question sentence text as matching Keyword;
According to the predetermined keyword in described problem library and the index relative between default question sentence text, traverse in described problem library Default question sentence text, with the number for the matching keywords for including in the determination default question sentence text;It will be described default The number for the matching keywords for including in question sentence text is as the default question sentence text and the input question sentence text pair The matching value answered.
4. according to the method in claim 2 or 3, which is characterized in that the foundation in described problem library includes:
Preset default question sentence text, and model answer text corresponding with the default question sentence text, and will be described pre- It puts up a question sentence text and the model answer text is stored in described problem library;
Number-mark is established for each default question sentence text;
Extract the corresponding predetermined keyword of each default question sentence text;
Establish the index relative between the predetermined keyword and the default question sentence text;Wherein, in the index relative, The predetermined keyword is corresponding with the default number-mark of question sentence text of the one or more comprising the predetermined keyword.
5. the method according to claim 1, wherein being exported and the input according to best match question sentence text The corresponding answer text of question sentence text includes:
Judge whether the similarity of the best match question sentence text reaches default similarity threshold;
If so, the corresponding model answer text of the best match question sentence text is searched from described problem library, by the mark Quasi- answer text is as the corresponding answer text output of the input question sentence text;
If not, the corresponding network answers text of the input question sentence text is searched from internet, by the network answers text As the corresponding answer text output of the input question sentence text.
6. a kind of question and answer coalignment characterized by comprising
Extraction module, for extracting the keyword in input question sentence text;
First determining module is used for according to the keyword, by the way of index filtering from library the problem of pre-establishing really It sets the goal and matches question sentence text;
Second determining module, for be based on Lay Weinstein distance algorithm, from the object matching question sentence text determine with it is described Input the highest best match question sentence text of similarity of question sentence text;
Answer output module, for exporting answer corresponding with the input question sentence text according to the best match question sentence text Case text;
The extraction module includes:
Participle unit generates word sequence for segmenting to input question sentence text;
Stop words removal unit obtains entry for removing the stop words in the word sequence;
The corresponding weight of each entry is calculated for utilizing improved comentropy formula in weight calculation unit;It is described to change Comentropy formula after are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttInstitute is appeared in for entry t Frequency in some text collections, N are the sum of text in text collection;
Sequencing unit obtains weight row for being ranked up all entries according to the size of the weight obtained after calculating Sequence table;
Keyword extracting unit, for extracting keyword from the weight sequencing table according to pre-set withdrawal ratio.
7. device according to claim 6, which is characterized in that first determining module includes:
Matching value acquiring unit, for according to the keyword in the input question sentence text, and the problem of pre-establish in library Predetermined keyword and default question sentence text between index relative, obtain the default question sentence text and the input question sentence be literary This corresponding matching value;
First determination unit, the default question sentence text for the matching value to be greater than preset matching threshold value are determined as target Match question sentence text.
8. device according to claim 6, which is characterized in that the answer output module includes:
Judging unit, for judging whether the similarity of the best match question sentence text reaches default similarity threshold;
Model answer output unit, for judging that the similarity of the best match question sentence text reaches default similarity threshold When, the corresponding model answer text of the best match question sentence text is searched from described problem library, by the model answer text This is as the corresponding answer text output of the input question sentence text;
Network answers output unit, in the not up to default similarity threshold of the similarity for judging the best match question sentence text When value, the corresponding network answers text of the input question sentence text is searched from internet, using the network answers text as institute State the corresponding answer text output of input question sentence text.
CN201611271173.2A 2016-12-30 2016-12-30 Question and answer matching process and device Active CN106649868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611271173.2A CN106649868B (en) 2016-12-30 2016-12-30 Question and answer matching process and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611271173.2A CN106649868B (en) 2016-12-30 2016-12-30 Question and answer matching process and device

Publications (2)

Publication Number Publication Date
CN106649868A CN106649868A (en) 2017-05-10
CN106649868B true CN106649868B (en) 2019-03-26

Family

ID=58839104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611271173.2A Active CN106649868B (en) 2016-12-30 2016-12-30 Question and answer matching process and device

Country Status (1)

Country Link
CN (1) CN106649868B (en)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273350A (en) * 2017-05-16 2017-10-20 广东电网有限责任公司江门供电局 A kind of information processing method and its device for realizing intelligent answer
CN107590192B (en) * 2017-08-11 2023-05-05 深圳市腾讯计算机***有限公司 Mathematical processing method, device, equipment and storage medium for text questions
CN107862058B (en) * 2017-11-10 2021-10-22 北京百度网讯科技有限公司 Method and apparatus for generating information
CN110110049A (en) * 2017-12-29 2019-08-09 深圳市优必选科技有限公司 Service consultation method, device, system, service robot and storage medium
CN108345644A (en) * 2018-01-15 2018-07-31 阿里巴巴集团控股有限公司 A kind of method and device of data processing
CN108509482B (en) * 2018-01-23 2020-12-08 深圳市阿西莫夫科技有限公司 Question classification method and device, computer equipment and storage medium
CN108415980A (en) * 2018-02-09 2018-08-17 平安科技(深圳)有限公司 Question and answer data processing method, electronic device and storage medium
CN110555093B (en) * 2018-03-30 2024-02-13 华为技术有限公司 Text matching method, device and equipment
CN108595619A (en) * 2018-04-23 2018-09-28 海信集团有限公司 A kind of answering method and equipment
CN108595629B (en) * 2018-04-24 2021-08-06 北京慧闻科技发展有限公司 Data processing method and application for answer selection system
CN110597966A (en) * 2018-05-23 2019-12-20 北京国双科技有限公司 Automatic question answering method and device
CN108897771B (en) * 2018-05-30 2021-03-12 东软集团股份有限公司 Automatic question answering method and device, computer readable storage medium and electronic equipment
CN108763529A (en) * 2018-05-31 2018-11-06 苏州大学 A kind of intelligent search method, device and computer readable storage medium
CN109190115B (en) * 2018-08-14 2023-05-26 重庆邂智科技有限公司 Text matching method, device, server and storage medium
CN109582966A (en) * 2018-12-03 2019-04-05 北京容联易通信息技术有限公司 A kind of information matching method and device
CN109597994B (en) * 2018-12-04 2023-06-06 挖财网络技术有限公司 Short text problem semantic matching method and system
CN109800416A (en) * 2018-12-14 2019-05-24 天津大学 A kind of power equipment title recognition methods
CN109684442B (en) * 2018-12-21 2021-03-23 东软集团股份有限公司 Text retrieval method, device, equipment and program product
WO2020133360A1 (en) * 2018-12-29 2020-07-02 深圳市优必选科技有限公司 Question text matching method and apparatus, computer device and storage medium
CN109783516A (en) * 2019-02-19 2019-05-21 北京奇艺世纪科技有限公司 A kind of query statement retrieval answering method and device
CN111611356B (en) * 2019-02-25 2023-06-16 北京嘀嘀无限科技发展有限公司 Information searching method, device, electronic equipment and readable storage medium
CN111858863B (en) * 2019-04-29 2023-07-14 深圳市优必选科技有限公司 Reply recommendation method, reply recommendation device and electronic equipment
CN110737751B (en) * 2019-09-06 2023-10-20 平安科技(深圳)有限公司 Search method and device based on similarity value, computer equipment and storage medium
CN111782776A (en) * 2019-09-26 2020-10-16 北京沃东天骏信息技术有限公司 Method and device for realizing intention identification through slot filling
CN110727764A (en) * 2019-10-10 2020-01-24 珠海格力电器股份有限公司 Phone operation generation method and device and phone operation generation equipment
CN111241378A (en) * 2020-01-07 2020-06-05 郇延强 Teaching information query method and device
CN113807148B (en) * 2020-06-16 2024-07-02 阿里巴巴集团控股有限公司 Text recognition matching method and device and terminal equipment
CN111858891A (en) * 2020-07-23 2020-10-30 平安科技(深圳)有限公司 Question-answer library construction method and device, electronic equipment and storage medium
CN111984763B (en) * 2020-08-28 2023-09-19 海信电子科技(武汉)有限公司 Question answering processing method and intelligent device
CN112905760B (en) * 2021-02-02 2023-01-13 天津弈博益商信息科技有限公司 Instant messaging intelligent question-answering, quality testing and anti-cheating system
CN114936272A (en) * 2021-04-27 2022-08-23 华为技术有限公司 Question answering method and system
CN116244418B (en) * 2023-05-11 2023-09-01 腾讯科技(深圳)有限公司 Question answering method, device, electronic equipment and computer readable storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101630312A (en) * 2009-08-19 2010-01-20 腾讯科技(深圳)有限公司 Clustering method for question sentences in question-and-answer platform and system thereof
CN102193929B (en) * 2010-03-08 2013-03-13 阿里巴巴集团控股有限公司 Method and equipment for searching by using word information entropy
US9299024B2 (en) * 2012-12-11 2016-03-29 International Business Machines Corporation Method of answering questions and scoring answers using structured knowledge mined from a corpus of data
US20140229580A1 (en) * 2013-02-12 2014-08-14 Sony Corporation Information processing device, information processing method, and information processing system
CN105989040B (en) * 2015-02-03 2021-02-09 创新先进技术有限公司 Intelligent question and answer method, device and system
CN105955976B (en) * 2016-04-15 2019-05-14 中国工商银行股份有限公司 A kind of automatic answering system and method
CN105975460A (en) * 2016-05-30 2016-09-28 上海智臻智能网络科技股份有限公司 Question information processing method and device

Also Published As

Publication number Publication date
CN106649868A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106649868B (en) Question and answer matching process and device
CN107993724B (en) Medical intelligent question and answer data processing method and device
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN103365924B (en) A kind of method of internet information search, device and terminal
CN104281702B (en) Data retrieval method and device based on electric power critical word participle
US20150074112A1 (en) Multimedia Question Answering System and Method
CN113112164A (en) Transformer fault diagnosis method and device based on knowledge graph and electronic equipment
CN103324700B (en) Noumenon concept attribute learning method based on Web information
CN108062304A (en) A kind of sentiment analysis method of the comment on commodity data based on machine learning
CN105159938B (en) Search method and device
CN104392006B (en) A kind of event query processing method and processing device
CA2612513A1 (en) Speech recognition training method for audio and video files indexing on a search engine
CN103077713B (en) A kind of method of speech processing and device
CN113268569B (en) Semantic-based related word searching method and device, electronic equipment and storage medium
CN105677795B (en) Recommended method, recommendation apparatus and the recommender system of abstract semantics
WO2014054052A2 (en) Context based co-operative learning system and method for representing thematic relationships
CN112256861B (en) Rumor detection method based on search engine return result and electronic device
CN108664599A (en) Intelligent answer method, apparatus, intelligent answer server and storage medium
CN111782800B (en) Intelligent conference analysis method for event tracing
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN109147793A (en) The processing method of voice data, apparatus and system
CN112836029A (en) Graph-based document retrieval method, system and related components thereof
CN110413997B (en) New word discovery method, system and readable storage medium for power industry
CN115905487A (en) Document question and answer method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant