CN112966081B

CN112966081B - Method, device, equipment and storage medium for processing question and answer information

Info

Publication number: CN112966081B
Application number: CN202110249012.8A
Authority: CN
Inventors: 庞海龙; 詹俊峰; 张文君; 薛璐影; 施鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2024-03-08
Anticipated expiration: 2041-03-05
Also published as: CN112966081A

Abstract

The disclosure discloses a method, a device, electronic equipment, a storage medium and a program product for processing question and answer information, relates to the technical field of computers, and particularly relates to the technical field of question and answer. The specific implementation scheme is as follows: obtaining at least one high-quality question-answer pair, wherein each high-quality question-answer pair comprises a target question and a high-quality answer matched with the target question; and for each target question and premium answer of the pair of premium questions, performing the following operations: selecting at least one similar question meeting a preset similar condition with the target question from the question library, and forming a to-be-verified question-answer pair by the high-quality answer and each similar question; determining whether each question-answer pair to be verified passes verification or not by using a question-answer correlation model; and associating similar questions in the question-answer pair determined to pass the verification with the premium answers to provide premium answers in response to the request for the similar questions.

Description

Method, device, equipment and storage medium for processing question and answer information

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of question answering technologies.

Background

The knowledge question-answering platform is an interactive and open platform for providing knowledge demands and knowledge supplies for the masses. The mode of the platform mainly comprises that users raise questions according to own requirements, and other users give answers. With the contribution of users, the platform accumulates more and more high-quality contents, and can meet the requirements of some similar questions through a high-quality answer, so that the value of the high-quality answer of the platform can be furthest exerted, the question-answer coverage rate is comprehensively improved, and the repeated answer can be prevented from wasting limited productivity. In the prior art, for the questions to be answered, the existing high-quality answers in the platform are usually matched manually or are searched through a simple correlation algorithm, but the two methods cannot meet the requirements of large hanging capacity and high accuracy requirement at the same time.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, storage medium, and program product for processing question-answer information.

According to an aspect of the present disclosure, there is provided a method of processing question-answer information, including: obtaining at least one high-quality question-answer pair, wherein each high-quality question-answer pair comprises a target question and a high-quality answer matched with the target question; selecting at least one similar question meeting a preset similar condition with the target question from a question library aiming at the target question and the high-quality answer of each high-quality question and answer pair, and forming the high-quality answer and each similar question into a question and answer pair to be verified; determining whether each question-answer pair to be verified passes verification or not by using a question-answer correlation model; and associating similar questions in the question-answer pair determined to pass the verification with the premium answers to provide the premium answers in response to the request for the similar questions.

According to another aspect of the present disclosure, there is provided an apparatus for processing question-answer information, including: the high-quality question-answering module is used for obtaining at least one high-quality question-answering pair, and each high-quality question-answering pair comprises a target question and a high-quality answer matched with the target question; the question-answer pair module is used for selecting at least one similar question meeting a preset similar condition with the target question from a question library aiming at the target question and the high-quality answer of each high-quality question-answer pair, and forming the high-quality answer and each similar question into a question-answer pair to be verified; the verification module is used for determining whether each question-answer pair to be verified passes verification or not by utilizing a question-answer correlation model; and an association module for associating similar questions in the question-answer pair determined to pass the verification with the quality answer to provide the quality answer in response to the request for the similar questions.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture of a method of processing question-answer information according to an embodiment of the disclosure;

FIG. 2 schematically illustrates a flow chart of a method of processing question-answer information according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a schematic diagram of constructing a challenge-answer pair to be verified in accordance with an embodiment of the present disclosure;

fig. 4 schematically illustrates a flow chart of obtaining a pair of quality questions answers in accordance with an embodiment of the disclosure;

FIG. 5 schematically illustrates a flow chart of a method of processing question-answer information according to another embodiment of the disclosure;

fig. 6 schematically illustrates a block diagram of an apparatus for processing question-answer information according to an embodiment of the disclosure;

FIG. 7 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The embodiment of the disclosure provides a method for processing question-answer information, which comprises the following steps: obtaining at least one high-quality question-answer pair, wherein each high-quality question-answer pair comprises a target question and a high-quality answer matched with the target question; selecting at least one similar question meeting a preset similar condition with the target question from a question library aiming at the target question and the high-quality answer of each high-quality question and answer pair, and forming the high-quality answer and each similar question into a question and answer pair to be verified; determining whether each question-answer pair to be verified passes verification or not by using a question-answer correlation model; and associating similar questions in the question-answer pair determined to pass the verification with the premium answers to provide the premium answers in response to the request for the similar questions.

Fig. 1 schematically illustrates an exemplary system architecture 100 of a method of processing question-answer information according to an embodiment of the disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a question and answer type application, a web browser application, a search type application, an instant messaging tool, and/or social platform software, etc., may be installed on the terminal devices 101, 102, 103, as just examples.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

For example, the server 105 may provide a trivia service, and the user browses a trivia page provided by the server 105 through the terminal device, and the user may submit questions or answer questions posed by other users through the trivia page. The server 105 may store questions and answers submitted by the user.

It should be noted that the method for processing question-answer information provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the apparatus for processing question-answer information provided by the embodiments of the present disclosure may be generally provided in the server 105. The method of processing question and answer information provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the apparatus for processing question and answer information provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The embodiment of the present disclosure provides a method of processing question-answer information, and a method of processing question-answer information according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 5 in conjunction with the application scenario of fig. 1.

Fig. 2 schematically illustrates a flowchart of a method of processing question-answer information according to an embodiment of the disclosure.

As shown in fig. 2, the processing of question-answer information 200 of the embodiment of the present disclosure may include, for example, operations S210 to S240.

At operation S210, at least one high-quality question-answer pair is obtained, each comprising a target question and high-quality answers matching the target question.

In operation S220, for each of the target questions and the high-quality answers of the high-quality question-answer pair, at least one similar question satisfying a predetermined similarity condition with the target question is selected from the question library, and the high-quality answer and each similar question are formed into a question-answer pair to be verified.

In operation S230, it is determined whether each question-answer pair to be verified is verified using the question-answer correlation model.

In operation S240, similar questions in the question-answer pair determined to pass the verification are associated with the quality answer to provide the quality answer in response to the request for the similar questions.

For example, each question and answer pair may comprise a question and an answer, and a question and answer pair may refer to a question and/or answer pair meeting a predetermined quality criterion.

Fig. 3 schematically illustrates a schematic diagram of constructing a challenge-answer pair to be verified in accordance with an embodiment of the present disclosure.

As shown in connection with FIGS. 2 and 3, a premium question-answer pair 310 packageIncludes a target question A and a high-quality answer B matched with the target question A. Selecting a plurality of questions C similar to the target question A from the question library ₁ ～C _n And forms each similar question with the quality answer B into a challenge-answer pair 320 to be verified. And verifying each question-answer pair 320 by using a question-answer correlation model to verify whether the correlation between the similar questions and the high-quality answers in the question-answer pair 320 meets the requirements, and if the verification is passed, associating the high-quality answers to the similar questions. For example, if the problem C is similar ₁ If the challenge-answer pair 320 to be verified, which is composed of the high-quality answer B, passes the verification, the high-quality answer B is associated with the similar question C ₁ To use the quality answer B to the similar question C ₁ The answer is made. When the user requests to open the similar problem C ₁ When the page corresponds to the page, the high-quality answer B can be used as the similar question C ₁ Is shown in a similar question C ₁ In the corresponding answer area.

According to the embodiment of the disclosure, on one hand, the problems of high cost, low efficiency and limited application range caused by matching answers in a manual mode can be avoided, and further the effects of large hanging capacity level, large application range and high efficiency can be realized; on the other hand, the model verification mode can ensure that similar questions and high-quality answers of the new combination are relevant, improve the relevance and accuracy of the question-answer pairs of the new combination, and avoid the situation of answering questions. Therefore, the method for processing the question and answer information can simultaneously meet the requirements of large hanging capacity and high accuracy requirement. In still another aspect, the method for processing question-answer information according to the embodiment of the present disclosure may answer a plurality of similar questions together according to one high-quality question-answer pair, thereby improving coverage of the high-quality answer and improving efficiency. And the high-quality content duty ratio is improved, so that the user experience is improved.

Fig. 4 schematically illustrates a flow chart of obtaining a pair of quality questions answers in accordance with an embodiment of the disclosure.

As shown in fig. 4, operation S210 includes operation S411 and operation S412 according to an embodiment of the present disclosure.

In operation S411, a plurality of candidate question-answer pairs, each including a question and an answer, are obtained.

In operation S412, candidate question-answer pairs satisfying a predetermined quality condition are determined as the excellent question-answer pairs from the plurality of candidate question-answer pairs using the excellent question-answer fusion model.

For example, the plurality of candidate question-answer pairs may be a set of question-answer pairs that are newly generated in the platform over a period of time. A candidate question-answer pair < D, E > including, for example, question D and answer E is described below as an example.

In embodiments of the present disclosure, the superior question-answer fusion model may include a question-answer relevance model. In the process of verifying the candidate question-answer pair < D, E > by using the high-quality question-answer fusion model, the question D and the answer E of the candidate question-answer pair can be input into a question-answer relevance model together, so that the relevance between the question D and the answer E is obtained. The question-answer relevance model can be obtained by training a plurality of question-answer samples and relevance verification labels of the samples in advance.

In embodiments of the present disclosure, the premium question-answer fusion model may also include an authoritative user model. In the process of verifying candidate question-answer pairs < D, E > by utilizing the high-quality question-answer fusion model, the respective user characteristic information of the question-presenting user and the answer user of the candidate question-answer pairs can be obtained, the respective user characteristic information of the question-presenting user and the answer user is respectively input into the authoritative user model, and whether the question-answer user and the answer user are authoritative users or not is determined. The user characteristic information includes, for example, the number of answers and/or questions submitted by the user, the number of times the questions or answers submitted by the user are automatically deleted and/or retained by the platform, the number of times the questions or answers submitted by the user are marked as quality questions and answers, and the like. The authoritative user model can be a classification model, and the output result of the model is two classification results which belong to authoritative users or not belong to authoritative users. The question-answer correlation model can be a machine learning model, and the authoritative user model can be obtained by training the user characteristics of a plurality of users and the verification tags of the corresponding users in advance.

In embodiments of the present disclosure, the premium question-answer fusion model may also include a typesetting model. In the process of verifying the candidate question-answer pair < D, E > by using the high-quality question-answer fusion model, the question D and the answer E of the candidate question-answer pair can be respectively input into a typesetting model to obtain typesetting quality of the question D and the answer E. The typesetting model can be, for example, a classification model, and typesetting quality can be expressed as two classification results of high-quality typesetting or low-quality typesetting. In an embodiment of the present disclosure, before inputting the question D and the answer E into the layout model, the layout features of the question D and the answer E may be extracted, respectively, and the layout features include, for example: whether or not a line is fed, whether or not a picture is present, the picture position and proportion, the number of punctuation marks such as commas, and the like. The typesetting features can be counted in a traversing manner, for example, traversing from the first character of the problem document, and adding 1 to the comma number every time a comma is found; every time a line feed symbol is found, the number of lines is increased by 1, etc. The picture position and scale may be determined, for example, in the following manner: in the process of traversing the document, determining a starting position and an ending position of the picture in the document, wherein the starting position is the inserting position of the picture, and if the picture is inserted after the 10 th character, the starting position is 11. If the picture occupies the position of 5 characters, the position of the picture is 11-15, and if the total number of characters occupied by the document is 20, the proportion of the picture is 5/20. The typesetting model can be obtained by training typesetting characteristics of a plurality of question and answer samples and verification tags of the samples in advance.

In embodiments of the present disclosure, the premium question fusion model may also include a domain model and a repeat question model. In the process of verifying the candidate question-answer pair < D, E > by using the high-quality question-answer fusion model, the question D of the candidate question-answer pair can be input into the field model to obtain the field to which the question D belongs. The fields may include, for example, economic, scientific, game, entertainment, and the like fields. After determining the field of the problem D, a plurality of historical problems belonging to the same field as the problem D can be selected from a preset high-quality question-answering library, the problem D and each historical problem form a problem pair, each problem pair is sequentially input into a repeated problem model to judge, the model can judge whether two related problems of the problem pair belong to repeated problems or not, and then whether the problem D is not repeated with each historical problem in the preset high-quality question-answering library or not can be judged.

In the embodiments of the present disclosure, the high-quality question-answer library may contain a large number of high-quality question-answer pairs, and the high-quality question-answer pairs identified by the embodiments of the present disclosure may also be supplemented into the high-quality question-answer library. If the problem which is repeated with the problem D exists in the high-quality question-answer library currently, the candidate question-answer pair where the problem D is located is not required to be processed, so that the repeated high-quality question-answer pair is avoided, the repeated processing of the same problem can be avoided, and the processing efficiency is improved. The field model and the repeated problem model can be obtained by deep learning, and can be trained in advance by sample data and verification tags. After the fields are determined, the questions in the same field are selected from a preset high-quality question-answering library, and then whether the questions which are repeated with the questions D exist or not is judged from the selected questions, so that the judging efficiency can be improved, and unnecessary calculation waste is avoided.

In embodiments of the present disclosure, the premium question-answer fusion model may also include a question model. In the process of verifying the candidate question-answer pair < D, E > by using the high-quality question-answer fusion model, the question of the candidate question-answer pair can be input into a question model to determine whether the question accords with a question form. The problem model can be a classification model, and the output result of the model is two classification results which accord with a question form or do not accord with the question form. The problem model can be a deep learning model, and can be trained in advance by using sample data and verification tags.

In an embodiment of the present disclosure, the predetermined high-quality condition includes at least one of the following conditions (1) to (6):

(1) The correlation between the questions and the answers is greater than or equal to a preset correlation threshold.

(2) The questioning users and/or answering users of the candidate question-answer pairs are authoritative users.

(3) The typesetting of questions and/or answers belongs to a premium typesetting.

(4) The questions do not coincide with the questions of the history of quality question answering pairs.

(5) The questions follow the question form.

All models or part of the models can be used according to actual needs aiming at each model and corresponding judging conditions contained in the high-quality question-answer fusion model, and the use sequence of each model can be determined according to the actual needs.

For example, a question model may be used first, if it is determined that the question D does not conform to the question form, the question D is considered to be not substantially a question, and the candidate question-answer pair where the question D is located may be discarded; if the problem D is judged to belong to the problem, a question-answer correlation model, a user authority model, a typesetting model and a repeated problem model can be used for obtaining a correlation model result aiming at the candidate question-answer pair, and comprehensive judgment is carried out by combining the results of the four models. For example, if the candidate question-answer pair simultaneously satisfies four conditions that the relevance exceeds a threshold, the questioning user and/or the answering user are authoritative users, the question and/or the answer are typesets of good quality, and the question does not belong to a duplicate question, then the candidate question-answer pair may be determined to be a good quality question-answer pair.

For another example, the problem model may be used first, if it is determined that the problem D belongs to the problem, the domain model and the repeated problem model may be continuously used to perform the determination, and if it is determined that the problem D does not belong to the repeated problem, the question-answer relevance model, the user authority model and the typesetting model may be continuously used to obtain a model result, and the results of the three models are combined to perform the comprehensive determination. For example, if the candidate question-answer pair satisfies three conditions that the degree of relatedness exceeds a threshold, does not belong to a duplicate question, and is excellent in layout, the candidate question-answer pair may be determined to be an excellent question-answer pair even if the authoritative user is not involved.

According to the embodiment of the disclosure, the relevance of the questions and the answers is used as the reference condition of the high-quality questions and answers, so that the high-quality questions and answers can be ensured to have high relevance. Because the quality of questions answered by the authoritative user is generally higher and the probability that questions submitted by the authoritative user can obtain high-quality answers is also higher, whether the questioning user and/or the answering user are authoritative users or not is used as a reference condition of the high-quality questions and answers can be guaranteed to have higher quality of the high-quality questions and answers. The typesetting quality of the questions and the answers is used as the reference condition of the excellent question-answer pair, so that the excellent question-answer pair can be ensured to have better typesetting, and the question-answer pair with disordered typesetting is prevented from being judged as the excellent question-answer pair. Whether the questions and the questions existing in the high-quality question-answering library are repeatedly used as reference conditions of the high-quality questions-answering can avoid repeated processing of the same questions, and processing efficiency is improved. Whether the question belongs to the question form is used as the reference condition of the excellent question-answer pair, so that the question-answer of the non-question can be prevented from being identified as the excellent question-answer pair.

According to an embodiment of the present disclosure, the method of processing question-answer information may further include: determining whether the questions and answers of each candidate question-answer pair meet a pre-condition, the pre-condition comprising at least one of a word count condition and a sensitive word condition; discarding candidate question-answer pairs which do not meet the preconditions; and determining whether each candidate question-answer pair meeting the precondition meets the predetermined high-quality condition by utilizing the high-quality question-answer fusion model.

For example, the preconditions may include: whether the answer is blank, whether the number of words of the question and/or the answer meets the judgment conditions of the specified number of words, whether the question and/or the answer contains sensitive words, and the like. If the candidate question-answer pair does not meet the precondition, discarding the candidate question-answer pair, and inputting the candidate question-answer pair into the models; if the pre-judging condition is met, the above model is used for further judging whether the candidate question-answer pair is a high-quality question-answer pair. Based on the scheme, part of candidate question-answer pairs can be eliminated in advance, the calculated amount of the model is reduced, and the efficiency of identifying the excellent question-answer pairs is improved.

According to an embodiment of the present disclosure, the method of processing question-answer information may further include: for candidate question-answer pairs to which the manual label is added, the relevance between the questions and the answers of the candidate question-answer pairs is changed based on the manual label.

For example, if a candidate question-answer pair has a manual mark, for example, a manual mark with a top-quality question-answer, and if it is determined by using the question-answer relevance model that the relevance of the candidate question-answer pair is low, the relevance of the candidate question-answer pair may be corrected to be greater than a preset relevance threshold according to the manual mark. Based on the scheme, the model result can be corrected, so that the judgment result is more accurate.

According to an embodiment of the present disclosure, selecting at least one similar question from a question library that satisfies a predetermined similarity condition with a target question includes: and determining candidate questions meeting a predetermined similarity condition from a plurality of candidate questions in the question library as similar questions of the target questions by using the similarity determination fusion model.

For example, a problem library may refer to a collection of newly generated problems in a platform over a period of time. The method can be used for initially screening all the questions in the question library, and screening out a plurality of questions similar to the target questions as candidate questions. For example, for a premium question-answer pair < q1, r1>, a ES (Elastic Search) distributed search engine may be employed to filter out candidate question sets { q2, q4 … } from the question library that meet the similarity threshold. After the candidate problem set is obtained, the similarity judgment fusion model can be reused to determine the similar problems from the candidate problem set.

The following description will take as an example a similar problem of judging whether the candidate problem q2 is the target problem q 1.

In embodiments of the present disclosure, the similarity determination fusion model may include a similarity model. In the process of judging whether the candidate problem q2 is a similar problem of the target problem q1 by using the similarity judgment fusion model, the target problem q1 and the candidate problem q2 can be input into a similarity model to obtain the similarity between the target problem q1 and the candidate problem q 2. The similarity model can be a deep learning model, and can be obtained by training a problem sample and a problem similarity verification tag in advance.

In embodiments of the present disclosure, the fused similarity determination model may also include a new word discovery model. In the process of judging whether the candidate problem q2 is a similar problem of the target problem q1 by using the similarity judgment fusion model, the target problem q1 and the candidate problem q2 may be respectively input into a new word discovery model, whether the target problem q1 and the candidate problem q2 each contain a new word is determined, and if the problem contains a new word, the contained new word may be further output. The new word discovery model may employ a hidden markov model (Hidden Markov Model, HMM).

In embodiments of the present disclosure, the fused similarity determination model may also include a problem category model. In the process of judging whether the candidate problem q2 is a similar problem of the target problem q1 by using the similarity judgment fusion model, the target problem q1 and the candidate problem q2 can be respectively input into the problem category model to obtain respective problem categories of the target problem q1 and the candidate problem q 2. The problem categories may include: the three categories of the class "how to" and the class "what" are defined, wherein the class is a question of objective facts, which may include, for example, what the "xxx society is, who the xxx is. The "how to" class may include, for example, a question form of "how xxx does". The "what" class may include, for example, question forms of which of the "xxx societies". The problem category model can be obtained by training a deep learning model in advance by using a problem sample and category verification labels of all the problem samples.

In an embodiment of the present disclosure, the predetermined similar condition may include at least one of the following conditions (1) to (3):

(1) The similarity of the candidate problem and the target problem is greater than or equal to a predetermined similarity threshold.

(2) The candidate questions and the target questions either coincide with new words contained in the target questions or neither the candidate questions nor the target questions contain new words.

(3) The problem category of the candidate problem is consistent with the problem category of the target problem.

All or part of the models contained in the similarity judgment fusion model and corresponding judgment conditions can be used according to actual needs, and the use sequence of each model can be determined according to the actual needs.

For example, the similarity model, the new word discovery model, and the problem category model may be used simultaneously, and the determination may be made in conjunction with the results of these three models. For example, if the target problem q1 and the candidate problem q2 satisfy three conditions that the similarity is greater than the similarity threshold, no new word is found, and the candidate problem q2 belongs to the same problem category, the candidate problem q2 may be determined as a similar problem to the target problem q 1. Alternatively, if the general similarity between the target question q1 and the candidate question q2 is greater than the similarity threshold and the new words identified in the two questions are identical, the two questions may be determined to be similar questions even if they do not belong to the same category.

For another example, since two similar questions generally do not include a new word in one question and the other does not, a new word discovery model may be used to determine whether the two questions include a new word, and if one of the questions is found to include a new word and the other does not, the two questions may be determined to be dissimilar; if both questions do not contain new words, the similarity model and the question category model can be continuously utilized for judging.

According to the embodiment of the disclosure, the similarity between the two questions is used as a reference condition for whether the two questions are similar, so that the text similarity between the two questions can be ensured to be high. Since similar questions generally do not include new words one and the other, whether the two questions each include new words and whether the included new words agree with each other are used as reference conditions for similarity, and accuracy of recognition can be improved. And whether the two problems belong to the same problem category is used as a reference condition of similarity, so that the accuracy of recognition can be improved.

According to an embodiment of the present disclosure, the method of processing question-answer information may further include: replacing non-universal words in the candidate questions and the target questions with universal words; and removing the entity words repeatedly appearing in the candidate questions and removing the entity words repeatedly appearing in the target questions.

For example, the target problem q1 and the candidate problem q2 may be processed before the target problem q1 and the candidate problem q2 are judged using the similarity judgment fusion model.

In the embodiment of the disclosure, a general replacement word library may be preset, where correspondence between some non-general words and general words is recorded in the word library. In the case that there is a non-generic word in the target question q1 and/or the candidate question q2, the corresponding generic word may be found using the lexicon, and the non-generic word in the target question q1 and/or the candidate question q2 may be replaced with the generic word.

In the embodiment of the present disclosure, a repeated entity word library may be preset, where some repeated entity words are recorded, for example, "photo" and "graph" may be considered as repeated entity words. When a continuous duplicate entity word occurs for the target question q1 or the candidate question q2, for example, in the case where there is a sentence of "what is XY" in the target question q1, where X and Y are duplicate entity words, the duplicate entity word may be deleted, for example, Y may be deleted, and what is XY may be reduced to what is X.

The operations of replacing and removing the repeated entity words through the above non-universal words can reduce calculation and improve recognition accuracy.

In the embodiment of the disclosure, after obtaining the similar questions of the target question q1, a high-quality question-answer fusion model is firstly utilized to judge whether each similar question has a high-quality answer, if yes, the similar questions and the high-quality answers thereof form a high-quality question-answer pair and are put into a high-quality question-answer library; otherwise, a high-quality answer to the similar question and the target question q1 may be formed into a question-answer pair to be verified, and the following operation is performed.

According to an embodiment of the present disclosure, determining whether each challenge-answer pair to be verified passes verification using a challenge-answer relevance model includes: inputting similar questions and high-quality answers in the question-answer pair to be verified into a question-answer relevance model to obtain relevance between the similar questions and the high-quality answers; if the correlation is greater than or equal to a preset correlation threshold, the question-answer pair to be verified passes verification.

For example, after finding a similar question of the target question and forming a question-answer pair to be verified, a relevance model may be used to determine whether the similar question and the good answer are matched, specifically, a relevance model may be used to calculate a relevance between the similar question and the good answer, and if the relevance between the similar question and the good answer is greater than a predetermined threshold, the similar question and the good answer may be considered to be matched, and the question-answer pair to be verified passes verification.

Fig. 5 schematically illustrates a flowchart of a method of processing question-answer information according to another embodiment of the present disclosure.

As shown in fig. 5, in operation S501, it may be determined whether each candidate question-answer pair is a question-answer pair, which may include a target question and a question-answer, using a question-answer fusion model. In operation S502, candidate questions close to the target question of the excellent question-answer pair may be preliminarily screened out from the question library. In operation S503, a similarity question of the target question may be selected from the plurality of candidate questions using the similarity determination fusion model, and the similarity question and the high-quality answer may be formed into a question-answer pair to be verified. In operation S504, it may be determined whether each of the to-be-verified question-answer pairs is verified using a question-answer correlation model. In operation S504, for the pair of questions and answers that pass the verification, the original answers of the similar questions and the superior questions and answers may be ranked in the following manner.

According to an embodiment of the present disclosure, the method of processing question-answer information may further include: the following is performed for each validated question-answer pair containing similar questions and premium answers: in the case where there are a plurality of original answers to similar questions, obtaining ranking conditions of each of the plurality of original answers and the high-quality answer, the ranking conditions including: at least one of relevance of the questions to the answers, answer time, whether the answers are video answers, and whether the answers are answers of authoritative users; and inputting the sorting conditions of the high-quality answers and the plurality of original answers into a sorting model to obtain the display sequence of the high-quality answers and the plurality of original answers.

For example, for each validated question-answer pair, it is checked whether the similar question itself has one or more answers, i.e., whether one or more users have answered the similar question, and if so, the premium question-answer may be ranked with the answer or answers itself to determine the proper presentation order.

If the question-answer pair < q3, r1> consisting of the similar question q3 of the target question q1 and the high-quality answer r1 of the target question q1 passes verification, and the similar question q3 has the original answers r2 and r3, the relevance of the answers r1 to r3 with the similar question q3, the answer time of the answers r1 to r3, whether the answers r1 to r3 are video answers, whether the answers r1 to r3 are answers of authoritative users or not and other features are obtained, and the features are input into a sorting model, so that the display sequence of the answers r1 to r3 can be obtained. The ranking model can be a machine learning model, and the ranking model can be obtained by training answer sample data and ranking verification tags in advance.

According to the method and the device for ordering the plurality of original questions and the high-quality questions, the plurality of answers corresponding to the similar questions can be displayed in a reasonable sequence by considering the conditions of question-answer relevance, answer time, whether the answers are video answers, whether the answers are authoritative users and the like, and therefore the higher referential performance of the previous answers is ensured. For example, answers with longer times may be less referenced, answers with later times may be placed in a later location, video answers, authoritative user answers, answers with high relevance to questions and answers may be placed in a earlier location, etc.

Another aspect of the disclosed embodiments also provides an apparatus for processing question-answer information.

Fig. 6 schematically illustrates a block diagram of an apparatus for processing question-answer information according to an embodiment of the disclosure.

As shown in fig. 6, the apparatus 600 includes a quality question-answering module 610, a question-answering module 620, a verification module 630, and an association module 640.

The quality question-answering module 610 is configured to obtain at least one quality question-answering pair, each of which includes a target question and quality answers matching the target question.

The question-answer pair module 620 is configured to select, for each of the target questions and the quality answers of the quality question-answer pair, at least one similar question satisfying a predetermined similarity condition with the target question from the question bank, and construct the quality answer pair with each similar question to be validated.

The verification module 630 is configured to determine whether each question-answer pair to be verified is verified by using the question-answer correlation model.

The association module 640 is for associating similar questions in a question-answer pair determined to pass verification with a premium answer to provide a premium answer in response to a request for similar questions.

According to an embodiment of the present disclosure, the quality question-answering module may include: the device comprises an acquisition module and a high-quality judgment module.

The acquisition module is used for acquiring a plurality of candidate question-answer pairs, wherein each candidate question-answer pair comprises a question and an answer.

The high-quality judging module is used for determining candidate question-answer pairs meeting preset high-quality conditions from the plurality of candidate question-answer pairs as high-quality question-answer pairs by utilizing a high-quality question-answer fusion model. Wherein the predetermined quality condition includes at least one of the following conditions: the relativity between the questions and the answers is larger than or equal to a preset relativity threshold value, the questioning users and/or the answering users of the candidate question-answer pairs are authoritative users, the typesetting of the questions and/or the answers belongs to high-quality typesetting, the questions are not coincident with the questions of the history high-quality question-answer pairs, and the questions accord with question sentence forms.

According to an embodiment of the present disclosure, the apparatus for processing question-answer information may further include a pre-module for: determining whether the questions and answers of each candidate question-answer pair meet a pre-condition, the pre-condition comprising at least one of a word count condition and a sensitive word condition; discarding candidate question-answer pairs which do not meet the preconditions; and determining whether each candidate question-answer pair meeting the precondition meets the predetermined high-quality condition by utilizing the high-quality question-answer fusion model.

According to an embodiment of the present disclosure, the apparatus may further include a modification module for: for candidate question-answer pairs to which the manual label is added, the relevance between the questions and the answers of the candidate question-answer pairs is changed based on the manual label.

According to an embodiment of the present disclosure, the question-answer pair module may include a similarity determination module for determining, from a plurality of candidate questions of the question bank, a candidate question satisfying a predetermined similarity condition as a similar question of the target question using a similarity determination fusion model.

Wherein the predetermined similar condition includes at least one of the following conditions: the similarity between the candidate problem and the target problem is greater than or equal to a preset similarity threshold; the candidate questions are consistent with the new words contained in the target questions, or the candidate questions and the target questions do not contain the new words; the problem category of the candidate problem is consistent with the problem category of the target problem.

According to an embodiment of the present disclosure, the apparatus for processing question-answer information may further include a question processing module for: replacing non-universal words in the candidate questions and the target questions with universal words; and removing the entity words repeatedly appearing in the candidate questions and removing the entity words repeatedly appearing in the target questions.

According to an embodiment of the present disclosure, the verification module is further configured to: inputting similar questions and high-quality answers in the question-answer pair to be verified into a question-answer relevance model to obtain relevance between the similar questions and the high-quality answers; if the correlation is greater than or equal to a preset correlation threshold, the question-answer pair to be verified passes verification.

According to an embodiment of the present disclosure, the apparatus for processing question-answer information may further include a ranking module for performing the following operations on similar questions and quality answers contained for each validated question-answer: in the case where there are a plurality of original answers to similar questions, obtaining ranking conditions of each of the plurality of original answers and the high-quality answer, the ranking conditions including: at least one of relevance of the questions to the answers, answer time, whether the answers are video answers, and whether the answers are answers of authoritative users; and inputting the sorting conditions of the high-quality answers and the plurality of original answers into a sorting model to obtain the display sequence of the high-quality answers and the plurality of original answers.

Note that, in the embodiment of the present disclosure, the device portion for processing the question-answer information corresponds to the method portion for processing the question-answer information in the embodiment of the present disclosure, and the description of the device portion for processing the question-answer information specifically refers to the method portion for processing the question-answer information, which is not described herein.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, a method of processing question-answer information. For example, in some embodiments, the method of processing question and answer information may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the method of processing question-answer information described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method of processing question-answer information in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of processing question-answer information, comprising:

obtaining at least one high-quality question-answer pair, wherein each high-quality question-answer pair comprises a target question and a high-quality answer matched with the target question;

selecting at least one similar question meeting a preset similar condition with the target question from a question library aiming at the target question and the high-quality answer of each high-quality question and answer pair, and forming the high-quality answer and each similar question into a question and answer pair to be verified;

determining whether each question-answer pair to be verified passes verification or not by using a question-answer correlation model; and

associating similar questions in the question-answer pair determined to pass verification with the premium answers, providing the premium answers in response to the request for the similar questions,

wherein the determining whether each question-answer pair to be verified passes verification by using the question-answer relevance model comprises:

Inputting similar questions and high-quality answers in the question-answer pair to be verified into the question-answer relevance model to obtain relevance between the similar questions and the high-quality answers;

and if the correlation degree is greater than or equal to a preset correlation degree threshold value, the question-answer pair to be verified passes verification.

2. The method of claim 1, wherein the obtaining at least one premium question-answer pair comprises:

obtaining a plurality of candidate question-answer pairs, each of the candidate question-answer pairs comprising a question and an answer;

determining candidate question-answer pairs satisfying a predetermined quality condition from the plurality of candidate question-answer pairs as the quality question-answer pairs using a quality question-answer fusion model,

wherein the predetermined quality condition includes at least one of the following conditions:

the correlation between the questions and the answers is greater than or equal to a preset correlation threshold;

the questioning users and/or answering users of the candidate questioning and answering pairs are authoritative users;

the typesetting of the question and/or the answer belongs to a high-quality typesetting;

the questions do not coincide with the questions of the history excellent question answer pair; and

the question is in accordance with a question form.

3. The method of claim 2, further comprising:

Determining whether the questions and answers of each candidate question-answer pair meet a pre-condition, wherein the pre-condition comprises at least one of a word number condition and a sensitive word condition;

discarding candidate question-answer pairs which do not meet the preconditions; and

and determining whether each candidate question-answer pair meeting the precondition meets the preset high-quality condition by utilizing a high-quality question-answer fusion model.

4. The method of claim 2, further comprising:

for candidate question-answer pairs to which artificial tags are added, changing the relevance between questions and answers of the candidate question-answer pairs based on the artificial tags.

5. The method of claim 1, wherein the selecting at least one similarity question from a question library that satisfies a predetermined similarity condition with the target question comprises:

determining a candidate problem satisfying a predetermined similarity condition from a plurality of candidate problems of the problem library as a similar problem of the target problem using a similarity determination fusion model,

wherein the predetermined similar condition includes at least one of the following conditions:

the similarity between the candidate problem and the target problem is greater than or equal to a preset similarity threshold;

the candidate questions are consistent with new words contained in the target questions, or the candidate questions and the target questions do not contain new words;

The problem category of the candidate problem is consistent with the problem category of the target problem.

6. The method of claim 5, further comprising:

replacing non-generic words in the candidate questions and the target questions with generic words;

and removing the repeated entity words in the candidate questions and removing the repeated entity words in the target questions.

7. The method of claim 1, further comprising: the following is performed for each validated question-answer pair containing similar questions and premium answers:

in the case where there are a plurality of original answers to the similar question, obtaining ranking conditions of the quality answer and each of the plurality of original answers, the ranking conditions including: at least one of relevance of the questions to the answers, answer time, whether the answers are video answers, and whether the answers are answers of authoritative users;

and inputting the sorting conditions of the quality answer and the plurality of original answers into a sorting model to obtain the display sequence of the quality answer and the plurality of original answers.

8. An apparatus for processing question-answer information, comprising:

the high-quality question-answering module is used for obtaining at least one high-quality question-answering pair, and each high-quality question-answering pair comprises a target question and a high-quality answer matched with the target question;

The question-answer pair module is used for selecting at least one similar question meeting a preset similar condition with the target question from a question library aiming at the target question and the high-quality answer of each high-quality question-answer pair, and forming the high-quality answer and each similar question into a question-answer pair to be verified;

the verification module is used for determining whether each question-answer pair to be verified passes verification or not by utilizing a question-answer correlation model; and

an association module for associating similar questions in a question-answer pair determined to pass verification with a premium answer to provide the premium answer in response to a request for the similar questions,

the verification module is further configured to:

inputting similar questions and high-quality answers in the question-answer pair to be verified into a question-answer relevance model to obtain relevance between the similar questions and the high-quality answers;

if the correlation is greater than or equal to a preset correlation threshold, the question-answer pair to be verified passes verification.

9. The apparatus of claim 8, wherein the quality question-answering module comprises:

an acquisition module for acquiring a plurality of candidate question-answer pairs, each candidate question-answer pair comprising a question and an answer;

a quality judgment module for determining candidate question-answer pairs satisfying a predetermined quality condition from the plurality of candidate question-answer pairs as the quality question-answer pairs using a quality question-answer fusion model,

Wherein the predetermined quality condition includes at least one of the following conditions: the relevance between the questions and the answers is greater than or equal to a preset relevance threshold, the questioning users and/or answering users of the candidate question-answer pairs are authoritative users, the typesetting of the questions and/or the answers belongs to high-quality typesetting, the questions are not coincident with the questions of the history high-quality question-answer pairs, and the questions accord with question forms.

10. The apparatus of claim 8, wherein the question-answer module comprises:

a similarity determination module for determining a candidate problem satisfying a predetermined similarity condition from among a plurality of candidate problems in the problem library as a similarity problem of the target problem using a similarity determination fusion model,

11. The apparatus of claim 8, further comprising:

The sorting module is used for carrying out the following operations on the contained similar questions and high-quality answers for each verified question and answer:

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

13. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.