CN117743519A - Question-answering knowledge base optimizing method and device - Google Patents

Question-answering knowledge base optimizing method and device Download PDF

Info

Publication number
CN117743519A
CN117743519A CN202211110038.5A CN202211110038A CN117743519A CN 117743519 A CN117743519 A CN 117743519A CN 202211110038 A CN202211110038 A CN 202211110038A CN 117743519 A CN117743519 A CN 117743519A
Authority
CN
China
Prior art keywords
question
target
word segmentation
confusion
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211110038.5A
Other languages
Chinese (zh)
Inventor
李鹏
徐超
熊超
包勇军
颜伟鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202211110038.5A priority Critical patent/CN117743519A/en
Priority to PCT/CN2023/088448 priority patent/WO2024055582A1/en
Publication of CN117743519A publication Critical patent/CN117743519A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an optimization method and device of a question-answer knowledge base, wherein the optimization method of the question-answer knowledge base comprises the following steps: selecting a target question sentence from a question set included in the question-answer knowledge base, and acquiring a target confusion question sentence corresponding to the target question sentence according to the question set; and determining the associated knowledge points corresponding to the target confusion question sentences, and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points. By mining the mixed question sentences of each knowledge point and attributing the mixed question sentences to the corresponding associated knowledge points, the question-answer knowledge base is expanded, the coverage range of the question-answer knowledge base to the possibly proposed question sentences by the user is enhanced, and when the common question-answer or retrieval function is realized based on the question-answer knowledge base, the knowledge points corresponding to the question sentences proposed by the user can be correctly matched, so that the response effect is enhanced.

Description

Question-answering knowledge base optimizing method and device
Technical Field
The application relates to the field of natural language processing, in particular to a method and a device for optimizing a question-answer knowledge base.
Background
At present, a question-answer knowledge base is expanded by mining similar question sentences which are synonymous with standard question sentences and different words under one knowledge point, the standard question sentences correspond to the same knowledge point with the similar question sentences which are synonymous with the standard question sentences and different words, and a common question-answer (FAQ) system can accurately match the question sentences which are proposed by a user to the knowledge points corresponding to the question sentences by matching the question sentences which are proposed by the user with the similarity of the question sentences (such as standard Questions and the similar Questions which are synonymous with the standard Questions and different words) under each knowledge point when the question-answer knowledge base is used for responding to the question sentences which are proposed by the user through mining the similar question sentences, so that FQA is not influenced by the synonyms. The confusing question sentence means that the question sentence proposed by the user is similar to the question sentence under a plurality of knowledge points in the knowledge base, and the knowledge base based on the current question and answer is difficult to be correctly matched with the corresponding knowledge points, so that the response accuracy based on the question and answer knowledge base is lower.
Disclosure of Invention
The present application aims to solve, at least to some extent, one of the technical problems in the related art.
For this purpose, a first object of the present application is to propose a method for optimizing a knowledge base of questions and answers.
A second object of the present application is to provide an optimizing apparatus for a question-answering knowledge base.
A third object of the present application is to propose an electronic device.
A fourth object of the present application is to propose a non-transitory computer readable storage medium.
A fifth object of the present application is to propose a computer programme product.
To achieve the above objective, an embodiment of a first aspect of the present application provides a method for optimizing a knowledge base of questions and answers, including: determining a question-answer knowledge base, wherein the question-answer knowledge base comprises knowledge points and a question set corresponding to the knowledge points; selecting a target question sentence from a question set included in the question-answer knowledge base, and acquiring a target confusion question sentence corresponding to the target question sentence according to the question set; and determining an associated knowledge point corresponding to the target confusion question sentence, and attributing the target confusion question sentence to a question set corresponding to the associated knowledge point.
Selecting a target question sentence from a question set included in a question-answer knowledge base, and acquiring a target confusion question sentence corresponding to the target question sentence according to the question set; and determining the associated knowledge points corresponding to the target confusion question sentences, and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points. By mining the mixed question sentences of each knowledge point and attributing the mixed question sentences to the corresponding associated knowledge points, the question-answer knowledge base is expanded, the coverage range of the question-answer knowledge base to the possibly proposed question sentences by the user is enhanced, and when the common question-answer or retrieval function is realized based on the question-answer knowledge base, the knowledge points corresponding to the question sentences proposed by the user can be correctly matched, so that the response effect is enhanced.
To achieve the above object, an embodiment of a second aspect of the present application provides an optimizing apparatus for a question-answer knowledge base, including: the first determining module is used for determining a question-answer knowledge base, wherein the question-answer knowledge base comprises knowledge points and a question set corresponding to the knowledge points; the acquisition module is used for selecting a target question sentence from a question set included in the question-answer knowledge base and acquiring a target confusion question sentence corresponding to the target question sentence according to the question set; and secondly, determining associated knowledge points corresponding to the target confusion question sentences, and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points.
Selecting a target question sentence from a question set included in a question-answer knowledge base, and acquiring a target confusion question sentence corresponding to the target question sentence according to the question set; and determining the associated knowledge points corresponding to the target confusion question sentences, and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points. By mining the mixed question sentences of each knowledge point and attributing the mixed question sentences to the corresponding associated knowledge points, the question-answer knowledge base is expanded, the coverage range of the question-answer knowledge base to the possibly proposed question sentences by the user is enhanced, and when the common question-answer or retrieval function is realized based on the question-answer knowledge base, the knowledge points corresponding to the question sentences proposed by the user can be correctly matched, so that the response effect is enhanced.
To achieve the above object, an embodiment of a third aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to implement the method for optimizing a question-answer knowledge base according to the embodiment of the first aspect of the present application.
To achieve the above object, an embodiment of a fourth aspect of the present application proposes a non-transitory computer readable storage medium storing computer instructions for implementing a method for optimizing a knowledge base of questions and answers according to an embodiment of the first aspect of the present application.
To achieve the above object, an embodiment of a fifth aspect of the present application proposes a computer program product comprising a computer program for implementing an optimization method of a question-answer knowledge base according to an embodiment of the first aspect of the present application when the computer program is executed by a processor.
Drawings
FIG. 1 is a flowchart of a method for optimizing a knowledge base of questions and answers according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for optimizing a knowledge base of questions and answers according to another embodiment of the present application;
FIG. 3 is a flowchart illustrating a method for optimizing a knowledge base of questions and answers according to another embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for optimizing a knowledge base of questions and answers according to another embodiment of the present disclosure;
FIG. 5 is a schematic flow chart of judging whether a sentence is legal or not in the method for optimizing a question-answer knowledge base according to an embodiment of the present application;
FIG. 6 is a block diagram of an optimizing apparatus for question-answering knowledge base proposed in the present application;
fig. 7 is a block diagram of an electronic device provided herein.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.
Fig. 1 is a schematic flow chart of a method for optimizing a question-answer knowledge base according to an embodiment of the present application, where the method for optimizing a question-answer knowledge base according to the embodiment of the present application may be executed by an apparatus for optimizing a question-answer knowledge base according to the embodiment of the present application, and the apparatus for optimizing a question-answer knowledge base may be set in electronic devices such as a terminal and a server. As shown in fig. 1, the method for optimizing the question-answer knowledge base according to the embodiment of the application includes the following steps:
s101, determining a question-answer knowledge base, wherein the question-answer knowledge base comprises knowledge points and a question set corresponding to the knowledge points.
In the embodiment of the application, a question-answer knowledge base to be optimized is determined, wherein the knowledge base comprises a plurality of knowledge points and a question set corresponding to the knowledge points, and the question set can comprise standard question sentences and similar question sentences corresponding to the knowledge points and answers for answering the standard question sentences and the similar question sentences. As shown in table 1:
TABLE 1 question and answer knowledge base example Table
S102, selecting a target question sentence from a question set included in the question-answer knowledge base, and acquiring a target confusion question sentence corresponding to the target question sentence according to the question set.
In the embodiment of the application, any question sentence in the question-answer knowledge base can be used as a target question sentence, namely, any standard question sentence or any similar question sentence under any knowledge point can be used as a target question sentence, so that the method for obtaining the target confusion question sentence corresponding to the target question sentence is implemented on the target question sentence. The confusing question sentence can be understood as a question sentence with similarity to a question sentence under a plurality of knowledge points, and it is difficult to determine the corresponding knowledge point, for example, "insufficient resources" can be regarded as a confusing question sentence, because "insufficient resources" and "insufficient prompting resources" under the knowledge point 1, how to view the existing resources "and" insufficient display resources "under the knowledge point 2, how to apply for" all have higher similarity, and when the confusing question sentence is not expanded under the knowledge point, it is difficult for the FQA system to accurately determine the corresponding knowledge point therefrom, thereby resulting in inaccurate response and difficulty in meeting the user requirement.
S103, determining associated knowledge points corresponding to the target confusion question sentences, and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points.
In the embodiment of the application, knowledge points associated with the target confusion question sentence can be determined by analyzing an actual application scene or business content, for example, according to a historical question sentence proposed by a user, which knowledge point is mostly queried when the user proposes the target confusion question is obtained by analysis, the associated knowledge point of the target confusion question is manually determined, and the associated knowledge point is attributed to a question set under the associated knowledge point, for example, the target confusion question sentence is stored in the question set as a similar question sentence under the associated knowledge point.
Therefore, when the question sentence proposed by the user is the target confusion question sentence or the similarity between the question sentence and the target confusion question sentence is high, the question sentence can be accurately matched with the corresponding knowledge point.
For example, for the confusion problem "under-resource" presented by the user, the FAQ system generally uses the "under-resource" calculated to be substantially similar to the 2 knowledge points, and the similarity of the knowledge point 2 is higher, so that the knowledge point 2 is used for responding. In an actual service scenario, however, most of users who make the confusion problem query knowledge point 1, and the user problem of "insufficient resources" should use knowledge point 1 to reply better. By the question-answer knowledge base optimization method, the mixed question sentences of each question sentence can be mined, and the mixed question sentences are manually attributed to the associated knowledge points based on business knowledge, so that the answer effect is enhanced.
The embodiment of the application provides an optimization method of a question-answer knowledge base, wherein target question sentences are selected from a question set included in the question-answer knowledge base, and target confusion question sentences corresponding to the target question sentences are obtained according to the question set; and determining the associated knowledge points corresponding to the target confusion question sentences, and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points. By mining the mixed question sentences of each knowledge point and attributing the mixed question sentences to the corresponding associated knowledge points, the question-answer knowledge base is expanded, the coverage range of the question-answer knowledge base to the possibly proposed question sentences by the user is enhanced, and when the common question-answer or retrieval function is realized based on the question-answer knowledge base, the knowledge points corresponding to the question sentences proposed by the user can be correctly matched, so that the response effect is enhanced.
On the basis of the above embodiment, as shown in fig. 2, the step of "obtaining, according to the question set, the target confusion question sentence corresponding to the target question sentence" in the step S101 may include the following steps:
s201, determining a reference question sentence of the generated target confusion question sentence.
In the embodiment of the application, the reference question sentence of the target confusion question sentence corresponding to the generated target question sentence is determined from the question sentences in the question-answer knowledge base. The reference question sentence can be other question sentences except the target question sentence in the question-answer knowledge base or other question sentences in the question set except the question set in which the target question sentence is located in the question-answer knowledge base.
S202, determining the similarity between the target question sentence and the reference question sentence.
In the embodiment of the application, the similarity between the target question sentence and any reference question sentence is calculated respectively.
The cosine value between the word vector of the target question sentence and the word vector of any reference question sentence can be calculated, and the cosine value of the word vector is used for representing the similarity between the question sentences. In some embodiments the similarity between two question sentences may also be calculated based on a neural network.
S203, determining a similar problem sentence set of the target problem sentence according to the similarity.
In the embodiment of the application, whether the reference question sentence can be used as the similar question sentence of the target question sentence or not can be determined according to the similarity between the target question sentence and the reference question sentence, so that a similar question sentence set of the target question sentence is obtained.
In some embodiments, the reference question sentence may be screened for similar question sentences by setting a similarity threshold.
In some embodiments, the following simplification may also be made to the set of similar question sentences: grouping the similar problems in the similar problem sentence set according to the knowledge points, and selecting a representative similar problem sentence from a plurality of similar problem sentences which can belong to the same knowledge point, so that the similar problem sentence set is simplified, and the calculated amount is reduced.
S204, obtaining the target confusion question sentence according to the target question sentence and the similar question sentence set.
In the embodiment of the application, the target confusion question sentence corresponding to the target question sentence is mined based on the target question sentence and the similar question sentence set corresponding to the target question sentence. Wherein the target confusion question sentence can be one or more, and the specific number is not limited in this application.
On the basis of the above embodiment, as shown in fig. 3, the "obtaining the target confusion question sentence according to the target question sentence and the set of similar question sentences" in the above step S204 may include the following steps:
s301, forming a question sentence pair by the target question sentence and any similar question sentence in the similar question sentence set.
S302, based on the word segmentation sequences of the two question sentences in the question sentence pair, the public word segmentation sequences corresponding to the question sentence pair are obtained.
In the embodiment of the present application, word segmentation is performed on the target question sentence and the similar question sentence in the question sentence pair, so as to obtain respective word segmentation sequences of the two question sentences, as shown in table 2:
TABLE 2 schematic table of word segmentation results for question sentences
Based on the word segmentation sequence of the target question sentence and the word segmentation sequence of the candidate question sentence, synonym words in the two word segmentation sequences are obtained, (i.e. synonyms or synonym phrases in the two word segmentation sequences are searched) and a plurality of word words which are synonyms are normalized into a unified target word segmentation representation, for example, a prompt in the target question sentence and a display in the similar question sentence in table 2 are synonyms, the two word segmentation uses a unified target word segmentation representation, and the target word can be any word in the prompt and the display. Replacing synonym in the two word segmentation sequences by using the target word segmentation to obtain two replaced word segmentation sequences, so as to normalize the two word segmentation sequences, and realize word alignment processing between the two word segmentation sequences, for example, as shown in table 3:
table 3 problem normalized results schematic table of corresponding word sequences
Wherein looking up synonyms or synonym phrases may be mined using a predefined synonym dictionary or neural network model. The present application is not limited.
The calculation using the neural network model can be realized through the following processes:
step one: and calculating the semantic vector of each word in each question sentence by using the BERT or encoding each word by using the BERT to obtain a word vector.
Step two: and calculating the similarity between the word of the target problem sentence in the problem sentence pair and the word of the similar problem sentence, and determining the word pair with the similarity larger than a certain threshold value as the synonymous word. Assuming that the question sentence 1 contains M tokens and the question sentence 2 contains N tokens, the obtained synonym token can be obtained by using the following formula:
the upper label indicates the number of the question sentence, the lower label indicates the number of the word, and the ++>A set of synonyms representing the ith term in question 1,/o->Representing similarity between the tokens, cosine values may be calculated using the token vector to represent similarity, and δ represents a threshold.
Comparing the two word segmentation sequences subjected to synonym word segmentation normalization, and selecting common word segmentation in the two word segmentation sequences to obtain a common word segmentation sequence of the problem sentence pair. The public word segmentation sequence may be the maximum public word segmentation sequence, for example, the maximum public word segmentation sequence of the question sentence pair in table 3 is: prompt, resource, shortfall, how. The common word segmentation sequence may be a sequence including a word segment shared by two question sentences and a word segment unique to each question sentence, and a forming manner of the common word segmentation sequence may be set as required.
As a possible implementation manner, the word segmentation processing on the question sentences can be performed before the step, and word segmentation is directly performed on all the question sentences in the question-answer knowledge base to obtain word segmentation sequences corresponding to the question sentences.
S303, generating a target confusion question sentence according to the public word segmentation sequence.
In some embodiments, target confusion question sentences of target question sentences are obtained according to the corresponding public word segmentation sequences of the question sentences.
On the basis of the above embodiment, as shown in fig. 4, the "generating the target confusion question sentence from the common word segmentation sequence" in the above step S303 may include the steps of:
s401, obtaining candidate confusion question sentences of the target question sentences based on the public word segmentation sequences.
In this embodiment of the present application, as shown in fig. 5, it is determined whether a sentence corresponding to a public word segmentation sequence is legal, where the sentence corresponding to the public word segmentation sequence is a sentence spliced by each word segmentation in the public word segmentation sequence, for example, the sentence corresponding to the public word segmentation sequence "hint, resource, insufficiency, how" is a hint resource insufficiency.
Editing the public word segmentation sequence to generate a new public word segmentation sequence in response to the fact that sentences corresponding to the public word segmentation sequence are illegal, and executing the step of judging whether the sentences corresponding to the public word segmentation sequence are legal or not according to the new public word segmentation sequence; and responding to the legal sentences corresponding to the public word segmentation sequences, and determining the legal sentences as candidate confusion problem sentences.
The preset condition of exiting the loop can be added before the common word segmentation sequence is edited, wherein the preset condition can be that the total editing frequency of one common word segmentation sequence is larger than the preset frequency or the word segmentation number of the common word segmentation sequence is smaller than the preset word segmentation number. Discarding the public word segmentation sequence in response to the public word segmentation sequence meeting a preset condition; and executing the step of editing the public word segmentation sequence in response to the public word segmentation sequence not meeting the preset condition.
The editing process of the public word segmentation sequence can be realized by deleting the first word segmentation or the last word segmentation in the public word segmentation sequence, so that the public word segmentation sequence after word segmentation deletion is used as a new public word segmentation sequence. In addition, the common word segmentation sequence can be edited through the model.
As a possible implementation manner, whether the sentence is legal or not may be determined according to the completeness of the sentence and/or the probability of the sentence appearing in the question-answer scene.
As another possible implementation manner, the judgment of whether the sentence is legal or not can be realized by constructing a model and training the model, for example, whether the sentence is legal or not is judged based on a trained target classification model, wherein the target classification model can be obtained by training based on a problem sentence in a user log as a positive sample and a positive sample after character deletion as a negative sample.
Illustrating: judging sentences corresponding to the public word segmentation sequences (prompt, resource, insufficiency and how) and outputting the sentences as illegal, wherein the 'how to prompt the insufficient resource' is a incomplete sentence. The exit preset conditions assumed to be set are: the number of the words in the word segmentation sequence is smaller than 3, whether the public word segmentation sequence meets the exit condition is judged, the output is unsatisfied, the last word is deleted, the output is [ prompt, resource and shortage ] as a new public word segmentation sequence, whether the statement corresponding to the new public word segmentation sequence is legal is judged, the 'prompt resource shortage' is output as a legal statement, and the legal statement is determined as a candidate confusion degree problem statement.
S402, determining the confusion degree score of the candidate confusion question sentence.
In the embodiment of the present application, the confusion score of each candidate confusion question sentence is calculated, where the calculation method of the confusion score may be the following process:
calculating the similarity between the candidate confusion question sentence and any question sentence in the question-answer knowledge base; and screening a preset number of higher-ranking similarities from the similarities, for example, in the embodiment of the application, selecting two similarities from the similarities according to the value of the similarities from high to low, and determining the confusion score of the candidate confusion question sentence according to the two similarities. Wherein the calculation formula is as follows:
score(C)=-|sim(C,K 1 )-sim(C,K 2 )|
wherein C isCandidate confusion question sentences, score (C) is the confusion score, sim (C, K) 1 ),sim(C,K 2 ) Is the highest two similarity degrees of the search knowledge base, K 1 And K 2 And the two question sentences with the greatest similarity with the candidate confusion question sentences in the question-answer knowledge base are obtained. Wherein, the larger the similarity difference is, the lower the confusion is; the smaller the similarity difference, the higher the confusion.
S403, determining a target confusion question sentence from the candidate confusion question sentences according to the confusion degree score.
In the embodiment of the application, the candidate confusion question sentences corresponding to the target question sentences are ranked according to the confusion degree scores, and a preset number of target confusion question sentences can be selected from high to low according to requirements.
The embodiment of the application provides an optimization method of a question-answer knowledge base, wherein target question sentences are selected from a question set included in the question-answer knowledge base, and target confusion question sentences corresponding to the target question sentences are obtained according to the question set; and determining the associated knowledge points corresponding to the target confusion question sentences, and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points. By mining the mixed question sentences of each knowledge point and attributing the mixed question sentences to the corresponding associated knowledge points, the question-answer knowledge base is expanded, the coverage range of the question-answer knowledge base to the possibly proposed question sentences by the user is enhanced, and when the common question-answer or retrieval function is realized based on the question-answer knowledge base, the knowledge points corresponding to the question sentences proposed by the user can be correctly matched, so that the response effect is enhanced.
Fig. 6 is a block diagram of an optimizing apparatus for a question-answer knowledge base, as shown in fig. 6, where the optimizing apparatus 600 for a question-answer knowledge base includes: a first determining module 601, an obtaining module 602, and a second determining module 603, wherein:
the first determining module 601 is configured to determine a question-answer knowledge base, where the question-answer knowledge base includes knowledge points and a question set corresponding to the knowledge points.
The obtaining module 602 is configured to select a target question sentence from a question set included in the question-answer knowledge base, and obtain a target confusion question sentence corresponding to the target question sentence according to the question set.
The second determining module 603 is configured to determine an associated knowledge point corresponding to the target confusion question sentence, and attribute the target confusion question sentence to a question set corresponding to the associated knowledge point.
According to one embodiment of the present application, the obtaining module 602 is further configured to: determining a reference question sentence for generating a target confusion question sentence; determining the similarity between the target question sentence and the reference question sentence; according to the similarity, determining a similar problem sentence set of the target problem sentence; and obtaining the target confusion question sentence according to the target question sentence and the similar question sentence set.
According to one embodiment of the present application, the obtaining module 602 is further configured to: and performing word segmentation processing on the question sentences in the question-answer knowledge base to obtain word segmentation sequences corresponding to the question sentences.
According to one embodiment of the present application, the obtaining module 602 is further configured to: forming a question sentence pair by the target question sentence and any similar question sentence in the similar question sentence set; based on the word segmentation sequences of the two question sentences in the question sentence pair, obtaining a public word segmentation sequence corresponding to the question sentence pair; and generating a target confusion question sentence according to the public word segmentation sequence.
According to one embodiment of the present application, the obtaining module 602 is further configured to: acquiring synonymous word in two word segmentation sequences; normalizing the synonym word to obtain a target word of the synonym word; replacing synonymous word in the two word segmentation sequences by using the target word segmentation to obtain two replaced word segmentation sequences; and comparing the two replaced word segmentation sequences to generate a common word segmentation sequence.
According to one embodiment of the present application, the obtaining module 602 is further configured to: obtaining candidate confusion question sentences of the target question sentences based on the public word segmentation sequences; determining a confusion degree score of the candidate confusion question sentence; and determining a target confusion question sentence from the candidate confusion question sentences according to the confusion degree score.
According to one embodiment of the present application, the obtaining module 602 is further configured to: and judging whether the sentences corresponding to the public word segmentation sequences are legal or not. And in response to the fact that the sentences corresponding to the public word segmentation sequences are illegal, editing the public word segmentation sequences to generate new public word segmentation sequences, and executing the step of judging whether the sentences corresponding to the public word segmentation sequences are legal or not according to the new public word segmentation sequences. And responding to the legal sentences corresponding to the public word segmentation sequences, and determining the legal sentences as candidate confusion problem sentences.
According to one embodiment of the present application, the obtaining module 602 is further configured to: before executing the step of judging whether the sentence corresponding to the public word segmentation sequence is legal or not aiming at the new public word segmentation sequence, discarding the public word segmentation sequence in response to the new public word segmentation sequence meeting the preset condition; and responding to the fact that the new public word segmentation sequence does not meet the preset condition, and executing the step of judging whether the sentence corresponding to the public word segmentation sequence is legal or not according to the new public word segmentation sequence.
According to one embodiment of the application, the preset condition is that the number of edits of the common word segmentation sequence is greater than a preset number, or that the number of words of the new common word segmentation sequence is less than the preset number of words.
According to one embodiment of the present application, the obtaining module 602 is further configured to: judging whether the sentence is legal or not according to the completeness of the sentence and/or the probability of the sentence in the question-answer scene.
According to one embodiment of the present application, the obtaining module 602 is further configured to: judging whether the sentence is legal or not based on a trained target classification model, wherein the target classification model is obtained by training on the basis of taking a problem sentence in a user log as a positive sample and taking a positive sample after character deletion as a negative sample.
According to one embodiment of the present application, the obtaining module 602 is further configured to: deleting the first word or the last word in the public word sequence to take the public word sequence after word deletion as a new public word sequence
According to one embodiment of the present application, the obtaining module 602 is further configured to: respectively determining the similarity between the candidate confusion question sentence and any question sentence in the question-answering knowledge base; and determining the confusion degree score of the candidate confusion question sentence according to the similarity.
It should be noted that the explanation of the above embodiment of the method for optimizing a question-answer knowledge base is also applicable to the optimizing device for a question-answer knowledge base in this embodiment, and the specific process is not repeated here.
The embodiment of the application provides an optimizing device of a question-answer knowledge base, wherein a target question sentence is selected from a question set included in the question-answer knowledge base, and a target confusion question sentence corresponding to the target question sentence is obtained according to the question set; and determining the associated knowledge points corresponding to the target confusion question sentences, and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points. By mining the mixed question sentences of each knowledge point and attributing the mixed question sentences to the corresponding associated knowledge points, the question-answer knowledge base is expanded, the coverage range of the question-answer knowledge base to the possibly proposed question sentences by the user is enhanced, and when the common question-answer or retrieval function is realized based on the question-answer knowledge base, the knowledge points corresponding to the question sentences proposed by the user can be correctly matched, so that the response effect is enhanced.
In order to implement the foregoing embodiments, the embodiments of the present application further provide an electronic device 700, as shown in fig. 7, where the electronic device 700 includes: the processor 701 is communicatively connected to a memory 702, where the memory 702 stores instructions executable by at least one processor, and the instructions are executed by at least one processor 701 to implement the method for optimizing a question-answer knowledge base as in the above embodiments.
In order to implement the above embodiment, the present application further proposes a non-transitory computer-readable storage medium storing computer instructions for causing a computer to implement the method for optimizing a question-answer knowledge base as shown in the above embodiment.
In order to implement the above embodiments, the embodiments of the present application further provide a computer program product, including a computer program, which when executed by a processor implements the method for optimizing a question-answer knowledge base as shown in the above embodiments.
In the description of the present application, it should be understood that the terms "center," "longitudinal," "transverse," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," etc. indicate orientations or positional relationships based on the orientations or positional relationships illustrated in the drawings, are merely for convenience in describing the present application and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be configured and operated in a particular orientation, and therefore should not be construed as limiting the present application.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (21)

1. The method for optimizing the question-answer knowledge base is characterized by comprising the following steps of:
determining a question-answer knowledge base, wherein the question-answer knowledge base comprises knowledge points and a question set corresponding to the knowledge points;
selecting a target question sentence from a question set included in the question-answer knowledge base, and acquiring a target confusion question sentence corresponding to the target question sentence according to the question set;
and determining an associated knowledge point corresponding to the target confusion question sentence, and attributing the target confusion question sentence to a question set corresponding to the associated knowledge point.
2. The optimization method according to claim 1, wherein the obtaining, according to the question set, a target confusion question sentence corresponding to the target question sentence includes:
determining a reference question sentence for generating the target confusion question sentence;
determining the similarity of the target question sentence and the reference question sentence;
according to the similarity, determining a similar problem sentence set of the target problem sentence;
and acquiring the target confusion question sentence according to the target question sentence and the similar question sentence set.
3. The optimization method according to claim 2, wherein before the obtaining the target confusion question sentence according to the target question sentence and the set of similar question sentences, the optimization method further comprises:
and performing word segmentation processing on the question sentences in the question-answer knowledge base to obtain word segmentation sequences corresponding to the question sentences.
4. The optimization method according to claim 2 or 3, wherein the obtaining the objective confusion question sentence according to the objective question sentence and the similar question sentence set includes:
forming a question sentence pair by the target question sentence and any similar question sentence in the similar question sentence set;
based on the word segmentation sequences of the two question sentences in the question sentence pair, obtaining a public word segmentation sequence corresponding to the question sentence pair;
and generating the target confusion question sentence according to the public word segmentation sequence.
5. The optimization method according to claim 4, wherein the obtaining the common word segmentation sequence corresponding to the question sentence pair based on the word segmentation sequences of the two question sentences in the question sentence pair includes:
acquiring synonymous word in two word segmentation sequences;
normalizing the synonym word to obtain a target word of the synonym word;
replacing the synonymous word in the two word segmentation sequences by using the target word segmentation to obtain two replaced word segmentation sequences;
and comparing the two replaced word segmentation sequences to generate the public word segmentation sequence.
6. The optimization method of claim 4, wherein the generating the target confusion question sentence from the common word segmentation sequence comprises:
obtaining candidate confusion question sentences of the target question sentences based on the public word segmentation sequences;
determining a confusion score for the candidate confusion question sentence;
and determining the target confusion question sentence from the candidate confusion question sentences according to the confusion degree score.
7. The optimization method of claim 6, wherein the obtaining candidate confusion question sentences of the target question sentence based on the common word segmentation sequence comprises:
judging whether the sentences corresponding to the public word segmentation sequences are legal or not;
responding to the illegal statement corresponding to the public word segmentation sequence, editing the public word segmentation sequence to generate a new public word segmentation sequence, and executing the step of judging whether the statement corresponding to the public word segmentation sequence is legal or not according to the new public word segmentation sequence;
and responding to the statement legal corresponding to the public word segmentation sequence, and determining the legal statement as the candidate confusion question statement.
8. The optimization method of claim 7, wherein before editing the common word segmentation sequence, further comprising:
discarding the public word segmentation sequence in response to the public word segmentation sequence meeting a preset condition;
and executing the step of editing the public word segmentation sequence in response to the public word segmentation sequence not meeting a preset condition.
9. The optimization method according to claim 8, wherein the preset condition is that the total number of edits of the common word segmentation sequence is greater than a preset number or that the number of words of the common word segmentation sequence is less than a preset number of words.
10. The optimization method according to claim 7, wherein the determining whether the sentence corresponding to the common word segmentation sequence is legal comprises:
judging whether the statement is legal or not according to the completeness of the statement and/or the probability of the statement in a question-answer scene.
11. The optimization method according to claim 7, wherein the determining whether the sentence corresponding to the common word segmentation sequence is legal comprises:
judging whether the sentence is legal or not based on a trained target classification model, wherein the target classification model is obtained by training on the basis that a problem sentence in a user log is taken as a positive sample and the positive sample after character deletion is taken as a negative sample.
12. The optimization method of claim 7, wherein the editing the common word segmentation sequence to generate the new common word segmentation sequence comprises:
deleting the first word segment or the last word segment in the public word segment sequence to take the public word segment sequence after word segment deletion as a new public word segment sequence.
13. The optimization method of claim 6, wherein said determining a confusion score for the candidate confusion question sentence comprises:
respectively determining the similarity between the candidate confusion question sentence and any question sentence in the question-answering knowledge base;
and determining the confusion degree score of the candidate confusion question sentence according to the similarity.
14. An optimizing device of a question-answer knowledge base, comprising:
the first determining module is used for determining a question-answer knowledge base, wherein the question-answer knowledge base comprises knowledge points and a question set corresponding to the knowledge points;
the acquisition module is used for selecting a target question sentence from a question set included in the question-answer knowledge base and acquiring a target confusion question sentence corresponding to the target question sentence according to the question set;
and the second determining module is used for determining the associated knowledge points corresponding to the target confusion question sentences and attributing the target confusion question sentences to the question sets corresponding to the associated knowledge points.
15. The optimization device of claim 14, wherein the acquisition module is further configured to:
determining a reference question sentence for generating the target confusion question sentence;
determining the similarity of the target question sentence and the reference question sentence;
according to the similarity, determining a similar problem sentence set of the target problem sentence;
and acquiring the target confusion question sentence according to the target question sentence and the similar question sentence set.
16. The optimization device of claim 15, wherein the acquisition module is further configured to:
and performing word segmentation processing on the question sentences in the question-answer knowledge base to obtain word segmentation sequences corresponding to the question sentences.
17. The optimization device of claim 15 or 16, wherein the acquisition module is further configured to:
forming a question sentence pair by the target question sentence and any similar question sentence in the similar question sentence set;
based on the word segmentation sequences of the two question sentences in the question sentence pair, obtaining a public word segmentation sequence corresponding to the question sentence pair;
and generating the target confusion question sentence according to the public word segmentation sequence.
18. The optimization device of claim 17, wherein the acquisition module is further configured to:
obtaining candidate confusion question sentences of the target question sentences based on the public word segmentation sequences;
determining a confusion score for the candidate confusion question sentence;
and determining the target confusion question sentence from the candidate confusion question sentences according to the confusion degree score.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.
21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-13.
CN202211110038.5A 2022-09-13 2022-09-13 Question-answering knowledge base optimizing method and device Pending CN117743519A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211110038.5A CN117743519A (en) 2022-09-13 2022-09-13 Question-answering knowledge base optimizing method and device
PCT/CN2023/088448 WO2024055582A1 (en) 2022-09-13 2023-04-14 Optimization method and apparatus for question-and-answer knowledge base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211110038.5A CN117743519A (en) 2022-09-13 2022-09-13 Question-answering knowledge base optimizing method and device

Publications (1)

Publication Number Publication Date
CN117743519A true CN117743519A (en) 2024-03-22

Family

ID=90257759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211110038.5A Pending CN117743519A (en) 2022-09-13 2022-09-13 Question-answering knowledge base optimizing method and device

Country Status (2)

Country Link
CN (1) CN117743519A (en)
WO (1) WO2024055582A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8155951B2 (en) * 2003-06-12 2012-04-10 Patrick William Jamieson Process for constructing a semantic knowledge base using a document corpus
CN110019305B (en) * 2017-12-18 2024-03-15 上海智臻智能网络科技股份有限公司 Knowledge base expansion method, storage medium and terminal
CN111125379B (en) * 2019-12-26 2022-12-06 科大讯飞股份有限公司 Knowledge base expansion method and device, electronic equipment and storage medium
CN113158688B (en) * 2021-05-11 2023-12-01 科大讯飞股份有限公司 Domain knowledge base construction method, device, equipment and storage medium
CN113536776B (en) * 2021-06-22 2024-06-14 深圳价值在线信息科技股份有限公司 Method for generating confusion statement, terminal device and computer readable storage medium

Also Published As

Publication number Publication date
WO2024055582A1 (en) 2024-03-21

Similar Documents

Publication Publication Date Title
CN106874441B (en) Intelligent question-answering method and device
CN110263854B (en) Live broadcast label determining method, device and storage medium
CN111563384A (en) Evaluation object identification method and device for E-commerce products and storage medium
CN116737908A (en) Knowledge question-answering method, device, equipment and storage medium
CN110781280A (en) Knowledge graph-based voice assisting method and device
US11379527B2 (en) Sibling search queries
US10650195B2 (en) Translated-clause generating method, translated-clause generating apparatus, and recording medium
CN116882372A (en) Text generation method, device, electronic equipment and storage medium
CN116663525A (en) Document auditing method, device, equipment and storage medium
CN110502620B (en) Method, system and computer equipment for generating guide diagnosis similar problem pairs
CN110795942B (en) Keyword determination method and device based on semantic recognition and storage medium
CN117668181A (en) Information processing method, device, terminal equipment and storage medium
CN116383366A (en) Response information determining method, electronic equipment and storage medium
CN117216226A (en) Knowledge positioning method, device, storage medium and equipment
KR20220122429A (en) Method, server and computer program for providing legal documents clustering search service using artificial intelligence
CN106407332B (en) Search method and device based on artificial intelligence
CN110929514B (en) Text collation method, text collation apparatus, computer-readable storage medium, and electronic device
CN111782789A (en) Intelligent question and answer method and system
CN117743519A (en) Question-answering knowledge base optimizing method and device
CN114547059A (en) Platform data updating method and device and computer equipment
CN109918651B (en) Synonym part-of-speech template acquisition method and device
CN113704422A (en) Text recommendation method and device, computer equipment and storage medium
CN108573025B (en) Method and device for extracting sentence classification characteristics based on mixed template
CN112035623B (en) Intelligent question-answering method and device, electronic equipment and storage medium
CN108197151B (en) Grammar library updating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination