CN111160512B

CN111160512B - Method for constructing double-discriminant dialogue generation model based on generation type countermeasure network

Info

Publication number: CN111160512B
Application number: CN201911224148.2A
Authority: CN
Inventors: 贺樑; 张凉; 朱频频; 杨燕; 陈成才
Original assignee: East China Normal University; Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: East China Normal University; Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2023-06-13
Anticipated expiration: 2039-12-04
Also published as: CN111160512A

Abstract

The invention discloses a method for constructing a double-discriminant dialogue generating model based on a generating type countermeasure network, which comprises the steps of firstly, processing corpus to obtain quaternary corpus with similar dialogue information; then pre-training a rewrite model and a discrimination model, wherein the former can rewrite the matched similar replies so as to generate a reply which is more in line with the current context, and the latter can discriminate true and false sentences to distinguish whether the sentences come from a corpus or the rewrite model; and finally, the model and the judging model are rewritten to conduct countermeasure learning, and the optimal rewriting effect is obtained in the game process of the model and the judging model. The invention improves the generation model by introducing two discriminators at multiple angles, and has great progress in the aspects of grammar, context correlation and the like of the generated sentences.

Description

Method for constructing double-discriminant dialogue generation model based on generation type countermeasure network

Technical Field

The invention relates to a natural language processing, deep learning and dialogue system, in particular to a method for constructing a double-discriminant dialogue generation model based on a generation type antagonism network (GAN).

Background

With the development of smart phones and smart homes, human-machine interaction becomes more and more frequent, and users have higher and higher requirements on the quality of conversation with machines, so that smooth, fluent and various communication experiences are expected, which means that the template-based conversation system commonly used in the industry at present is difficult to meet the demands of users. The method for constructing the dialogue system commonly used in the industry at present is mostly based on templates, namely, manually arranging and defining a large number of dialogues to form templates, inputting the dialogues of users into the predefined templates to obtain fixed replies, and the method has fewer topics and higher labor cost. With the development of big data technology, deep learning technology and computer hardware, the academic world has explored a lot of in the aspect of automatically generating dialogue systems, the big data technology provides a lot of analyzable corpus, the deep learning technology provides a complex computing frame, the computer hardware technology provides high-speed operation, and the three work together to promote the development of dialogue systems.

The generated dialogue system, as the name implies, is that the reply obtained by the user is created by the system, no one tells the system about the fixed answer in advance, and only "losing" a large amount of data for the user needs to learn by the user, so that the process of "self learning" needs to utilize deep learning. The "self-learning" of most of the current generation-type dialogue systems is based on the seq2seq model, so that the model learns how to encode the input, then decodes the input to obtain a reply, and improves the model by continuously narrowing the gap between the generated sentence and the real sentence, but this causes "lazy" of the model, i.e. it learns only the simple generation, i.e. it is more prone to generate a universal reply of "i don't know", "good", "one's". Meanwhile, since the corpus adopted by the traditional methods is the binary corpus of 'above + reply', no intervention from reply sentences exists in the generation process, so that a model does not know what sentences are correct, and some grammatical errors and utterances with poor semantics can be generated sometimes. There is a need for a method that improves the diversity, context correlation, grammar accuracy of automatically generated dialog systems.

Disclosure of Invention

The invention aims at providing a method for constructing a double-discriminant dialogue generating model based on a generating type countermeasure network aiming at the defects of the existing model of the generating type dialogue system, and the method focuses on two core problems of grammar and semantics of generating replies and provides effective optimization.

The specific technical scheme for realizing the aim of the invention is as follows:

a method for constructing a double-discriminant dialogue generation model based on a generation type countermeasure network comprises the following specific steps:

step 1: corpus processing

According to the current context C, matching similar contexts C 'in a corpus by using a text matching algorithm, so as to obtain replies R' under the similar contexts, and forming quaternary corpuses < C, R, C ', R' >;

step 2: initializing a rewrite model

Training the quaternary data obtained in the step 1 under a seq2seq frame to obtain a primary rewrite model, wherein the rewrite model can rewrite R 'by combining the contexts C and C' to generate a reply R, and the rewrite model at the moment does not reach an ideal effect and has larger loss;

step 3: initializing a discriminant model

Training two discriminators of a discrimination model by using the reply R generated in the step 2 and the true reply R, wherein the two discriminators respectively judge from two aspects of the rewriting effect and the context correlation, and the specific process is as follows:

a discriminator_1 for discriminating whether the effect of the rewriting is good or bad, which is inputted as a sentence before and after the rewriting, wherein the sentence before the rewriting is "True" and the sentence after the rewriting is "False", and the discriminator_1 is operative to discriminate the two classes as much as possible;

the discriminator_2 is used for discriminating whether the context correlation is good or bad, so that the input of the discriminator_2 is 'current context + generated reply' and 'current context + real reply', obviously, the context correlation of the discriminator_2 is strongest, the category of the discriminator_2 is marked as 'True', and the role of the discriminator_2 is to distinguish two categories as far as possible;

step 4: model rewriting and discriminant model countermeasure training

Updating parameters by the rewritten model according to feedback of the discrimination model, then transmitting the generated sentences to the discrimination model by the new rewritten model, updating the model parameters by the discrimination model by using the accuracy of discriminating true and false sentences, wherein in the countermeasure process, loss of both the models is in a descending trend until the models tend to be stable, and the countermeasure training is terminated; and the obtained rewritten model is the optimal model, namely the dialogue generating model.

In the process of countermeasure training, the lower the accuracy of the discriminator is, the better the effect of rewriting the model is, because the discriminator is "confused"; however, the discriminators also increase their discrimination with increasing effect of rewriting the model and decreasing accuracy of themselves, which is a so-called "countermeasure" process. When the accuracy of the loss of the generated model and the discrimination model is reduced to a certain degree and tends to be stable, the two models are considered to be 'equivalent' and the antagonism can be stopped. The generated model obtained at this time is the optimal model, and the rewritten model is taken as the final dialogue generated model.

Compared with the prior art, the invention has the following advantages:

1) The grammar is correct: compared with a sentence generated from zero, the method can provide a good grammar basis by rewriting based on the existing sentence;

2) Semantic smoothing: the method is rewritten according to replies under similar contexts, and can have better context basis, so that the obtained replies are more in line with the context.

3) The model has strong automatic learning: the model and the discriminator are rewritten against learning, and the model parameters are modified according to the effect of the other party, not just the own loss.

Drawings

Fig. 1 is an overall frame diagram of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific embodiments and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific references, and the present invention is not limited in particular.

Examples

Referring to fig. 1, the method for constructing a dual-arbiter dialogue generation model based on a generation type countermeasure network provided by the invention is a method for constructing a dialogue generation model by rewriting instead of creating and rewriting the model and learning with arbiter games, as shown in the figure, the current context C is known, and a text matching algorithm is utilized to match the similar context C 'and a reply R' thereof. The left dashed box in the figure is the rewrite model of this embodiment, which is based on the seq2seq framework, where the encoder encodes R ', and the decoder decodes while introducing the difference diff (C, C ') between C and C ', resulting in the generated reply R. In the figure, the right virtual box part is a discrimination model of the embodiment, the discriminator_1 learns to distinguish R and the real reply R obtained by the rewrite model, the discriminator_2 learns to distinguish the false dialogue C+R and the real dialogue C+R, and feedback obtained by the two discriminators is transmitted back to the rewrite model. The embodiment specifically comprises the following steps:

step 1: data preprocessing

According to observation and experience, the modification is easier than the creation from zero, so the embodiment proposes to obtain the reply of the dialogue by rewriting a sentence, and the corpus is processed first. At present, the existing dialogue corpora are paired, namely, the current context corresponds to a sentence reply (response), in order to have a better rewriting basis, the current context and the current context are ensured to be under similar context conditions, so that similar contexts are obtained according to a matching algorithm, corresponding replies are extracted to form a quaternary corpus < C, R, C ', R ' >, the text matching algorithm is directly called, and the first 10C ' with the highest matching score with C are obtained to form the required corpus. After the quaternary corpus is obtained, the data is further cleaned, the maximum sentence length is set to be 50 words, and the data is discarded after exceeding the threshold value. Further, the corpus is divided into a training set, a verification set and a test set, and the proportion is 7:2:1.

Step 2: pre-training rewrite model

The rewriting model is based on a common framework of a generating model, namely seq2seq, the input of the framework is R ', the target output is R, the framework generates R by word, and an Attention mechanism is introduced in the decoding process, wherein the Attention of the Attention mechanism is from an edit vector diff (C, C'), R 'and a generated word sequence, and the edit vector is a difference word vector sequence of C and C'. The model is trained by using the training set obtained in the step 1, a model is obtained after each round of training is finished, the verification set obtained in the step 1 is used for verification, and the model with the best effect is selected as an initialization model for rewriting the model in the countermeasure process.

Step 3: pre-training discriminant model

The decision maker is based on a neural network, which functions to decide whether a sentence/group of conversations is true or not. Before inputting sentences into a discriminator, firstly labeling sentences, wherein the labels from a corpus are 1, and the labels generated by a rewritten model are 0; the discriminator is then trained to distinguish as far as possible between the two classes of sentences. To distinguish the authenticity of sentences from different aspects, the present embodiment uses two discriminators whose inputs are shown in fig. 1, and uses R and R' to train the discriminators_1 to discriminate the syntacticity of sentences, and uses c+r and c+r to train the discriminators_2 to discriminate the context.

Step 4: writing model and discriminating model countermeasure learning

In the process of countermeasure learning, the object of the rewriting model is to generate a real deception judging model as much as possible, and the object of the judging model is to distinguish the reply generated by the rewriting model from the real reply as much as possible. Thus, the rewrite model and the discrimination model constitute a dynamic countermeasure. In the most ideal case, the rewrite model G may generate sufficient replies to "spurious true". For the discriminant model D, it is difficult to determine whether the reply generated by the rewritten model is true or not, and the accuracy of the discriminant is D (G) =0.5. In the countermeasure process, parameters are continuously adjusted to enable the accuracy of the judging model to approach 0.5, and meanwhile, the loss of the rewritten model is reduced and finally tends to be stable. The finally obtained dialogue generation model improves the generated reply in terms of grammar and context, and results of 0.629, 0.755 and 0.682 are respectively obtained on vector cosine similarity measurement index Greedy, average, extrema. This results in a generative model that can be used to generate the appropriate replies.

Claims

1. A method for constructing a double-discriminant dialogue generation model based on a generation type countermeasure network is characterized by comprising the following specific steps:

step 1: corpus processing

step 2: initializing a rewrite model

Training the quaternary data obtained in the step 1 under a seq2seq frame to obtain a primary rewrite model, wherein the rewrite model can rewrite R 'by combining the contexts C and C', and a reply R is generated;

step 3: initializing a discriminant model

a discriminator_1 for discriminating the effect of rewriting, which is input as a sentence before and after rewriting, wherein the sentence type before rewriting is "True", the sentence type after rewriting is "False", and the discriminator_1 is to discriminate the two types;

a discriminator_2 for discriminating the context correlation, the inputs of which are "current context+generated reply" and "current context+true reply", the latter having the strongest context correlation, the former being "True", the discriminator_2 being to discriminate two categories;

step 4: model rewriting and discriminant model countermeasure training

The rewritten model updates parameters according to feedback of the discrimination model, the updated rewritten model transmits the generated sentences to the discrimination model, the discrimination model updates the model parameters by utilizing the accuracy of discriminating true and false sentences, and in the countermeasure process, loss of both the models is in a descending trend until the models tend to be stable, and the countermeasure training is terminated; and the obtained rewritten model is the optimal model, namely the dialogue generating model.