CN116701566A

CN116701566A - Multi-round dialogue model and dialogue method based on emotion

Info

Publication number: CN116701566A
Application number: CN202310486999.4A
Authority: CN
Inventors: 王博; 刘书雅
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-04-29
Filing date: 2023-04-29
Publication date: 2023-09-05

Abstract

The application discloses a multi-round dialogue model and a dialogue method based on emotion, wherein the multi-round dialogue model comprises a dialogue history embedding module, an emotion prediction module and an emotion embedding module; the dialogue history embedding module comprises a word level encoder and a dialogue history self-attention module, and is used for inputting dialogue history information into the multi-round dialogue model; different dialog history information represents different rounds of dialog; the emotion prediction module is used for predicting the replied emotion, and the emotion type predicted by the emotion prediction module replaces the manually appointed emotion type; the emotion prediction module obtains emotion types through the dialogue history, wherein the emotion prediction module outputs one emotion type for a sentence vector of a section of dialogue history; the emotion embedding module is used for generating a reply and embedding a designated emotion type in the reply, the embedded emotion type is provided by the emotion prediction module, and finally, the multi-round dialogue model generates the reply with emotion, so that the reply quality is improved.

Description

Multi-round dialogue model and dialogue method based on emotion

Technical Field

The application relates to the technical field of multi-round dialogue and deep learning, in particular to a multi-round dialogue modeling method based on emotion.

Background

With the rapid development of artificial intelligence technology, computing power and computing infrastructure, the possibilities of intelligent dialog systems have increased. Currently, the application of dialogue systems in social life is also becoming increasingly popular. Various intelligent assistants on the intelligent terminal equipment can complete simple tasks, such as inquiring weather, making a call, adding schedules, performing simple boring and the like, according to input information (including voice and words) of a user. Intelligent customer service of each industry can also provide detailed explanation of schemes and terms for users according to their query requests. The intelligent dialogue system not only meets the requirements of users, but also saves a great deal of labor cost for the platform side, and improves the overall productivity.

The nature of dialog systems although they are all in reply to user input, dialog systems can be broadly divided into two categories depending on the particular application: task oriented dialog systems, open domain dialog systems. Task oriented dialog systems require a user to complete a particular task through a dialog. Unlike other applications, an open domain dialog system does not aim to help a user accomplish a particular task, but rather interacts with the user through a coherent and attractive dialog. The answers generated by such systems need to be as close as possible to human answers ^[1] 。

Open domain dialog systems are more abstract in terms of goals than task-oriented dialog systems, and the answers generated by the system are not explicit standard answers. Optimization of the system is more difficult due to the lack of explicit mathematical evaluation criteria. Thus, open domain dialog systems are more difficult, requiring the system to understand the user context and emotion requirements in depth. In addition, because the open domain dialog system generates answers and context-dependent, consistency considerations are made with respect to dialog history information and speaker personality and attributes. The dialogue effect of the open domain dialogue system is greatly improved, but the answer generated by the system is still different from the answer of human beings. Therefore, the open domain dialog system has higher research difficulty and relatively smaller application range, but is not independent from the task-oriented dialog system, and the task-oriented dialog system can also have daily dialog functions. Research into open domain dialog systems can enhance the user experience of the dialog system.

Multiple rounds of conversations can simulate the linguistic behavior of humans in multiple rounds of conversations as compared to single rounds of conversations. Such models require consideration of context information, dialog history, etc. to determine the response of the dialog system. And the multi-round dialogue is more consistent with the dialogue process between human beings in real life. Recent work on multi-turn dialog generation has mainly incorporated dialog history into the model. 2016 Serban et al ^[5] The HRED model is proposed, which uses a recurrent neural network to fully consider context information, unlike the normal dialog model. In 2017 by Vaswani et al ^[6] After the Transfomer model is set up, the multi-round dialog system works ^[8,9] Most consider processing using the attention mechanism.

Secondly, emotion communication is an important problem in man-machine conversation systems. Many works indicate ^[2,3 ，4]In man-machine conversation systems, in addition to replying to content, emotional communication of a machine with a person is also an important issue. Humans can express their own emotion through language, thereby obtaining a soothing effect. In order to effectively communicate emotion with human beings, the dialogue system must have emotion understanding capability, including identifying emotion of the user and expressing proper emotion. If the dialog system has no emotion factor, the user may feel that the reply is very hard. Therefore, how to provide a machine with emotion understanding and expression capability is a new opportunity and challenge in the field of man-machine conversations. Emotion dialogue System in 2018 Zhou H et al ^[10] The proposed ECM model is typically one that can be used to generate a response with a particular emotion by specifying that emotion.

And emotion factors are added into the multi-round dialogue system, emotion changes in the multi-round dialogue are learned, and answers generated by the system are closer to human beings, so that the experience of a user in using the dialogue system is improved. The learned emotion change is more consistent with the rules of human dialogue than the appointed emotion to generate an answer, and is more beneficial to the dialogue system to generate a more natural reply.

[ reference ]

[1]Chen H,Liu X,Yin D,et al.A survey on dialogue systems:Recent advances and new frontiers[J].Acm Sigkdd Explorations Newsletter,2017,19(2):25-35.

[2]Hazarika D,Poria S,Mihalcea R,et al.Icon:Interactive conversational memory network for multimodal emotion detection[C]//Proceedings of the 2018conference on empirical methods in natural language processing.2018:2594-2604.

[3]Zhou H,Huang M,Zhang T,et al.Emotional chatting machine:Emotional conversation generation with internal and external memory[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1).

[4]Asghar N,Poupart P,Hoey J,et al.Affective neural response generation[C]//European Conference on Information Retrieval.Springer,Cham,2018:154-166.

[5]Serban I,Sordoni A,Bengio Y,et al.Building end-to-end dialogue systems using generative hierarchical neural network models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016,30(1).

[6]Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[J].Advances in neural information processing systems,2017,30.

[7]Sun B,Feng S,Li Y,et al.Generating relevant and coherent dialogue responses using self-separated conditional variational autoencoders[J].arXiv preprint arXiv:2106.03410,2021.

[8]Paulhus,D.L.,Vazire,S.2007.The self-report method.Handbook of research methods in personality psychology,1,224-239.

[9]Xu J，Lei Z，Wang H，et a1.Discovering dialog structure graph for coherent dialog generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the llth International Joint Conference on Natural Language Processing(Volume 1：Long Papers).2021：1726-1739.

[10]Zhou H，Huang M，Zhang T，et al.Emotional chatting machine：Emotional conversation generation with internal and external memory[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018，32(1).

Disclosure of Invention

The application aims to overcome the defects in the prior art and provide a multi-round dialogue model and a dialogue method based on emotion, namely emotion factors are added into a multi-round dialogue system, so that more reasonable replies are generated.

The application aims at realizing the following technical scheme:

a multi-round dialogue model based on emotion comprises a dialogue history embedding module, an emotion prediction module and an emotion embedding module;

the dialogue history embedding module comprises a word level encoder and a dialogue history self-attention module, and is used for inputting dialogue history information into the emotion-based multi-round dialogue model; different dialog history information represents different rounds of dialog;

the emotion prediction module is used for predicting the replied emotion, and the emotion type predicted by the emotion prediction module replaces the manually appointed emotion type; the emotion prediction module obtains emotion types through the dialogue history, wherein the emotion prediction module outputs one emotion type for a sentence vector of a section of dialogue history;

and the emotion embedding module is used for generating a reply and embedding a designated emotion type in the reply, wherein the embedded emotion type is provided by the emotion prediction module, so that the multi-round dialogue model generates the reply with emotion finally, and the reply quality is improved.

Further, the method comprises the following steps:

(1) Defining dialog history as C ₁ ＝{s ₁ ，...，s _N Sum C ₂ ＝{t ₁ ，...，t _N }, wherein C ₁ And C ₂ Respectively representing the conversation histories of speaker one and speaker two, s _i And t _i Respectively representing sentences in the conversation histories of two speakers, wherein i represents an ith sentence, i=1, 2 … N; in s _i For example, s _i ＝{x ₁ ，...，x _M }, whereinx _k Representing words in the sentence; computing context encoding h _M The process is as follows:

i _k ＝σ(W _i [h _k-1 ，ω _k ])

f _k ＝σ(W _f [h _k-1 ，ω _k ])

o _k ＝σ(W _o [h _k-1 ，ω _k ])

l _k ＝tanh(W _l [h _k-1 ，ω _k ])

c _k ＝f _k c _k-1 +i _k l _k

h _k ＝o _k tanh(c _k )

wherein i is _k ，f _k And o _k The memory gate and the output gate are respectively an input gate, a memory gate and an output gate; omega _k Is the corresponding word x _k Word embedding of h _k Represents hidden variable, h, corresponding to word k in LSTM _k By combining omega _k And h _k-1 Calculating to obtain; c _k Representing the state of the neural cell at time k; sigma represents a sigmoid function, W _i 、W _f 、W _o And W is _l As a parameter matrix, the parameters are continuously adjusted through training after being initialized through a given random number;

(2) For a matrix given n query vectorsKeys matrix->And value matrix The dot product attention calculation formula is as follows:

wherein d is the number of hidden units in the dialogue history self-attention module, which consists of H heads, each head focusing on a different part of the channel; for the ith head, useAndrepresenting the parameters learned by training in correspondence with the queries, keys and values, respectively; the specific implementation of the dialog history self-attention module is formulated as:

after each parallel head passes the calculation, all vectors are connected together to form one vector, and the calculation process is as follows:

M＝Contact(M ₁ ，...，M _H )

O＝MW

wherein the method comprises the steps ofAnd->Wherein O is the output of the dialogue history vectors of two speakers obtained by the self-attention module, and O is used for the output of the speaker I ₁ To express, output of speaker II is represented by O ₂ Representing; for O ₁ And O ₂ Outputting speaker conversation history representations +.>And->Calculation ofThe process is as follows:

wherein FNN represents a feedforward neural network;

(3) The vector representation g of the dialogue history is input into the fully connected layer, the vector of the hidden layer dimension is mapped into the vector of which the dimension is the category number, and the calculation process is as follows:

h＝ReLU(W ₁ g+b ₁ )

wherein ReLU is an activation function, W ₁ And b ₁ Is a full-connection layer learnable parameter, h is the obtained hidden vector, and the vector h is calculated through a softmax layer to finally obtain E= { E ₁ ，…，e _M The probability of responding to emotion categories corresponding to the dialogue history is represented, the emotion category z with the highest probability is taken as a final result, and the calculation process is as follows:

wherein e _i For the probability that the current dialogue history information belongs to the ith emotion type, w _i And b _i The weight and bias of the i-th category respectively;

(4) Constructing a speaker attention decoder;

usingHidden representation O of response as key and value in the attention mechanism _r O for output of speaker-attention decoder as query _d1 Representing, the calculation process is as follows:

wherein Attention is Attention calculation; usingAs keys and values in the attention mechanism, the output O of the speaker-attention decoder _d1 O for output of speaker two-attention decoder as query _d2 Representing, the calculation process is as follows:

attention is Attention calculation; output O of last speaker two-attention decoder _d2 Obtaining using a feed-forward neural networkThe calculation process is as follows:

wherein FNN represents a feedforward neural network;

(5) Emotion embedding;

in the state updating of the emotion embedding module, the state of the emotion-based multi-round dialogue model is updated by inputting emotion vectors, and the emotion embedding module is calculated as follows:

h _j ＝LSTM([Emb(y _j-1 )；e _j ]，h _j-1 )

wherein e _j Representing emotion vector representation, emb (y _j-1 ) Word vector representation representing the last predicted word, the word vector representation being calculated by word2vec method, h _j-1 Representing the state of the last time step LSTM, where h ₁ Is the last time step of h ₁ Emb (y) _j-1 ) For sentence start identifier<EOS>Is a word vector representation of (1); [.; .]Representing the operation of connecting two eigenvectors, two eigenvectors are used "; "dividing; emotion vector representation e _j Is determined by the emotion type z predicted by the emotion prediction module, and the emotion vector represents e _j The specific calculation process is as follows:

wherein the method comprises the steps ofRepresenting emotion category z as the kth word in the emotion dictionary, T _z Is emotion category T in emotion dictionary _z Alpha, beta and gamma are all learnable parameters in the emotion embedding module; a, a _jk Representing the attention, attention a, of the kth word of the emotion dictionary at time step j _jk Through c _jk Calculated, conceal the state h from the previous time step _j-1 Output of speaker attention mechanism O _d And the word vector representation of the kth word of category z in the emotion vocabulary +.>And (5) determining.

The application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the emotion-based multi-round dialog method when executing the program.

The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the emotion-based multi-round dialog method.

Compared with the prior art, the technical scheme of the application has the following beneficial effects:

1. by using the dialogue history method based on the self-attention mechanism to distinguish the speakers, the generated replies can better refer to dialogue history information, and the generated replies are more reasonable and diversified, so that the model has the capability of distinguishing the differences among the speakers, and the quality of the generated replies of the model is effectively improved.

2. By using the emotion prediction module and the emotion embedding module, the multi-round dialogue model has the capability of understanding and expressing emotion, so that the model can automatically generate proper emotion through dialogue history, and generate reply with emotion without logic and grammar errors.

3. Compared with the prior art that the dialogue history is mostly embedded as a long text, the application designs a dialogue history method for distinguishing the speaker based on a self-attention mechanism, and the multi-round dialogue model is more beneficial to distinguishing the speaker and can generate more reasonable replies.

Drawings

FIG. 1 is a schematic diagram of the overall framework of the emotion-based multi-round dialog method of the present application.

Detailed Description

The application is described in further detail below with reference to the drawings and the specific examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment provides a multi-round dialogue model based on emotion, which comprises three parts: the system comprises a dialogue history embedding module, an emotion prediction module and an emotion embedding module.

And the conversation history embedding module is used for inputting conversation history information into the emotion-based multi-round conversation model. Different conversation history information represents different rounds of conversations, and compared with embedding conversation history as a long text in the previous work, the embodiment designs a conversation history method for distinguishing speakers based on a self-attention mechanism.

And the emotion prediction module is used for predicting the returned emotion, and the emotion type predicted by the emotion prediction module replaces the manually appointed emotion type. The emotion prediction module obtains emotion types through the dialogue history, wherein the emotion prediction module task is similar to the text emotion classification task, the text emotion classification task is input into sentence vectors, finally the emotion types for outputting the sentence are obtained, and the emotion prediction task is input into sentence vectors of a section of dialogue history to output one emotion type, so that the text emotion classification task is referred in the embodiment.

And the emotion embedding module is used for generating a reply and embedding a designated emotion type in the reply, wherein the embedded emotion type is provided by the emotion prediction module, so that the model generates the reply with proper emotion finally, and the reply quality is improved.

Specifically, as shown in fig. 1, the workflow of the multi-round dialogue method of the present embodiment is as follows:

1 dialogue history embedding module

1.1 word level encoder

Defining dialog history as C ₁ ＝{s ₁ ，...，s _N Sum C ₂ ＝{t ₁ ，...，t _N }, wherein C ₁ And C ₂ Respectively representing the conversation histories of speaker one and speaker two, s _i And t _i Each representing sentences in the conversation history of two speakers, where i represents the i-th sentence. In s _i For example, s _i ＝{x ₁ ，...，x _M X, where x _k Representing words in the sentence. Computing context encoding h _M The process is as follows:

i _k ＝σ(W _i [h _k-1 ，ω _k ])

f _k ＝σ(W _f [h _k-1 ，ω _k ])

o _k ＝σ(W _o [h _k-1 ，ω _k ])

l _k ＝tanh(W _l [h _k-1 ，ω _k ])

c _k ＝f _k c _k-1 +i _k l _k

h _k ＝o _k tanh(c _k )

i in (i) _k ，f _k And o _k Input, memory and output gates, respectively. Omega _k Is the corresponding word x _k Word embedding of h _k Represents hidden variable, h, corresponding to word k in LSTM _k By combining omega _k And h _k-1 And (5) calculating to obtain the product. c _k The state of the nerve cell at time k is indicated. Sigma represents a sigmoid function, W _i 、W _f 、W _o And W is _l As a parameter matrix, parameters are continuously adjusted through training after being initialized through a given random number.

1.2. Dialogue history self-attention module

The dialogue self-attention mechanism can well capture remote dependent information and is quite high in calculation efficiency, and the specific steps comprise:

for a matrix given n query vectorsKeys matrix->And value matrixThe dot product attention calculation formula is as follows:

where d is the number of hidden units in the dialogue history self-attention module, which consists of H headers, each header focusing on a different part of the channel. For the ith head, the application usesAndrepresenting the parameters learned by training in correspondence with queries, keys and values, respectively. The specific implementation of the dialog history self-attention module is formulated as:

M＝Contact(M ₁ ，...，M _H )

O＝MW

wherein the method comprises the steps ofAnd->Wherein O is the output of the dialogue history vectors of two speakers obtained by the self-attention module, and O is used for the output of the speaker I ₁ To express, output of speaker II is represented by O ₂ The representation is performed. For O ₁ And O ₂ Outputting a representation of a speaker conversation history using a feedforward neural network, respectively>And->The calculation process is as follows:

wherein FNN represents a feedforward neural network.

2 emotion prediction module

The vector representation g of the dialogue history represents the mapping of the vector of hidden layer dimensions into the vector of class number of dimensions, which is input into the fully connected layer, and the calculation process is as follows:

h＝ReLU(W ₁ g+b ₁ )

wherein e _i For the probability that the current dialogue history information belongs to the ith emotion type, w _i And b _i The weight and bias of the i-th category, respectively.

3 emotion embedding module

3.1 speaker attention decoder construction

Used in the applicationHidden representation O of response as key and value in the attention mechanism _r O for output of speaker-attention decoder as query _d1 Representing, the calculation process is as follows:

wherein Attention is the Attention calculation. UsingAs keys and values in the attention mechanism, the output O of the speaker-attention decoder _d1 O for output of speaker two-attention decoder as query _d2 Representing, the calculation process is as follows

Similarly, attention is calculated. Output O of last speaker two-attention decoder _d2 Obtaining using a feed-forward neural networkThe calculation process is as follows:

wherein FNN represents a feedforward neural network.

3.2 emotion embedding

The main body of the part is an LSTM model, compared with the common LSTM model, the emotion vector is required to be input in the emotion embedding module state updating method to update the model state, and the calculation process of the module is shown in the formula:

h _j ＝LSTM([Emb(y _j-1 )；e _j ]，h _j-1 )

wherein e _j Representing emotion vector representation, emb (y _j-1 ) Word vector representation representing the last predicted word, the word vector representation being calculated by word2vec method, h _j-1 Representing the state of the last time step LSTM, where h ₁ Is the last time step of h ₁ Emb (y) _j-1 ) For sentence start identifier<EOS>Is a word vector representation of (c). [.; .]Representing the operation of connecting two feature vectors for use by the two feature vectors; and (5) dividing. Emotion vector representation e _j Is determined by the emotion type z predicted by the emotion prediction module, and the emotion vector represents e _j The specific calculation process is as follows:

wherein the method comprises the steps ofRepresenting emotion category z as the kth word in the emotion dictionary, T _z Is emotion category T in emotion dictionary _z Alpha, beta and gamma are all learnable parameters in the emotion embedding module.

The LSTM model enables the calculation of the attention score. a, a _jk Representing the attention, attention a, of the kth word of the emotion dictionary at time step j _jk Through c _jk Calculated, conceal the state h from the previous time step _j-1 The speaker's attention mechanism outputs Od and the word vector representation of the kth word of class z in the emotion vocabularyAnd (5) determining. In the generating process of each vocabulary in the reply, the method based on the vocabulary attention mechanism can realize that the more relevant words in the emotion vocabulary are with given emotion, the larger the influence of the words on emotion vectors, the more relevant emotion words have higher generating probability to realize the replyEmotion embedding. In the generating process of each vocabulary in the reply, the method based on the vocabulary attention mechanism can be realized, the more relevant words in the emotion vocabulary are with given emotion, the larger the influence of the words on the emotion vector is, and the more relevant emotion words have higher generating probability to realize the reply emotion embedding.

The present implementation performs experiments on two real datasets, daily-Dialog and MELD. The Dalie-Dialog data set is collected at the English learning site, and because the data originates from one learning site, the dialogs in the Dalie-Dialog data set are more grammatically canonical. The Dailm-Dialog dataset has 13000 groups of dialogs, each group having an average of 8 pairs of sentences, and the dialogs covering ten more Dialog topics including work, health, travel, economic, political, etc. In addition, the Dalie-Dialog dataset is rich in a large number of emotions, with emotion tags categorized into seven major categories including neutral, surprise, happiness, anger, sadness, fear, and disgust. MELD data sets, which were published by pora et al in 2018 as originating from excerpted clips in movie old friends, are multi-modal data sets that contain not only dialogue text information, but also text-corresponding audio and video information. And the MELD dataset text contains seven emotions, anger, aversion, sadness, happiness, neutrality, surprise, and fear, respectively. The MELD dataset contains 1400 sets of dialogs, which collectively contain 13000 utterances. The characteristics of these two data sets are summarized in table 1.

Table 1 dataset information statistics

And for the evaluation of the reply generated by the multi-round dialogue model, two modes of automatic evaluation indexes and manual evaluation indexes are adopted. For the automatic evaluation, the sample embodiment adopts a mode based on a language model (PPL) and a mode based on word overlapping (BLEU, distict-1, distict-2) respectively. The calculation formula of PPL (Perplexity) confusion degree is as follows:

wherein the method comprises the steps ofThe generated sentence is a probability of conforming to the language model sentence. Therefore, the smaller the confusion value is, the more correct the reply grammar of sentences generated by the multi-round dialogue model is, and the language model is more met. But some general replies such as "i don't know" have the lowest confusion among dialogue tasks, so that the quality of the reply generated by a dialogue model cannot be truly judged.

The BLEU index evaluates the model by using a word overlap calculation principle, and the specific implementation manner is to evaluate the similarity of the generated replies and replies in the test set, wherein the result range is between 0 and 1 for value, and the BLEU index can be divided into a plurality of evaluation indexes according to n-gram, wherein n generally takes 1, 2, 3 and 4, and n represents the maximum number of words selected by each calculation. The higher the overlap ratio of the BLEU indicators, the higher the BLEU value, and the higher the BLEU score when n is a small value than when n is a large value. N in the experiments of this example is calculated by selecting 2.

The distict index represents the ratio of the number of different sentences to the total number of sentences in the generated sentences. The purpose of this index is to evaluate the diversity and originality of the generated statement. Because general replies are generated in the dialogue task, but general replies have no specific meaning in the dialogue, but the general replies may not be low in score in other indexes, so that whether the model replies too many general or repetitive replies can be evaluated by using the distict index, and the calculation formula is shown as the formula:

where U represents the number of categories representing n-grams and G represents the total number of n-grams, so when the number of distict is larger, the more categories of n-grams in the reply are indicated, the fewer the number of repetitions are, so the diversity of the replies of the model can be evaluated by using distict index.

And (3) selecting two dimensions of content and emotion for evaluation in manual evaluation. The method comprises the following specific operations that 50 sections of dialogs are randomly extracted according to each emotion type, emotion types and dialogue histories are respectively input into the model and other baseline models, each reply is obtained, after the replies are obtained, the generated replies, the dialogue histories and the appointed emotion types are delivered to 4 researchers in the field to score in the aspects of dialogue content and emotion, the index scoring range in the aspect of content is 0-2, and the index scoring range in the aspect of emotion is 0-1. The rating index is used in the context of evaluating whether replies are coherent or not and is related to the history of the conversation. A content evaluation index of 0 represents that grammar problems exist and are not consistent with dialogue history; score 1 represents that the grammar is correct but that the reply is meaningless; score 2 represents that the generated reply grammar is correct, relevant to the conversation history content and completely consistent with the previous speaker. The evaluation index is used in emotion to evaluate whether the generated reply is consistent with the specified emotion type. The emotion evaluation index 0 represents that the reply emotion is inconsistent with the appointed emotion type; a score 1 represents that the generated response emotion is consistent with the specified emotion classification.

Tables 2 and 3 show the results of the automatic evaluation index test on the day-Dialog data set and the automatic evaluation index test on the MELD data set, respectively. Tables 4 and 5 show the results of the manual evaluation of the present application on the day-Dialog data set and the MELD data set, respectively. And compares the corresponding effect with the existing traditional emotion dialogue model Seq2Seq and ECM.

TABLE 2 Daily-dialogs dataset automatic evaluation

Table 3 automatic evaluation of MELD dataset

TABLE 4 Daily-dialogs dataset Manual evaluation

Table 5 manual evaluation of MELD dataset

According to the application, the experimental results in two data sets are better than two baseline models in the emotion-based multi-round dialogue modeling task in the aspect of not embodying the emotion automatic evaluation index through the experimental results of automatic evaluation. In terms of manual evaluation, although the ranking of individual model scores under different emotion categories in the day-Dialog and MELD datasets may change, the application achieves the best performance in both content and emotion by ultimately averaging the scores under all categories.

Preferably, the embodiment of the present application further provides a specific implementation manner of an electronic device capable of implementing all the steps in the emotion-based multi-round dialogue method in the foregoing embodiment, where the electronic device specifically includes the following contents:

a processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus;

the processor, the memory and the communication interface complete communication with each other through buses; the communication interface is used for realizing information transmission among relevant equipment such as server-side equipment, metering equipment and user-side equipment.

The processor is configured to invoke the computer program in the memory, and when the processor executes the computer program, the processor implements all the steps in the emotion-based multi-round dialogue method in the above embodiment.

The embodiment of the present application also provides a computer-readable storage medium capable of implementing all the steps in the emotion-based multi-round dialogue method in the above embodiment, on which a computer program is stored, which when executed by a processor implements all the steps in the emotion-based multi-round dialogue method in the above embodiment.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a hardware+program class embodiment, the description is relatively simple, as it is substantially similar to the method embodiment, as relevant see the partial description of the method embodiment.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The application is not limited to the embodiments described above. The above description of specific embodiments is intended to describe and illustrate the technical aspects of the present application, and is intended to be illustrative only and not limiting. Numerous specific modifications can be made by those skilled in the art without departing from the spirit of the application and scope of the claims, which are within the scope of the application.

Claims

1. The emotion-based multi-round dialogue model is characterized by comprising a dialogue history embedding module, an emotion prediction module and an emotion embedding module;

2. A multi-round dialogue method based on emotion and based on the multi-round dialogue model as claimed in claim 1, characterized by comprising the following steps:

(1) Defining dialog history as C ₁ ＝{s ₁ ，...，s _N Sum C ₂ ＝{t ₁ ，...，t _N }, wherein C ₁ And C ₂ Respectively representing the conversation histories of speaker one and speaker two, s _i And t _i Respectively representing sentences in the conversation histories of two speakers, wherein i represents an i-th sentence, i=1, 2..n; in s _i For example, s _i ＝{x ₁ ，...，x _M X, where x _k Representing words in the sentence; computing context encoding h _M The process is as follows:

i _k ＝σ(W _i [h _k-1 ，ω _k ])

f _k ＝σ(W _f [h _k-1 ，ω _k ])

o _k ＝σ(W _o [h _k-1 ，ω _k ])

l _k ＝tanh(W _l [h _k-1 ，ω _k ])

c _k ＝f _k c _k-1 +i _k l _k

h _k ＝o _k tanh(c _k )

(2) For a matrix given n query vectorsKeys matrix->And value matrix-> The dot product attention calculation formula is as follows:

M _i ＝Attention(QW _i ^Q ，KW _i ^K ，VW _i ^V )

M＝Contact(M ₁ ，...，M _H )

O＝MW

wherein the method comprises the steps ofAnd->Wherein O is the output of the dialogue history vectors of two speakers obtained by the self-attention module, and O is used for the output of the speaker I ₁ To express, output of speaker II is represented by O ₂ Representing; for O ₁ And O ₂ Using feed-forward gods, respectivelyOutputting a speaker conversation history representation via a network>And->The calculation process is as follows:

wherein FNN represents a feedforward neural network;

h＝ReLU(W ₁ g+b ₁ )

(4) Constructing a speaker attention decoder;

wherein FNN represents a feedforward neural network;

(5) Emotion embedding;

h _j ＝LSTM([Emb(y _j-1 )；e _j ]，h _j-1 )

wherein the method comprises the steps ofRepresenting emotion category z as the kth word in the emotion dictionary, T _z Is emotion category T in emotion dictionary _z Alpha, beta and gamma are all learnable parameters in the emotion embedding module; a, a _jk Representing the attention, attention a, of the kth word of the emotion dictionary at time step j _jk Through c _jk Calculated from the lastTime step hidden state h _j-1 Output of speaker attention mechanism O _d And the word vector representation of the kth word of category z in the emotion vocabulary +.>And (5) determining.

3. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the emotion-based multi-round dialog method of claim 2 when the program is executed by the processor.

4. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the emotion-based multi-round dialog method of claim 2.