CN109522545B

CN109522545B - A kind of appraisal procedure that more wheels are talked with coherent property amount

Info

Publication number: CN109522545B
Application number: CN201811181214.8A
Authority: CN
Inventors: 兰曼; 周云晓
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2018-10-11
Filing date: 2018-10-11
Publication date: 2019-08-23
Anticipated expiration: 2038-10-11
Also published as: CN109522545A

Abstract

The invention discloses a kind of appraisal procedures that more wheels are talked with coherent property amount, its main feature is that take turns dialog text as input more, using layering attention mechanism, respectively in the whole level of single discourse level and more wheel dialogues, the semantic information and intent information of dialogue are merged, realizes that more wheels talk with the automatic assessment of coherent property amount.Training speed is fast compared with prior art by the present invention, it is high to analyze accuracy rate, it does not need to extract entity from text, avoid the propagation that entity extracts error, it is especially suitable for extensive and real-time corpus, in conjunction with semantic information and intent information that dialogue is contained, effectively assesses the coherent property amount of more wheel dialogues automatically, the more wheel dialog generation systems of guidance generate the dialog text of more high quality, and dialog generation system is promoted faster and better to develop.

Description

Method for evaluating consistency quality of multi-turn conversation

Technical Field

The invention relates to the technical field of internet deep learning models, in particular to a method for evaluating the consistency quality of a multi-turn conversation by utilizing deep learning.

Background

In recent years, more and more man-machine conversation systems, such as chat/customer service robots (Chatbot) like apple Siri, microsoft mini ice, etc., are appearing in daily life. Dialogues (dialogues) are basic information interaction modes in human social activities, including television interviews, question-answering dialogues, WeChat chats and the like, and a key core technology in man-machine dialogues is Multi-turn dialog Generation (Multi-turn dialog Generation), which is a mark of artificial intelligence development level and is also a research hotspot in the field of natural language processing and is paid more and more attention by researchers. The quality assessment of multi-turn dialog generation generally takes the following two approaches: 1) objective evaluation measurement such as BLEU and ROUGE in other text generation tasks (such as machine translation or automatic summarization) is borrowed, the objective automatic evaluation measurement is based on word overlapping, the characteristics of diversity and interactivity of the conversation nature are ignored, and the method is not well suitable for quality evaluation of the conversation; 2) although the manual evaluation has high accuracy, the manual and time cost is high, and the quality evaluation of large-scale and real-time conversation cannot be applied. Considering that a conversation usually occurs between two or more parties, the quality of content generated by multiple rounds of conversation depends on the text coherence (TextCoherence) between the preceding and following conversations, and if the coherence is lacked, the conversation is prone to be in logic confusion, so that the conversation cannot be continued. Therefore, whether the mode of objective automatic evaluation or manual evaluation is adopted, the consistency evaluation of multiple rounds of conversations is a key index for conversation quality evaluation.

The text continuity evaluation mostly adopts a method based on Entity grids (Entity grids) or Entity sequences (Entity sequences), firstly, entities (entities) such as names of people and place names in the text and grammatical roles (such as subject, predicate and the like) of the entities are extracted and filled into Grid nodes (namely Grid points), the conversion of the Entity grammatical roles between adjacent sentences becomes connecting lines between the Grid points, and then the text continuity evaluation is carried out by manually extracting features or utilizing a Convolutional Neural Network (CNN) method.

The following two examples of multi-turn conversations in table 1 further illustrate the problems with the prior art:

TABLE 1 example of two multi-turn conversations

The left and right columns in table 1 are consecutive and non-consecutive multiple round conversation examples (the number of rounds of conversation is 6), respectively, where H denotes Human (Human) in the conversation and C denotes chat bot (Chatbot). Each sentence of H or C is an Utterance (Utterance), for example, H1 and C1 are both utterances. The utterance H1 "i want to buy some flowers to my wife" on the left in table 1 indicates the speaker's willingness to buy flowers, and the utterance C1 reasonably replies recommending appropriate flowers. The intent (intent) information of a Dialog is typically represented using Dialog behavior (Dialog Act), where the Dialog behavior markup of the left H1 indicates that there are reasonable Dialog intent transitions, i.e., Dialog behavior transitions, in consecutive rounds of Dialog that lead to the Dialog being logically expanded around the topic and intent.

The existing conversation continuity quality assessment technology has the following defects: (1) the characteristics of language diversity and multi-round interactivity of the conversation are ignored, so that the accuracy of conversation quality evaluation is low; (2) the whole semantic information of a plurality of rounds of conversation is lacked; (3) lack of intent information implied in multiple rounds of conversation; (4) the method lacks deep fusion of semantic information and intention information of multi-round conversation; (5) the cost of labor and time is high, and the method cannot be suitable for large-scale and real-time dialogue quality evaluation; (6) the method based on the entity grid or sequence depends on the performance of entity extraction, and extraction errors directly influence the performance of subsequent steps.

Disclosure of Invention

The invention aims to design a method for evaluating the consistency quality of multi-turn conversations aiming at the defects of the prior art, which takes multi-turn conversation texts as input and adopts a layered attention mechanism to fuse semantic information and intention information of the conversations respectively at a single speaking level and an integral level of the multi-turn conversations so as to realize the automatic evaluation of the consistency quality of the multi-turn conversations. Through the automatic evaluation to the continuity quality of the multi-turn conversation, the multi-turn conversation generation system can be guided to generate a higher-quality conversation text, the conversation generation system is promoted to develop better and faster, an entity does not need to be extracted from the text, the propagation of an entity extraction error is avoided, characteristics do not need to be designed manually, the cost is saved, the training speed is high, the system is particularly suitable for large-scale and real-time linguistic data, semantic information and intention information contained in the conversation are combined respectively from a single conversation layer and an integral conversation layer, the continuity quality of the multi-turn conversation is effectively evaluated, and the analysis accuracy is greatly improved.

The purpose of the invention is realized as follows: a method for evaluating consistency quality of multi-turn conversations is characterized in that deep learning is adopted to carry out layered semantic modeling on the multi-turn conversations, a conversation word sequence is modeled after semantic vector representation of each utterance is obtained, and then overall semantic information of the multi-turn conversations is obtained, and the consistency quality evaluation is carried out according to the following steps:

the method comprises the following steps: initializing each word in the single utterance into a word vector by table lookup by using the pre-trained word vector, taking the word vector sequence of the single utterance as input, and performing semantic learning on the single utterance by using a deep learning model (such as an LSTM model) to obtain the semantic vector of each utterance.

Step two: initializing the dialogue behavior type of the single utterance into dialogue behavior vector representation, and then fusing the dialogue behavior vector and the single utterance semantic vector of the single utterance by adopting an attention mechanism to obtain a semantic intention fusion vector of the single utterance.

Step three: aiming at the utterance sequence of the multi-turn dialog, the semantic intention fusion vector sequence of the multi-turn dialog is constructed by utilizing the semantic intention fusion vector of a single utterance, the semantic intention fusion vector sequence of the multi-turn dialog is taken as input, and a deep learning model (such as an LSTM model) is utilized to learn the multi-turn dialog so as to obtain the whole vector of the multi-turn dialog.

Step four: and taking the utterance semantic vector sequence of the multi-turn dialog as the input of a deep learning model (such as an LSTM model), performing semantic modeling on the multi-turn dialog to obtain the whole semantic vector of the multi-turn dialog, taking the dialog behavior sequence of the multi-turn dialog as the input, and learning the whole intention of the multi-turn dialog by adopting the deep learning model (such as a CNN model) to obtain the whole intention vector of the multi-turn dialog.

Step five: and (4) fusing the whole semantic vector and the whole intention vector of the multi-round conversation by adopting an attention mechanism to obtain a conversation whole semantic intention fusion vector.

Step six: the method comprises the steps of integrating (such as simply splicing) conversation vectors obtained by fusing conversation semantic information and intention information at different levels respectively to obtain a whole continuity vector of a conversation, obtaining continuity quality evaluation scores of multiple rounds of conversations by using the whole continuity vector of the conversation as input and using a scoring function, calculating a continuity score error of training data by using a cross entropy loss function, and performing model training by updating model parameters through back propagation.

Step seven: and inputting the multiple rounds of conversations to be evaluated and the conversation behavior sequences thereof into the trained multiple rounds of conversation continuity quality automatic evaluation model, and carrying out the continuity quality evaluation and the prediction score of the conversations.

Compared with the prior art, the method has the advantages of high training speed, high analysis accuracy, no need of manual design of features, cost saving, no need of entity extraction from the text, avoidance of propagation of entity extraction errors, and particularly suitability for large-scale and real-time linguistic data.

Drawings

FIG. 1 is a flow chart of the operation of the present invention.

Detailed Description

The invention will now be described in further detail with reference to specific embodiments, conditions and experimental methods, wherein the terms used are defined as follows:

multi-turn dialog (Multi-turn dialog): the multi-turn dialogue is composed of two or more than two turns of ordered dialogue, and each turn of dialogue contains one Utterance (Utterance), namely the text content which is spoken by a single interlocutor in a single time. The multiple rounds of conversation shown in Table 2 below include a total of 6 rounds of conversation, i.e., 6 utterances, H1 "I want to buy some flowers from my wife. "is the first utterance of the multi-turn conversation.

Dialogue acts (Dialog Act): dialog behavior reflects the intent of each dialog, sharing 42 classes, including statements, questions, instructions, claims, explanations, and the like. The type of dialogue acts for each utterance is shown in table 2 below, where the utterance H2 "how much is the red rose? "is a question, expressing the doubt of the interlocutor, while the utterance C2" is 5-pronounces and ten 40-pronounces. "the question is answered by stating the price of the red rose.

TABLE 2 example of six-wheel dialog text and behavior types for each turn of dialog

Word Vector (Word Vector): each word in the text is represented using a vector of low-dimensional continuity values. Word vectors are obtained from the corpus in a pre-trained manner.

Deep Learning Model (Deep Learning Model): the deep learning model is divided into three parts: an input layer, a hidden layer and an output layer. The hidden layer can be expanded into a multilayer structure, the neurons between two adjacent layers are connected with each other, and the neurons in the layer are not connected. Common deep learning models include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long-Short Term Memory Neural Network (LSTM), and the like.

Referring to fig. 1, the input in the present invention is a multi-turn dialog D comprising N utterances u, denoted D ═ u₁,u₂,…,u_i,…,u_N}，i＝[1,2,…,N]Wherein u is_iRepresenting the ith utterance. Each utterance u contains n words, the utterance u is represented as a sequence of strings u ═ w₁,w₂,…,w_j,…,w_n}，j＝[1,2,…,n]. In order to obtain a semantic vector representation of the whole of a plurality of rounds of conversation, a semantic vector representation of a single utterance needs to be acquired first.

The invention adopts deep learning to carry out layered semantic modeling on a plurality of rounds of conversations, and carries out modeling on a conversation word sequence after semantic vector representation of each utterance is obtained, so as to obtain the overall semantic information of the plurality of rounds of conversations, and the continuity quality evaluation is carried out according to the following steps:

the method comprises the following steps: in a single-utterance layer, performing semantic modeling on each utterance u in the multi-turn dialog D by using a deep learning model (such as a CNN, RNN or LSTM model) to obtain a semantic vector representation of a single utterance, wherein the specific process is described as follows:

inputting: a multi-turn dialog D containing N utterances u, a word vector dictionary, a deep learning model and related parameters;

and (3) outputting: a semantic vector representation of a single utterance;

the process is as follows: step a₁: each utterance typically contains multiple words, and in order to semantically model a single utterance, each word in the single utterance is first converted to its corresponding word vector representation by looking up a table in a pre-trained word vector dictionary. The utterance u contains n words, i.e. a sequence of strings u ═ w₁,w₂,…,w_j,…,w_n}，j＝[1,2,…,n]Each word w in the utterance u is segmented by looking up a pre-trained word vector dictionary_jInitializing into a word vector to obtain a word vector sequence s ═ x of the utterance u₁,x₂,…,x_j,…,x_n}，j＝[1,2,…,n]Wherein x is_jA word vector representing the jth word of utterance u.

Step a₂: and taking the word vector sequence s of the utterance u as input, and performing semantic modeling on the utterance u by using a deep learning model (such as a CNN (convolutional neural network), RNN (neural network) or LSTM (least squares) model) to obtain semantic vector representation of a single utterance. Taking the LSTM model as an example, the process is described as follows:

(1) and the word vector sequence s of the input speech is { x ═ x₁,x₂,…,x_j,…,x_n}；

(2) And processing each word vector in the word vector sequence by sequentially adopting a repeating module in the LSTM model according to the following formulas a to e:

f_t＝σ(W_fx_t+U_fh_t-1+b_f) (a)

i_t＝σ(W_ix_t+U_ih_t-1+b_i) (b)

o_t＝σ(W_ox_t+U_oh_t-1+b_o) (c)

c_t＝f_t*c_t-1+i_t*tanh(W_cx_t+U_ch_t-1+b_c) (d)

h_t＝o_t*tanh(c_t) (e)

wherein x is_tIs the t-th word vector in the word vector sequence s (t ═ 1,2, …, n]) As input vectors for t time steps; f. of_t，i_t，o_tA forgetting gate, an input gate and an output gate which respectively represent t time steps; w_f,W_i,W_o,W_cAnd U_f,U_i,U_o,U_cAre all weight parameters, b_f,b_i,b_o,b_cAre all bias terms; sigma is an S-shaped curve activation function (sigmoid), and tanh is a hyperbolic tangent function; element-wise Multiplication (Element-wise Multiplication) operations representing corresponding elements of the two vectors; h is_tIs the hidden layer state at t time step.

(3) Outputting the hidden layer state h of the last time step n_nH is represented as a semantic vector for utterance u. The first step, semantic modeling of each utterance by a deep learning model, obtains only semantic information of a single utterance without considering intent information of the utterance.

Step two: in a single utterance layer, the semantic vector representation of the single utterance obtained in the step one is utilized, the intention information of the single utterance is fused by adopting an attention mechanism, and the semantic intention fusion vector of the single utterance is output and obtained, wherein the specific process is described as follows:

inputting: expressing the semantic vector of the single utterance obtained in the step one, and performing deep learning on a model and related parameters according to the dialogue behavior type corresponding to the utterance;

and (3) outputting: a semantic intent fusion vector for a single utterance;

the process is as follows: step b₁: initializing the dialogue behavior type of the utterance into dialogue behavior vector representation, and fusing by adopting an attention mechanismCombining the dialogue action vector of the single utterance and the semantic vector representation of the single utterance obtained in the step one to obtain a semantic intention fusion vector of the single utterance, wherein the process is described as follows:

(1) and obtaining fixed-dimension vector representation (taking 200-dimension as an example) corresponding to each conversation behavior type by randomly initializing the 42 conversation behavior types to form a conversation behavior vector dictionary E E R^42×200。

(2) Initializing the dialogue behavior type of a single utterance u into a vector v by looking up a dialogue behavior vector dictionary E_da。

(3) Fusing dialogue action vectors v of single utterances by using attention mechanism_daAnd the semantic vector obtained in the step one represents h to obtain a semantic intention fusion vector h of the utterance_daTaking a 5-degree linear transformation example, the following equations i to k are calculated:

z_i＝W_ih+b_i(i＝[1,2,...,5]) (i)

wherein, W_iIs a weight, b_iIs an offset term, z_iAs a result of the ith linear transformation on the semantic vector h α_iFor scalar values, the representation utilizes a dialog behavior vector v_daCalculated z_iThe weight value of (1); h is_daI.e. the semantic intent fusion vector of the utterance, is for each z_iThe semantic information and the intention information of the utterance u are fused as a result of the weighted summation of (a).

Step three: using step two for the spoken sequences of a multi-turn conversationAnd constructing a semantic intention fusion vector sequence of the whole multi-turn dialog by using the obtained semantic intention fusion vector of the utterance, and obtaining the whole vector representation of the multi-turn dialog by adopting a deep learning model (such as LSTM). The consistency quality of the multi-turn dialog is to evaluate the consistency of the whole multi-turn dialog, so that the third step of the invention utilizes the semantic intention fusion vector h of the utterance obtained in the second step to the utterance sequence of the multi-turn dialog_daBuilding the whole semantic intention fusion vector sequence of the multi-turn dialog, and modeling the utterance sequence of the multi-turn dialog by using a deep learning model (taking an LSTM model as an example) to obtain the whole vector expression h of the multi-turn dialog_dThe specific process is described as follows:

inputting: semantic intention fusion vector h of each utterance obtained in step two_daThe constructed semantic intention of the multi-turn dialog is fused with a vector sequence, a deep learning model and relevant parameters;

and (3) outputting: a global vector representation of a plurality of rounds of dialog;

the process is as follows: step c₁: utterance sequence { u ] for multiple rounds of dialog D₁,u₂,…,u_i,…,u_NAnd e, utilizing the semantic intention fusion vector h of the single speech obtained in the step two_daConstructing a semantic intention fusion vector sequence { h) of a plurality of rounds of conversations_da1,h_da2,...,h_dai,...,h_daN}，i＝[1,2,…,N]Wherein h is_daiRepresenting the ith utterance u in a conversation_iThe semantic intent fusion vector of (1).

Step c₂: taking the semantic intention fusion vector sequence of the multi-turn dialog as input, and modeling the multi-turn dialog by using an LSTM model to obtain the integral vector representation h of the multi-turn dialog_dProcess description of the LSTM model with reference to step a₂。

Step four: and carrying out deep learning (such as LSTM or CNN) modeling on the utterance sequence and the dialogue behavior sequence of the multi-turn dialogue respectively to obtain the overall semantic vector representation and the overall intention vector representation of the multi-turn dialogue. And the second step and the third step are to fuse semantic and intention information by using an attention mechanism at a single utterance level and then obtain the overall representation of the conversation by using the fused sequence. In order to fuse the semantic information and the intention information of the dialog by using an attention mechanism at the overall level of the multi-turn dialog, firstly, the overall semantic modeling and the intention modeling need to be performed on the multi-turn dialog, so that the fourth step of the invention uses a deep learning model (such as a CNN, RNN or LSTM model) to model the utterance sequence and the dialog behavior sequence of the multi-turn dialog respectively to obtain the overall semantic vector representation and the overall intention vector representation of the multi-turn dialog, and the specific process is described as follows:

inputting: the semantic vector of the single utterance obtained in the step one forms a semantic vector sequence of multi-round conversations, a conversation behavior sequence of the multi-round conversations, a deep learning model and related parameters;

and (3) outputting: overall semantic vector representation and overall intention vector representation of a plurality of rounds of dialog;

the process is as follows: step d₁: utterance sequence { u ] for multiple rounds of dialog D₁,u₂,…,u_i,…,u_NAnd (4) obtaining semantic vector representation h of each utterance by utilizing the step one to form an utterance semantic vector sequence s of D_h＝{h₁,h₂,...,h_i,...,h_N}，i＝[1,2,…,N]Taking the semantic vector as the input of an LSTM model, and carrying out semantic modeling on multiple rounds of conversations to obtain the overall semantic vector representation h of the multiple rounds of conversations_semReference step a for the specific process of the LSTM model₂；

Step d₂: taking a conversation behavior sequence of a plurality of rounds of conversations as input, and adopting a deep learning model (such as a CNN (neural network), RNN (neural network) or LSTM (least Square) model) to model the overall intention of the plurality of rounds of conversations to obtain the overall intention vector representation of the plurality of rounds of conversations, wherein the process is as follows:

(1) and speech sequence { u ] for multiple rounds of dialog D₁,u₂,…,u_i,…,u_ND, obtaining a dialogue behavior sequence s of D according to the dialogue behavior type of each utterance_da＝{da₁,da₂,...,da_i,...,da_N}，i＝[1,2,…,N]Therein da_iFor the ith utterance u_iThe type of dialog behavior.

(2) Modeling the conversation behavior sequence by adopting a deep learning model to obtain the conversation behavior representation of the whole conversation, taking a CNN model as an example, and describing the process as follows:

(I) for sequences of dialog actions s_da＝{da₁,da₂,...,da_i,...,da_NFor each dialog action da_i(i＝[1,2,…,N]) Looking up a table in the dialogue behavior vector dictionary E constructed in the second step to obtain a corresponding vector representation, and converting the dialogue behavior sequence into a vector sequence { v } containing N vectors₁,v₂,...,v_i,...,v_NAs input to the CNN model.

(II) the convolution layer performs convolution operation on the input vector v, and the calculation is performed according to the following formula f:

c_j＝f(u^Tv_j-k+1:j+b) (f)

wherein f is a nonlinear function, such as a hyperbolic tangent activation function (hyperbolic tangent), an S-shaped curve activation function (sigmoid), etc., b is a bias term, u represents a convolution filter with a window size k, and v represents a convolution filter with a window size k_(j-k+1:j)And representing the j-k +1 th dialog behavior to the j (k) th dialog behavior in the dialog behavior sequence. The number of the filters is m, and m eigenvalue sequences C is obtained under a sliding window with the size of k ═ C₁,C₂,…,C_mIn which each sequence of characteristic values C_i＝[c₁,c₂,…,c_N-k+1]，C_i∈R^N-k+1，i＝[1,…,m]。

(III) in order to extract important characteristics and control the consistency of output, the pooling layer pools the m characteristic value sequences C to obtain an m-dimensional characteristic vectorOverall intent vector representation v as a multi-turn dialog_d。

Step five: and in the overall dialogue level, expressing the two vectors obtained in the fourth step of attention mechanism fusion to obtain an overall dialogue semantic intention fusion vector. In order to fuse the semantic information and the intention information of the dialog in the overall level of the multi-turn dialog, the fifth step of the invention is to adopt an attention mechanism and fuse the step d₁The obtained overall semantic vector represents h_semAnd step d₂The overall intention vector representation v obtained in (1)_dObtaining the integral semantic intention fusion vector of the conversationThe specific process is described as follows:

inputting: overall semantic vector representation h of multi-turn dialog_semOverall intent vector representation v of a multi-turn dialog_dDeep learning models and related parameters;

and (3) outputting: fusing vectors of the overall semantic intention of the multi-turn dialog;

the process is as follows: step e₁: adopting an attention mechanism to fuse the overall semantic vector representation h of multiple rounds of conversation_semAnd the global intent vector representation v_dTaking the example of 5 linear transformations, the specific calculation is performed according to the following equations g, l and n:

g_i＝W_dih_sem+b_di(i＝[1,2,...,5]) (g)

wherein, W_diIs a weight, b_diIs a bias term, g_iFor the semantic vector h_semβ_iFor scalar values, the representation uses the dialogue acts to represent v_dCalculated g_iThe weight value of (a) is set,then for each g_iThe result of weighted summation of (1) is a dialog whole semantic intention fusion vector fusing the dialog semantic information and the intention information of multiple rounds.

Step six: the steps adopt a layered attention mechanism, semantic information and intention information of the dialog are respectively fused on two layers of single utterance and the whole of multiple rounds of dialogues, and two kinds of whole vector representations h of the multiple rounds of dialogues are obtained_dAndthe sixth step of the present invention is to integrate the two overall vector representations and obtain the continuity quality assessment score of the dialog by using a scoring function (such as softmax, sigmoid function), wherein the training of the multi-round dialog continuity quality automatic assessment model is performed by a cross entropy loss function and back propagation (back propagation), and the specific process is described as follows:

inputting: the overall vector of the multi-turn dialog obtained in the third step represents h_dAnd the integral semantic intention fusion vector of the multi-turn dialogue obtained in the step fiveDeep learning models and related parameters;

and (3) outputting: a consistency quality assessment score for multiple sessions;

the process is as follows: step f₁: the third step and the fifth step are respectively fused with dialogue semantic information and intention information at different levels to obtain two dialogue vector representations h_dAndsynthesis (taking simple concatenation as an example) is performed to obtain the overall coherence vector of the dialog:wherein,indicating a splicing operation.

Step f₂: by a vector h_cFor inputting, a scoring function (such as softmax, sigmoid function) is utilized to score the overall consistency quality of the multi-turn conversation to obtain the consistency probability distribution h of the conversation_s. Taking the softmax function as an example, the overall calculation is performed according to the following formula m:

h_s＝softmax(W_sh_c+b_s) (m)

wherein, W_sIs a weight, b_sAs a bias term, a vector h of overall coherence for a multi-turn dialog_cMapping to a two-dimensional target space to obtain a two-dimensional vector h_oAs an input to the softmax function, the softmax function is specifically calculated according to the following equation p:

wherein,represents a vector h_sJ-th dimension of (1), j ═ 0,1]，Andis a real number between 0 and 1, the sum is 1, whereinThe continuity probability of the dialog, i.e., the continuity quality assessment score of the dialog, is represented.

Step f₃: the training of the automatic consistency quality evaluation model of the multi-round conversations is a supervised learning process, and after the consistency quality evaluation scores of the multi-round conversations are obtained, the calculation of a cross entropy loss function of the training data T is carried out according to the following formula q:

theta is a parameter set which needs to be trained and updated in the model and comprises weight and bias term parameters involved in each step and a dialogue behavior vector dictionary E constructed in the second step, wherein | T | represents the number of dialogues in the training data set, and y is a parameter set which needs to be trained and updated in the model_iAnd the continuity score of the ith multi-turn dialog in the training data set is represented, and if the multi-turn dialog is a coherent dialog, the multi-turn dialog is 1, otherwise, the multi-turn dialog is 0.

In order to train and learn the consistency quality automatic evaluation model of multi-turn conversation, the step is to carry out derivation on the loss function, carry out reverse propagation on the consistency error, update the model parameters until loss (theta) is smaller than a preset threshold value tau, stop updating and finish model training.

Step seven: and inputting the multiple rounds of conversations to be evaluated and the conversation behavior sequences thereof into the trained automatic evaluation model of the consistency quality of the multiple rounds of conversations, and predicting the consistency quality evaluation score of the conversations. The seventh step of the invention is to input the multiple rounds of dialogs to be evaluated and the dialog behavior sequences thereof into the multiple rounds of dialog continuity quality automatic evaluation model trained in the sixth step to obtain a real numerical value (between 0 and 1) representing the dialog continuity degree, and output the real numerical value as the continuity quality evaluation score of the multiple rounds of dialogs.

The invention has been described in further detail in order to avoid limiting the scope of the invention, and it is intended that all such equivalent embodiments be included within the scope of the following claims.

Claims

1. A method for evaluating consistency quality of multi-turn conversations is characterized in that deep learning is adopted to carry out layered semantic modeling on the multi-turn conversations, a conversation word sequence is modeled after semantic vector representation of each utterance is obtained, and then overall semantic information of the multi-turn conversations is obtained, and consistency quality evaluation is carried out according to the following steps:

the method comprises the following steps: initializing each word in a single utterance into a word vector by using a pre-trained word vector through table lookup, taking a word vector sequence of the single utterance as input, and performing semantic learning on the single utterance by using a deep learning model to obtain a semantic vector of each utterance;

step two: initializing the dialogue behavior type of a single utterance into dialogue behavior vector representation, and then fusing the dialogue behavior vector of the single utterance and the semantic vector of the single utterance by adopting an attention mechanism to obtain a semantic intention fusion vector of the single utterance;

step three: aiming at the utterance sequence of the multi-turn dialog, the semantic intention fusion vector sequence of the multi-turn dialog is constructed by utilizing the semantic intention fusion vector of a single utterance, the semantic intention fusion vector sequence of the multi-turn dialog is taken as input, and the deep learning model is utilized to learn the multi-turn dialog to obtain the whole vector of the multi-turn dialog;

step four: taking the utterance semantic vector sequence of the multi-turn dialogue as the input of a deep learning model, carrying out semantic modeling on the multi-turn dialogue to obtain the whole semantic vector of the multi-turn dialogue, taking the dialogue action sequence of the multi-turn dialogue as the input, and learning the whole intention of the multi-turn dialogue by adopting the deep learning model to obtain the whole intention vector of the multi-turn dialogue;

step five: fusing the whole semantic vectors and the whole intention vectors of the multi-round conversations by adopting an attention mechanism to obtain a conversation whole semantic intention fusion vector;

step six: integrating dialogue vector representations obtained by fusing dialogue semantic information and intention information in different levels respectively in the third step and the fifth step to obtain a whole continuity vector of a dialogue, obtaining continuity quality evaluation scores of multiple rounds of dialogue by using the whole continuity vector of the dialogue as input and using a scoring function, then calculating a continuity score error of training data by using a cross entropy loss function, and updating model parameters through back propagation to perform model training;