CN109522545B - A kind of appraisal procedure that more wheels are talked with coherent property amount - Google Patents

A kind of appraisal procedure that more wheels are talked with coherent property amount Download PDF

Info

Publication number
CN109522545B
CN109522545B CN201811181214.8A CN201811181214A CN109522545B CN 109522545 B CN109522545 B CN 109522545B CN 201811181214 A CN201811181214 A CN 201811181214A CN 109522545 B CN109522545 B CN 109522545B
Authority
CN
China
Prior art keywords
vector
semantic
dialogue
turn
utterance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811181214.8A
Other languages
Chinese (zh)
Other versions
CN109522545A (en
Inventor
兰曼
周云晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201811181214.8A priority Critical patent/CN109522545B/en
Publication of CN109522545A publication Critical patent/CN109522545A/en
Application granted granted Critical
Publication of CN109522545B publication Critical patent/CN109522545B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of appraisal procedures that more wheels are talked with coherent property amount, its main feature is that take turns dialog text as input more, using layering attention mechanism, respectively in the whole level of single discourse level and more wheel dialogues, the semantic information and intent information of dialogue are merged, realizes that more wheels talk with the automatic assessment of coherent property amount.Training speed is fast compared with prior art by the present invention, it is high to analyze accuracy rate, it does not need to extract entity from text, avoid the propagation that entity extracts error, it is especially suitable for extensive and real-time corpus, in conjunction with semantic information and intent information that dialogue is contained, effectively assesses the coherent property amount of more wheel dialogues automatically, the more wheel dialog generation systems of guidance generate the dialog text of more high quality, and dialog generation system is promoted faster and better to develop.

Description

Method for evaluating consistency quality of multi-turn conversation
Technical Field
The invention relates to the technical field of internet deep learning models, in particular to a method for evaluating the consistency quality of a multi-turn conversation by utilizing deep learning.
Background
In recent years, more and more man-machine conversation systems, such as chat/customer service robots (Chatbot) like apple Siri, microsoft mini ice, etc., are appearing in daily life. Dialogues (dialogues) are basic information interaction modes in human social activities, including television interviews, question-answering dialogues, WeChat chats and the like, and a key core technology in man-machine dialogues is Multi-turn dialog Generation (Multi-turn dialog Generation), which is a mark of artificial intelligence development level and is also a research hotspot in the field of natural language processing and is paid more and more attention by researchers. The quality assessment of multi-turn dialog generation generally takes the following two approaches: 1) objective evaluation measurement such as BLEU and ROUGE in other text generation tasks (such as machine translation or automatic summarization) is borrowed, the objective automatic evaluation measurement is based on word overlapping, the characteristics of diversity and interactivity of the conversation nature are ignored, and the method is not well suitable for quality evaluation of the conversation; 2) although the manual evaluation has high accuracy, the manual and time cost is high, and the quality evaluation of large-scale and real-time conversation cannot be applied. Considering that a conversation usually occurs between two or more parties, the quality of content generated by multiple rounds of conversation depends on the text coherence (TextCoherence) between the preceding and following conversations, and if the coherence is lacked, the conversation is prone to be in logic confusion, so that the conversation cannot be continued. Therefore, whether the mode of objective automatic evaluation or manual evaluation is adopted, the consistency evaluation of multiple rounds of conversations is a key index for conversation quality evaluation.
The text continuity evaluation mostly adopts a method based on Entity grids (Entity grids) or Entity sequences (Entity sequences), firstly, entities (entities) such as names of people and place names in the text and grammatical roles (such as subject, predicate and the like) of the entities are extracted and filled into Grid nodes (namely Grid points), the conversion of the Entity grammatical roles between adjacent sentences becomes connecting lines between the Grid points, and then the text continuity evaluation is carried out by manually extracting features or utilizing a Convolutional Neural Network (CNN) method.
The following two examples of multi-turn conversations in table 1 further illustrate the problems with the prior art:
TABLE 1 example of two multi-turn conversations
The left and right columns in table 1 are consecutive and non-consecutive multiple round conversation examples (the number of rounds of conversation is 6), respectively, where H denotes Human (Human) in the conversation and C denotes chat bot (Chatbot). Each sentence of H or C is an Utterance (Utterance), for example, H1 and C1 are both utterances. The utterance H1 "i want to buy some flowers to my wife" on the left in table 1 indicates the speaker's willingness to buy flowers, and the utterance C1 reasonably replies recommending appropriate flowers. The intent (intent) information of a Dialog is typically represented using Dialog behavior (Dialog Act), where the Dialog behavior markup of the left H1 indicates that there are reasonable Dialog intent transitions, i.e., Dialog behavior transitions, in consecutive rounds of Dialog that lead to the Dialog being logically expanded around the topic and intent.
The existing conversation continuity quality assessment technology has the following defects: (1) the characteristics of language diversity and multi-round interactivity of the conversation are ignored, so that the accuracy of conversation quality evaluation is low; (2) the whole semantic information of a plurality of rounds of conversation is lacked; (3) lack of intent information implied in multiple rounds of conversation; (4) the method lacks deep fusion of semantic information and intention information of multi-round conversation; (5) the cost of labor and time is high, and the method cannot be suitable for large-scale and real-time dialogue quality evaluation; (6) the method based on the entity grid or sequence depends on the performance of entity extraction, and extraction errors directly influence the performance of subsequent steps.
Disclosure of Invention
The invention aims to design a method for evaluating the consistency quality of multi-turn conversations aiming at the defects of the prior art, which takes multi-turn conversation texts as input and adopts a layered attention mechanism to fuse semantic information and intention information of the conversations respectively at a single speaking level and an integral level of the multi-turn conversations so as to realize the automatic evaluation of the consistency quality of the multi-turn conversations. Through the automatic evaluation to the continuity quality of the multi-turn conversation, the multi-turn conversation generation system can be guided to generate a higher-quality conversation text, the conversation generation system is promoted to develop better and faster, an entity does not need to be extracted from the text, the propagation of an entity extraction error is avoided, characteristics do not need to be designed manually, the cost is saved, the training speed is high, the system is particularly suitable for large-scale and real-time linguistic data, semantic information and intention information contained in the conversation are combined respectively from a single conversation layer and an integral conversation layer, the continuity quality of the multi-turn conversation is effectively evaluated, and the analysis accuracy is greatly improved.
The purpose of the invention is realized as follows: a method for evaluating consistency quality of multi-turn conversations is characterized in that deep learning is adopted to carry out layered semantic modeling on the multi-turn conversations, a conversation word sequence is modeled after semantic vector representation of each utterance is obtained, and then overall semantic information of the multi-turn conversations is obtained, and the consistency quality evaluation is carried out according to the following steps:
the method comprises the following steps: initializing each word in the single utterance into a word vector by table lookup by using the pre-trained word vector, taking the word vector sequence of the single utterance as input, and performing semantic learning on the single utterance by using a deep learning model (such as an LSTM model) to obtain the semantic vector of each utterance.
Step two: initializing the dialogue behavior type of the single utterance into dialogue behavior vector representation, and then fusing the dialogue behavior vector and the single utterance semantic vector of the single utterance by adopting an attention mechanism to obtain a semantic intention fusion vector of the single utterance.
Step three: aiming at the utterance sequence of the multi-turn dialog, the semantic intention fusion vector sequence of the multi-turn dialog is constructed by utilizing the semantic intention fusion vector of a single utterance, the semantic intention fusion vector sequence of the multi-turn dialog is taken as input, and a deep learning model (such as an LSTM model) is utilized to learn the multi-turn dialog so as to obtain the whole vector of the multi-turn dialog.
Step four: and taking the utterance semantic vector sequence of the multi-turn dialog as the input of a deep learning model (such as an LSTM model), performing semantic modeling on the multi-turn dialog to obtain the whole semantic vector of the multi-turn dialog, taking the dialog behavior sequence of the multi-turn dialog as the input, and learning the whole intention of the multi-turn dialog by adopting the deep learning model (such as a CNN model) to obtain the whole intention vector of the multi-turn dialog.
Step five: and (4) fusing the whole semantic vector and the whole intention vector of the multi-round conversation by adopting an attention mechanism to obtain a conversation whole semantic intention fusion vector.
Step six: the method comprises the steps of integrating (such as simply splicing) conversation vectors obtained by fusing conversation semantic information and intention information at different levels respectively to obtain a whole continuity vector of a conversation, obtaining continuity quality evaluation scores of multiple rounds of conversations by using the whole continuity vector of the conversation as input and using a scoring function, calculating a continuity score error of training data by using a cross entropy loss function, and performing model training by updating model parameters through back propagation.
Step seven: and inputting the multiple rounds of conversations to be evaluated and the conversation behavior sequences thereof into the trained multiple rounds of conversation continuity quality automatic evaluation model, and carrying out the continuity quality evaluation and the prediction score of the conversations.
Compared with the prior art, the method has the advantages of high training speed, high analysis accuracy, no need of manual design of features, cost saving, no need of entity extraction from the text, avoidance of propagation of entity extraction errors, and particularly suitability for large-scale and real-time linguistic data.
Drawings
FIG. 1 is a flow chart of the operation of the present invention.
Detailed Description
The invention will now be described in further detail with reference to specific embodiments, conditions and experimental methods, wherein the terms used are defined as follows:
multi-turn dialog (Multi-turn dialog): the multi-turn dialogue is composed of two or more than two turns of ordered dialogue, and each turn of dialogue contains one Utterance (Utterance), namely the text content which is spoken by a single interlocutor in a single time. The multiple rounds of conversation shown in Table 2 below include a total of 6 rounds of conversation, i.e., 6 utterances, H1 "I want to buy some flowers from my wife. "is the first utterance of the multi-turn conversation.
Dialogue acts (Dialog Act): dialog behavior reflects the intent of each dialog, sharing 42 classes, including statements, questions, instructions, claims, explanations, and the like. The type of dialogue acts for each utterance is shown in table 2 below, where the utterance H2 "how much is the red rose? "is a question, expressing the doubt of the interlocutor, while the utterance C2" is 5-pronounces and ten 40-pronounces. "the question is answered by stating the price of the red rose.
TABLE 2 example of six-wheel dialog text and behavior types for each turn of dialog
Word Vector (Word Vector): each word in the text is represented using a vector of low-dimensional continuity values. Word vectors are obtained from the corpus in a pre-trained manner.
Deep Learning Model (Deep Learning Model): the deep learning model is divided into three parts: an input layer, a hidden layer and an output layer. The hidden layer can be expanded into a multilayer structure, the neurons between two adjacent layers are connected with each other, and the neurons in the layer are not connected. Common deep learning models include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long-Short Term Memory Neural Network (LSTM), and the like.
Referring to fig. 1, the input in the present invention is a multi-turn dialog D comprising N utterances u, denoted D ═ u1,u2,…,ui,…,uN},i=[1,2,…,N]Wherein u isiRepresenting the ith utterance. Each utterance u contains n words, the utterance u is represented as a sequence of strings u ═ w1,w2,…,wj,…,wn},j=[1,2,…,n]. In order to obtain a semantic vector representation of the whole of a plurality of rounds of conversation, a semantic vector representation of a single utterance needs to be acquired first.
The invention adopts deep learning to carry out layered semantic modeling on a plurality of rounds of conversations, and carries out modeling on a conversation word sequence after semantic vector representation of each utterance is obtained, so as to obtain the overall semantic information of the plurality of rounds of conversations, and the continuity quality evaluation is carried out according to the following steps:
the method comprises the following steps: in a single-utterance layer, performing semantic modeling on each utterance u in the multi-turn dialog D by using a deep learning model (such as a CNN, RNN or LSTM model) to obtain a semantic vector representation of a single utterance, wherein the specific process is described as follows:
inputting: a multi-turn dialog D containing N utterances u, a word vector dictionary, a deep learning model and related parameters;
and (3) outputting: a semantic vector representation of a single utterance;
the process is as follows: step a1: each utterance typically contains multiple words, and in order to semantically model a single utterance, each word in the single utterance is first converted to its corresponding word vector representation by looking up a table in a pre-trained word vector dictionary. The utterance u contains n words, i.e. a sequence of strings u ═ w1,w2,…,wj,…,wn},j=[1,2,…,n]Each word w in the utterance u is segmented by looking up a pre-trained word vector dictionaryjInitializing into a word vector to obtain a word vector sequence s ═ x of the utterance u1,x2,…,xj,…,xn},j=[1,2,…,n]Wherein x isjA word vector representing the jth word of utterance u.
Step a2: and taking the word vector sequence s of the utterance u as input, and performing semantic modeling on the utterance u by using a deep learning model (such as a CNN (convolutional neural network), RNN (neural network) or LSTM (least squares) model) to obtain semantic vector representation of a single utterance. Taking the LSTM model as an example, the process is described as follows:
(1) and the word vector sequence s of the input speech is { x ═ x1,x2,…,xj,…,xn};
(2) And processing each word vector in the word vector sequence by sequentially adopting a repeating module in the LSTM model according to the following formulas a to e:
ft=σ(Wfxt+Ufht-1+bf) (a)
it=σ(Wixt+Uiht-1+bi) (b)
ot=σ(Woxt+Uoht-1+bo) (c)
ct=ft*ct-1+it*tanh(Wcxt+Ucht-1+bc) (d)
ht=ot*tanh(ct) (e)
wherein x istIs the t-th word vector in the word vector sequence s (t ═ 1,2, …, n]) As input vectors for t time steps; f. oft,it,otA forgetting gate, an input gate and an output gate which respectively represent t time steps; wf,Wi,Wo,WcAnd Uf,Ui,Uo,UcAre all weight parameters, bf,bi,bo,bcAre all bias terms; sigma is an S-shaped curve activation function (sigmoid), and tanh is a hyperbolic tangent function; element-wise Multiplication (Element-wise Multiplication) operations representing corresponding elements of the two vectors; h istIs the hidden layer state at t time step.
(3) Outputting the hidden layer state h of the last time step nnH is represented as a semantic vector for utterance u. The first step, semantic modeling of each utterance by a deep learning model, obtains only semantic information of a single utterance without considering intent information of the utterance.
Step two: in a single utterance layer, the semantic vector representation of the single utterance obtained in the step one is utilized, the intention information of the single utterance is fused by adopting an attention mechanism, and the semantic intention fusion vector of the single utterance is output and obtained, wherein the specific process is described as follows:
inputting: expressing the semantic vector of the single utterance obtained in the step one, and performing deep learning on a model and related parameters according to the dialogue behavior type corresponding to the utterance;
and (3) outputting: a semantic intent fusion vector for a single utterance;
the process is as follows: step b1: initializing the dialogue behavior type of the utterance into dialogue behavior vector representation, and fusing by adopting an attention mechanismCombining the dialogue action vector of the single utterance and the semantic vector representation of the single utterance obtained in the step one to obtain a semantic intention fusion vector of the single utterance, wherein the process is described as follows:
(1) and obtaining fixed-dimension vector representation (taking 200-dimension as an example) corresponding to each conversation behavior type by randomly initializing the 42 conversation behavior types to form a conversation behavior vector dictionary E E R42×200
(2) Initializing the dialogue behavior type of a single utterance u into a vector v by looking up a dialogue behavior vector dictionary Eda
(3) Fusing dialogue action vectors v of single utterances by using attention mechanismdaAnd the semantic vector obtained in the step one represents h to obtain a semantic intention fusion vector h of the utterancedaTaking a 5-degree linear transformation example, the following equations i to k are calculated:
zi=Wih+bi(i=[1,2,...,5]) (i)
wherein, WiIs a weight, biIs an offset term, ziAs a result of the ith linear transformation on the semantic vector h αiFor scalar values, the representation utilizes a dialog behavior vector vdaCalculated ziThe weight value of (1); h isdaI.e. the semantic intent fusion vector of the utterance, is for each ziThe semantic information and the intention information of the utterance u are fused as a result of the weighted summation of (a).
Step three: using step two for the spoken sequences of a multi-turn conversationAnd constructing a semantic intention fusion vector sequence of the whole multi-turn dialog by using the obtained semantic intention fusion vector of the utterance, and obtaining the whole vector representation of the multi-turn dialog by adopting a deep learning model (such as LSTM). The consistency quality of the multi-turn dialog is to evaluate the consistency of the whole multi-turn dialog, so that the third step of the invention utilizes the semantic intention fusion vector h of the utterance obtained in the second step to the utterance sequence of the multi-turn dialogdaBuilding the whole semantic intention fusion vector sequence of the multi-turn dialog, and modeling the utterance sequence of the multi-turn dialog by using a deep learning model (taking an LSTM model as an example) to obtain the whole vector expression h of the multi-turn dialogdThe specific process is described as follows:
inputting: semantic intention fusion vector h of each utterance obtained in step twodaThe constructed semantic intention of the multi-turn dialog is fused with a vector sequence, a deep learning model and relevant parameters;
and (3) outputting: a global vector representation of a plurality of rounds of dialog;
the process is as follows: step c1: utterance sequence { u ] for multiple rounds of dialog D1,u2,…,ui,…,uNAnd e, utilizing the semantic intention fusion vector h of the single speech obtained in the step twodaConstructing a semantic intention fusion vector sequence { h) of a plurality of rounds of conversationsda1,hda2,...,hdai,...,hdaN},i=[1,2,…,N]Wherein h isdaiRepresenting the ith utterance u in a conversationiThe semantic intent fusion vector of (1).
Step c2: taking the semantic intention fusion vector sequence of the multi-turn dialog as input, and modeling the multi-turn dialog by using an LSTM model to obtain the integral vector representation h of the multi-turn dialogdProcess description of the LSTM model with reference to step a2
Step four: and carrying out deep learning (such as LSTM or CNN) modeling on the utterance sequence and the dialogue behavior sequence of the multi-turn dialogue respectively to obtain the overall semantic vector representation and the overall intention vector representation of the multi-turn dialogue. And the second step and the third step are to fuse semantic and intention information by using an attention mechanism at a single utterance level and then obtain the overall representation of the conversation by using the fused sequence. In order to fuse the semantic information and the intention information of the dialog by using an attention mechanism at the overall level of the multi-turn dialog, firstly, the overall semantic modeling and the intention modeling need to be performed on the multi-turn dialog, so that the fourth step of the invention uses a deep learning model (such as a CNN, RNN or LSTM model) to model the utterance sequence and the dialog behavior sequence of the multi-turn dialog respectively to obtain the overall semantic vector representation and the overall intention vector representation of the multi-turn dialog, and the specific process is described as follows:
inputting: the semantic vector of the single utterance obtained in the step one forms a semantic vector sequence of multi-round conversations, a conversation behavior sequence of the multi-round conversations, a deep learning model and related parameters;
and (3) outputting: overall semantic vector representation and overall intention vector representation of a plurality of rounds of dialog;
the process is as follows: step d1: utterance sequence { u ] for multiple rounds of dialog D1,u2,…,ui,…,uNAnd (4) obtaining semantic vector representation h of each utterance by utilizing the step one to form an utterance semantic vector sequence s of Dh={h1,h2,...,hi,...,hN},i=[1,2,…,N]Taking the semantic vector as the input of an LSTM model, and carrying out semantic modeling on multiple rounds of conversations to obtain the overall semantic vector representation h of the multiple rounds of conversationssemReference step a for the specific process of the LSTM model2
Step d2: taking a conversation behavior sequence of a plurality of rounds of conversations as input, and adopting a deep learning model (such as a CNN (neural network), RNN (neural network) or LSTM (least Square) model) to model the overall intention of the plurality of rounds of conversations to obtain the overall intention vector representation of the plurality of rounds of conversations, wherein the process is as follows:
(1) and speech sequence { u ] for multiple rounds of dialog D1,u2,…,ui,…,uND, obtaining a dialogue behavior sequence s of D according to the dialogue behavior type of each utteranceda={da1,da2,...,dai,...,daN},i=[1,2,…,N]Therein daiFor the ith utterance uiThe type of dialog behavior.
(2) Modeling the conversation behavior sequence by adopting a deep learning model to obtain the conversation behavior representation of the whole conversation, taking a CNN model as an example, and describing the process as follows:
(I) for sequences of dialog actions sda={da1,da2,...,dai,...,daNFor each dialog action dai(i=[1,2,…,N]) Looking up a table in the dialogue behavior vector dictionary E constructed in the second step to obtain a corresponding vector representation, and converting the dialogue behavior sequence into a vector sequence { v } containing N vectors1,v2,...,vi,...,vNAs input to the CNN model.
(II) the convolution layer performs convolution operation on the input vector v, and the calculation is performed according to the following formula f:
cj=f(uTvj-k+1:j+b) (f)
wherein f is a nonlinear function, such as a hyperbolic tangent activation function (hyperbolic tangent), an S-shaped curve activation function (sigmoid), etc., b is a bias term, u represents a convolution filter with a window size k, and v represents a convolution filter with a window size k(j-k+1:j)And representing the j-k +1 th dialog behavior to the j (k) th dialog behavior in the dialog behavior sequence. The number of the filters is m, and m eigenvalue sequences C is obtained under a sliding window with the size of k ═ C1,C2,…,CmIn which each sequence of characteristic values Ci=[c1,c2,…,cN-k+1],Ci∈RN-k+1,i=[1,…,m]。
(III) in order to extract important characteristics and control the consistency of output, the pooling layer pools the m characteristic value sequences C to obtain an m-dimensional characteristic vectorOverall intent vector representation v as a multi-turn dialogd
Step five: and in the overall dialogue level, expressing the two vectors obtained in the fourth step of attention mechanism fusion to obtain an overall dialogue semantic intention fusion vector. In order to fuse the semantic information and the intention information of the dialog in the overall level of the multi-turn dialog, the fifth step of the invention is to adopt an attention mechanism and fuse the step d1The obtained overall semantic vector represents hsemAnd step d2The overall intention vector representation v obtained in (1)dObtaining the integral semantic intention fusion vector of the conversationThe specific process is described as follows:
inputting: overall semantic vector representation h of multi-turn dialogsemOverall intent vector representation v of a multi-turn dialogdDeep learning models and related parameters;
and (3) outputting: fusing vectors of the overall semantic intention of the multi-turn dialog;
the process is as follows: step e1: adopting an attention mechanism to fuse the overall semantic vector representation h of multiple rounds of conversationsemAnd the global intent vector representation vdTaking the example of 5 linear transformations, the specific calculation is performed according to the following equations g, l and n:
gi=Wdihsem+bdi(i=[1,2,...,5]) (g)
wherein, WdiIs a weight, bdiIs a bias term, giFor the semantic vector hsemβiFor scalar values, the representation uses the dialogue acts to represent vdCalculated giThe weight value of (a) is set,then for each giThe result of weighted summation of (1) is a dialog whole semantic intention fusion vector fusing the dialog semantic information and the intention information of multiple rounds.
Step six: the steps adopt a layered attention mechanism, semantic information and intention information of the dialog are respectively fused on two layers of single utterance and the whole of multiple rounds of dialogues, and two kinds of whole vector representations h of the multiple rounds of dialogues are obtaineddAndthe sixth step of the present invention is to integrate the two overall vector representations and obtain the continuity quality assessment score of the dialog by using a scoring function (such as softmax, sigmoid function), wherein the training of the multi-round dialog continuity quality automatic assessment model is performed by a cross entropy loss function and back propagation (back propagation), and the specific process is described as follows:
inputting: the overall vector of the multi-turn dialog obtained in the third step represents hdAnd the integral semantic intention fusion vector of the multi-turn dialogue obtained in the step fiveDeep learning models and related parameters;
and (3) outputting: a consistency quality assessment score for multiple sessions;
the process is as follows: step f1: the third step and the fifth step are respectively fused with dialogue semantic information and intention information at different levels to obtain two dialogue vector representations hdAndsynthesis (taking simple concatenation as an example) is performed to obtain the overall coherence vector of the dialog:wherein,indicating a splicing operation.
Step f2: by a vector hcFor inputting, a scoring function (such as softmax, sigmoid function) is utilized to score the overall consistency quality of the multi-turn conversation to obtain the consistency probability distribution h of the conversations. Taking the softmax function as an example, the overall calculation is performed according to the following formula m:
hs=softmax(Wshc+bs) (m)
wherein, WsIs a weight, bsAs a bias term, a vector h of overall coherence for a multi-turn dialogcMapping to a two-dimensional target space to obtain a two-dimensional vector hoAs an input to the softmax function, the softmax function is specifically calculated according to the following equation p:
wherein,represents a vector hsJ-th dimension of (1), j ═ 0,1],Andis a real number between 0 and 1, the sum is 1, whereinThe continuity probability of the dialog, i.e., the continuity quality assessment score of the dialog, is represented.
Step f3: the training of the automatic consistency quality evaluation model of the multi-round conversations is a supervised learning process, and after the consistency quality evaluation scores of the multi-round conversations are obtained, the calculation of a cross entropy loss function of the training data T is carried out according to the following formula q:
theta is a parameter set which needs to be trained and updated in the model and comprises weight and bias term parameters involved in each step and a dialogue behavior vector dictionary E constructed in the second step, wherein | T | represents the number of dialogues in the training data set, and y is a parameter set which needs to be trained and updated in the modeliAnd the continuity score of the ith multi-turn dialog in the training data set is represented, and if the multi-turn dialog is a coherent dialog, the multi-turn dialog is 1, otherwise, the multi-turn dialog is 0.
In order to train and learn the consistency quality automatic evaluation model of multi-turn conversation, the step is to carry out derivation on the loss function, carry out reverse propagation on the consistency error, update the model parameters until loss (theta) is smaller than a preset threshold value tau, stop updating and finish model training.
Step seven: and inputting the multiple rounds of conversations to be evaluated and the conversation behavior sequences thereof into the trained automatic evaluation model of the consistency quality of the multiple rounds of conversations, and predicting the consistency quality evaluation score of the conversations. The seventh step of the invention is to input the multiple rounds of dialogs to be evaluated and the dialog behavior sequences thereof into the multiple rounds of dialog continuity quality automatic evaluation model trained in the sixth step to obtain a real numerical value (between 0 and 1) representing the dialog continuity degree, and output the real numerical value as the continuity quality evaluation score of the multiple rounds of dialogs.
The invention has been described in further detail in order to avoid limiting the scope of the invention, and it is intended that all such equivalent embodiments be included within the scope of the following claims.

Claims (1)

1. A method for evaluating consistency quality of multi-turn conversations is characterized in that deep learning is adopted to carry out layered semantic modeling on the multi-turn conversations, a conversation word sequence is modeled after semantic vector representation of each utterance is obtained, and then overall semantic information of the multi-turn conversations is obtained, and consistency quality evaluation is carried out according to the following steps:
the method comprises the following steps: initializing each word in a single utterance into a word vector by using a pre-trained word vector through table lookup, taking a word vector sequence of the single utterance as input, and performing semantic learning on the single utterance by using a deep learning model to obtain a semantic vector of each utterance;
step two: initializing the dialogue behavior type of a single utterance into dialogue behavior vector representation, and then fusing the dialogue behavior vector of the single utterance and the semantic vector of the single utterance by adopting an attention mechanism to obtain a semantic intention fusion vector of the single utterance;
step three: aiming at the utterance sequence of the multi-turn dialog, the semantic intention fusion vector sequence of the multi-turn dialog is constructed by utilizing the semantic intention fusion vector of a single utterance, the semantic intention fusion vector sequence of the multi-turn dialog is taken as input, and the deep learning model is utilized to learn the multi-turn dialog to obtain the whole vector of the multi-turn dialog;
step four: taking the utterance semantic vector sequence of the multi-turn dialogue as the input of a deep learning model, carrying out semantic modeling on the multi-turn dialogue to obtain the whole semantic vector of the multi-turn dialogue, taking the dialogue action sequence of the multi-turn dialogue as the input, and learning the whole intention of the multi-turn dialogue by adopting the deep learning model to obtain the whole intention vector of the multi-turn dialogue;
step five: fusing the whole semantic vectors and the whole intention vectors of the multi-round conversations by adopting an attention mechanism to obtain a conversation whole semantic intention fusion vector;
step six: integrating dialogue vector representations obtained by fusing dialogue semantic information and intention information in different levels respectively in the third step and the fifth step to obtain a whole continuity vector of a dialogue, obtaining continuity quality evaluation scores of multiple rounds of dialogue by using the whole continuity vector of the dialogue as input and using a scoring function, then calculating a continuity score error of training data by using a cross entropy loss function, and updating model parameters through back propagation to perform model training;
step seven: and inputting the multiple rounds of conversations to be evaluated and the conversation behavior sequences thereof into the trained multiple rounds of conversation continuity quality automatic evaluation model, and carrying out the continuity quality evaluation and the prediction score of the conversations.
CN201811181214.8A 2018-10-11 2018-10-11 A kind of appraisal procedure that more wheels are talked with coherent property amount Active CN109522545B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811181214.8A CN109522545B (en) 2018-10-11 2018-10-11 A kind of appraisal procedure that more wheels are talked with coherent property amount

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811181214.8A CN109522545B (en) 2018-10-11 2018-10-11 A kind of appraisal procedure that more wheels are talked with coherent property amount

Publications (2)

Publication Number Publication Date
CN109522545A CN109522545A (en) 2019-03-26
CN109522545B true CN109522545B (en) 2019-08-23

Family

ID=65770262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811181214.8A Active CN109522545B (en) 2018-10-11 2018-10-11 A kind of appraisal procedure that more wheels are talked with coherent property amount

Country Status (1)

Country Link
CN (1) CN109522545B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188190B (en) * 2019-04-03 2023-07-14 创新先进技术有限公司 Dialogue analysis method, device, server and readable storage medium
CN110162779B (en) * 2019-04-04 2023-08-04 北京百度网讯科技有限公司 Medical record quality evaluation method, device and equipment
RU2744032C2 (en) * 2019-04-15 2021-03-02 Общество С Ограниченной Ответственностью "Яндекс" Method and system for determining result of task execution in crowdsourced environment
CN111159356B (en) * 2019-12-31 2023-06-09 重庆和贯科技有限公司 Knowledge graph construction method based on teaching content
RU2020107002A (en) 2020-02-14 2021-08-16 Общество С Ограниченной Ответственностью «Яндекс» METHOD AND SYSTEM FOR RECEIVING A LABEL FOR A DIGITAL PROBLEM PERFORMED IN A CROWDSORING ENVIRONMENT
CN111460115B (en) * 2020-03-17 2023-05-26 深圳市优必选科技股份有限公司 Intelligent man-machine conversation model training method, model training device and electronic equipment
CN111428470B (en) * 2020-03-23 2022-04-22 北京世纪好未来教育科技有限公司 Text continuity judgment method, text continuity judgment model training method, electronic device and readable medium
CN111241263A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Text generation method and device and electronic equipment
CN112487158B (en) * 2020-11-06 2023-05-05 泰康保险集团股份有限公司 Multi-round dialogue problem positioning method and device
CN112417112A (en) * 2020-11-10 2021-02-26 中山大学 Open domain dialogue system evaluation method based on graph characterization enhancement

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9348816B2 (en) * 2008-10-14 2016-05-24 Honda Motor Co., Ltd. Dialog coherence using semantic features
CN106599196A (en) * 2016-12-14 2017-04-26 竹间智能科技(上海)有限公司 Artificial intelligence conversation method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9348816B2 (en) * 2008-10-14 2016-05-24 Honda Motor Co., Ltd. Dialog coherence using semantic features
CN106599196A (en) * 2016-12-14 2017-04-26 竹间智能科技(上海)有限公司 Artificial intelligence conversation method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Analyzing Dialog Coherence Using Transition Patterns in Lexical and Semantic Features;Amruta Purandare et al;《Proceedings of the Twenty-First International FLAIRS Conference》;20081231;第195-200页
基于深度学习的生成式聊天机器人算法综述;孙立茹 等;《电脑知识与技术》;20180831;第14卷(第23期);第227-228页

Also Published As

Publication number Publication date
CN109522545A (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN109522545B (en) A kind of appraisal procedure that more wheels are talked with coherent property amount
US20180329884A1 (en) Neural contextual conversation learning
CN108549658B (en) Deep learning video question-answering method and system based on attention mechanism on syntax analysis tree
CN110837548B (en) Answer matching method and device, electronic equipment and storage medium
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
Shi et al. Unsupervised dialog structure learning
CN115329779B (en) Multi-person dialogue emotion recognition method
CN111966800A (en) Emotional dialogue generation method and device and emotional dialogue model training method and device
CN107679225B (en) Reply generation method based on keywords
CN115495568B (en) Training method and device for dialogue model, dialogue response method and device
CN115495552A (en) Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment
CN112364148B (en) Deep learning method-based generative chat robot
CN116150338A (en) Intelligent customer service method and system based on multi-round dialogue
CN114153955B (en) Construction method of multi-skill task type dialogue system fusing chatting and common knowledge
CN113656564A (en) Power grid service dialogue data emotion detection method based on graph neural network
CN112287106A (en) Online comment emotion classification method based on dual-channel hybrid neural network
CN110597968A (en) Reply selection method and device
CN113255366A (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN111046157B (en) Universal English man-machine conversation generation method and system based on balanced distribution
Wang et al. Empathetic response generation through graph-based multi-hop reasoning on emotional causality
Li et al. Voice Interaction Recognition Design in Real-Life Scenario Mobile Robot Applications
CN114912020A (en) Multi-sub-target dialogue recommendation method based on user preference graph
CN117494727B (en) De-biasing method for large language model
Ai et al. A Two-Stage Multimodal Emotion Recognition Model Based on Graph Contrastive Learning
CN117037789B (en) Customer service voice recognition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant