CN111382257A

CN111382257A - Method and system for generating dialog context

Info

Publication number: CN111382257A
Application number: CN202010470216.XA
Authority: CN
Inventors: 简葳玙; 王太峰; 何建杉; 林谢雄
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2020-07-07

Abstract

The embodiment of the specification discloses a method and a system for generating a dialog context. The method comprises the following steps: acquiring a conversation text, acquiring at least one knowledge text related to the conversation text according to the conversation text, and generating at least one knowledge vector k corresponding to the at least one knowledge text₁~k_m(ii) a The knowledge text is stored in a knowledge base; according to at least one knowledge vector k₁~k_mAnd decoding hidden state S at current time_tGenerating a knowledge fusion vector for a current time using a first attention model

(ii) a Knowledge fusion vector based on the current time

Context vector of current time

And a decoding hidden state S of the current time_tGenerating the dialog context word y at the current time^t；y¹~y^tThe dialog context words of (a) make up the dialog context, the y¹Representing the dialog context word at t = 1.

Description

Method and system for generating dialog context

Technical Field

The present description relates to the field of Natural Language Processing (NLP), and more particularly to a method and system for generating a dialog context.

Background

In recent years, with the rise of artificial intelligence, man-machine conversation has received wide attention from academic and industrial circles as an important challenge of artificial intelligence. The task-independent chat type conversation gradually becomes the key point of research of people as more intelligent and vivid conversation experience can be provided for users and emotional appeal of the users can be solved. At present, the generation mode of the chatting dialogue can be roughly divided into a retrieval mode and a generation mode.

And the retrieval type dialogue recalls the relevant candidate reply from the knowledge base according to the input of the user by utilizing the information retrieval technology. Unlike the retrievals, the generative methods do not select historical replies from the corpus, but generate completely new replies, because the chat conversation is open and has no definite target and limited knowledge range, so that it is a challenge how to smoothly introduce knowledge in the generative conversation into the external knowledge base and reduce the "yes" and "i know" safety replies.

Therefore, a method for generating a dialog context is desired, which can dynamically select knowledge in the process of generating a reply by a system, so that the dialog system has the capability of naturally switching topics in the process of man-machine dialog.

Disclosure of Invention

One embodiment of the present specification provides a method for generating a dialog context. The method comprises the following steps:

acquiring a conversation text, acquiring at least one knowledge text related to the conversation text according to the conversation text, and generating at least one knowledge vector k corresponding to the at least one knowledge text₁~k_m(ii) a Said isStoring the recognition text in a knowledge base; according to at least one knowledge vector k₁~k_mAnd decoding hidden state S at current time_tGenerating a knowledge fusion vector for a current time using a first attention model

(ii) a Knowledge fusion vector based on the current time

Context vector of current time

One of the embodiments of the present specification provides a system for generating a dialog context, the system comprising:

a knowledge vector generation module, configured to obtain a dialog context, obtain at least one knowledge text related to the dialog context according to the dialog context, and generate at least one knowledge vector k corresponding to the at least one knowledge text₁~k_m(ii) a The knowledge text is stored in a knowledge base; a knowledge fusion vector generation module for generating a knowledge fusion vector k based on at least one knowledge vector k₁~k_mAnd decoding hidden state S at current time_tGenerating a knowledge fusion vector for a current time using a first attention model

(ii) a A dialogue following word generation module for fusing vector based on the knowledge of the current time

Context vector of current time

And a decoding hidden state S of the current time_tGenerating the dialog context word y at the current time^t(ii) a A dialog context generation module for generating y¹~y^tThe dialog context words of (a) make up the dialog context, the y¹Representing the dialog context word at t = 1.

One of the embodiments of the present specification provides an apparatus for generating a dialog context, the apparatus including:

at least one processor and at least one memory; the at least one memory is for storing computer instructions; the at least one processor is configured to execute at least some of the computer instructions to implement a method of generating a dialog context.

One of the embodiments of the present specification provides a computer-readable storage medium storing computer instructions, and when the computer instructions in the storage medium are read by a computer, the computer executes at least part of the instructions to realize a method for generating a dialog context.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a structured flow diagram of a method of generating a dialog context, shown in accordance with some embodiments of the present description;

FIG. 2 is a schematic structural diagram of a first attention model according to some embodiments herein;

FIG. 3 is a diagram of an application scenario for generating a dialog context, shown in accordance with some embodiments of the present description;

FIG. 4 is an exemplary conversation between a chat bot and a user, shown in some embodiments herein.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

FIG. 1 is a structured flow diagram of a method of generating a dialog context, shown in some embodiments in accordance with the present description.

Step 110, obtaining the dialog text, and generating a sequence h of coding hidden state by using a dialog coder according to the dialog text₁~h_n。

In some embodiments, the dialog context may be a dialog statement of a user chatting in the dialog system. The chat Conversation system is a kind of Human-Machine Conversation (Human-Machine Conversation) system, which makes a Machine understand and use natural language to communicate emotionally with a user. Unlike the man-machine dialogue systems that are commonly used for information query and retrieval of specific service types, dialogue through the chat dialogue system is open, i.e. the user's dialogue sentences can be any reasonable natural language sentences, such as: "I want to go to climb a mountain", "do you like to watch a movie", etc. The user dialog may be entered into the dialog system by the user via a human-machine interface, including but not limited to the following: voice input, text input, etc. In some embodiments, to generate a system conversation using more of the above information, the conversation context may include all of the historical conversations of the user and the system during a conversation.

In some embodiments, the dialog system may be implemented using a Sequence-to-Sequence (Seq 2 Seq) network architecture based on a contextual attention mechanism, the Seq2Seq consisting of a dialog encoder and a dialog decoder, the dialog encoder converting a variable length dialog context into a fixed length vector representation and the dialog decoder converting the fixed length vector into a variable length dialog context. In some embodiments, the dialog encoder may be constructed based on a Bi-directional Long Short-Term Memory (Bi-directional LSTM) model. The Bi-directional LSTM is composed of two LSTM models, the first one processes an input sentence sequence from left to right, the other one processes the input sentence sequence from right to left, and hidden states obtained by the two LSTMs are combined at each moment of coding processing and are output as the hidden state of the whole model. Because the Bi-directional LSTM takes the information of the whole context into consideration when encoding, the Bi-directional LSTM has better encoding effect compared with the unidirectional LSTM. In some embodiments, the dialog encoder may also be constructed based on other sequence-based models, such as bi-directional gru, without being limited by the description herein.

In some embodiments, all tokens (tokens) in the dialog context are converted into word-embedding vectors using a word-embedding model, and then input to a dialog encoder, which outputs a compilation of time instantsSequence h of code hidden states₁~h_n. Wherein h is_nAll information above the dialog is contained. The word embedding model may include, but is not limited to: word2vec model, Term Frequency-Inverse Document Frequency model (TF-IDF), or SSWE-C (skip-gram based combined-sensitive Word embedding) model, etc.

Step 120, obtaining the decoding hidden state S at the current time_t。

In some embodiments, the encoding hidden state h generated by the dialog encoder at the last moment in step 110 is used_nAs an initial intermediate semantic vector between the dialog encoder and the dialog decoder, i.e. to encode the hidden state h_nInitial decoding hidden state S as dialog decoder₀At the moment when t is 1, the dialog decoder conceals the state S according to the initial decoding₀Decoding to generate a first decoded hidden state S₁。

In some embodiments, because the decoding process of the dialog context is unidirectional, the dialog decoder may be constructed based on a unidirectional LSTM model. In some embodiments, the dialog decoder may also be constructed based on other sequence-based models, and is not limited by the description herein.

In some embodiments, at the current time (any time t after the first time), the decoding hidden state S at the previous time (time t-1) can be based on_t-1Inputting the decoding of the current time

And knowledge fusion vector of previous time

Generating a decoding hidden state S at the current time as an input of a dialog decoder_tThe calculation formula is as follows:

（1）

wherein,

for knowledge fusion vectors of the previous time, knowledge fusion vectors

For details see in step 150, see,

for knowledge gating, for determining knowledge fusion vectors

The ratio in the input is obtained by the following formula:

（2）

wherein,

and

for a learnable parameter, by

And

to pair

And

and performing linear transformation on the formed splicing matrix, and mapping the result into a range of (0, 1) through a sigmoid function. The sigmoid function is also called Logistic function, and is commonly used for hiding layer neuron output.

In formula (1)

Obtained from the following equation:

（3）

wherein,

and

is a parameter that can be learned by the user,

context vector generated for last moment, with respect to context vector

See step 160, e (y)^t-1) Dialog word y generated for the last moment^t-1The words of (a) are embedded into the vector,

passing parameters

And

for e (y)^t-1) And

the splicing matrix is obtained by linear transformation.

As can be seen from equations (1) to (3), the decoding hidden state S is generated at the current time_tExcept that the context vector of the previous time instant is used

And the dialogue word y generated at the last moment^t-1AsInput of dialog decoder (realization of existing decoder based on context attention mechanism) and knowledge fusion vector of last moment

Dynamic integration of external knowledge into the current decoding hidden state S generated by the dialog decoder_tIn (1). From step 150, knowledge fusion vectors

Is dynamically generated using an attention mechanism so that the dialog decoder can focus on different information in the extrinsic knowledge at each moment of decoding.

Step 130, generating a word vector.

In some embodiments, at least one knowledge text related to the context of the conversation may be obtained from the context of the conversation. Specifically, at least one knowledge text related to the above of the conversation is recalled by querying a knowledge base.

In some embodiments, the knowledge text may be in a ternary format, i.e., the text format is: subject + predicate + object. For example: "Zhang three, height, 226", "Liqu, representational name, drama a" and so on. In some embodiments, the knowledge text may also be a piece of text or other format, and is not limited by the description herein. In some embodiments, the language of the knowledge text may be chinese, english or other languages, and is not limited by the description of the present specification. In some embodiments, the knowledge base may be queried by the search system according to some keywords or/words in the dialog, the relevant knowledge texts may be recalled, the top N knowledge texts may be selected by sorting, and the value of N may be selected by comprehensively considering the computation workload of the model and the acquired richness of knowledge, for example, N may be 30, or other values. In some embodiments, the content in the knowledge base may be from an open source data set, such as Wizard-of-Wikipedia, DuConv, and the like.

In some embodiments, knowledge encoding may be used for any knowledge textThe device encodes each word in the knowledge text, generating a plurality of word vectors. Specifically, the word segmentation model may be used to divide the knowledge text into token sequences, the obtained token sequences are input into a knowledge encoder, the knowledge encoder encodes each token, and a word vector sequence corresponding to the knowledge text is generated

. Finally obtaining a sequence Z of m word vectors corresponding to m knowledge texts₁~Z_m. In some embodiments, transform's encoder may be used as the knowledge encoder. The Transformer is a classical model of natural language processing, the Transformer encoder does not adopt the sequential structure of an RNN model, but processes each input token in parallel through a Self-extension layer and a feed-forward layer, residual connection is used among sub-layers, and finally each input token is coded into a word vector representation with global information.

In some embodiments, a sequence Z of m word vectors to be obtained may be obtained using a third attention model₁~Z_mCombine into m knowledge vectors k₁~k_m. See step 140 for a detailed description of the third attention model. In some embodiments, other methods may also be used to obtain at least one knowledge vector k corresponding to at least one knowledge text₁~k_mFor example, the encoding may be performed by a knowledge graph embedding method, and is not limited by the description of the present specification.

Step 140, a knowledge vector is generated.

In some embodiments, there may be many knowledge texts (e.g. 30 knowledge texts) obtained in step 130, and in order to use all the information contained in these texts for decoding by the decoder, it is first necessary to use one knowledge vector to represent each knowledge text, and then generate knowledge fusion vectors according to the method described in step 150 from the knowledge vectors corresponding to all the knowledge texts. Thus, in some embodiments, a knowledge vector may be generated using the third attention model from the generated plurality of word vectors.

In some embodiments, the input to the third attention model may be the sequence Z of m word vectors obtained in step 130₁~Z_mJ-th of (1): z_j，Z_jComprises a

Individual word vectors:

the output may be the m knowledge vectors k described in step 130₁~k_mJ-th knowledge vector of (1): k is a radical of_j. The third attention model is implemented as follows:

for multiple word vectors

For each of the word vectors

Performing weighting operation, and processing the operation result by using activation function to generate word attention vector w^tThe calculation formula is as follows:

（4）

wherein, V_zAnd W_zAre learnable parameters. In some embodiments, word vectors that are more relevant to the context of a conversation may be trained

Corresponding word attention vector

And is also higher.

(II) attention vector w based on words^tGenerating a plurality of word attention weights corresponding to the word attention vectors using a scoring function

The calculation formula is as follows:

（5）

wherein the softmax function is paired with w^tIs normalized to obtain a value in the range of (0, 1)

. In some embodiments, word attention weight may be obtained by calculating cosine similarity

. In some embodiments, other ways of calculating word attention weights may also be used

And are not intended to be limited by the description herein.

(III) respectively calculating a plurality of word vectors

And multiple word attention weights

And summing the results of the calculations to generate a knowledge vector k_jThe calculation formula is as follows:

（6）

by using an attention mechanism, a plurality of word vectors are encoded

Combining into a knowledge vector k based on importance_jKnowledge vector k_jAnd word vector

The dimension is the same and comprises word vectors

All the information of (2) is convenient for subsequent calculation.

Step 150, generating the knowledge fusion vector of the current time

。

In some embodiments, the at least one knowledge vector k may be based on₁~k_mAnd decoding hidden state S at current time_tGenerating a knowledge fusion vector for a current time using a first attention model

。

In some embodiments, as shown in FIG. 2, the input to the first attention model may be at least one knowledge vector k₁~k_mAnd decoding hidden state S at current time_tThe output may be a knowledge fusion vector for the current time instant

. About knowledge vector k₁~k_mSee step 130 for details. Decoding hidden state S about current time_tReferring to step 120, at each decoding time, the dialog decoder generates a decoding hidden state according to the method described in step 120, so that t-1 decoding hidden states have been generated before the current time t: s₁~S_t-1. The first attention model is implemented as follows:

for knowledge vector k₁~k_mOf the knowledge vector and the decoded hidden state S at the current time instant_tPerforming weighted summation operation, and processing the operation result by using activation function to generate knowledge attention vector

The calculation formula is as follows:

（7）

wherein

Obtained for learnable parameters

Representing a knowledge vector k for a vector of m real numbers₁~k_mDecoding hidden state S corresponding to current time_tThe degree of correlation of (c). In some embodiments, the parameters are trained

So that the decoding from the current time is hidden from the state S_tKnowledge vector k with high correlation_iCalculated to obtain

The higher.

(II) attention vector based on knowledge

Generating and knowledge vector k using a scoring function₁~k_mCorresponding knowledge attention weight

The calculation formula is as follows:

（8）

wherein, softmax function pair

Each element in (1)Normalizing the elements to obtain values in the range of (0, 1)

. In some embodiments, knowledge attention weight may be obtained by calculating cosine similarity

. In some embodiments, other ways of computing knowledge attention weights may also be used

And are not intended to be limited by the description herein.

(III) calculating knowledge vectors k respectively₁~k_mAnd knowledge attention weight

And summing the calculation results to generate a knowledge fusion vector at the current moment

The calculation formula is as follows:

（9）

in the embodiments described in this specification, the vectors are fused by introducing knowledge in the decoding process

The knowledge vector k can be obtained by the dialog decoder at each moment of decoding₁~k_mAvoids losing part of knowledge information in the decoding process. And hiding the state S by using the current decoding_tCalculated knowledge attention weight d^tKnowledge vector k may be given at different times₁~k_mWith different attention. For example: at the current moment, if the knowledge vector k₂Knowledge information contained with currentDecoding hidden state S_tThe correlation is higher, then k₂Corresponding knowledge attention weight d²Will also be higher, at the next instant if the knowledge vector k₃Knowledge information contained and decoding hidden state S at next moment_t+1The correlation is higher, then k₃Corresponding knowledge attention weight d³Will also be higher. It is possible to make the dialog decoder focus on different information in the extrinsic knowledge at different moments of decoding, for example: at the current time in the above example, the decoder may be interested in the knowledge vector k₁~k_mK in (1)₂At the next instant, the decoder may look to the knowledge vector k₁~k_mK in (1)₃。

Step 160, generating a context vector of the current time

。

In step 120, the encoded hidden state h of the semantic information of the above dialog is saved_nAs the initial intermediate semantic vector C between the dialog encoder and the dialog decoder. Since this vector length is fixed, when the length of the dialog context is long, the intermediate semantic vector C cannot hold all the semantic information, thereby limiting the comprehensibility of the dialog decoder. Thus, in some embodiments, a dynamic attention mechanism is used to generate a context vector for a current time instant

As an intermediate semantic vector for the current time between the dialog encoder and the dialog decoder. In some embodiments, the concealment state may be encoded according to a sequence h₁~h_nAnd decoding hidden state S at current time_tGenerating a context vector for the current time using a second attention model

。

In some embodiments, the input to the second attention model may beDecoding hidden states St and sequences h of encoding hidden states at previous instants₁~h_nThe output may be a context vector for the current time instant

. The second attention model is implemented as follows:

for coding sequence h of hidden states₁~h_nFor each of the encoded hidden state and the decoded hidden state S at the current time_tPerforming weighted summation operation, and processing the operation result by using activation function to generate context attention vector

The calculation formula is as follows:

（10）

wherein,

are learnable parameters. Obtained by

Representing the sequence h of encoded hidden states for a vector of n real numbers₁~h_nDecoding hidden state S corresponding to current time_tCorrelation of (2), coding hidden state h with high correlation_iCorresponding to

The higher.

(II) attention vector based on context

Generating and encoding a sequence h of hidden states using a scoring function₁~h_nCorresponding contextual attention weights

The calculation formula is as follows:

（11）

wherein, softmax function pair

Is normalized to obtain a value in the range of (0, 1)

. In some embodiments, the contextual attention weight may be obtained by calculating cosine similarity

. In some embodiments, other ways to compute the contextual attention weight may also be used

And are not intended to be limited by the description herein.

(III) respectively calculating sequences h of coding hidden states₁~h_nAnd a plurality of contextual attention weights

And summing the calculation results to generate a context vector at the current time

The calculation formula is as follows:

（12）

by using

Can make the dialog decoder decode at each timeThe sequence h of the coding hidden state can be obtained at all times₁~h_nAll of the information in (1). At the same time, the hidden state S is concealed by using the current decoding_tCalculated contextual attention weight

The sequence h of the coded hidden states is given at different times₁~h_nDifferent attention is paid so that the dialog decoder can focus on different information in the dialog context at various moments of decoding. For example: the dialog encoder, based on the dialog context: "Tom chasJerry" generates a sequence h encoding a hidden state₁~h₃At time t =1 when the dialog decoder decodes, attention is weighted by context

Administration of h₁The highest attention, and therefore the dialog decoder may be in

Is concerned with h₁Information of the word "Tom" in the corresponding dialog context, and so on, at the time t =2 at which the dialog decoder decodes, the dialog decoder may be at

Is concerned with h₂The information of the word "chase" in the corresponding dialog context may be decoded by the dialog decoder at time t =3

Is concerned with h₃The corresponding dialog is for information of the word "Jerry" above. If the sequence h of the coding hidden states is not given at different times₁~h_nWith different attention, the information that the dialog decoder can take into account is the same at any time the dialog decoder decodes.

At step 170, the following terms are generated.

In some casesIn an embodiment, the vector may be fused based on knowledge of the current time instant

Context vector of current time

And a decoding hidden state St at the current time, and generating a dialog context word yt at the current time. Specifically, a context vector of the current time is generated

Decoding hidden state St at current time and knowledge fusion vector at current time

The splicing matrix is input into a word selection model, and the word selection model is used for predicting the following words y of the conversation at the current moment^tThe calculation formula is as follows:

（13）

wherein, V₁、V₂、b₁And b₂For learnable parameters, in equation (13), S_t、

And

composed mosaic matrix [ S ]_t,

,

]Two linear transformations were performed with the above parameters: 1. [ S ]_t,

,

]And V₁Performing dot product operation to obtain vector and offset vector b₁And performing addition operation. 2. Operation result obtained by linear transformation 1 and V₂Performing dot product operation to obtain vector and offset vector b₂And performing addition operation to finally obtain a vector consisting of M real numbers. M is the size of the vocabulary used by the dialog system, and M real numbers represent the similarity of the predicted dialog context word and the M words in the vocabulary, respectively. Then, the vector is normalized by a softmax function to obtain probabilities represented by M scores (within the range of 0-1), and words in the extracted vocabulary corresponding to the highest probability are selected as the words yt in the conversation context at the current moment.

In some embodiments, y¹~y^tThe dialog context words of (a) constitute a dialog context, y¹Representing the dialog context word at t = 1. The embodiments described in this specification fuse vectors by adding knowledge based on the existing context awareness based Seq2Seq network structure

The dynamic interaction of the external knowledge and the dialog decoder is realized, so that the dialog decoder can dynamically introduce the external knowledge in the decoding process, and a dialog system can naturally and smoothly switch new topics. The following is an example of a set of human-machine dialog and knowledge text shown in fig. 4:

the history dialog shown in fig. 4, a is a dialog of chatting the robot, B is a dialog of the user, and the dialogs that have been performed are: "A: you are recommended a good-for-mouth movie bar. "," a: kay, director F of movie a may be a true stick to inspire the movie. "," B: the director has a good ear! ". At the start of the conversation, the chat bot recalls the set of knowledge texts shown in fig. 4 from the knowledge base: "F, ancestry, USA", "F, gender, male", "F,representative works, movie b "," F, profession, director "," F, date of birth, 1925/12/5/1925 "," movie a, director, F "," movie a, public praise, love movie with good public praise "," movie a, winning prize, american gold prize nomination-tv-like-best mini-drama ", for a total of 8 knowledge texts, please refer to step 130 for a related description of the knowledge texts. Knowledge vectors k corresponding to the above 8 knowledge texts can then be generated according to the methods described in

steps

130 and 140 in this specification₁~k₈. In the generated dialog shown in fig. 4, the knowledge text has been integrated: "movie a, public praise, love movie with public praise", "movie a, director, F". The last dialog in fig. 4 is "B: the director has a good ear! ", focusing the dialog on the director-F, the dialog decoder generates a decoded hidden state S at time t =1 from the intermediate semantic vector generated above the dialog₁(see step 120 for details), then based on the knowledge vector k using a dynamic attention mechanism as described in step 150₁~k₈And decoding the state S₁Generating knowledge fusion vectors

Wherein knowledge fusion vectors are being generated due to the conversational context

In the process of (1), the knowledge text "F, date of birth, 5/12/1925" and "F" is given a higher attention to the knowledge vector corresponding to the movie b ", wherein the degree of matching between" F, date of birth, 5/12/1925 "and the dialogue text is relatively higher, so that the attention given to the corresponding knowledge vector is the highest. The knowledge is then fused into vectors as described in step 170

As one of the inputs of the word selection model, the word selection model may extract the knowledge information with the highest current attention in the process of generating the following words: "F, date of birth, 5 months and 12 days 1925". At the time t =2, knowledgeFused vector

Again, this is one of the inputs to the dialog decoder, so the decoded hidden state S2 generated by the dialog decoder at time t =2 contains the most recently interesting knowledge information: "F, date of birth, 5 months and 12 days 1925". By analogy, vectors are fused by the knowledge described above

The repeated interaction process with the dialog decoder can coherently integrate the knowledge text "F, date of birth, 5/12/1925" and "F, representative, movie b" into the dialog context, resulting in a dialog context "he was born at 5/12/1925" and he also has a representation, movie b ".

In some embodiments, a training set of a dialog context, at least one knowledge text, and a dialog context may be obtained, and a dialog system composed of a dialog encoder, a dialog decoder, a knowledge encoder, a first attention model, a second attention model, a third attention model, and a softmax function may be trained using a back propagation algorithm. Specifically, the dialogue context can be used as a label, and model training is performed in an end-to-end training mode, so that a trained dialogue system is obtained.

It should be noted that the above description of the process 100 is for illustration and description only, and does not limit the scope of the application of the present disclosure. Various modifications and alterations to process 100 will become apparent to those skilled in the art in light of the present description. However, such modifications and variations are intended to be within the scope of the present description. For example, step 130 and step 140 may be combined into one step, and a word vector corresponding to the knowledge text is generated in the same step, and a knowledge vector is generated based on the word vector.

FIG. 3 is a diagram of an application scenario for generating a dialog context, shown in some embodiments according to the present description.

As shown in fig. 3, during the chat process between the chat robot and the user, there is a dialog: "recommend a theatrical bar to you. "," good, you recommend a bar ". The chat robot generates a conversation context and carries out a smooth conversation with the user by using the method described in the specification: the 'called TV play a' is the one played by Zhang three actors. "i prefer three. ",.... Please refer to fig. 1 for a method for generating a dialog context in detail, which is not described herein again.

The method described in this specification can also be applied to other application scenarios, and is not limited by the description of this specification.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method of generating a dialog context, the method comprising:

acquiring a conversation text, acquiring at least one knowledge text related to the conversation text according to the conversation text, and generating at least one knowledge vector k corresponding to the at least one knowledge text₁~k_m(ii) a The knowledge text is stored in a knowledge base;

according to at least one knowledge vector k₁~k_mAnd decoding hidden state S at current time_tGenerating a knowledge fusion vector for a current time using a first attention model

；

Knowledge fusion vector based on the current time

When inContext vector of previous time

And a decoding hidden state S of the current time_tGenerating the dialog context word y at the current time^t；

y¹~y^tThe dialog context words of (a) make up the dialog context, the y¹Representing the dialog context word at t = 1.

2. The method of claim 1, wherein the context vector for the current time instant

Generated in the following way:

generating a sequence h of encoded hidden states from the dialog context using a dialog encoder₁~h_n；

Obtaining decoding hidden state S of current time_t；

According to the sequence h of the coded hidden states₁~h_nAnd a decoding hidden state S of the current time_tGenerating a context vector for the current time using a second attention model

。

3. The method of claim 2, wherein the generating at least one knowledge vector k corresponding to the at least one knowledge text₁~k_mThe method comprises the following steps:

for any of the knowledge texts, encoding each word in the knowledge text using a knowledge encoder, and generating the knowledge vector using a third attention model based on the generated plurality of word vectors.

4. The method of claim 3, wherein the generating the knowledge vector using a third attention model from the generated plurality of word vectors comprises:

for each of the plurality of word vectors, performing a weighted operation on the word vector, and processing an operation result by using an activation function to generate a word attention vector;

generating a plurality of word attention weights corresponding to a plurality of the word vectors using a scoring function based on the word attention vectors;

generating the knowledge vector based on a plurality of the word vectors and a plurality of the word attention weights.

5. The method of claim 4, wherein said k is based on at least one of said knowledge vectors₁~k_mAnd decoding hidden state S at current time_tGenerating a knowledge fusion vector for a current time using a first attention model

The method comprises the following steps:

for at least one of the knowledge vectors k₁~k_mFor each of the knowledge vector and the decoded hidden state S at the current time_tPerforming weighted summation operation, and processing the operation result by using an activation function to generate a knowledge attention vector;

generating at least one knowledge attention weight corresponding to at least one of the knowledge vectors using a scoring function based on the knowledge attention vector;

generating a knowledge fusion vector for the current time based on at least one knowledge vector and at least one knowledge attention weight

。

6. The method of claim 5, wherein the obtaining of the decoding hidden state S at the current time_tThe method comprises the following steps:

at the current moment, based on the previous oneDecoding hidden state S of time_t-1Inputting the decoding of the current time

And knowledge fusion vector of previous time

Generating a decoding hidden state S at the current time as an input of a dialog decoder_t(ii) a Wherein, by knowledge gating

To determine the decoded input at said current time instant

And the knowledge fusion vector of the previous moment

The ratio of (A) to (B);

decoded input of the current time

The dialog context word y generated from the previous moment^t-1Word-embedded representation e (y)^t-1) And context vector of previous time instant

And (4) combining and generating.

7. Method according to claim 6, wherein said sequence h according to said coded hidden state₁~h_nAnd a decoding hidden state S of the current time_tGenerating a context vector for the current time using a second attention model

The method comprises the following steps:

for said sequence h of encoded hidden states₁~h_nFor each of said encoded hidden state and said decoded hidden state S at said current time_tPerforming weighted summation operation, and processing the operation result by using an activation function to generate a context attention vector;

generating and encoding the hidden state h using a scoring function based on the context attention vector₁~h_nA corresponding plurality of contextual attention weights;

based on the coding hidden state h₁~h_nAnd generating a context vector for the current time instant using a plurality of the context attention weights

。

8. The method of claim 7, wherein the fusion vector is based on knowledge of the current time instant

Context vector of current time

And a decoding hidden state S of the current time_tGenerating the dialog context word y at the current time^tThe method comprises the following steps:

generating a context vector for the current time instant

Decoding hidden state S of the current time_tAnd knowledge fusion vector of the current time

Based on the concatenation vector, predicting the dialog context word y at the current time using a softmax function^t。

9. The method of claim 8, wherein said obtaining at least one knowledge text from the dialog context comprises:

and querying the knowledge base, and recalling the at least one knowledge text related to the conversation.

10. The method of claim 9, wherein the knowledge encoder is a transform encoder.

11. The method of claim 10, wherein the method further comprises:

obtaining a training set consisting of the dialog context, the at least one knowledge text, and the dialog context, training a dialog system consisting of the dialog encoder, the dialog decoder, the knowledge encoder, the first attention model, the second attention model, the third attention model, and the softmax function using a back propagation algorithm.

12. A system for generating a dialog context, the system comprising:

a knowledge vector generation module, configured to obtain a dialog context, obtain at least one knowledge text related to the dialog context according to the dialog context, and generate at least one knowledge vector k corresponding to the at least one knowledge text₁~k_m(ii) a The knowledge text is stored in a knowledge base;

a knowledge fusion vector generation module for generating a knowledge fusion vector k based on at least one knowledge vector k₁~k_mAnd a decoding hidden state St of the current time, and generating a knowledge fusion vector of the current time by using the first attention model

；

A dialogue following word generation module for fusing vector based on the knowledge of the current time

Context vector of current time

And the decoding hidden state St of the current moment generates the dialog context word y of the current moment^t；

A dialog context generation module for generating y¹~y^tThe dialog context words of (a) make up the dialog context, the y¹Representing the dialog context word at t = 1.

13. The system of claim 12, further comprising:

an encoding module for generating a sequence h of encoded hidden states using a dialog encoder based on the dialog context₁~h_n；

A decoding module for obtaining the decoding hidden state S at the current time_t；

A context vector generation module for generating a context vector based on the sequence h of the coded hidden states₁~h_nAnd a decoding hidden state S of the current time_tGenerating a context vector for the current time using a second attention model

。

14. The system of claim 13, wherein the generating at least one knowledge vector k corresponding to the at least one knowledge text₁~k_mThe method comprises the following steps:

15. The system of claim 14, wherein the generating the knowledge vector using a third attention model from the generated plurality of word vectors comprises:

16. The system of claim 15, wherein the function is based on at least one of the knowledge vectors k₁~k_mAnd decoding hidden state S at current time_tGenerating a knowledge fusion vector for a current time using a first attention model

The method comprises the following steps:

。

17. The system of claim 16, wherein the obtaining of the decoding hidden state S at the current time_tThe method comprises the following steps:

at the present timeMoment based on the decoded hidden state S of the previous moment_t-1Inputting the decoding of the current time

And knowledge fusion vector of previous time

To determine the decoded input at said current time instant

And the knowledge fusion vector of the previous moment

The ratio of (A) to (B);

decoded input of the current time

And (4) combining and generating.

18. The system according to claim 17, wherein said sequence h according to said encoded concealment state₁~h_nAnd a decoding hidden state S of the current time_tGenerating a context vector for the current time using a second attention model

The method comprises the following steps:

。

19. The system of claim 18, wherein the knowledge fusion vector based on the current time of day

Context vector of the current time

generating a context vector for the current time instant

20. The system of claim 19, wherein said obtaining at least one knowledge text from the dialog context comprises:

21. The system of claim 20, wherein the knowledge encoder is a transform encoder.

22. The system of claim 21, wherein the system further comprises:

a training module to obtain a training set of the dialog context, the at least one knowledge text, and the dialog context, train a dialog system composed of the dialog encoder, the dialog decoder, the knowledge encoder, the first attention model, the second attention model, the third attention model, and the softmax function using a back propagation algorithm.

23. An apparatus to generate a dialog context, wherein the apparatus comprises at least one processor and at least one memory;

the at least one memory is for storing computer instructions;

the at least one processor is configured to execute at least some of the computer instructions to implement the method of any of claims 1-11.

24. A computer-readable storage medium storing computer instructions which, when read by a computer, cause the computer to perform the method of any one of claims 1 to 11.