CN109308316B - Adaptive dialog generation system based on topic clustering - Google Patents
Adaptive dialog generation system based on topic clustering Download PDFInfo
- Publication number
- CN109308316B CN109308316B CN201810823424.6A CN201810823424A CN109308316B CN 109308316 B CN109308316 B CN 109308316B CN 201810823424 A CN201810823424 A CN 201810823424A CN 109308316 B CN109308316 B CN 109308316B
- Authority
- CN
- China
- Prior art keywords
- module
- clustering
- dialogue data
- seq2seq
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a topic clustering-based adaptive dialogue generating system, which comprises a dialogue data module, a vectorization module, a clustering module and a Seq2Seq module, wherein the vectorization module is used for carrying out topic clustering on a plurality of groups of words; a dialogue data module for constructing a dialogue data set before training; the vectorization module is used for vectorizing the dialogue data set before clustering and taking the vectorized dialogue data set as the input of a clustering model to become a clustering basis; the clustering module is used for clustering the vectorized dialogue data set into a plurality of clusters; and the Seq2Seq module is used for constructing a Seq2Seq model and generating corresponding replies to the dialogue data sets in the clusters obtained by the clustering module. The method can cluster the dialogue data according to the theme, and train the dialogue data of different categories by using a specific Seq2Seq model. Under the classical Seq2Seq model, meaningless replies tend to be generated. The model provided by the invention can enable the dialog system to generate more theme-conforming and more meaningful replies. The reply can make the user more willing to communicate with the dialog system, and the user experience is improved.
Description
Technical Field
The invention relates to the field of dialog generation, in particular to a topic clustering-based self-adaptive dialog generation system.
Background
Currently, with the development of artificial intelligence, dialog systems are receiving more and more attention. In Turing test, the dialog system is used as an index for judging whether the computer is intelligent. Products with dialog systems are now being introduced into people's lives, such as Siri from apple and Cortana from microsoft. The application and popularity of dialog systems allow people to interact with computers through the use of natural language. Making human communication with a computer more natural.
With the development of deep learning, many key technologies, such as image recognition, voice recognition, machine translation, and the like, have made relatively great progress. In solving the problems of speech recognition and machine translation, the Sequence-to-Sequence (Seq-to-Sequence) model is more commonly used. Unlike other models, the Seq2Seq model can accept sequences of indefinite length as input and output sequences of indefinite length. Such a feature is important because in many problems, the length of the input and output cannot be known in advance, such as machine translation and speech recognition. In recent years, researchers have also attempted to apply the Seq2Seq model to dialog generation tasks, since dialog generation can also be viewed as a sequence-to-sequence task. The Seq2Seq model achieves better results in many data sets than dialog generation using the traditional N-Gram model.
However, applying the Seq2Seq model directly to a conversation may cause the model to tend to generate meaningless replies such as "good", "no", etc. Such replies occur in large numbers in the dialog data set, thus leaving the model prone to generate such replies during the learning process. Such a reply, while relatively secure, may leave the dialog between the computer and the user very brief. It is difficult for the user to communicate with the computer based on these responses. Various improved models have been proposed by many researchers to try to alleviate this problem. However, in the existing Seq2 Seq-based dialog generation system, it is common to use a single Seq2Seq model to train all the dialog data. However, different conversations contain different topics, with different characteristics of the conversations, which are often difficult to capture well by a single Seq2Seq model.
Disclosure of Invention
The invention aims to provide an adaptive dialog generating system based on topic clustering, which can better capture the characteristics among different topics and generate answers related to the topics.
The purpose of the invention can be realized by the following technical scheme:
a self-adaptive dialog generation system based on topic clustering comprises a dialog data module, a vectorization module, a clustering module and a Seq2Seq module;
the dialogue data module is used for constructing a dialogue data set before training;
the vectorization module is used for vectorizing the dialogue data set before clustering and taking the vectorized dialogue data set as the input of a clustering model to become a clustering basis;
the clustering module is used for clustering the vectorized dialogue data set into clusters;
and the Seq2Seq module is used for constructing a Seq2Seq model and generating corresponding replies to the dialogue data sets in the clusters obtained by the clustering module.
In the Seq2Seq module, the dialogue data is divided into a plurality of clusters according to the clustering module, a Seq2Seq model is constructed for each cluster to train the input dialogue data, and a dialogue reply is generated.
Specifically, in the vectorization module, a bag-of-words model is constructed by adopting tf-idf as a weight calculation mode, and the vectorization processing is performed on the dialogue data set output by the dialogue data module.
Before clustering is performed by using a clustering module, a dialogue data set needs to be vectorized, and a bag-of-words model is a relatively common way for vectorizing the dialogue data set in natural language processing. In the process of converting the dialogue data set by using the bag-of-words model, each word can be uniquely fixed at a certain position in the vector, in a certain dialogue data set, if a certain word appears, the corresponding position can be set to be a non-0 number, and if a certain word does not appear, the corresponding position can be set to be 0. At this time, how to select the non-0 number becomes very important. A simpler way is to set these numbers to 1, but then the importance between words is not distinguished, so tf-idf is used as the weight calculation in the clustering module.
As a way of weight calculation, tf-idf is used to evaluate how important a word is to one of the documents in a corpus or a corpus. The importance of a word increases in direct proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Tf is the word frequency and idf is the inverse file frequency. Wherein idf is specifically: if the documents containing the entry t are fewer, namely n is smaller, idf is larger, the entry t has good category distinguishing capability.
Further, in the present system, the conversion from the dialogue data set to the tf-idf vector is realized by a packet encapsulated in scimit-spare.
After vectorizing the dialogue data set, clustering operation can be carried out on the dialogue data set. In the system, the clustering algorithm adopts a k-means algorithm. The algorithm is a hard clustering algorithm, is a typical target function clustering method based on a prototype, takes a certain distance from a data point to the prototype as an optimized target function, and obtains an adjustment rule of iterative operation by using a function extremum solving method. The algorithm takes Euclidean distance as similarity measurement, and the optimal classification of a corresponding initial clustering center vector V is solved, so that the evaluation index J is minimum. The algorithm uses a sum of squared errors criterion function as a clustering criterion function. The selection of the k initial cluster center points has a large influence on the clustering result, because any k objects are randomly selected as the centers of the initial clusters in the first step of the algorithm to initially represent one cluster. The algorithm reassigns each object remaining in the data set to the nearest cluster based on its distance from the center of the respective cluster in each iteration. After all data objects are examined, one iteration operation is completed, and a new clustering center is calculated. If the value of J does not change before or after an iteration, the algorithm is converged.
Further, in the system, the specific working steps of the k-means algorithm in the clustering module are as follows:
1. randomly selecting k dialogue data sets from the N dialogue data sets as a centroid;
2. measuring the distance to each centroid for each of the remaining sets of session data and categorizing it to the nearest centroid;
3. recalculating the centroid of each obtained class;
4. and (4) repeating the step (2) and the step (3) until the new centroid is equal to the original centroid or is smaller than a specified threshold value.
Further, the Seq2Seq model includes two sub-modules, which are an Encoder and a Decoder, respectively. The Encode receives an input of indefinite length and converts it to a vector of fixed length. And the Decoder generates an output sequence according to the vector obtained by the Encode conversion.
The goal of the Seq2Seq model is to measure a conditional probability p (y)1,y2,...yT|x1,x2,...xT′). Wherein x1,x2,...xT′Is an input sequence, y1,y2,...yTIs the output sequence. The length T' of the input sequence and the length T of the output sequence may be different. The above probability can be expressed by the following equation:
where v is a vector representing a fixed length generated by the Encoder submodule.
Further, in order to realize the Seq2Seq model, a special symbol is added after each sequence to indicate the stop of the sequence in order to allow the model to recognize the stop flag of the sequence.
Further, the Encoder submodule and the Decode submodule are each constituted by one RNNs (RecurrentNeuralNetworks).
The RNNs are one type of artificial neural network, and unlike conventional neural networks, data will be input at different time steps. And the internally stored extrema of the RNNs enable the RNNs to hold previous input information.
Specifically, the output of RNNs is calculated as:
ot=f(st)
wherein o istAnd stRespectively, the output and implicit states at time t in RNNs. While the function f may be different for different tasks. In the present system, what is desired is the probability of each word at the current time, and therefore the softmax function is used. The implicit states in RNNs can be updated by the following equations:
st=g(st-1,xt)
wherein g (×) is a non-linear activation function, which may be a sigmoid function or other more complex activation functions. x is the number oftIs input at time step t, st-1Is the implicit state of the RNNs at time step t-1.
The internal memory mechanism of RNNs allows it to capture information from previous inputs, a mechanism that is important for natural language processing tasks. Since the input and output should not be treated as independent among natural language processing tasks. However, when the input sequence is too long, standard RNNs suffer from long-term dependence. Therefore, GRU (gated recovery unit) is used as a building block in RNNs.
The GRU is a simplified implementation of LSTM (long short-term memory). Unlike the neuron structure in classical RNNs, the GRU incorporates a reset gate and an update gate mechanism, and the calculation formula of the reset gate is as follows:
rt=σ(Wrxt+Urht-1)
where σ is sigmoid function, xtAnd ht-1Is an implicit state of the input and at time t-1 in the GRU. When GRU is used htI.e. s as described in RNNstTwo different symbols are used here for the purpose of differentiation. WrAnd UrIs the weight matrix that needs to be learned.
The calculation formula for the updated gate z is as follows:
z=σ(Wzxt+Uzht-1)
Wzand UzIs a weight matrix to be learned;
and implies a state htIs that
Wherein
Wherein the content of the first and second substances,is element correspondence multiplication, W and U are weight matrixes to be learned, and tanh is a hyperbolic tangent function.
From the above equation, if the reset gate is 0, the implicit state is only affected by the current time. By the mechanism, the model can forget unimportant information in the future by the hidden state, so that the model can be better expressed. And from htIt can be seen that the update gate z determines how much the previous implicit state has influence on the current implicit state.
When a sentence is very long, if only the implicit state of the last state of the Encoder is taken, the information of the first half of the sentence is easy to lose. To solve this problem, an attention mechanism is adopted to use the implicit state of the last moment to represent that the whole sentence uses a vector c under the attention mechanismiIs shown in which ciThe specific calculation formula of (A) is as follows:
wherein, aijThe calculation is made by the following formula:
where η is typically implemented with a multilayer perceptron, s'i-1Is an implicit state, h ', of the Decoder at time instant i-1'jAnd exp is an exponential function which is an implicit state of the Encoder at the j-th moment, and the bottom of the exponential function can be a natural constant or other constants. By paying attention to the introduction of the force mechanism, the model can more fully consider the input information at each moment. And more weight may be assigned to information that the model considers to be important.
Further, the specific process of training and generating the reply of the Seq2Seq model is as follows:
during the training process, the Encoder receives each word in the above as an input, and the input word needs to be converted into word embedding. After the word embedding of each word in the above information is sequentially input into the Encoder, the Decoder integrates the output of each time by adopting an attention mechanism and generates the output of each time by combining the input. The output at each time in the Decoder is the probability of each word in the lexicon occurring at the current time.
Further, the Word embedding is a technique of representing each Word by a vector of a fixed length. The word embedding in the Seq2Seq model can be directly input by using a value pre-trained by Google, and can also be input by randomly initializing a vector for each word. The word embedding of each word can be a fixed value or fine-tuned by the Seq2Seq model during training.
In the generation process, the word embedding corresponding to each word in the above text needs to be input into the Seq2Seq model. The Decoder will integrate the outputs at each time using an attention mechanism and combine the inputs at each time to produce an output at each time. The output at each time instant according to Decoder is the probability of each word appearing in the lexicon at the current time instant.
Further, when selecting the generated reply, the most probable word may be selected at each time, or several of the previous most probable outputs may be retained at each time. The latter option sometimes gives better recovery.
Specifically, when a Seq2Seq model is built and implemented in a Seq2Seq module, if all the models are implemented from the start of neuron building, the workload is large. Therefore, the method needs to be implemented by combining some deep learning frameworks, which can encapsulate some operations, so that certain convenience is brought when a Seq2Seq model is built.
Further, PyTorch is selected as a deep learning framework. Python language is also supported by the previously converted vectors and scimit-spare called using the k-means algorithm. So that the two can be combined relatively well. In addition, the Pythrch supports tensor operations and the construction of dynamic networks. In terms of operation speed, the Pythrch supports the use of a GPU to accelerate operation, and thus, the time consumed by the model when large-scale data training is used can be further reduced. By using the pytorech packaged module, each part in the Seq2Seq model can be implemented relatively easily.
Compared with the prior art, the invention has the following beneficial effects:
1. in the system, the dialogue data is clustered, each category of dialogue data is considered to share one theme, and the dialogue data of each theme is trained by a specific Seq2Seq model. During application, each sentence spoken by a user is subjected to category judgment and processed by a corresponding Seq2Seq model, so that the adaptive process based on topic clustering can enable the system to capture some characteristics specific to each topic. Therefore, the system can generate the answers which are sensitive in subject and more meaningful, and the experience of the user in the dialog system is improved.
Drawings
FIG. 1 is a general schematic diagram of an adaptive dialog generation system based on topic clustering according to the present invention;
FIG. 2 is a schematic diagram of the Seq2Seq model constructed in the Seq2Seq module.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
Fig. 1 is a schematic diagram of an adaptive dialog generation system based on topic clustering, where the system includes a dialog data module, a vectorization module, a clustering module, and a Seq2Seq module;
the dialogue data module is used for constructing a dialogue data set before training;
the vectorization module is used for vectorizing the dialogue data set before clustering and taking the vectorized dialogue data set as the input of a clustering model to become a clustering basis;
the clustering module is used for clustering the vectorized dialogue data set into clusters;
and the Seq2Seq module is used for constructing a Seq2Seq model and generating corresponding replies to the dialogue data sets in the clusters obtained by the clustering module.
In the Seq2Seq module, the dialogue data is divided into a plurality of clusters according to the clustering module, a Seq2Seq model is constructed for each cluster to train the input dialogue data, and a dialogue reply is generated.
Specifically, in the vectorization module, a bag-of-words model is constructed by adopting tf-idf as a weight calculation mode, and the vectorization processing is performed on the dialogue data set output by the dialogue data module.
Before clustering is carried out by using a clustering module, vectorization of a normalized data set is required, and a bag-of-words model is a relatively common way for vectorizing a dialogue data set in natural language processing. In the process of converting the dialogue data set by using the bag-of-words model, each word can be uniquely fixed at a certain position in the vector, in a certain dialogue data set, if a certain word appears, the corresponding position can be set to be a non-0 number, and if a certain word does not appear, the corresponding position can be set to be 0. At this time, how to select the non-0 number becomes very important. A simpler way is to set these numbers to 1, but then the importance between words is not distinguished, so tf-idf is used as the weight calculation in the clustering module.
As a way of weight calculation, tf-idf is used to evaluate how important a word is to one of the documents in a corpus or a corpus. The importance of a word increases in direct proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Because a word appears in more documents, the less the ability to distinguish documents to illustrate the word. Tf is the word frequency and idf is the inverse file frequency. Wherein idf is specifically: if the documents containing the entry t are fewer, namely n is smaller, idf is larger, the entry t has good category distinguishing capability.
Further, in the present system, the conversion from the dialogue data set to the tf-idf vector is realized by a packet encapsulated in scimit-spare.
After vectorizing the dialogue data set, clustering operation can be carried out on the dialogue data set. In the system, the clustering algorithm adopts a k-means algorithm. The algorithm is a hard clustering algorithm, is a typical target function clustering method based on a prototype, takes a certain distance from a data point to the prototype as an optimized target function, and obtains an adjustment rule of iterative operation by using a function extremum solving method. The algorithm takes Euclidean distance as similarity measurement, and the optimal classification of a corresponding initial clustering center vector V is solved, so that the evaluation index J is minimum. The algorithm uses a sum of squared errors criterion function as a clustering criterion function. The selection of the k initial cluster center points has a large influence on the clustering result, because any k objects are randomly selected as the centers of the initial clusters in the first step of the algorithm to initially represent one cluster. The algorithm reassigns each object remaining in the data set to the nearest cluster based on its distance from the center of the respective cluster in each iteration. After all data objects are examined, one iteration operation is completed, and a new clustering center is calculated. If the value of J does not change before or after an iteration, the algorithm is converged.
Further, in the system, the specific working steps of the k-means algorithm in the clustering module are as follows:
1. randomly selecting k dialogue data sets from the N dialogue data sets as a centroid;
2. measuring the distance to each centroid for each of the remaining sets of session data and categorizing it to the nearest centroid;
3. recalculating the centroid of each obtained class;
4. and (4) repeating the step (2) and the step (3) until the new centroid is equal to the original centroid or is smaller than a specified threshold value.
Further, each Seq2Seq model shown in fig. 2 includes two sub-modules, namely an Encoder and a Decoder. The Encode receives an input of indefinite length and converts it to a vector of fixed length. And the Decoder generates an output sequence according to the vector obtained by the Encode conversion.
The goal of the Seq2Seq model is to measure a conditional probability p (y)1,y2,...yT|x1,x2,...xT′). Wherein x1,x2,...xT′Is an input sequence, y1,y2,...yTIs the output sequence. The length T' of the input sequence and the length T of the output sequence may be different. The above probability can be expressed by the following equation:
where v is a vector representing a fixed length generated by the Encoder submodule.
Further, in order to realize the Seq2Seq model, a special symbol is added after each sequence to indicate the stop of the sequence in order to allow the model to recognize the stop flag of the sequence.
Further, the Encoder submodule and the Decode submodule are each constituted by one RNNs (RecurrentNeuralNetworks).
The RNNs are one type of artificial neural network, and unlike conventional neural networks, data will be input at different time steps. And the internally stored extrema of the RNNs enable the RNNs to hold previous input information.
Specifically, the output of RNNs is calculated as:
ot=f(st)
wherein o istAnd stRespectively, the output and implicit states at time t in RNNs. While the function f may be different for different tasks. In the present system, what is desired is the probability of each word at the current time, and therefore the softmax function is used. The implicit states in RNNs can be updated by the following equations:
st=g(st-1,xt)
wherein g (×) is a non-linear activation function, which may be a sigmoid function or other more complex activation functions. x is the number oftIs input at time step t, st-1Is the implicit state of the RNNs at time step t-1.
The internal memory mechanism of RNNs allows it to capture information from previous inputs, a mechanism that is important for natural language processing tasks. Since the input and output should not be treated as independent among natural language processing tasks. However, when the input sequence is too long, standard RNNs suffer from long-term dependence. Therefore, GRU (gated recovery unit) is used as a building block in RNNs.
The GRU is a simplified implementation of LSTM (long short-term memory). Unlike the neuron structure in classical RNNs, the GRU incorporates a reset gate and an update gate mechanism, and the calculation formula of the reset gate is as follows:
rt=σ(Wrxt+Urht-1)
where σ is sigmoid function, xtAnd ht-1Is the implicit state of the input and the t-1 th instant of the GRU. When GRU is used htI.e. s as described in RNNstTwo different symbols are used here for the purpose of differentiation. WrAnd UrIs the weight matrix that needs to be learned.
The calculation formula for the updated gate z is as follows:
z=σ(Wzxt+Uzht-1)
Wzand UzIs a weight matrix to be learned;
and implies a state htThe update formula of (2) is:
wherein
Wherein the content of the first and second substances,is element correspondence multiplication, W and U are weight matrixes to be learned, and tanh is a hyperbolic tangent function.
From the above equation, if the reset gate is 0, the implicit state is only affected by the current time. By the mechanism, the model can forget unimportant information in the future by the hidden state, so that the model can be better expressed. And from htIt can be seen that the update gate z determines how much the previous implicit state has influence on the current implicit state.
When a sentence is very long, if only the implicit state of the last state of the Encoder is taken, the information of the first half of the sentence is easy to lose. To solve this problem, an attention mechanism is adopted to use the implicit state of the last moment to represent that the whole sentence uses a vector c under the attention mechanismiIs shown in which ciThe specific calculation formula of (A) is as follows:
wherein, aijThe calculation is made by the following formula:
where η is typically implemented with a multilayer perceptron, s'i-1Is an implicit state, h ', of the Decoder at time instant i-1'jAnd exp is an exponential function which is an implicit state of the Encoder at the j-th moment, and the bottom of the exponential function can be a natural constant or other constants. By paying attention to the introduction of the force mechanism, the model can more fully consider the input information at each moment. And more weight may be assigned to information that the model considers to be important.
Further, the specific process of training and generating the reply of the Seq2Seq model is as follows:
during the training process, the Encoder receives each word in the above as an input, and the input word needs to be converted into word embedding. After the word embedding of each word in the above information is sequentially input into the Encoder, the Decoder integrates the output of each time by adopting an attention mechanism and generates the output of each time by combining the input. The output at each time in the Decoder is the probability of each word in the lexicon occurring at the current time.
Further, the Word embedding is a technique of representing each Word by a vector of a fixed length. The word embedding in the Seq2Seq model can be directly input by using a value pre-trained by Google, and can also be input by randomly initializing a vector for each word. The word embedding of each word can be a fixed value or fine-tuned by the Seq2Seq model during training.
In the generation process, the word embedding corresponding to each word in the above text needs to be input into the Seq2Seq model. The Decoder will integrate the outputs at each time using an attention mechanism and combine the inputs at each time to produce an output at each time. The output at each time instant according to Decoder is the probability of each word appearing in the lexicon at the current time instant.
Further, when selecting the generated reply, the most probable word may be selected at each time, or several of the previous most probable outputs may be retained at each time. The latter option sometimes gives better recovery.
Specifically, when a Seq2Seq model is built and implemented in a Seq2Seq module, if all the models are implemented from the start of neuron building, the workload is large. Therefore, the method needs to be implemented by combining some deep learning frameworks, which can encapsulate some operations, so that certain convenience is brought when a Seq2Seq model is built.
Further, PyTorch is selected as a deep learning framework. Python language is also supported by the previously converted vectors and scimit-spare called using the k-means algorithm. So that the two can be combined relatively well. In addition, the Pythrch supports tensor operations and the construction of dynamic networks. In terms of operation speed, the Pythrch supports the use of a GPU to accelerate operation, and thus, the time consumed by the model when large-scale data training is used can be further reduced. By using the pytorech packaged module, each part in the Seq2Seq model can be implemented relatively easily.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (7)
1. An adaptive dialog generation system based on topic clustering, characterized in that the dialog data of each topic in the system is trained by a specific Seq2Seq model; when the method is applied, the type of each sentence spoken by a user is judged, and the sentence spoken by the user is processed by a corresponding Seq2Seq model; the system comprises a dialogue data module, a vectorization module, a clustering module and a Seq2Seq module;
the dialogue data module is used for constructing a dialogue data set before training;
the vectorization module is used for vectorizing the dialogue data set before clustering and taking the vectorized dialogue data set as the input of a clustering model to become a clustering basis;
the clustering module is used for clustering the vectorized dialogue data set into clusters;
the Seq2Seq module is used for constructing a Seq2Seq model and generating corresponding replies to the dialogue data sets in the clusters obtained by the clustering module;
in the Seq2Seq module, dividing the dialogue data into a plurality of clusters according to a clustering module, constructing a Seq2Seq model for each cluster to train the input dialogue data, and generating a dialogue reply;
in the Seq2Seq module, constructing a Seq2Seq model to train input dialogue data and generate dialogue replies; the Seq2Seq model comprises two sub-modules, namely an Encoder and a Decoder; the Encoder receives input with an indefinite length and converts the input into a vector with a definite length; the Decoder generates an output sequence according to the vector obtained by the Encode conversion; the Encoder submodule and the Decode submodule are respectively composed of an RNNs;
using attention mechanism to use the implicit state of the last time in the Encoder submodule to represent that the whole sentence uses a vector c under attention mechanismiIs shown in which ciThe specific calculation formula of (A) is as follows:
wherein, aijThe calculation is made by the following formula:
where η is typically implemented with a multilayer perceptron, s'i-1Is an implicit state of the Decoder at the i-1 th time,h′jis the implicit state of Encode at the j-th time.
2. The adaptive dialog generation system based on topic clustering according to claim 1, characterized in that in the vectorization module, tf-idf is used as a weight calculation method to construct a bag-of-words model, and the dialog data set output by the dialog data module is vectorized.
3. An adaptive dialog generation system based on topic clustering according to claim 2 characterized in that in the present system the conversion from the dialog data set to the tf-idf vector is done by means of encapsulated packages in scimit-lern.
4. The adaptive dialog generation system based on topic clustering according to claim 2, characterized in that in the system, the specific working steps of the k-means algorithm in the clustering module are:
1. randomly selecting k dialogue data sets from the N dialogue data sets as a centroid;
2. measuring the distance to each centroid for each of the remaining sets of session data and categorizing it to the nearest centroid;
3. recalculating the centroid of each obtained class;
4. and (4) repeating the step (2) and the step (3) until the new centroid is equal to the original centroid or is smaller than a specified threshold value.
5. The adaptive dialog generation system based on topic clustering according to claim 1, characterized in that when implementing the Seq2Seq model, in order to make the model recognize the stop sign of the sequence, a special symbol is added after each sequence to indicate the stop of the sequence.
6. The adaptive dialog generation system based on topic clustering according to claim 1 characterized by taking GRUs as building blocks in RNNs; unlike the neuron structure in classical RNNs, the GRU incorporates a reset gate and an update gate mechanism, and the calculation formula of the reset gate is as follows:
rt=σ(Wrxt+Urht-1)
where σ is sigmoid function, xtAnd ht-1Is the implicit state of the input and at time t-1 in the GRU; wrAnd UrIs a weight matrix to be learned;
the calculation formula for the updated gate z is as follows:
z=σ(Wzxt+Uzht-1)
Wzand UzIs a weight matrix to be learned;
and implies a state htIs that
Wherein
7. The adaptive dialog generation system based on topic clustering according to claim 1, characterized in that when constructing and implementing the Seq2Seq model in the Seq2Seq module, PyTorch is used as the deep learning framework.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810823424.6A CN109308316B (en) | 2018-07-25 | 2018-07-25 | Adaptive dialog generation system based on topic clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810823424.6A CN109308316B (en) | 2018-07-25 | 2018-07-25 | Adaptive dialog generation system based on topic clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109308316A CN109308316A (en) | 2019-02-05 |
CN109308316B true CN109308316B (en) | 2021-05-14 |
Family
ID=65225979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810823424.6A Active CN109308316B (en) | 2018-07-25 | 2018-07-25 | Adaptive dialog generation system based on topic clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109308316B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297909B (en) * | 2019-07-05 | 2021-07-02 | 中国工商银行股份有限公司 | Method and device for classifying unlabeled corpora |
CN113836275B (en) * | 2020-06-08 | 2023-09-05 | 菜鸟智能物流控股有限公司 | Dialogue model establishment method and device, nonvolatile storage medium and electronic device |
CN111751714A (en) * | 2020-06-11 | 2020-10-09 | 西安电子科技大学 | Radio frequency analog circuit fault diagnosis method based on SVM and HMM |
CN112115687B (en) * | 2020-08-26 | 2024-04-26 | 华南理工大学 | Method for generating problem by combining triplet and entity type in knowledge base |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107133224A (en) * | 2017-04-25 | 2017-09-05 | 中国人民大学 | A kind of language generation method based on descriptor |
US9830315B1 (en) * | 2016-07-13 | 2017-11-28 | Xerox Corporation | Sequence-based structured prediction for semantic parsing |
CN108319668A (en) * | 2018-01-23 | 2018-07-24 | 义语智能科技(上海)有限公司 | Generate the method and apparatus of text snippet |
-
2018
- 2018-07-25 CN CN201810823424.6A patent/CN109308316B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9830315B1 (en) * | 2016-07-13 | 2017-11-28 | Xerox Corporation | Sequence-based structured prediction for semantic parsing |
CN107133224A (en) * | 2017-04-25 | 2017-09-05 | 中国人民大学 | A kind of language generation method based on descriptor |
CN108319668A (en) * | 2018-01-23 | 2018-07-24 | 义语智能科技(上海)有限公司 | Generate the method and apparatus of text snippet |
Non-Patent Citations (5)
Title |
---|
基于双向LSTMN神经网络的中文分词研究分析;黄积杨;《中国优秀硕士学位论文全文数据库信息科技辑》;20161015(第10期);第34页 * |
基于强化学习的开放领域聊天机器人对话生成算法;曹东岩;《中国优秀硕士学位论文全文数据库信息科技辑》;20180215(第2期);全文 * |
基于深度学习的智能聊天机器人的研究;梁苗苗;《中国优秀硕士学位论文全文数据库信息科技辑》;20180615(第6期);全文 * |
官宸宇.面向事件的社交媒体文本自动摘要研究.《中国优秀硕士学位论文全文数据库信息科技辑》.2017,(第8期),第12、16-18、25、28、33、36-37页. * |
面向事件的社交媒体文本自动摘要研究;官宸宇;《中国优秀硕士学位论文全文数据库信息科技辑》;20170815(第8期);第12、16-18、25、28、33、36-37页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109308316A (en) | 2019-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111368996B (en) | Retraining projection network capable of transmitting natural language representation | |
KR102071582B1 (en) | Method and apparatus for classifying a class to which a sentence belongs by using deep neural network | |
CN110188358B (en) | Training method and device for natural language processing model | |
CN109308316B (en) | Adaptive dialog generation system based on topic clustering | |
CN111858931B (en) | Text generation method based on deep learning | |
CN110427461B (en) | Intelligent question and answer information processing method, electronic equipment and computer readable storage medium | |
CN110609891A (en) | Visual dialog generation method based on context awareness graph neural network | |
CN110377916B (en) | Word prediction method, word prediction device, computer equipment and storage medium | |
CN110990543A (en) | Intelligent conversation generation method and device, computer equipment and computer storage medium | |
CN112115687B (en) | Method for generating problem by combining triplet and entity type in knowledge base | |
WO2022217849A1 (en) | Methods and systems for training neural network model for mixed domain and multi-domain tasks | |
CN112417894B (en) | Conversation intention identification method and system based on multi-task learning | |
CN112749274B (en) | Chinese text classification method based on attention mechanism and interference word deletion | |
CN111581970B (en) | Text recognition method, device and storage medium for network context | |
CN114676234A (en) | Model training method and related equipment | |
CN114596844B (en) | Training method of acoustic model, voice recognition method and related equipment | |
CN111046157B (en) | Universal English man-machine conversation generation method and system based on balanced distribution | |
KR20240089276A (en) | Joint unsupervised and supervised training for multilingual automatic speech recognition. | |
CN111899766A (en) | Speech emotion recognition method based on optimization fusion of depth features and acoustic features | |
Yang et al. | Sequence-to-sequence prediction of personal computer software by recurrent neural network | |
WO2023159759A1 (en) | Model training method and apparatus, emotion message generation method and apparatus, device and medium | |
CN116150334A (en) | Chinese co-emotion sentence training method and system based on UniLM model and Copy mechanism | |
CN111274359B (en) | Query recommendation method and system based on improved VHRED and reinforcement learning | |
CN113901820A (en) | Chinese triplet extraction method based on BERT model | |
Tascini | Al-Chatbot: elderly aid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |