Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure.
Fig. 1 is a flow diagram of some embodiments of a method of generating natural language text of the present disclosure. The method may be performed, for example, by a natural language text generating device.
As shown in FIG. 1, the method of this embodiment includes steps 101-103.
In step 101, structured data to be generated is acquired as input information.
The structured data includes at least an intent (also called an intent), and optionally, a slot (also called a slot) and its value (also called a slot-value). The structured data may be represented using a structured query language (Structured Query Language, SQL), for example. For example, in order service, one structured data is hello, which is intended as "hello", does not include slots and their values, and indicates the meaning of "hello". For another example, still another structured data is select (name=kender), which includes what is meant as "select", slot is "name", and slot value (or slot value) is "kender", meaning "select a restaurant whose name is kender". For another example, the structured data is select (name=kender, count=2), where the intent of the structured data is "select", indicating that the corresponding restaurant was found; the combination of slots and their values is "name=kender, and count=2", meaning "find a restaurant whose 2 names are kender".
In some embodiments, the structured data to be generated may be one-hot encoded. The parts (e.g., the intent, the slot values) in the structured data are separately single-heat coded. Each of the parts corresponds to one of the one-hot encoded vectors, and which position in the one-hot encoded vector is 1 is determined according to the ordering of the state values of the part in all the state values of the part. The order of the ordering is not limited. But the ordering should be consistent during the training process and the prediction process. For example, in order service, assuming that there are 5 intents in total, the intent to be encoded is select, based on the ordered list of intents, select is intended to be ranked first in the ordered list of all intents, and the corresponding one-hot encoding vector is denoted as (1, 0).
In some embodiments, if the structured data includes multiple parts such as intent, slot and value thereof, the single-hot encoded vectors of the multiple parts (such as the single-hot encoded vector of the intent and the single-hot encoded vector of the slot and value thereof) are concatenated to obtain the single-hot encoded vector of the structured data as a whole (such as the intent-slot-value combination).
In step 102, the structured data to be generated is input into a natural language text generation model and a corresponding text sentence pattern is output.
Step 102 includes two sub-steps, 1021 and 1022.
In step 1021, the structured data to be generated is input to a natural language text generation model, and predictive text nodes of a plurality of output parts are sequentially obtained.
The structured data to be generated is input into a natural language text generation model, which sequentially outputs a plurality of parts by automatic learning, each output part includes at least one predicted text node (the text node may be a word, which may be represented by a corresponding identifier (also referred to as ID)), and each output part serves as an input of a next output part, and the prediction of the next output part is continued until the prediction ends. And after the prediction is finished, converting the identification into a corresponding text node according to a word stock constructed in the training process.
In step 1022, text sentence patterns corresponding to the structured data are obtained by combining according to the text nodes of each predicted output portion.
An exemplary method for combining text nodes to obtain corresponding text sentence patterns includes: selecting a preset number (at least one preset number) of text nodes with the maximum prediction probability from each output part as a prediction text node set of each output part according to the prediction probability of the text nodes output by each output part; then, respectively selecting a text node from the predicted text node set of each output part, and combining the selected text nodes to obtain a text node combination; then traversing text nodes in the predicted text node sets of all the output parts in sequence to obtain a plurality of text node combinations; and finally, selecting a preset number of text nodes with the highest combination probability from the plurality of text node combinations as the corresponding text sentence patterns of the structured data.
Each text node is combined to obtain a corresponding text sentence pattern, which can be realized by using a cluster search algorithm, and specifically comprises the following steps: the method comprises the steps of determining the preset number by setting the beamwidth (Beam-Width), calculating the heuristic cost corresponding to each text node when predicting each output part, and carrying out descending sorting according to the heuristic cost of each text node of the output part from large to small. And according to the heuristic cost descending ordering sequence of the text nodes, a preset number of text nodes are left to expand at the next layer, and other text nodes which are not expanded at the next layer are pruned. The heuristic cost may be calculated, for example, by log (P (x)), where x is a text node or a text node combination, and P (x) represents the probability of x.
Therefore, if the bundle width (i.e., the preset number) is greater than 1, a plurality of text sentences may be correspondingly generated for one piece of input structured data, thereby increasing the diversity of predicted text sentences.
The determining of the text node combinations may be to sum probabilities of all text nodes in one text node combination through prediction probabilities of all text nodes in the text node combinations to obtain a combination probability corresponding to the text node combination, and select a text node combination with a preset number with the highest combination probability from a plurality of text node combinations as a text sentence pattern corresponding to the structured data finally, so as to predict and obtain a preset number of text sentence patterns corresponding to the structured data finally.
Fig. 2 is a schematic diagram of combining text nodes to obtain a corresponding text sentence pattern by using a bundle search algorithm in the present disclosure. For example, as shown in fig. 2, assuming that a certain structured data is predicted by using a natural language text generation model to obtain two output parts, namely a first output part and a second output part, wherein the first output part is provided with three text nodes of "a", "b" and "c", the prediction probability is respectively-1.05, -0.92, -1.39, the preset number is set to be 2, and since the two text nodes with the largest probability are "a" (probability is-1.05) and "b" (probability is-0.92), and the node of "c" with the smallest probability is pruned, the text node set of the first output part is { a, b }. Three text nodes expanded by the second output part aiming at the 'a' in the first output part are respectively'd' (the probability is-1.90), 'e' (the probability is-0.22), 'f' (the probability is-3.00), and according to the principle of selecting two text nodes with the highest probability, pruning the 'f' node with the smallest probability, wherein the text node set of the second output part obtained by the 'a' is { d, e }; the three text nodes expanded for the "b" in the second output part are "g" (the probability is-0.92), "h" (the probability is-0.69), "i" (the probability is-2.30), and according to the principle of selecting two text nodes with the highest probability, pruning is carried out on the "i" node with the lowest probability, so that the text node set of the second output part obtained by the "b" is { g, h }. Thus, four text node combinations of "ad" (probability of-1.05+ (-1.90) = -2.95), "ae" (probability of-1.05+ (-0.22) = -1.27), "bg" (probability of-0.92+ (-0.92) = -0.84), "bh" (probability of-0.92+ (-0.69) = -1.61) can be obtained. And selecting 2 'ae' and 'bh' with the highest combination probability from the four text node combinations to be used as the final text sentence pattern corresponding to the structured data.
In step 103, the placeholders of the slots in the text sentence patterns are replaced as required, so as to obtain the corresponding natural language text.
Specifically, if the structured data to be generated comprises slots and values thereof, replacing placeholders of the slots in the text sentence pattern with values of corresponding slots in the structured data to obtain corresponding natural language text; if the structured data to be generated does not comprise slots and values thereof, directly taking the text sentence pattern as the corresponding natural language text generated by the structured data.
According to the embodiment, the structured data training sample and the corresponding text template are utilized to train the cyclic neural network to obtain the natural language text generation model, the structured data to be generated is input into the natural language text generation model and the corresponding text sentence pattern is output, if necessary, the placeholders of the slots in the text sentence pattern are replaced by the slot values in the structured data to be generated, and finally the natural language text is generated, so that the text sentence pattern corresponding to the text template can be flexibly and automatically learned, the manual configuration work is greatly reduced, and the manpower is saved. In addition, if the bundle width is greater than 1, a plurality of text sentences may be correspondingly generated for one piece of input structured data, thereby increasing the diversity of predicted text sentences.
FIG. 3 is a flow diagram of some embodiments of the present disclosure generating a natural language text generation model. As shown in FIG. 3, the method of this embodiment includes steps 301-303.
In step 301, a training sample set is obtained, which includes a plurality of structured data training samples and text templates corresponding to each training sample, and data in the training sample set is preprocessed to be input information of a training model.
Obtaining the structured data training sample comprises: and extracting all intents and slots from the system reply of the dialogue system of a certain service, and combining all intents and slots to obtain a structured data training sample.
The text template corresponding to the structured data training sample is obtained, which comprises the following steps: and in the system reply of the dialogue system of a certain service, replacing the specific slot value in the reply sentence with the corresponding slot placeholder to obtain the text template corresponding to the structured data training sample. For example, "find your home restaurant" for you [ 2 ], after replacement is "find your home restaurant" for you [ count (number of restaurants).
For example, one list library includes 3 intent-slot combinations (i.e., 3 structured data training samples), with text templates corresponding to each combination in brackets. The list library data includes: select_name (found restaurant named name), bye_ (see again) select_name_count (found restaurant named name's count home).
The obtained structured data training sample and the text template corresponding to the structured data training sample can be preprocessed. Preprocessing for the text templates includes padding processing, addition of sentence symbols, and ID mapping of words and mapping of word vectors in the text templates. Preprocessing for structured data includes performing one-hot encoding. Described in detail below.
A filling process (also called padding) is performed for the text templates, that is, the text templates having equal lengths are processed. For example, the length can be made the same by cutting out the words at the end of the sentence by the sentences with the word number exceeding the fixed length, and adding the same words before the sentences with the insufficient word number. The filling processing method comprises forward filling, and filling the same words before sentences with insufficient word numbers, so that the lengths of the processed sentences are the same.
Adding sentence pattern symbols for the text templates, namely adding a start symbol < sos > for each text template, and representing the start of a sentence; an ending symbol < eos > is added after the text template, indicating the end of the sentence.
And carrying out ID mapping on words in the text template based on the word stock. The process for constructing the word stock comprises the following steps: based on the text template result after word segmentation, each word is marked with a unique identifier of the word, and the mapping relation between the word and the unique identifier is stored. The word stock arranges and stores words in reverse order according to word frequency. For example, according to the word frequency from large to small, the constructed word stock can be expressed as: "UNK", "SOS", "EOS", "and" on the other hand ". "," NAME "," COUNT ","? "," restaurant "," you "," go "," find "," you "," home "," good "," have "," yes "," have "," phone "," or "have". Where "UNK" represents words that do not appear in the word vector (e.g., words with word frequencies below a certain value to cull infrequent occurrences of words), "SOS" is a start indicator representing the beginning of the text template, and "EOS" is an end indicator representing the end of the text template. The word frequency is calculated by dividing the number of times of occurrence of the word in the word stock by the sum of the number of times of occurrence of all words in the word stock.
Preprocessing of word vector mapping for a text template refers to mapping each word in the text template into a word vector based on the segmented text template. For example, the employed word vector generation method includes, for example: glove model, word2vec model.
The preprocessing of generating the one-hot codes for the structured data refers to the one-hot coding of each part (such as intention, groove and groove value) in the structured data training sample. Each of the parts corresponds to one of the one-hot encoded vectors, and which position in the one-hot encoded vector is 1 is determined according to the ordering of the state values of the part in all the state values of the part. The order of the ordering is not limited. But the ordering should be consistent during the prediction process and the training process. For example, in order service, assuming that there are 5 intents in total, the intent to be encoded is select, based on the ordered list of intents, select is intended to be ranked first in the ordered list of all intents, and the corresponding one-hot encoding vector is denoted as (1, 0).
And connecting the word vector obtained by the preprocessed text template with the single-hot coding vector obtained by the preprocessed structured data (the connection can be carried out by adopting a concact method) to obtain a connection vector of the single-hot coding vector and the word vector.
In step 302, input information is input to a recurrent neural network, a plurality of output portions are sequentially obtained, and a total loss is determined according to losses between predicted words in each output portion and actual words in corresponding portions in the text template.
In some embodiments, training of the recurrent neural network (Recurrent Neural Network, RNN) is achieved by way of an encoder-decoder (encoder-decoder). An encoder (encoder) may handle the encoding process for the data in the training sample set and a decoder (decoder) may handle the mapping process for the outputted word ID into a word.
The connection vector of the single thermal coding vector and the word vector obtained in step 301 is input to the recurrent neural network, and a plurality of output parts are sequentially obtained, wherein one output part comprises a predicted word, and each previous output part is used as an input for training the next output part, namely, the predicted word of the next position predicted according to the word of the current position. For example, the next position word is predicted to be "to" according to the current position word, "find" and "to" before are taken as input of the next prediction.
Each step of prediction process can obtain a predicted word from one output part, calculate the loss between the predicted word and the actual word of the corresponding part in the text template, and add up and sum the losses calculated by all the output parts to obtain the total loss. The method for calculating the loss is to calculate the softmax function of the predicted word and the actual word to obtain the loss.
In step 303, a gradient is calculated based on the total loss, and when the gradient meets a preset condition, the training of the cyclic neural network is completed, and a natural language text generation model is obtained.
In some embodiments, the algorithm for gradient based on the loss includes a back propagation algorithm, which refers to optimizing the gradient of the loss and updating the weights of the recurrent neural network according to the gradient. The weights of the networks in the recurrent neural network may be trained through TensorFlow framework.
In some embodiments, the preset condition includes setting a threshold of loss, and when the gradient is less than or equal to the set threshold, the training process of the recurrent neural network is completed, so as to obtain the natural language text generation model.
FIG. 4 is a schematic diagram of some embodiments of training and prediction of a natural language text generation model of the present disclosure. Wherein H0, H1, H2, …, H6 represent a hidden layer of the recurrent neural network, and typically, the initial state H0 is obtained by random initialization.
As shown in fig. 4, a specific training procedure is: each piece of structured data is trained by starting from a start symbol SOS, for example, structured data select (name=kender, count=2), and after passing through a hidden layer H1 of the neural network in the first step, obtaining a first output portion X1, i.e. a first predicted word; taking the connection vector formed by the single-hot coding vector of the X1 and the word vector of the structured data and the word vector of the text template thereof as the input of the second step, and obtaining a second output part X2 after passing through the hidden layer H2 of the cyclic neural network; and taking the connection vector formed by the single-hot coding vector of the X2 and the word vector of the text template of the structured data as the input of the third step, obtaining a third output part X3, … … after passing through the hidden layer H3 of the neural network, and the like, finally obtaining five output parts X1, X2, X3, X4 and X5, wherein the end symbol EOS of the encountered sentence pattern indicates that the training process of the structured data is ended. And then respectively calculating the losses between the predicted words and the actual words contained in the output part obtained in each step, summing the losses in each step to obtain the total loss of the piece of structured data, and calculating the gradient of the structured data based on the total loss until the gradient meets the preset condition, and ending the training process of the piece of structured data by using the cyclic neural network. And continuing training the next piece of structured data until all pieces of structured data are trained, and obtaining the natural language text generation model.
As shown in fig. 4, a specific prediction procedure is: starting from a start symbol SOS, obtaining a first output part X1 after the structured data to be generated passes through a hidden layer H1 of a neural network in the first step, wherein the X1 comprises one or more prediction words; taking X1 and the single-heat coding vector of the structured data to be generated as the input of the second step, and obtaining a second output part X2 after passing through a hidden layer H2 of the neural network, wherein the X2 contains one or more predicted words; taking X2 and the single-heat coding vector of the structured data to be generated as the input of the third step, obtaining a third output part X3 after passing through a hidden layer H3 of the neural network, wherein X3 comprises one or more predicted words, … …, and the like, finally obtaining the predicted words of five output parts of X1, X2, X3, X4 and X5, and ending the prediction process of the structured data when an end symbol EOS of a sentence pattern is encountered. Finally, the predicted words of the five output parts are combined to generate one or more text sentence patterns. And finally, replacing placeholders of the slots in the text sentence pattern with slot values according to the requirements to obtain corresponding natural language texts. The text sentence patterns can be obtained by using the bundle searching method.
Fig. 5 is a schematic diagram of some embodiments of a natural language text generating device of the present disclosure.
As shown in fig. 5, the natural language text generating device 500 of this embodiment includes a memory 510 and a processor 520 coupled to the memory 510, the processor 520 being configured to perform the natural language text generating method in any of the embodiments of the present disclosure based on instructions stored in the memory 510.
The memory 510 may include, for example, system memory, fixed nonvolatile storage media, and the like. The system memory stores, for example, an operating system, application programs, boot loader (BootLoader), database, and other programs.
Fig. 6 is a schematic diagram of a dialog system of an exemplary embodiment of the present disclosure.
As shown in fig. 6, the apparatus of this embodiment includes: structured data generation means 601, natural language text generation means 602.
The structured data generating device 601 is configured to make a system reply according to a user request by the dialog system, and obtain structured data based on the system reply. The structured data generation apparatus 601 includes a user request module 6011, a natural language text understanding module 6012, and a dialog management module 6013.
The user request module 6011 is configured to cause the dialog system to receive a user request from a user side. The user request is natural language text. The user request may include any of a voice request, a text request.
The natural language text understanding module 6012 is configured to process the received user request into structured data that can be understood by the dialog system, that is, structured data corresponding to the user request, through a natural language understanding (Nature Language Understanding, NLU) technique.
The session management module 6013 is configured to generate a system reply in the form of structured data according to the structured data corresponding to the user request.
The natural language text generating device 602 is configured to convert the structured data of the system reply acquired by the structured data generating device 601 into a corresponding natural language text by using a natural language text generating method in any of some embodiments of the present disclosure, and output the corresponding natural language text. The natural language text generating device 602 includes a natural language text generating module 6021 and a replying user module 6022.
The natural language text generation module 6021 is configured to convert the system reply requested by the user into a corresponding natural language text using the natural language text generation method in any of some embodiments of the present disclosure, and output the converted natural language text.
The reply user module 6022 is configured to send the natural language text output by the natural language text generating module 6021 to the user, so as to form a final reply of the dialogue system to the user request.
It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to enable any modification, equivalent replacement, improvement or the like, which fall within the spirit and principles of the present disclosure.