CN113255372B

CN113255372B - Information generation method and device, electronic equipment and storage medium

Info

Publication number: CN113255372B
Application number: CN202110534852.9A
Authority: CN
Inventors: 崔志; 张嘉益
Original assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2024-06-14
Anticipated expiration: 2041-05-17
Also published as: CN113255372A

Abstract

The disclosure provides an information generation method and device, electronic equipment and a storage medium. The method comprises the following steps: after receiving dialogue input, determining a target reference vector applicable to the current dialogue input according to the similarity of a reference vector in a reference vector set and a hidden vector corresponding to the dialogue input, selecting a target word from a plurality of words contained in a preset word library according to the hidden vector and the target reference vector, and generating a reply sentence according to the target word. Because the target reference vectors suitable for different dialogue inputs have differences, different target words and sentences can be determined by combining different target reference vectors, and finally different reply sentences are determined, so that the diversity of the reply sentences is realized.

Description

Information generation method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of computer communication, and in particular relates to an information generation method and device, electronic equipment and a storage medium.

Background

With the development of natural language processing technology, man-machine interaction technology can realize man-machine dialogue, so that people can quickly obtain answers to questions. After receiving dialogue content input by a user, the electronic equipment performs semantic analysis on the dialogue content, determines and outputs a reply sentence.

At present, the reply sentences output by a machine in the field of man-machine interaction are single, so that the diversity of the reply sentences output by the electronic equipment is enriched, the use experience of a user is improved, and the technical problem to be solved by the person skilled in the art is urgent.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides an information generating method and apparatus.

According to a first aspect of embodiments of the present disclosure, there is provided an information generating method, the method including:

After receiving dialogue input, determining hidden vectors corresponding to the dialogue input;

determining a target reference vector according to the similarity between the reference vector in the reference vector set and the hidden vector;

selecting a target word from a plurality of words included in a preset word library according to the hidden vector and the target reference vector;

And generating a reply sentence according to the target word.

In some embodiments, the word stock is presented in a vector form; selecting a target word from a plurality of words included in a preset word library according to the hidden vector and the target reference vector, wherein the target word comprises the following steps:

splicing the hidden vector and the target reference vector to obtain a combined vector;

inputting the combined vector to a decoder such that the decoder outputs an intermediate vector;

performing dimension conversion on the intermediate vector to obtain a target vector, wherein the dimension of the target vector is smaller than or equal to the dimension of the word stock;

And selecting the target word from a plurality of words included in the word library according to the element value size of the element in the target vector.

In some embodiments, the generating a reply sentence according to the target word includes:

Inputting the target word into the decoder so that the decoder outputs a new intermediate vector;

performing dimension conversion on the new intermediate vector to obtain a new target vector, wherein the dimension of the new target vector is smaller than or equal to the dimension of the word stock;

selecting a new target word from a plurality of words included in the word library according to the element value of the element in the new target vector;

After the newly selected target words are circularly input into the decoder, the reply sentence is generated according to the target words selected from the word library for a plurality of times.

In some embodiments, the method further comprises:

extracting a sub-word library from the word library according to the combination vector;

the step of performing dimension conversion on the intermediate vector to obtain a target vector includes:

performing dimension conversion on the intermediate vector to obtain the target vector with the same dimension as the sub-word library;

The selecting the target word from the words included in the word library according to the element value size of the element in the target vector, including:

And selecting the target word from a plurality of words included in the sub word library according to the element value size of the element in the target vector.

In some embodiments, the extracting the sub-word stock from the word stock according to the combination vector includes:

performing dimension conversion on the combined vector to obtain a conversion vector with the same dimension as the word stock;

acquiring an element set of which the element values in the conversion vector meet element value conditions;

Acquiring a word set positioned at the same position in the word library according to the position of the element set in the conversion vector;

And generating the sub-word library according to the word set.

In some embodiments, the word stock is presented in the form of a1×n-dimensional vector, the sub-word stock is presented in the form of a1×n-dimensional vector, and the N is less than the N; the dimension of the intermediate vector is 1×m;

The step of performing dimension conversion on the intermediate vector to obtain the target vector with the same dimension as the sub-word library comprises the following steps:

determining a designated vector corresponding to each word in the sub-word library, wherein the element position of a non-zero element in the designated vector is the same as the position of the corresponding word in the sub-word library, and the dimension of the designated vector is the same as the dimension of the word library;

combining the appointed vectors corresponding to all the words in the sub word library to obtain a matrix with dimension of N multiplied by N;

and determining the target vector with the dimension of 1 multiplied by N according to the global conversion matrix with the dimension of M multiplied by N and the transposed matrix of the matrix with the dimension of N multiplied by N.

In some embodiments, the determining the target vector with a dimension of 1×n according to a global transformation matrix with a dimension of m×n and a transpose matrix of the matrix with a dimension of n×n includes:

Determining a local conversion matrix with dimension of MXn according to the product of the global conversion matrix and the transposed matrix;

And determining the target vector with the dimension of 1 multiplied by n according to the product of the intermediate vector and the local conversion matrix.

In some embodiments, the method is performed by an information generation network comprising an encoder, a reference vector determination sub-network, and the decoder; the information generation network is obtained through training the following steps:

inputting a sample dialogue input into the information generation network such that the decoder in the information generation network loops out sample intermediate vectors;

Responding to the primary output sample intermediate vector of the decoder, and performing dimension conversion on the primary output sample intermediate vector to obtain a corresponding sample target vector;

In the word library, replacing the word to be determined currently in the sample reply sentence with a first identifier, and replacing other words except the word to be determined currently with a second identifier to obtain a first standard vector;

Determining the difference between the sample target vector obtained at the time and the first standard vector;

and after the decoder outputs the sample intermediate vector for a plurality of times, adjusting parameters in the information generation network according to the difference between the sample target vector obtained for a plurality of times and the first standard vector.

In some embodiments, the information generation network further comprises a sub-word library generation sub-network; the training the information generation network further specifically includes:

after inputting the sample dialogue input into the information generation network, acquiring a sample conversion vector determined by the sub-word library generation sub-network;

in the word library, replacing words included in the sample reply sentence with a third identifier, and replacing words not included in the sample reply sentence with a fourth identifier to obtain a second standard vector;

and adjusting parameters in the information generation network according to the difference between the sample conversion vector and the second standard vector.

According to a second aspect of the embodiments of the present disclosure, there is provided an information generating apparatus, the apparatus including:

The hidden vector determining module is configured to determine a hidden vector corresponding to the dialogue input after receiving the dialogue input;

A reference vector determination module configured to determine a target reference vector based on similarities of reference vectors in a set of reference vectors and the hidden vector;

The target word selecting module is configured to select target words from a plurality of words included in a preset word library according to the hidden vector and the target reference vector;

and the reply sentence generating module is configured to generate a reply sentence according to the target word.

In some embodiments, the word stock is presented in a vector form; the target word selecting module comprises:

The combined vector obtaining submodule is configured to splice the hidden vector and the target reference vector to obtain a combined vector;

a decoder use submodule configured to input the combined vector to a decoder such that the decoder outputs an intermediate vector;

the dimension conversion sub-module is configured to perform dimension conversion on the intermediate vector to obtain a target vector, and the dimension of the target vector is smaller than or equal to the dimension of the word stock;

And the target word selecting sub-module is configured to select the target word from a plurality of words included in the word stock according to the element value size of the element in the target vector.

In some embodiments, the reply sentence generation module includes:

A target word input sub-module configured to input the target word into the decoder such that the decoder outputs a new intermediate vector;

the vector dimension conversion sub-module is configured to perform dimension conversion on the new intermediate vector to obtain a new target vector, and the dimension of the new target vector is smaller than or equal to the dimension of the word stock;

A target word selecting sub-module configured to select a new target word from a plurality of words included in the word stock according to an element value size of an element in the new target vector;

And the reply sentence generation sub-module is configured to generate the reply sentence according to target words selected from the word library for a plurality of times after inputting the newly selected target words into the decoder in a circulating way.

In some embodiments, the apparatus further comprises:

the sub-word library extracting module is configured to extract a sub-word library from the word library according to the combination vector;

The dimension conversion sub-module is configured to perform dimension conversion on the intermediate vector to obtain the target vector with the same dimension as the sub-word library;

The target word selecting sub-module is configured to select the target word from a plurality of words included in the sub-word library according to the element value size of the element in the target vector.

In some embodiments, the sub-word library extraction module includes:

The combined vector dimension conversion sub-module is configured to dimension convert the combined vector to obtain a converted vector with the same dimension as the word stock;

An element set obtaining sub-module configured to obtain an element set in which element values in the conversion vector satisfy element value conditions;

a word set acquisition sub-module configured to acquire a word set located at the same position in the word library according to the position of the element set in the conversion vector;

and the sub-word library generation sub-module is configured to generate the sub-word library according to the word set.

The dimension conversion sub-module comprises:

A specified vector obtaining unit configured to determine a specified vector corresponding to each word in the sub-word library, wherein the element position of a non-zero element in the specified vector is the same as the position of the corresponding word in the sub-word library, and the dimension of the specified vector is the same as the dimension of the word library;

The specified vector combination unit is configured to combine specified vectors corresponding to all words in the sub word library to obtain a matrix with dimension of N multiplied by N;

And a target vector determination unit configured to determine the target vector with a dimension of 1×n from a global conversion matrix with a dimension of m×n and a transposed matrix of the matrix with a dimension of n×n.

In some embodiments, the specified vector combination unit includes:

a matrix multiplication subunit configured to determine a local conversion matrix with a dimension of mxn according to a product of the global conversion matrix and the transpose matrix;

a vector multiplication subunit configured to determine the target vector having a dimension of 1×n from a product of the intermediate vector and the local conversion matrix.

In some embodiments, the method is performed by an information generation network comprising an encoder, a reference vector determination sub-network, and the decoder; the apparatus further comprises:

A sample acquisition sub-module configured to acquire sample dialogue inputs and sample reply sentences;

An information generation network usage submodule configured to input the sample dialogue input into the information generation network such that the decoder in the information generation network cyclically outputs sample intermediate vectors;

the sample vector dimension conversion submodule is configured to respond to the primary output sample intermediate vector of the decoder, and perform dimension conversion on the primary output sample intermediate vector to obtain a corresponding sample target vector;

The first standard vector obtaining sub-module is configured to replace a word to be determined currently in the sample reply sentence with a first identifier, and replace other words except the word to be determined currently with a second identifier in the word library to obtain a first standard vector;

A difference determination sub-module configured to determine a difference between the sample target vector obtained this time and the first standard vector;

And the first parameter adjustment submodule is configured to adjust parameters in the information generating network according to the difference between the sample target vector obtained multiple times and the first standard vector after the decoder outputs the sample intermediate vector multiple times.

In some embodiments, the network training module further specifically includes:

a sample conversion vector acquisition sub-module configured to acquire a sample conversion vector determined by the sub-word stock generation sub-network after inputting the sample dialogue input into the information generation network;

the second standard vector obtaining sub-module is configured to replace words included in the sample reply sentence with third identifiers and replace words not included in the sample reply sentence with fourth identifiers in the word library to obtain a second standard vector;

And a second parameter adjustment sub-module configured to adjust parameters in the information generation network according to a difference between the sample conversion vector and the second standard vector.

According to a third aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the first aspects described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

A processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of the first aspects above.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

In the embodiment of the disclosure, after receiving a dialogue input, an electronic device determines a target reference vector applicable to a current dialogue input according to similarity of a reference vector in a reference vector set and a hidden vector corresponding to the dialogue input, selects a target word from a plurality of words included in a preset word library according to the hidden vector and the target reference vector, and generates a reply sentence according to the target word. Because the target reference vectors suitable for different dialogue inputs have differences, different target words and sentences can be determined by combining different target reference vectors, and finally different reply sentences are determined, so that the diversity of reply sentence generation is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart of an information generation method according to an exemplary embodiment of the present disclosure;

FIG. 2 is a flowchart of a training method for an information generation network, according to an exemplary embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an information generation network, shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 4 is a block diagram of an information generating apparatus according to an exemplary embodiment of the present disclosure;

Fig. 5 is a schematic structural view of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

With the development of natural language processing technology, man-machine interaction technology can realize man-machine dialogue, so that people can quickly obtain answers to questions. After receiving the dialogue content question sentences input by the user, the electronic equipment performs semantic analysis on the dialogue content question sentences, determines and outputs reply sentences.

At present, the reply sentences output by a machine in the field of man-machine interaction of electronic equipment are single, so that the diversity of the reply sentences output by the electronic equipment is enriched, the use experience of a user is improved, and the technical problem to be solved by the person skilled in the art is urgent.

The inventors have found in research that it is common practice in the related art for dialog generation to rely on a sequence-to-sequence (seq 2 seq) model, which model contains one encoder and one decoder. Wherein the encoder is used to encode the query of the user, resulting in a hidden vector for this query. And the decoder decodes the corresponding reply according to the hidden vector.

However, the above-described seq2seq model has several problems as follows: first, a selection from the full vocabulary is required for generation, and typically one vocabulary is between 3 and 5 tens of thousands in size. Such a larger matrix operation typically results in a slower rate of generation. Second, the encoder typically has some loss of information for the query encoding, and if there is no additional optimization, the generation of the reply is directly affected. Third, when the current seq2seq is generated, a monotonic and uninteresting reply is generated frequently, and it is difficult to generate different replies according to a query.

Based on this, the embodiment of the disclosure provides an information generating method, which is applicable to an electronic device, after receiving a question sentence dialogue input, according to the similarity of a reference vector in a reference vector set and a hidden vector corresponding to the question sentence dialogue input, determining a target reference vector applicable to the current question sentence dialogue input, according to the hidden vector and the target reference vector, selecting a target word from a plurality of words included in a preset word library, and generating a reply sentence according to the target word. Because the target reference vectors suitable for dialogue input of different question sentences have differences, different target words and sentences can be determined by combining different target reference vectors, and finally different reply sentences are determined, so that the diversity of reply sentence generation is realized.

Fig. 1 is a flowchart illustrating a method for generating information according to an exemplary embodiment, where the method shown in fig. 1 is applied to an electronic device, and the electronic device may be a mobile device such as a smart phone, a tablet computer, a smart watch, a smart bracelet, a smart head-mounted display device, a smart speaker, a PDA (english: personal DIGITAL ASSISTANT, chinese: personal digital assistant), or a stationary device such as a desktop computer smart television.

The method shown in fig. 1 comprises the following steps:

in step 101, after receiving the dialogue input, a hidden vector corresponding to the dialogue input is determined.

In some embodiments, the information generating method is performed by an information generating network, the information generating network comprising an encoder. The dialogue input is input into the encoder such that the encoder outputs a hidden vector corresponding to the dialogue input.

The hidden vector output by the encoder is used to abstract the semantics of the utterance input.

The network structure of the encoder may be set as desired, for example, the encoder may be composed of at least one unidirectional GRU network.

For example, a dialog input consists of a set of words, one dialog input query= [ x ₁,x₂,x₃,...,x_m ], where x _i represents the i-th word in the query. Inputting [ x ₁,x₂,x₃,...,x_m ] into the unidirectional GRU network, so that the unidirectional GRU network converts [ x ₁,x₂,x₃,...,x_m ] into [ h ₁,h₂,h₃,...,h_m ], wherein h _m is a hidden vector corresponding to dialogue input.

In addition, a method in the related art may be used to determine a hidden vector corresponding to the dialogue input.

In step 102, a target reference vector is determined based on the similarity of the reference vectors in the set of reference vectors to the hidden vector.

Inputting a query, matching with different reference vectors, different replies are generated, and the reference vector set can include multiple reference vectors, which represents the possibility of multiple replies, thereby realizing diversified replies.

In some embodiments, the set of reference vectors may be pre-set, or the set of reference vectors may be randomly introduced. The number of reference vectors included in the reference vector set may be set as desired.

In some embodiments, for each reference vector in the set of reference vectors, a similarity of the reference vector to the hidden vector is determined, and a target reference vector whose similarity satisfies a similarity condition is selected from the set of reference vectors.

For example, a target reference vector having the greatest similarity is selected from the set of reference vectors.

The similarity in this embodiment may be cosine similarity or other types of similarity.

In some embodiments, the dimensions of the reference vector may be the same as the dimensions of the hidden vector. For example, the dimensions of the reference vector and the hidden vector are each 1×s, where s is a positive integer.

In some embodiments, the information generating method may be performed by an information generating network, which may include a reference vector determining sub-network.

The set of reference vectors and the hidden vector may be input into a reference vector determination sub-network such that the reference vector determination sub-network determines a target reference vector based on similarities of the reference vectors in the set of reference vectors and the hidden vector and outputs the target reference vector.

In step 103, a target word is selected from a plurality of words included in a preset word library according to the hidden vector and the target reference vector.

In some embodiments, the word stock is presented in a vector form. For example, word libraries are presented in the form of vectors of 1N dimensions.

The target word may be selected from a plurality of words included in a preset word library according to the hidden vector and the target reference vector by: a first step of: splicing the hidden vector and the target reference vector to obtain a combined vector; a second step of inputting the combined vector to a decoder so that the decoder outputs an intermediate vector; and a third step of: performing dimension conversion on the intermediate vector to obtain a target vector, wherein the dimension of the target vector is smaller than or equal to the dimension of the word stock; fourth step: and selecting a target word from a plurality of words included in the word library according to the element value size of the element in the target vector.

For the first step, for example, the reference vector is [ a ₁,a₂,a₃,...,a₁₀ ], the target reference vector is [ b ₁,b₂,b₃,...,b₁₀ ], and the combined vector of the reference vector and the target reference vector is [ a ₁,a₂,a₃,...,a₁₀,b₁,b₂,b₃,...,b₁₀ ].

For the second step, the information generating method is performed by an information generating network, the information generating network comprising a decoder. The network structure of the decoder may be set as desired, for example, the decoder may be composed of at least one unidirectional GRU network.

The dimension of the intermediate vector may be the same as the dimension of the hidden vector.

For the third step, the intermediate vector is the vector that the decoder outputs once per cycle, representing the semantics of the current output word. If the current word is required to be obtained, dimension conversion is required to be carried out on the intermediate vector through full connection to obtain a target vector, the position of the maximum numerical value in the target vector is determined, the word positioned at the same position is obtained from a word library, and therefore the current word is obtained.

For the fourth step, the first case: and under the condition that the dimension of the target vector is equal to the dimension of the word stock, determining the position of the element with the maximum element value in the target vector, and selecting the target word positioned at the same position from a plurality of words included in the word stock.

Second case: the electronic device may further extract a sub-word stock from the word stock according to the combination vector after obtaining the combination vector.

In this case, the electronic device may perform dimension conversion on the intermediate vector to obtain a target vector having the same dimension as the sub-word library, and select a target word from a plurality of words included in the sub-word library according to the element value size of the element in the target vector.

For example, a position of an element of a maximum element value in the target vector is determined, and a target word located at the same position is selected from a plurality of words included in the sub-word library.

In this case, because the target vector is multiplied in each process of generating a target word, the speed of generating the target vector can be effectively improved by reducing the dimension of the target vector, and the efficiency of overall decoding can be effectively improved.

The implementation of extracting a sub-word stock from a word stock according to the combination vector in the second case above is described by way of example.

First, the combined vector is subjected to dimension conversion to obtain a conversion vector with the same dimension as the word stock.

The combined vector may be dimension converted by full concatenation to obtain a converted vector.

The information generating method provided by the embodiment can be executed through an information generating network, the information generating network can comprise a sub-word library generating sub-network, the combined vector can be input into the sub-word library generating sub-network, the sub-word library generating sub-network performs dimension conversion on the combined vector to obtain a conversion vector with the same dimension as the word library, and the conversion vector is output.

Next, an element set is acquired in which element values in the conversion vector satisfy element value conditions.

The element values of the elements in the transformation vector may be scores.

The element value condition may define an element value threshold, and when the element value reaches the element value threshold, it is determined that the element value satisfies the element value condition. Or the element value condition may define an element value range, and when the element value is within the element value range, it is determined that the element value satisfies the element value condition. Or the element value condition may define an element value of Q before the ranking, and when the ranking of the element value is located before Q, it is determined that the element value satisfies the element value condition.

For example, Q is 1000, it is determined that the element values of 1000 in the top ranking all satisfy the element value condition, and further, the element values of 1000 in the top ranking are combined to obtain an element set.

And thirdly, acquiring a word set at the same position in the word library according to the position of the element set in the conversion vector.

The word stock is presented in the form of vectors, the transformation vectors having the same dimensions as the word stock.

The position of the element set in the transformation vector can be understood as: in the transformation vector, the position of each element in the element set.

And finally, generating a sub-word library according to the word set.

The process of performing dimension conversion on the intermediate vector in the second case to obtain the target vector having the same dimension as the sub-word library is described by way of example.

Assume that: the word stock is presented in the form of a 1×n-dimensional vector, the sub-word stock is presented in the form of a 1×n-dimensional vector, N is smaller than N, and the dimension of the intermediate vector is 1×m.

Determining a designated vector corresponding to each word in the sub-word library, wherein the element positions of non-zero elements in the designated vector are the same as the positions of the corresponding words in the sub-word library, the dimension of the designated vector is the same as the dimension of the word library, and the dimension of the designated vector is 1 XN. For example, the non-zero element is 1.

After the appointed vector corresponding to each word in the sub word library is obtained, the appointed vectors corresponding to all the words in the sub word library are combined to obtain a matrix with dimension of N multiplied by N, and a target vector with dimension of 1 multiplied by N is determined according to the global conversion matrix with dimension of M multiplied by N and the transposed matrix of the matrix.

For example, a global conversion matrix having a dimension of mxn may be referred to as a matrix a, a matrix having a dimension of nxn may be referred to as a matrix B, a local conversion matrix having a dimension of mxn may be determined from a product of the matrix a and a transposed matrix of the matrix B, and a target vector having a dimension of 1 xn may be determined from a product of the intermediate vector and the local conversion matrix.

The dimensions of the target vector and the sub word library obtained by the method are 1 multiplied by n.

Illustratively, the word stock is presented in the form of a1×2-dimensional vector, the sub-word stock is presented in the form of a1×1000-dimensional vector, and N is less than N; the dimension of the intermediate vector is 1×512. The dimension of the specified vector for each word in the sub-word library is 1 x 2 ten thousand. And combining the specified vectors corresponding to all the words in the sub word library to obtain a matrix with the dimension of 1000 multiplied by 2 ten thousand. The global transformation matrix with the dimension of 512×2 ten thousand is called a matrix a, the matrix with the dimension of 1000×2 ten thousand is called a matrix B, and the transposed matrix of the matrix a and the matrix B are multiplied to obtain the local transformation matrix with the dimension of 512×1000. The intermediate vector with the dimension of 1×512 and the local conversion matrix with the dimension of 512×1000 are multiplied to obtain the target vector with the dimension of 1×1000.

Compared with the global transformation matrix, the dimension of the local transformation matrix is smaller, so that the dimension transformation is carried out on the intermediate vector by using the local transformation matrix, the required calculation amount is smaller in the process of obtaining the target vector, the target vector can be quickly generated in a short time, and the generation rate of the reply sentence is improved.

Because the number of words included in the sub word library is smaller than that of the total word library, and the dimension of the target vector obtained by using the local conversion matrix is smaller than that of the target vector obtained by using the global conversion matrix, the calculation amount required in the process of selecting the target word from the plurality of words included in the word library is smaller according to the element value of the element in the target vector obtained by using the local conversion matrix, the target vector can be quickly generated in a short time, and the generation rate of the reply sentence is improved.

By adopting the method, the calculation process is simplified, the target vector can be rapidly determined, and the generation rate of reply sentences is improved.

In step 104, a reply sentence is generated from the target word.

In some embodiments, for some types of decoders, the decoder may output multiple intermediate vectors at a time according to the input combination vector, determine all target words according to the multiple intermediate vectors, and combine all target words to obtain the reply sentence.

In this embodiment, all the target words are determined in step 103.

In some embodiments, for some types of decoders, e.g., decoders made up of a GRU network, the decoder may output one or more intermediate vectors at a time from the input combined vector, and determine some of all target words from the one or more intermediate vectors.

In this case, after the execution of step 103 is finished, the target word determined in step 103 may be input into the decoder, so that the decoder outputs a new intermediate vector, performs dimensional conversion on the new intermediate vector to obtain a new target vector, where the new target vector has a dimension smaller than or equal to the dimension of the word stock, selects a new target word from a plurality of words included in the word stock according to the element value of the element in the new target vector, and after the newly selected target word is input into the decoder in a cyclic manner, generates a reply sentence according to the target word selected from the word stock multiple times.

After the newly selected target word is input to the decoder, other new target words may be selected with reference to the above manner of selecting target words.

The information generating method provided by the present embodiment may be performed by an information generating network, which may include a reference vector determination sub-network. After determining the target reference vector according to the similarity of the reference vector and the hidden vector in the reference vector set by using the reference vector to determine the sub-network, the reference vector may be used to determine the sub-network, select a new target word from a plurality of words included in the word stock, and output the new target word.

A stop symbol, such as EOS, may be added to the code during training of the information generating network to instruct the information generating network to stop inputting the newly selected target word into the decoder and the looping process ends. After the training is finished, the information generating network can recognize when the cyclic process is finished.

In the embodiment of the disclosure, after receiving a dialogue input, an electronic device determines a target reference vector applicable to a current dialogue input according to similarity of a reference vector in a reference vector set and a hidden vector corresponding to the dialogue input, selects a target word from a plurality of words included in a preset word library according to the hidden vector and the target reference vector, and generates a reply sentence according to the target word. Because the target reference vectors suitable for different dialogue inputs have differences, different target words and sentences can be determined by combining different target reference vectors, and finally different reply sentences are determined, so that the diversity of the reply sentences is realized.

In some embodiments, the information generating method provided by the present embodiment is performed by an information generating network including an encoder, a reference vector determination sub-network, and a decoder.

Training of the information generating network is required before the information generating network is used.

Fig. 2 is a flowchart of a training method of an information generating network, according to an exemplary embodiment, referring to fig. 2, the training method of the information generating network includes:

in step 201, sample dialogue input and sample reply sentences are acquired.

For example, the sample dialogue input is "do you eat today" and the sample reply sentence is "eat, eat very full".

In step 202, a sample dialogue is input into the input information generation network such that a decoder in the information generation network loops out sample intermediate vectors.

In step 203, in response to the decoder outputting the sample intermediate vector once, the dimension conversion is performed on the sample intermediate vector outputted once, so as to obtain a corresponding sample target vector.

In step 204, in the word library, the word to be currently determined in the sample reply sentence is replaced by a first identifier, and the other words except the word to be currently determined are replaced by a second identifier, so as to obtain a first standard vector.

For example, in the word library, the word to be determined currently in the sample reply sentence is replaced by 1, and the other words except the word to be determined currently are replaced by 0, so as to obtain a first standard vector.

In step 205, the difference between the sample target vector obtained this time and the first standard vector is determined.

In step 206, after the decoder outputs the sample intermediate vector a plurality of times, parameters in the information generating network are adjusted according to the difference between the sample target vector obtained a plurality of times and the first standard vector.

The multiple determined differences are counted, e.g., directly summed, weighted summed, or otherwise calculated, and parameters in the information generating network are adjusted based on the counted results.

There are various ways of adjusting parameters in the information generating network according to the difference between the sample target vector obtained multiple times and the first standard vector, for example, adjusting parameters in the information generating network for a preset number of times according to the difference between the sample target vector obtained multiple times and the first standard vector, or adjusting parameters in the information generating network until the difference is smaller than or equal to the difference threshold, or until the difference is minimum.

In some embodiments, the information generating network may further include a sub-word library generating sub-network, where the sub-word library generating sub-network is configured to perform dimension conversion on the combined vector, obtain a conversion vector having the same dimension as the word library, obtain an element set in which an element value in the conversion vector satisfies an element value condition, obtain a word set in the same position in the word library according to a position of the element set in the conversion vector, and generate the sub-word library according to the word set.

Fig. 3 is a schematic diagram of an information generation network shown in fig. 3, including an encoder, a reference vector determination sub-network, a sub-word library generation sub-network, and a decoder, according to an exemplary embodiment, the arrow direction in fig. 3 indicating the transmission direction of data.

Under such a network structure, the method for training the information generating network may further include: after inputting the sample dialogue into the input information generation network, acquiring a sample conversion vector determined by a sub-word library generation sub-network; in the word library, replacing words included in the sample reply sentence with third identifiers, and replacing words not included in the sample reply sentence with fourth identifiers to obtain a second standard vector; and adjusting parameters in the information generation network according to the difference between the sample conversion vector and the second standard vector.

For example, in the word stock, the words included in the sample reply sentence are replaced by 1, the words not included in the sample reply sentence are replaced by 0, and a second standard vector is obtained.

There are various ways to adjust the parameters in the information generating network according to the difference between the sample conversion vector and the second standard vector, for example, the parameters in the information generating network are adjusted by a preset number of times according to the difference between the sample conversion vector and the second standard vector, or the parameters in the information generating network are adjusted by a preset number of times until the difference between the sample conversion vector and the second standard vector is smaller than or equal to the difference threshold, or until the difference is minimum.

In the process of training the network, parameters in the information generation network can be adjusted according to the difference between the sample target vector and the first standard vector and the difference between the sample conversion vector and the second standard vector, which are obtained for many times, so that the optimization of the information generation network is better realized.

In the embodiment, in the process of network training, the multi-classification loss of the sub-word library is optimized, so that the sub-word library can comprise all words required for generating the reply sentence, the effect that only the target word is selected from the determined sub-word library, and the target word is not selected from different sub-word libraries is achieved, and the generation rate of the reply sentence is improved.

In some embodiments, an appropriate target reference vector may be selected from the set of reference vectors by Gumbel_softmax. Gumbel_softmax was selected because: multiple discretized reference vectors are introduced, and the whole training can be made directly end-to-end and all parameter gradients can be led using gummel_softmax.

In some embodiments, the sequence-to-sequence network in the related art may be modified to obtain the information generation network used in the above embodiments.

For the foregoing method embodiments, for simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will appreciate that the present disclosure is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the disclosure.

Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.

Corresponding to the embodiment of the application function implementation method, the disclosure also provides an embodiment of the application function implementation device and corresponding electronic equipment.

Fig. 4 is a block diagram of an information generating apparatus according to an exemplary embodiment, the apparatus including:

a hidden vector determination module 31 configured to determine, after receiving a dialogue input, a hidden vector corresponding to the dialogue input;

A reference vector determination module 32 configured to determine a target reference vector based on similarities of reference vectors in the set of reference vectors and the hidden vector;

a target word selecting module 33 configured to select a target word from a plurality of words included in a preset word library according to the hidden vector and the target reference vector;

the reply sentence generation module 34 is configured to generate a reply sentence from the target word.

In some embodiments, the word stock is presented in a vector form based on the information generating apparatus shown in fig. 4; the target word selecting module 33 may include:

In some embodiments, the reply sentence generation module 34 may include:

In some embodiments, the apparatus may further comprise:

In some embodiments, the sub-word library extraction module may include:

The dimension conversion sub-module may include:

In some embodiments, the target vector determining unit may include:

A matrix multiplication subunit configured to obtain a local conversion matrix with dimension m×n according to the product of the global conversion matrix and the transposed matrix;

and a vector multiplication subunit configured to obtain the target vector with a dimension of 1×n according to a product of the intermediate vector and the local conversion matrix.

In some embodiments, the method is performed by an information generation network comprising an encoder, a reference vector determination sub-network, and the decoder; the apparatus may further include:

In some embodiments, the network training module may further specifically include:

Fig. 5 is a schematic diagram illustrating a configuration of an electronic device 1600, according to an example embodiment. For example, electronic device 1600 may be a user device, which may be embodied as a mobile phone, computer, digital broadcast electronic device, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, wearable device such as a smart watch, smart glasses, smart bracelet, smart running shoe, and the like.

Referring to fig. 5, the electronic device 1600 may include one or more of the following components: a processing component 1602, a memory 1604, a power component 1606, a multimedia component 1608, an audio component 1610, an input/output (I/O) interface 1612, a sensor component 1614, and a communication component 1616.

The processing component 1602 generally controls overall operation of the electronic device 1600, such as operations associated with display, telephone call, data communication, camera operation, and recording operation. The processing component 1602 may include one or more processors 1620 to execute instructions to perform all or part of the steps of the methods described above. In addition, the processing component 1602 may include one or more modules that facilitate interactions between the processing component 1602 and other components. For example, the processing component 1602 may include a multimedia module to facilitate interactions between the multimedia component 1608 and the processing component 1602.

The memory 1604 is configured to store various types of data to support operations at the device 1600. Examples of such data include instructions for any application or method operating on the electronic device 1600, contact data, phonebook data, messages, pictures, video, and so forth. The memory 1604 may be implemented by any type of volatile or nonvolatile memory device or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 1606 provides power to the various components of the electronic device 1600. Power supply component 1606 can include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 1600.

The multimedia component 1608 includes a screen that provides an output interface between the electronic device 1600 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only a boundary of a touch or a sliding action but also a duration and a pressure related to the touch or the sliding operation. In some embodiments, the multimedia component 1608 includes a front-facing camera and/or a rear-facing camera. When the electronic device 1600 is in an operational mode, such as an adjustment mode or a video mode, the front-facing camera and/or the rear-facing camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 1610 is configured to output and/or input audio signals. For example, the audio component 1610 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 1604 or transmitted via the communication component 1616. In some embodiments, the audio component 1610 further includes a speaker for outputting audio signals.

The I/O interface 1612 provides an interface between the processing component 1602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 1614 includes one or more sensors for providing status assessment of various aspects of the electronic device 1600. For example, the sensor assembly 1614 may detect an on/off state of the electronic device 1600, a relative positioning of the components, such as a display and keypad of the electronic device 1600, the sensor assembly 1614 may also detect a change in position of the electronic device 1600 or a component of the electronic device 1600, the presence or absence of a user's contact with the electronic device 1600, an orientation or acceleration/deceleration of the electronic device 1600, and a change in temperature of the electronic device 1600. The sensor assembly 1614 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 1614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1614 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1616 is configured to facilitate communication between the electronic device 1600 and other devices, either wired or wireless. The electronic device 1600 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 1616 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 1616 described above further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as a memory 1604 including instructions that, when executed by a processor 1620 of an electronic device 1600, enable the electronic device 1600 to perform an information generating method comprising: after receiving dialogue input, determining hidden vectors corresponding to the dialogue input; determining a target reference vector according to the similarity between the reference vector in the reference vector set and the hidden vector; selecting a target word from a plurality of words included in a preset word library according to the hidden vector and the target reference vector; and generating a reply sentence according to the target word.

The non-transitory computer readable storage medium may be a ROM, random-access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An information generation method, the method comprising:

generating a reply sentence according to the target word;

Wherein the word stock is presented in a vector form; selecting a target word from a plurality of words included in a preset word library according to the hidden vector and the target reference vector, wherein the target word comprises the following steps:

2. The method of claim 1, wherein the generating a reply sentence from the target word comprises:

3. The method according to claim 1 or 2, characterized in that the method further comprises:

4. The method of claim 3, wherein said extracting a sub-word stock from said word stock based on said combination vector comprises:

And generating the sub-word library according to the word set.

5. The method of claim 3, wherein the word stock is presented in the form of a 1 x N-dimensional vector, the sub-word stock is presented in the form of a 1 x N-dimensional vector, and the N is less than the N; the dimension of the intermediate vector is 1×m;

6. The method of claim 5, wherein determining the target vector of dimension 1 xn based on a global transformation matrix of dimension mxn and a transposed matrix of the matrix of dimension nxn comprises:

7. The method of claim 4, wherein the method is performed by an information generation network comprising an encoder, a reference vector determination sub-network, and the decoder; the information generation network is obtained through training the following steps:

Responding to the sample intermediate vector output by the decoder at one time, and performing dimension conversion on the sample intermediate vector output at one time to obtain a corresponding sample target vector;

In the word library, replacing the word to be determined currently in a sample reply sentence with a first identifier, and replacing other words except the word to be determined currently with a second identifier to obtain a first standard vector;

8. The method of claim 7, wherein the information generation network further comprises a sub-word library generation sub-network; the training the information generation network further specifically includes:

9. An information generating apparatus, characterized in that the apparatus comprises:

The target word selecting module is configured to select target words from a plurality of words included in a preset word library according to the hidden vector and the target reference vector; the word stock is presented in a vector form;

a reply sentence generation module configured to generate a reply sentence from the target word;

The target word selecting module comprises:

the vector obtaining submodule is configured to splice the hidden vector and the target reference vector to obtain a combined vector;

10. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the method of any of claims 1-8.

11. An electronic device, comprising:

A processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any of claims 1-8.