CN113377938A

CN113377938A - Conversation processing method and device

Info

Publication number: CN113377938A
Application number: CN202110706487.5A
Authority: CN
Inventors: 张嘉益
Original assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-09-10

Abstract

The present disclosure relates to the field of human-computer conversation technologies, and in particular, to a conversation processing method and device. The conversation processing method comprises the following steps: obtaining dialogue inquiry data, wherein the dialogue inquiry data comprises an inquiry character sequence; determining a candidate character set of the current character to be generated based on the query character sequence and a reply character sequence generated before the current character to be generated; according to the predetermined target character features, the determined target characters in the candidate character set are used as the current characters to be generated; and updating the reply character sequence based on the target character until all characters in the reply character sequence are generated, and obtaining target reply data. The disclosed method improves human-computer conversation performance.

Description

Conversation processing method and device

Technical Field

The present disclosure relates to the field of human-computer conversation technologies, and in particular, to a conversation processing method and device.

Background

With the development of intelligent terminals and AI (Artificial Intelligence) technology, human-computer interaction Systems (conversation Systems) have become a new generation of human-computer interaction mode. Currently, various human-machine dialog systems have been widely used by the industry for various types of product services.

Meanwhile, with the continuous development of AI technology, people also put higher demands on the performance of the man-machine dialog system.

Disclosure of Invention

In order to improve the performance of a man-machine conversation system, the embodiment of the disclosure provides a conversation processing method, a conversation processing device, an electronic device and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a dialog processing method, including:

obtaining dialogue inquiry data, wherein the dialogue inquiry data comprises an inquiry character sequence;

determining a candidate character set of the current character to be generated based on the query character sequence and a reply character sequence generated before the current character to be generated;

according to the predetermined target character features, the determined target characters in the candidate character set are used as the current characters to be generated;

and updating the reply character sequence based on the target character until all characters in the reply character sequence are generated, and obtaining target reply data.

In some embodiments, the determining the candidate character set of the character to be generated based on the query character sequence and a reply character sequence generated before the character to be generated comprises:

inputting the query character sequence into a pre-trained dialog generation network to obtain candidate characters of the current character to be generated output by the dialog generation network and first weights corresponding to the candidate characters;

and sequencing the first weight values from high to low, and determining the candidate character set consisting of candidate characters corresponding to the first weight values in a preset number.

In some embodiments, the taking the target character determined in the candidate character set as the current character to be generated according to the predetermined target character feature includes:

for each candidate character in the candidate character set, combining the candidate character with the reply character sequence generated before the current character to be generated to obtain a candidate sequence;

inputting each candidate sequence into a pre-trained target character discrimination network to obtain a second weight corresponding to each candidate sequence output by the target character discrimination network; wherein the target personality discrimination network is a network predetermined from a plurality of personality discrimination networks;

fusing the first weight and the second weight corresponding to each candidate character to obtain a third weight corresponding to each candidate character;

and according to the third weight, taking the determined target character in the candidate character set as the current character to be generated.

In some embodiments, obtaining the target reply data until all the characters in the reply character sequence are generated completely includes:

and in response to the fact that the current character to be generated comprises the terminator, determining that all the characters in the reply character sequence are generated completely, and obtaining target reply data.

In some embodiments, the training process for each of the personality discriminating networks is as follows:

acquiring first dialogue data, wherein the first dialogue data comprises dialogue inquiry data, dialogue reply data and character data corresponding to the dialogue reply data;

inputting dialogue reply data of the first dialogue data into the character discrimination network to obtain a prediction character output by the character discrimination network;

and adjusting the network parameters of the character discrimination network according to the difference between the predicted characters and the character data until a convergence condition is met.

In some embodiments, the training process of each of the personality discriminating networks further includes:

sampling the dialogue reply data of the first dialogue data to obtain incomplete dialogue data containing the incomplete dialogue reply data;

inputting the incomplete dialogue reply data contained in the incomplete dialogue data into the character judging network to obtain a predicted character output by the character judging network;

and adjusting the network parameters of the character discrimination network according to the difference between the predicted characters and the character data of the incomplete dialogue data until a convergence condition is met.

In some embodiments, training session data for training the session generation network and/or the personality discrimination network is obtained by:

for each character discrimination network, the target reply data obtained in the current round is used as the dialogue inquiry data of the next round, and the process of obtaining the target reply data is executed in a circulating iteration mode until the convergence condition is met;

obtaining first training dialogue data according to each group of dialogue inquiry data and target reply data generated in the loop iteration process;

and obtaining second training dialogue data according to each group of dialogue inquiry data, the target reply data and the character data corresponding to the character discrimination network, which are generated in the loop iteration process.

In a second aspect, an embodiment of the present disclosure provides a dialog processing apparatus, including:

an acquisition module configured to acquire dialog query data, the dialog query data including a query character sequence;

a first determining module, configured to determine a candidate character set of a current character to be generated based on the query character sequence and a reply character sequence generated before the current character to be generated;

the second determining module is configured to take the determined target character in the candidate character set as the current character to be generated according to a predetermined target character feature;

and the updating module is configured to update the reply character sequence based on the target character until all characters in the reply character sequence are generated, so as to obtain target reply data.

In some embodiments, the first determination module is specifically configured to:

In some embodiments, the second determination module is specifically configured to:

In some embodiments, the apparatus of the present disclosure further comprises a network training module configured to:

In some embodiments, the network training module is configured to:

In some embodiments, the apparatus of the present disclosure further comprises a training data acquisition module configured to:

In a third aspect, the disclosed embodiments provide an electronic device, including:

a processor; and

a memory storing computer instructions readable by the processor, the processor performing the method according to any of the embodiments of the first aspect when the computer instructions are read.

In a fourth aspect, the disclosed embodiments provide a storage medium for storing computer-readable instructions for causing a computer to perform the method according to any one of the embodiments of the first aspect.

The conversation processing method comprises the steps of obtaining conversation inquiry data, wherein the conversation inquiry data comprises an inquiry character sequence, determining a candidate character set of a current character to be generated based on the inquiry character sequence and a reply character sequence generated before the current character to be generated, taking a target character determined in the candidate character set as the current character to be generated according to a predetermined target character characteristic, updating the reply character sequence based on the target character until all characters in the reply character sequence are generated, and obtaining target reply data. In the embodiment of the disclosure, the reply data corresponding to the target character can be output by fusing the target character features, the performance of man-machine conversation is improved, and the universal attribute of the character is adopted, so that compared with the character attribute which is set by people, the application scene of the man-machine conversation is wider, and the practicability is stronger. Meanwhile, when the reply character sequence of the target reply data is generated, each reply character is generated one by one based on the target character characteristics, the calculated amount of the target reply data is smaller, and the efficiency and the performance of man-machine conversation can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flow diagram of a conversation processing method in some embodiments according to the present disclosure.

FIG. 2 is a block diagram of a human-machine dialog system in accordance with some embodiments of the present disclosure.

Fig. 3 is a flow diagram of a conversation processing method in some embodiments according to the present disclosure.

FIG. 4 is a schematic diagram of a dialog processing method in some embodiments according to the present disclosure.

Fig. 5 is a flow diagram of a conversation processing method in some embodiments according to the present disclosure.

Fig. 6 is a flow diagram of a conversation processing method in some embodiments according to the present disclosure.

Fig. 7 is a flow diagram of a conversation processing method in some embodiments according to the present disclosure.

Fig. 8 is a flow diagram of a conversation processing method in some embodiments according to the present disclosure.

Fig. 9 is a flow diagram of a conversation processing method in some embodiments according to the present disclosure.

Fig. 10 is a block diagram of a dialog processing device according to some embodiments of the present disclosure.

FIG. 11 is a block diagram of an electronic device suitable for implementing the disclosed method.

Detailed Description

The technical solutions of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure. In addition, technical features involved in different embodiments of the present disclosure described below may be combined with each other as long as they do not conflict with each other.

At present, human-computer interaction Systems (conversation Systems) are widely applied to the lives of people, such as intelligent customer service, mobile phone voice assistants and the like, users can implement business handling, IoT device control and the like through the human-computer interaction Systems, and the human-computer interaction Systems become a new generation of human-computer interaction mode.

With the continuous development of AI technology, people are not limited to using the man-machine interaction system to realize business functions, but more hope that the man-machine interaction system can communicate like a real person. For example, the intelligent voice assistant of the millet mobile phone is 'love classmates', more and more users hope that the 'love classmates' can carry out natural conversation like real people, so that the interest can be increased, and the users can really feel accompanying.

In order to improve the personalization of the system reply, the man-machine dialog system in the related art often trains a plurality of different dialog generation networks in advance based on the character dialog style, for example, a dialog generation network including the character style may be trained in advance based on the dialog materials such as Sunwukong and Guo Dege, so that when the user selects the Sunwukong dialog style, the man-machine dialog system may output the shape of me! "reply-like" makes the conversation more interesting.

However, in an actual conversation scene, the system only simulates the speaking style of a fixed character according to the selection of a user, and the practicability is not high and the limitation is large. More importantly, as the system needs to have various character styles for the user to select, a plurality of dialogue generating networks need to be trained in advance, and the parameter quantity of each generating network is extremely large (for example, parameters in the order of billions), so that the training difficulty and magnitude are also large, and the system is not suitable for landing of industrial scenes.

Based on the above-mentioned drawbacks in the related art, the embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a storage medium for processing a dialog, so as to enhance the performance of a human-computer dialog, reduce the training complexity of a dialog system, and facilitate landing deployment.

In a first aspect, the disclosed embodiments provide a dialog processing method, which may be applied to a terminal device including a man-machine dialog system, and implement a user dialog with the terminal device. The terminal device may be, for example, a smart phone, a tablet computer, a smart wearable device, a personal computer, a handheld terminal, or the like. The present disclosure is not so limited.

As shown in fig. 1, in some embodiments, a dialog processing method of an example of the present disclosure includes:

and S110, acquiring dialogue inquiry data.

In particular, one complete dialog data, which may include dialog query data (query) and dialog reply data (response), for example, one dialog data may be < at samma? And want you. >, "at the trunk? "is the dialogue inquiry data," think you. "is the dialog reply data.

It is to be understood that the dialogue query data is not necessarily a grammatical query, and data of an arbitrary sentence pattern may be used as the dialogue query data. The dialog reply data need not be a grammatical statement, as long as semantically a coherent dialog can be formed with the dialog query data.

In the disclosed embodiment, the obtained dialogue inquiry data can be voice information, such as the voice of the user speaking picked up by the dialogue system through a microphone; the dialog query data may also be a text message, such as a text message entered by the user and captured by the dialog system via a text box, which is not limited by this disclosure. When the dialogue inquiry data is voice information, the system can perform semantic recognition on the voice information and convert the voice information into text characters, which can be understood by those skilled in the art, and the details of the disclosure are not repeated.

The dialogue query data includes a query character sequence, which refers to a data sequence formed by a plurality of characters in a semantic order. In one example, one dialog query data is "Do you refer tea," where the dialog query data includes four characters "Do", "you", "refer", "tea," which constitute a query character sequence in semantic order.

It is understood that the characters described in the embodiments of the present disclosure may be any type of characters, such as one or more of chinese characters, letters, numbers, punctuation marks, and operation symbols, and the present disclosure does not limit the same.

And S120, determining a candidate character set of the current character to be generated based on the query character sequence and a reply character sequence generated before the current character to be generated.

In the embodiment of the present disclosure, after the dialog query data is acquired, not an entire character sequence of the target reply data is generated at one time from the dialog query data, but each character in the reply sequence is generated one by one. The method has the advantages that the target reply data can be guided by using the character feature information better, decoupling between the dialogue generation network and the character discrimination network is facilitated, and network training is facilitated.

Specifically, in some embodiments, the query character sequence of the obtained conversational query data is represented as: x ═ X₁，x₂，......，x_nWhere n denotes the sequence length of the query character sequence, i.e. the number of characters. Assume that the reply character sequence of the generated reply-to-destination data is Y ═ { Y ═ Y₁，y₂，......，y_mWhere m denotes the sequence length, i.e. the number of characters, of the reply character sequence.

Current character Y to be generated in generating reply character sequence Y_iAccording to the query character sequence X and the current character y to be generated_iThe sequence of reply characters y that has been generated previously₁，y₂，......，y_i-1Determining the current character y to be generated_iThe candidate character set of (2).

In one example, assume that the current character to be generated, y_iThe sequence of reply characters that has been previously generated is { Do, you }, i.e., includes a total of two generated characters. Thus, based on the generated reply character sequence and query character sequence X, the current character y to be generated can be determined_iThe candidate character set of (Z ═ can, refer, … …, thus), and the character represented by each element in the set can be semantically combined with "Do you" into a coherent sentence.

In some embodiments, a dialog generation network may be used to obtain a candidate character set, where an input of the dialog generation network is a query character sequence, and an output is a candidate character set of a current character to be generated. The following embodiments of the present disclosure specifically describe the network structure, principles, and training process, and will not be described in detail here.

And S130, according to the predetermined target character features, taking the target characters determined in the candidate character set as the characters to be generated currently.

It should be noted that in the embodiments of the present disclosure, the character features are fused when generating the dialogue reply data. The personality characteristics of the embodiment of the present disclosure are distinguished from the characteristics of the speaking style (personal), the role attribute, and the like of the person: the attribute information is not dependent on the individual attribute information, but is general attribute information as a whole.

For example, the speaking style of a person (i.e. a personal device) only concerns the speaking mode of a specific character, which trains a conversation network conforming to the personal device of the character according to the conversation data of the fixed character, so that the conversation system can imitate the speaking style of the character. Similarly, the character attribute only concerns the attribute information of interest and hobbies, character characters and the like of a specific character, so that a conversation network conforming to the character attribute is trained according to the conversation data of the character, and a conversation system can cater to the speaking style of the character.

However, the personality characteristics of the disclosed embodiments are different from those described above, and the disclosed embodiments focus on general personality attributes, such as optimistic, passive, and stickers, which do not focus on a specific character, but rather consider a wide range of personality characteristics.

In some embodiments, a plurality of character discrimination networks of different characters can be trained in advance through dialogue data containing character feature information, and each character discrimination network can represent a character, such as optimistic, passive, body sticker and the like. The following embodiments of the present disclosure will be described in detail, and will not be described in detail here.

The predetermined target character feature may be a target character discrimination network selected by the user from a plurality of character discrimination networks in advance. For example, if the user prefers the dialog system to have the character of "sticker", the character of "sticker" may be selected so that the system sets the character discrimination network corresponding to "sticker" as the target character discrimination network. The target character of the dialog system is selected in advance by the user without limitation, and can be set according to needs, for example: and sending the target character of the selected dialog system in a voice or APP interactive mode.

Still referring to the foregoing example, assume that the current character y to be generated_iIs Z ═ where each candidate character in the set of candidate characters can be associated with a previously generated reply character sequenceThe column { Do, you } constitutes a new sequence of reply characters, which may include, for example: { Do, you, want }, { Do, you, prefer }, … …, { Do, you, thus }.

By fusing the target character features, which reply character sequence meets the target character features in the new reply character sequences can be determined, the candidate character corresponding to the reply character sequence is the target character meeting the target character features, and the target character can be determined as the current character y to be generated_i。

For example, each new reply character sequence may be input into the target character recognition network, and the result output by the target character recognition network is { Do, you, refer }, which indicates that the reply character sequence most conforms to the character desired by the user, that is, the target character "refer" in the candidate character set of Z ═ can be determined as the character y to be generated currently_i。

And S140, updating the reply character sequence based on the target character until all characters in the reply character sequence are generated, and obtaining target reply data.

The current character y to be generated is determined in S130_iAfter the corresponding character, y can be utilized_iThe reply character sequence is updated, i.e., the previously generated reply character sequence y₁，y₂，......，y_i-1Is updated to { y }₁，y₂，......，y_i-1，y_i}。

In generation of y_iAfter the corresponding character, S120 and S130 may be executed in a loop based on the updated sequence of reply characters y₁，y₂，......，y_i-1，y_iAnd query character sequence, continue to generate y_i+1The corresponding character. Those skilled in the art can understand and realize the method by referring to the steps, and the details of the disclosure are not repeated.

And for each character in the reply character sequence, sequentially and circularly executing the steps until the reply character sequence Y is { Y ═ Y₁，y₂，......，y_mCompleting the generation of all characters in the Chinese character to obtain the target characterThe symbol sequence Y is the target reply data.

In some embodiments, the target reply data may be obtained by detecting whether the current character to be generated includes a terminator, and if the current character to be generated includes a terminator, determining that generation of all characters in the reply character sequence is completed.

It can be understood that, in the dialog processing method according to the embodiment of the present disclosure, the generated target reply data is fused with the target personality characteristics predetermined by the user, such as the personality of optimistic or body sticker, so that the reply data corresponding to the target personality can be output, the performance of the human-computer dialog is improved, and by adopting the universal attribute of the personality, compared with the human-computer interaction in which the attribute of the uniform color is set, the application scenario of the human-computer dialog of the present disclosure is wider, and the practicability is stronger.

Meanwhile, when the reply character sequence of the target reply data is generated, each reply character is generated one by one based on the target character characteristics, the calculated amount of the target reply data is smaller, and the efficiency and the performance of man-machine conversation are improved. In addition, the personality-discriminating network guides the dialog generation network to generate the target reply data, and the two networks are independent from each other, so that the difficulty of network training is reduced, which is specifically described below.

Fig. 2 shows a network structure of a human-computer conversation system in an example of the present disclosure, and a conversation processing method in the example of the present disclosure is specifically described below with reference to fig. 2.

As shown in FIG. 2, in some embodiments, the human-machine dialog system 100 of the disclosed example employs a structured generation architecture. Specifically, the human-machine dialog system 100 includes a dialog generation network 110 and a plurality of personality discrimination networks 120.

The dialog generation network 110 may employ a base-dialog generator (base-generator) that inputs dialog query data and outputs a set of candidate characters that are the characters currently to be generated.

The plurality of character recognition networks 120 are pre-trained discriminator networks, wherein each character recognition network 120 may correspond to a character, for example, the plurality of character recognition networks may correspond to characters such as "optimistic", "passive", "sticker", and the like. The input of the character discrimination network is a reply character sequence formed by each candidate character in the candidate character set, and the probability of each candidate character sequence is output.

Based on the system structure shown in fig. 2, fig. 3 and 4 show some embodiments of the session processing method of the present disclosure, which are described in detail below with reference to fig. 3 and 4.

As shown in fig. 3, in some embodiments, a dialog processing method of an example of the present disclosure includes:

s310, inputting the query character sequence into the pre-trained dialog generation network to obtain candidate characters of the current character to be generated output by the dialog generation network and first weight values corresponding to the candidate characters.

S320, sorting the first weight values from high to low, and determining a candidate character set consisting of candidate characters corresponding to the first weight values of the preset number.

In particular, in some embodiments, the dialog generation network may be a base dialog generator, pre-trained with dialog data. The following embodiments of the present disclosure will specifically describe the training process of the dialog generation network, and will not be described in detail here.

In the disclosed embodiment, the inputs to the dialog generation network 110 are a query character sequence and a reply character sequence generated before the current character to be generated. In one example, the query character sequence is X and the current character to be generated is y₃Currently to be generated character y₃The previously generated reply character sequence is Y' ═ { Y ═ Y₁，y₂The inputs to the dialog input network 100 are X and Y'.

As in the example of fig. 4, assume that the reply character sequence is Y' ═ { Y ═ Y₁，y₂The specific character sequence corresponding to is Y' ═ { Do, you } pair, i.e. Y₁The character represented is "Do", y₂The character represented is "you".

In this example, the dialog generation network 110 predicts and outputs candidate characters of the character to be generated currently and first weight values corresponding to the candidate characters according to the query character sequence of X and the reply character sequence of Y'.

For example, in the example of fig. 4, each candidate character output by the dialog generation network 110 includes: wan, prefer and thus … …, and the first weight value corresponding to each candidate character is:<want，0.3>、<prefer，0.3>、<thus，0.1>… … are provided. Wherein the content of the first and second substances,<want，0.3>indicating the current character y to be generated₃The probability of want is 0.3, i.e. the first weight corresponding to want is 0.3. For the same reason, the present disclosure will not be described herein again.

In the embodiment of the disclosure, the candidate characters are sorted according to the first weight value from high to low, and the candidate characters corresponding to the first weight value with a preset number before are determined to form a candidate character set.

For example, in the example of fig. 4, the candidate characters are sorted from high to low according to the first weight, and it can be understood that the more top candidate character is sorted, the higher the probability that the candidate character is the current character to be generated is represented. Therefore, candidate characters corresponding to a preset number of first weights ranked in the Top are screened out, for example, only Top-3 candidate characters, wan, prefer, and thus are combined into a candidate character set in fig. 3, that is, the candidate character set Z ═ is (wan, prefer, and thus).

Those skilled in the art will appreciate that the preset number may be selected according to specific requirements of a scene, and is not limited to the Top-3 candidate characters, but may be any other number of candidate characters, which is not limited by the present disclosure.

Therefore, in the embodiment of the present disclosure, a candidate character set is formed by determining a preset number of candidate characters according to the first weight of each candidate character, so as to reduce the amount of calculation data and improve the system operation speed.

As shown in fig. 5, in some embodiments, the dialog processing method of the present disclosure further includes:

s510, for each candidate character in the candidate character set, combining the candidate character with a reply character sequence generated before the current character to be generated to obtain a candidate sequence.

S520, inputting each candidate sequence into the pre-trained target character discrimination network to obtain a second weight corresponding to each candidate sequence output by the target character discrimination network.

Specifically, the target character discrimination network 120 is a network predetermined from a plurality of character discrimination networks. For example, in the system architecture shown in fig. 2, a plurality of character recognition networks may be trained in advance, where each character recognition network corresponds to one character, and when a user needs a dialog system with a certain character, the corresponding character may be selected in advance, so that the system may determine a character recognition network corresponding to the character selected by the user, that is, a target character recognition network, from the plurality of character recognition networks.

The following embodiments of the present disclosure are specifically described for the specific training process of the personality-discriminating network, and will not be described in detail here.

For example, in the example of fig. 4, the obtained current character y to be generated₃Is (wait, prefer, thus). Each element in the candidate character set Z may form a new character sequence, i.e., a candidate sequence, with the reply character sequence Y'. For example, the element "wait" in the candidate character set Z may be combined with the reply character sequence Y' to form a new character sequence "Do you wait"; the element "prefer" in the candidate character set Z may be combined with the reply character sequence Y' to form a new character sequence "Do you prefer"; the element "thus" in the candidate character set Z may be combined with the reply character sequence Y' to form a new character sequence "Do you thus". Thus, the obtained candidate sequence may include (Do you want, Do you prefer, Do you thus).

It will be appreciated that the target personality discrimination network may be a two-class classifier whose purpose is to determine whether the input dialog data is data of the target personality. Therefore, in the embodiment of the present disclosure, each candidate sequence is input into the target character discrimination network, so as to obtain the second weight of each candidate sequence. Wherein the second weight may represent a probability that the candidate sequence is the target character.

For example, as shown in fig. 4, the second weight value corresponding to the candidate sequence "Do you want" output by the target character recognition network 120 is 0.4, the second weight value corresponding to the candidate sequence "Do you prefer" is 0.8, and the second weight value corresponding to the candidate sequence "Do you thus" is 0.9. It can be understood that the higher the second weight value is, the higher the probability that the candidate sequence corresponding to the second weight value is the target character is.

S530, the first weight and the second weight corresponding to each candidate character are subjected to fusion processing, and a third weight corresponding to each candidate character is obtained.

And S540, according to the third weight, taking the determined target character in the candidate character set as the current character to be generated.

In the embodiment of the present disclosure, after obtaining the first weight and the second weight of each candidate character, it can be understood that the first weight represents a probability that the candidate character is a current character to be generated, and the second weight represents a probability that the candidate character is a target character, so that the first weight and the second weight are fused to obtain a third weight corresponding to each candidate character, where the third weight represents a probability that the candidate character is a current character to be generated on the basis of fusing the target character features.

The specific process of the fusion process may be, for example, in the example shown in fig. 4, multiplying the first weight corresponding to each candidate character by the second weight to obtain a third weight. For example, the first weight 0.3 of the candidate character "wait" is multiplied by the second weight 0.4 to obtain a third weight 0.12; for another example, the first weight 0.3 of the candidate character "prefer" is multiplied by the second weight 0.8 to obtain a third weight 0.24; for another example, the first weight 0.1 of the candidate character "thus" is multiplied by the second weight 0.9 to obtain a third weight 0.09.

It should be noted that fig. 4 is only an exemplary implementation of the method of the present disclosure, and in other embodiments, the first weight and the second weight may be fused in other manners, which is not limited in the present disclosure.

After the third weight of each candidate character is obtained, the candidate character corresponding to the largest third weight can be determined as the target character, that is, the character to be generated currently. For example, in the example of fig. 4, the third weight value corresponding to the candidate character "prefer" is the largest, so that the candidate character "prefer" is determined as the target wordCharacter, i.e. the character y currently to be generated₃。

It should be noted that, as can be seen from the principle shown in fig. 4, in the embodiment of the present disclosure, in the process of generating the reply character sequence of the target reply data, the target character features are fused to assist the generation of the reply character, and the dialog generation network 110 and the target character recognition network 120 are independent network structures, for example, in the structure shown in fig. 4, even if the target character recognition network 120 is removed, the dialog generation network 110 can also be used as a basic dialog generation, so that the two networks do not need to be trained jointly, can be trained independently, the robustness of the system is stronger, and when more characters are added for training, only the character recognition network needs to be trained, thereby reducing the training task amount.

Continuing with the example of FIG. 4, upon determining that the current character y is to be generated₃After being "ferer", the character y may be generated based on the current character to be generated₃For the reply character sequence Y' ═ Y₁，y₂The updated reply character sequence may be denoted as Y ═ Y₁，y₂，y₃And the corresponding specific character sequence is Y ═ Do, you, prefer }. The updated reply character sequence is then input back to the next character (y)₄) In the dialog generating network 110 in the determination process, thereby assisting y₄And generating characters. The specific process is similar to the above, and those skilled in the art can perform the above steps in a loop manner, which is not described again. Until the reply character sequence Y ═ Y₁，y₂，......，y_mAnd finishing the generation of all characters in the data, wherein the obtained reply character sequence Y is the target reply data.

Specifically, in some embodiments, determining that generation of all characters in the reply character sequence is complete may include:

and in response to the detection that the current character to be generated comprises the terminator, determining that all the characters in the reply character sequence are completely generated, and obtaining target reply data.

Specifically, a terminator may be set at the end of the dialog reply data, so that each generated character is detected during the generation of the character in the reply character sequence, and when the terminator is detected, it indicates that all the characters in the reply character sequence are completely generated, thereby obtaining complete target reply data.

The above description is made on the system structure and the working principle of the dialog processing method according to the embodiment of the present disclosure, and on the basis of the system structure shown in fig. 2, the dialog processing method according to the present disclosure further includes a network training process, which is specifically described below.

Based on the system structure shown in fig. 2, the network training process of the dialog system can be divided into two parts: a training process for the dialog generation network 110, and a training process for the individual personality discrimination networks 120. These two processes will be described in detail below.

As shown in fig. 6, in some embodiments, the training process of the dialog generation network includes:

s610, second dialogue data is obtained.

In particular, the second dialog data may be a generic dialog corpus, i.e. any corpus pair comprising dialog query data (query) and dialog reply data (response).

In some embodiments, the source of the second dialog data may be various open-source dialog corpuses on the network.

In other embodiments, the second dialogue data can be obtained by using network self-learning based on a network reinforced learning mode. The present disclosure is explained below, and will not be described in detail here.

And S620, inputting the dialogue inquiry data of the second dialogue data into the dialogue generating network to obtain the prediction reply data output by the dialogue generating network.

In particular, the dialog generating network may be a base-dialog generator. In embodiments of the present disclosure, a generator network such as autoregressive may be employed, i.e., generating the reply character sequence step by step, rather than generating the entire reply character sequence at once, thereby facilitating the fusion of target character features in the reply character.

In one example, the dialog generation network of the disclosed embodiments employs a GPT (Generative Pre-Training) network, wherein the principle of the GPT network is as follows:

the network input is a query character sequence: x ═ X₁，x₂，......，x_n}；

The network output is a reply character sequence: y ═ Y₁，y₂，......，y_m}；

Where n and m represent the length of the character sequence, i.e. the number of characters.

The probability P (Y) that the dialog generating network outputs Y given a query character sequence X is expressed as:

wherein, y_iRepresenting the current character to be generated, y_1:i-1Representing a sequence of reply characters generated before the current character to be generated. The specific principle is the same as that of the above-mentioned one-by-one generation of the reply characters, and those skilled in the art can understand and fully implement the above-mentioned embodiments, and the detailed description of the disclosure is omitted.

Therefore, when the dialogue generation network is trained, dialogue query data (query) in the second dialogue data is input into the dialogue generation network to be trained, and therefore prediction reply data output by the dialogue generation network can be obtained through the process.

S630, according to the difference between the prediction reply data and the dialogue reply data, adjusting the network parameters of the dialogue generating network until the convergence condition is met.

Specifically, the prediction reply data represents prediction data output by the dialog generation network to be trained, and the dialog reply data (response) represents real data corresponding to the dialog query data (query) in the second dialog data. Therefore, network parameters of the dialogue generating network can be adjusted and optimized according to the difference, namely loss, between the prediction reply data and the dialogue reply data until the convergence condition of network training is met, and the trained dialogue generating network is obtained.

As shown in fig. 7, in some embodiments, the training process of each personality discriminating network includes:

and S710, acquiring first dialogue data.

Specifically, the first session data includes session query data (query), session reply data (response), and personality data (personality) corresponding to the session reply data, that is, the behavior of each first session data is represented as (query, response, personality).

In one example, the first session data may be (query: how do you not happy, response: uhal | life is not all nine times, after a rain and wind there is a rainbow you | and personal: optimal/optimistic)

In another example, the first dialogue data may be (query: do not care how, response: how, can say a kind with I, I is a very good listener! perspective: caring/caring for others)

It can be seen from the above examples that, for the same dialogue inquiry data, the generated dialogue reply data are different under different characters, which also illustrates the change of the universal character, not only can increase the interest of the dialogue, but also can make the man-machine dialogue closer to the scene of the real person dialogue, and has a wider use scene compared with the simple simulation of the speaking style of a certain person.

S720, inputting the dialogue reply data of the first dialogue data into the character discrimination network to obtain the prediction character output by the character discrimination network.

Specifically, each of the first dialog data has corresponding personality data (personality) for the dialog reply data (response), which is real data of the dialog reply data.

In some embodiments, the personality discriminating network, i.e., the personality classifier (e.g., a binary classifier), outputs whether the dialog reply data conforms to a certain personality given the input dialog reply data. Therefore, the dialogue reply data of the first dialogue data is input into the character discrimination network to be trained, and the predicted character output by the character discrimination network can be output, wherein the predicted character is the predicted data output by the character discrimination network.

And S730, adjusting the network parameters of the character discrimination network according to the difference between the predicted characters and the character data until a convergence condition is met.

Specifically, the predicted character represents predicted data output by the character discrimination network to be trained, and the character data (personality) represents real data corresponding to the dialogue reply data (response) in the first dialogue data. Therefore, network parameters of the character discrimination network can be adjusted and optimized according to the difference between the predicted characters and the character data, namely loss until the convergence condition of network training is met, and the trained character discrimination network is obtained.

It is to be understood that the above description has been made on the training process of one character recognition network, and the training process of a plurality of character recognition networks included in the human-machine interactive system may be performed with reference to the above description, so that a plurality of character recognition networks corresponding to different characters, for example, character recognition networks corresponding to characters such as optimistic, passive, stickers, etc., can be obtained. And training the personality judging network according to the needed personality.

Therefore, in the embodiment of the disclosure, the man-machine conversation system adopts a 1+ N (1 conversation generation network + N personality lattice judgment network) distributed system structure, the conversation generation network and the personality lattice judgment network can be decoupled and trained independently, so that when a new personality lattice judgment network is added subsequently, the new personality lattice judgment network can be trained independently to be connected to the conversation generation network, the training of the joint conversation generation network is not needed, the training difficulty is reduced, and the seamless capacity expansion of the network structure is facilitated.

In addition, compared with the training of a plurality of dialogue generating networks with different character characteristics, the distributed network structure greatly reduces the workload of network training. This is because the dialog generation network is generally a network with a large parameter amount, and the network parameters can even reach hundreds of millions. However, in the embodiment of the disclosure, only 1 dialog generation network needs to be trained, and the character discrimination network is a discriminator structure, and the network parameters are greatly reduced compared with the dialog generation network, and are generally only ten thousand or million levels, so that the workload of network training can be greatly reduced, and the industrial floor deployment is more facilitated.

As can be seen from the principle shown in fig. 4, for the personality determination network, it needs to determine whether the personality determination network belongs to the target personality according to the incomplete session reply data, so if the personality determination network is trained only according to the complete session reply data, there may be a problem of poor network generalization capability, which may result in poor performance of the dialog system.

Therefore, in some embodiments, as shown in fig. 8, the training process of the character recognition network of the present disclosure further includes:

and S810, sampling the dialogue reply data of the first dialogue data to obtain the incomplete dialogue data containing the incomplete dialogue reply data.

Specifically, the incomplete dialog reply data refers to a semantically incomplete dialog reply sequence, and the purpose of sampling the dialog reply data (response) of the first dialog data is: when the reply character sequence is generated, the character feature needs to be determined based on the incomplete sequence generated before, namely, the character discrimination network has the potential possibility of discriminating whether a malformed reply sequence has a certain character. Therefore, by sampling the dialogue reply data in the first dialogue data, an incomplete sequence can be obtained as training data, namely incomplete dialogue data.

For example, a complete first session data includes: (query: { x)₁，x₂，......，x_n}，response：{y₁，y₂，......，y_m}, personalitiy: p). Thus, the incomplete dialog data obtained after the sampling process may include:

(query：{x₁，x₂，......，x_n}，response：{y₁，y₂}，personality：P)

(query：{x₁，x₂，......，x_n}，response：{y₁，y₂，y₃}，personality：P)

(query：{x₁，x₂，......，x_n}，response：{y₁，y₂，y₃，y₄}，personality：P)

……

(query：{x₁，x₂，......，x_n}，response：{y₁，y₂，y₃，y₄，……，y_m}，personality：P)

it can be seen that the dialogue reply data (response) included in the malformed dialogue data is a plurality of malformed dialogue reply data.

And S820, inputting the incomplete dialogue reply data included in the incomplete dialogue data into the character discrimination network to obtain the predicted character output by the character discrimination network.

And S830, adjusting network parameters of the character discrimination network according to the difference between the character data of the predicted character and the character data of the incomplete dialogue data until a convergence condition is met.

Specifically, the processes of S820 to S830 are similar to those of S720 to S730, and those skilled in the art can understand and fully implement the processes with reference to the foregoing descriptions, and the details of the disclosure are not repeated herein.

Therefore, in the embodiment of the disclosure, the character discrimination network is trained by using the incomplete dialog reply data, so that the character discrimination network can better identify the target character when generating each reply character in an auxiliary manner, and the performance of the man-machine dialog system is improved.

In some embodiments, in the embodiments of the present disclosure, a training mode of reinforcement learning is adopted for the human-computer dialog system, so that training data can be expanded based on the system itself, and then the dialog system is further trained by using the training data, thereby enhancing the system performance. This will be explained with reference to fig. 9.

As shown in fig. 9, in some embodiments, the dialogue data for training the dialogue generation network and the personality determination network is obtained by:

s910, for each character discrimination network, the target reply data obtained in the current round is used as the dialogue inquiry data of the next round, and the process of obtaining the target reply data is executed in a circulating iteration mode until the convergence condition is met.

It is understood that, for example, the human-computer dialog system shown in fig. 2, which includes a plurality of character discrimination networks, in order to obtain more comprehensive training data, the character discrimination networks of the human-computer dialog system may be sequentially set as target character discrimination networks, so as to obtain training data under each character discrimination network.

Taking the character judging network corresponding to the 'body sticker' as an example, the target character of the man-machine conversation system can be set as the 'body sticker', then the target reply data obtained in the current round is used as the conversation inquiry data in the next round, and the conversation process is executed in a circulating iteration mode, so that the conversation data under the 'body sticker' character can be obtained.

In one example, the dialog data generated by the human-machine dialog system is shown in the following table one:

watch 1

As shown in the above table one, the dialog reply data of each dialog data is used as the dialog query data of the next dialog, so that the process is iterated circularly until the convergence condition is met and the stop is reached, and a plurality of dialog data generated by the system can be obtained.

It is to be understood that the above example is the dialogue data generated for the "body sticker" character, and the dialogue data generated for other characters can be obtained by switching the target character recognition network to another character recognition network and repeating the above process, and the dialogue data is saved, i.e. can be used for subsequent network training.

In some embodiments, the above-mentioned reinforcement learning process may be iteratively performed by a system loop, and complete training data is obtained by switching the target character determination network of the system.

In other embodiments, the reinforcement learning process may be implemented by two dialog systems in dialog with each other. Specifically, the dialog reply data generated by the system 1 can be used as the dialog query data of the system 2, so that the dialog reply data generated by the system 2 can be used as the dialog query data of the system 1 again, and the process is repeated in a circulating way until the convergence condition is met and stops. Meanwhile, the target character discrimination networks of the two dialogue systems can be switched, so that the whole amount of dialogue data can be obtained.

It can be understood that both of the above two embodiments can achieve the purpose of enhancing learning of the embodiments of the present disclosure, and the mutual conversation of the two conversation systems is closer to the real-person conversation scene, which also can relatively improve the reliability of the conversation data.

S920, obtaining first training dialogue data according to each group of dialogue inquiry data and target reply data generated in the loop iteration process.

Specifically, the first training session data is training data of a training session generation network, and since the session generation network itself does not focus on character features, the first training session data can be formed by performing one-to-one correspondence combination on session query data and target reply data in the generated session data.

For example, in the example of table one, the first training session data may be generated by: (query: do you

And S930, obtaining second training dialogue data according to each group of dialogue inquiry data, target reply data and character data corresponding to the character discrimination network generated in the loop iteration process.

Specifically, the second training session data is training data for training the personality determination network, and since the role of the personality determination network is to merge personality characteristics, the second training session data focuses not only on the session query data and the target reply data, but also on the personality data, so that the second session data can be formed by performing one-to-one corresponding combination of the session query data, the target reply data and the personality data in the generated session data.

For example, in the example of table one, the second training session data may be generated by: (query: how good, response: i fine, you woolen

Therefore, in the embodiment of the present disclosure, the reinforcement learning method is utilized to enable the system to generate the training data, and the training data can be trained on the current network reversely, thereby improving the performance of the man-machine conversation system and reducing the requirement of data collection in scale.

In a second aspect, the disclosed embodiments provide a dialog processing apparatus, which can be applied to a terminal device including a man-machine dialog system, and implement a user dialog with the terminal device. The terminal device may be, for example, a smart phone, a tablet computer, a smart wearable device, a personal computer, a handheld terminal, or the like. The present disclosure is not so limited.

As shown in fig. 10, in some embodiments, a dialog processing device of an example of the present disclosure includes:

an acquisition module 101 configured to acquire dialog query data, the dialog query data including a query character sequence;

a first determining module 102, configured to determine a candidate character set of a character to be generated currently based on a query character sequence and a reply character sequence generated before the character to be generated currently;

the second determining module 103 is configured to take the determined target character in the candidate character set as a current character to be generated according to a predetermined target character feature;

and the updating module 104 is configured to update the reply character sequence based on the target character until all the characters in the reply character sequence are generated, so as to obtain target reply data.

In some embodiments, the first determination module 102 is specifically configured to:

inputting the query character sequence into a pre-trained dialog generation network to obtain each candidate character of the current character to be generated output by the dialog generation network and a first weight corresponding to each candidate character;

and sequencing the first weight values from high to low, and determining a candidate character set consisting of candidate characters corresponding to the first weight values of the preset number.

In some embodiments, the second determination module 103 is specifically configured to:

for each candidate character in the candidate character set, combining the candidate character with a reply character sequence generated before the current character to be generated to obtain a candidate sequence;

inputting each candidate sequence into a pre-trained target character discrimination network to obtain a second weight corresponding to each candidate sequence output by the target character discrimination network; wherein the target personality discriminating network is a network predetermined by a plurality of personality discriminating networks;

inputting the dialogue reply data of the first dialogue data into a character discrimination network to obtain a prediction character output by the character discrimination network;

In some embodiments, the network training module is configured to:

inputting incomplete dialogue reply data of incomplete dialogue data into a character judging network to obtain a prediction character output by the character judging network;

and adjusting the network parameters of the character discrimination network according to the difference between the character data of the predicted character and the incomplete dialogue data until a convergence condition is met.

and obtaining second training dialogue data according to each group of dialogue inquiry data, target reply data and character data corresponding to the character discrimination network generated in the loop iteration process.

Therefore, the dialogue processing device of the embodiment of the disclosure can output the reply data corresponding to the target character by fusing the target character features, so as to improve the performance of man-machine dialogue, and adopts the universal character attributes, so that compared with the man-set character attributes, the application scene of the man-machine dialogue is wider, and the practicability is stronger. Meanwhile, when the reply character sequence of the target reply data is generated, each reply character is generated one by one based on the target character characteristics, the calculated amount of the target reply data is smaller, and the efficiency and the performance of man-machine conversation are improved.

a processor; and

a memory storing computer instructions readable by a processor, the processor performing the method according to any of the embodiments of the first aspect when the computer instructions are read.

In a fourth aspect, the disclosed embodiments provide a storage medium for storing computer-readable instructions for causing a computer to perform a method according to any one of the embodiments of the first aspect.

Fig. 11 is a block diagram of an electronic device according to some embodiments of the present disclosure, and the following describes principles related to the electronic device and a storage medium according to some embodiments of the present disclosure with reference to fig. 11.

Referring to fig. 11, the electronic device 1800 may include one or more of the following components: processing component 1802, memory 1804, power component 1806, multimedia component 1808, audio component 1810, input/output (I/O) interface 1812, sensor component 1816, and communications component 1818.

The processing component 1802 generally controls the overall operation of the electronic device 1800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1802 may include one or more processors 1820 to execute instructions. Further, the processing component 1802 may include one or more modules that facilitate interaction between the processing component 1802 and other components. For example, the processing component 1802 can include a multimedia module to facilitate interaction between the multimedia component 1808 and the processing component 1802. As another example, the processing component 1802 can read executable instructions from a memory to implement electronic device related functions.

The memory 1804 is configured to store various types of data to support operation at the electronic device 1800. Examples of such data include instructions for any application or method operating on the electronic device 1800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 1806 provides power to various components of the electronic device 1800. The power components 1806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1800.

The multimedia component 1808 includes a display screen that provides an output interface between the electronic device 1800 and a user. In some embodiments, the multimedia component 1808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera can receive external multimedia data when the electronic device 1800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

Audio component 1810 is configured to output and/or input audio signals. For example, the audio component 1810 can include a Microphone (MIC) that can be configured to receive external audio signals when the electronic device 1800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1804 or transmitted via the communication component 1818. In some embodiments, audio component 1810 also includes a speaker for outputting audio signals.

I/O interface 1812 provides an interface between processing component 1802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 1816 includes one or more sensors to provide status evaluations of various aspects for the electronic device 1800. For example, the sensor component 1816 can detect an open/closed state of the electronic device 1800, the relative positioning of components such as a display and keypad of the electronic device 1800, the sensor component 1816 can also detect a change in position of the electronic device 1800 or a component of the electronic device 1800, the presence or absence of user contact with the electronic device 1800, orientation or acceleration/deceleration of the electronic device 1800, and a change in temperature of the electronic device 1800. Sensor assembly 1816 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1816 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1816 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1818 is configured to facilitate communications between the electronic device 1800 and other devices in a wired or wireless manner. The electronic device 1800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, 3G, 4G, 5G, or 6G, or a combination thereof. In an exemplary embodiment, the communication component 1818 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1818 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components.

It should be understood that the above embodiments are only examples for clearly illustrating the present invention, and are not intended to limit the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the present disclosure may be made without departing from the scope of the present disclosure.

Claims

1. A conversation processing method, comprising:

2. The method of claim 1, wherein determining the candidate character set for the current character to be generated based on the query character sequence and a reply character sequence generated before the current character to be generated comprises:

3. The method according to claim 2, wherein the taking the determined target character in the candidate character set as the current character to be generated according to a predetermined target character feature comprises:

4. The method according to any one of claims 1 to 3, wherein obtaining target reply data until all characters in the reply character sequence are generated comprises:

5. The method of claim 3, wherein each of the personality decision networks is trained as follows:

6. The method of claim 5, wherein each of the personality decision networks further comprises:

7. The method of claim 3, wherein training session data for training the session generation network and/or the personality discrimination network is obtained by:

8. A conversation processing apparatus, comprising:

9. The apparatus of claim 8, wherein the first determination module is specifically configured to:

10. The apparatus of claim 9, wherein the second determining module is specifically configured to:

11. The apparatus of claim 10, further comprising a network training module configured to:

12. The apparatus of claim 11, wherein the network training module is configured to:

13. The apparatus of claim 10, further comprising a training data acquisition module configured to:

14. An electronic device, comprising:

a processor; and

a memory storing computer instructions readable by the processor, the processor performing the method of any of claims 1 to 7 when the computer instructions are read.

15. A storage medium storing computer readable instructions for causing a computer to perform the method of any one of claims 1 to 7.