CN117271745A

CN117271745A - Information processing method and device, computing equipment and storage medium

Info

Publication number: CN117271745A
Application number: CN202311380546.XA
Authority: CN
Inventors: 陈春全
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2023-12-22

Abstract

The application provides an information processing method, an information processing device, a computing device and a storage medium, and relates to related technologies related to natural language processing in artificial intelligence, which can realize the chat conversation function. The method comprises the steps of obtaining interactive content information and role attribute information; acquiring a text feature vector and a character feature vector, and predicting to obtain a weight coefficient based on the text feature vector and the character feature vector, wherein the weight coefficient is used for representing dialogue content information corresponding to the interaction content information to be obtained and the association degree of the dialogue content information and the character attribute information; and according to the predicted weight coefficient, performing dialogue prediction processing on the text feature vector and the role feature vector to obtain dialogue content information corresponding to the interaction content information. The method and the device can flexibly generate personalized chat conversation content.

Description

Information processing method and device, computing equipment and storage medium

Technical Field

The present disclosure relates to the field of computer applications, and in particular, to an information processing method and apparatus, a computing device, and a storage medium.

Background

Chat robots, a program or system capable of simulating one or more conversations, have been widely used in many fields as a tool for human-machine interaction. It uses natural language processing and artificial intelligence techniques to understand the user's input and generate a corresponding response. Chat robots can be used in a variety of contexts, such as customer service, virtual assistants on a variety of devices, and so forth. With the continuous development of artificial intelligence technology, the functions and performances of chat robots are also continuously improved.

The core technology of the chat robot comprises natural language processing, text generation, dialogue management and the like, and the chat dialogue function is realized by training an optimized dialogue model. Because the training data adopted by the dialogue model is sparse in character during training, that is, most of the dialogues in the training data are irrelevant to character individuality, in the process of realizing chat dialogues through the dialogue model, a parallel and direct mode is mostly adopted to finish text or language reply, so that the method is inflexible.

Disclosure of Invention

The embodiment of the application provides an information processing method, an information processing device, a computing device and a storage medium, which can combine interaction content and role attributes to flexibly realize dialogue interaction.

In one aspect, an embodiment of the present application provides an information processing method, including:

acquiring interactive content information and role attribute information;

acquiring text feature vectors and character feature vectors, wherein the text feature vectors are obtained by encoding interactive content information, and the character feature vectors are obtained by encoding character attribute information associated with the interactive content information;

based on the text feature vector and the character feature vector, predicting to obtain a weight coefficient, wherein the weight coefficient is used for representing dialogue content information corresponding to the interaction content information to be obtained and the association degree of the dialogue content information and the character attribute information;

and according to the predicted weight coefficient, performing dialogue prediction processing on the text feature vector and the character feature vector to obtain dialogue content information corresponding to the interaction content information.

On the other hand, the embodiment of the application also provides an information processing device, which comprises:

the interface unit is used for carrying out information interaction;

the processing unit is used for acquiring the interactive content information and the character attribute information; acquiring text feature vectors and character feature vectors, wherein the text feature vectors are obtained by encoding interactive content information, and the character feature vectors are obtained by encoding character attribute information associated with the interactive content information; based on the text feature vector and the character feature vector, predicting to obtain a weight coefficient, wherein the weight coefficient is used for representing dialogue content information corresponding to the interaction content information to be obtained and the association degree of the dialogue content information and the character attribute information; and according to the predicted weight coefficient, performing dialogue prediction processing on the text feature vector and the character feature vector to obtain dialogue content information corresponding to the interaction content information.

Correspondingly, the embodiment of the application also provides a computing device which comprises an interaction interface, a storage device and a processor; the interaction interface is used for carrying out information interaction; the storage device stores a computer program, and the processor executes the computer program stored in the storage device to realize the corresponding information processing method.

Accordingly, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, the computer program comprises program instructions, and when the program instructions are executed by a processor, the program instructions cause a computing device with the processor to realize the corresponding information processing method of the application.

The embodiments of the present application also provide a computer program product comprising a computer program or computer instructions which, when executed by a processor, implement the above mentioned information processing method.

In the process of generating the chat conversation information, the embodiment of the application combines the interactive content information, the role attribute information and a weight coefficient for representing the conversation content information corresponding to the interactive content information and the association degree of the role attribute information to be obtained to generate the conversation content information corresponding to the interactive content information, so that conversation contents which are different in degree and related to personalized factors such as the role attribute can be flexibly generated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an information handling system architecture according to an embodiment of the present application;

FIG. 2 is a page schematic diagram of an interactive session page according to an embodiment of the present application;

FIG. 3 is a flow chart of an information processing method according to an embodiment of the present application;

FIG. 4 is a flow chart of a method of obtaining a dialog model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a model structure according to an embodiment of the present application;

FIG. 6 is a flow chart of a method of determining personalized dialog training data according to an embodiment of the present application;

FIG. 7 is a schematic diagram of another model structure according to an embodiment of the present application;

fig. 8 is a schematic structural view of an information processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The chat scheme of the present application may incorporate awareness of personalized information such as character attributes to enable conversations. In the information input stage of chat conversation, not only the content input by the object, such as the input voice, text and other interactive content information, but also the character attribute information related to the account initiating the interactive content information and representing the character personality, such as the personalized information of the interest and hobbies of the object, are obtained. And then determining the influence degree of the character attribute information on the reply information aiming at the interactive content information to be generated later through a predicted weight coefficient, and generating the reply information by integrating the weight coefficient, the interactive content information and the character attribute information to realize personalized conversations with different degrees.

In one possible implementation, the process of inputting the interactive content information and the character attribute information to finally generate the reply information may be implemented by a dialogue model. In the training process of the dialogue model, a general dialogue model or called an intermediate model can be pre-trained first by using a large amount of open-field dialogue data (first dialogue corpus) to learn basic dialogue generation capability and language structure. And then fine tuning is carried out on the personalized dialogue data (second dialogue corpus) with sparse characters to construct a personalized dialogue model, namely a final dialogue model, and characters and knowledge related to character individuality are learned while the coherent dialogue generating capability of the general dialogue model is fully utilized. In addition, in the dialogue model, an attention routing mechanism is also arranged, and whether the response related to the character personality should be generated is judged in the decoder according to the related characteristics of the dialogue history, so that the expression degree of the character personalization information is dynamically balanced and controlled. This dynamic attention routing mechanism allows us to fully exploit the sparse conversational data of the character during training and control the degree of expressing the character personalization information during decoding to generate replies.

The present application relates to artificial intelligence (Artificial Intelligence, AI), which is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The present application relates to natural language processing (Nature Language processing, NLP), NPL being an important direction in the field of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. The natural language processing relates to natural language, namely the language used by the object in daily life, and is closely researched with linguistics; and also to computer science and mathematics. An important technique for model training in the artificial intelligence domain, a pre-training model, is developed from a large language model (Large Language Model) in the NLP domain. Through fine tuning, the large language model can be widely applied to downstream tasks. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

For dialogue models, machine Learning (ML) is very important, and ML is a multi-domain interdisciplinary discipline involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.

The scheme idea of the application is to generate a required dialogue model based on a pre-training and fine-tuning method, firstly pre-train a general dialogue model by using a large amount of general dialogue data, and then fine-tune on the personalized dialogue data with sparse roles. The method combines the features of the generic language representation capability and character personalization of the pre-trained model. In another implementation manner of the application, in order to realize personalized dialogue content information, the method can also be realized by a mode of a mixed model, namely, two models are designed, one model is a universal dialogue model and is responsible for processing dialogues irrelevant to character individuality; another model is a character personalized dialog model for handling dialogs related to character personalities. And judging whether the input dialogue is related to character personality according to the content of the input dialogue, and selecting a proper model to generate an answer. In yet another implementation manner of the present application, dialog content information for generating personalized replies may also be obtained based on a method of generating a model by designing a condition generating model, adding character description information into model input, and guiding the model to generate answers related to character personalities. For example, a character description tag is added to the input of the transducer model to allow the model to take into account character characteristics when generating answers.

The scheme of the application has wide application scenes and can be used in conversation scenes such as chat robots, electronic game roles and the like. The chat robot can be endowed with specific roles and personalized information, so that the chat robot shows unique personalized characteristics when interacting with the object, the participation degree and satisfaction degree of the object can be improved, and the chat experience of the object is enhanced. In an electronic game, a personalized dialogue model is designed for a game character, so that the character presents unique individuality when interacting with a player, and the game character has more realism when interacting with the player, which can enhance the immersion and interestingness of the game.

Referring to fig. 1, an architecture diagram of an information processing system according to an embodiment of the present application includes a server 101 and a computing device 102, based on the information processing system, on one hand, an object may experience a chat function through a client installed on the computing device 102, where the computing device 102 may be various smart devices, for example, a smart phone, a tablet computer, a personal computer, a smart wearable device, a vehicle-mounted device, or other home appliances, such as a smart television, and in addition, the computing device 102 may also be a server with a corresponding function. On the other hand, the computing device 102 may be connected to the server 101 for receiving services provided by the server 101, including, but not limited to, services related to support of chat functions of the computing device by the server, deployment, update, optimization, etc. of a conversation model for generating reply messages. The server 101 may be a single server, or may be a server group formed by a plurality of servers, or may be a related server based on Cloud technology (Cloud technology), which is a hosting technology for unifying serial resources such as hardware, software, network, etc. in a wide area network or a local area network, so as to implement calculation, storage, processing and sharing of data. Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

The object may initiate a chat session on a page provided by a client of the computing device, which may be a web page or a dedicated chat user interface UI, and after the object with chat requirements logs in to the account, chat data may be entered on a page, such as that shown in fig. 2, where the entered chat data may be text characters or data such as voice. The data are uniformly converted into interactive content information in a corresponding format, so that reply content is generated from the interactive content information by combining character attribute information set for the chat robot or character attribute information associated with an account number of an object initiating the chat data. In fig. 2, the content replied by the chat robot is displayed at the position corresponding to the account icon 201 of the chat robot, that is, the obtained dialogue content information, and the account icon 202 of the object displays the related content sent by the object initiating the interactive content information. In one possible implementation, the chat robot may generate reply content through the conversation model, obtain conversation content information, and display the conversation content information at a location corresponding to chat robot account icon 201 as shown in fig. 2. As for the conversation model configured for the chat robot, the interactive content information as the original input is not limited to only one sentence, and may include a plurality of pieces of information that the object has interacted with the chat robot, that is, a conversation history including a plurality of pieces of information, specifically, one or more pieces of information sent by the object and one or more pieces of information generated by the chat robot, is input to the conversation model. The process of generating and training the conversation model of the chat robot can refer to the description of the following embodiments, and particularly refer to fig. 4, 5, 6 and 7 and the text descriptions of the embodiments corresponding to the drawings.

Referring to fig. 3, a flowchart of an information processing method according to an embodiment of the present application may be implemented by the above-mentioned computing device, in some possible implementations, may be implemented by a server, where the computing device is responsible for collecting and presenting corresponding data, and in other possible implementations, may be implemented by the computing device and the server cooperatively. The method of the embodiment of the application comprises the following steps.

S301: and acquiring the interactive content information and the character attribute information. The interactive content information may be obtained through information obtained through an interactive session page, and the information obtained on the interactive session page may be directly used as the interactive content information, and specifically, any one or more of an obtained text character, a text character obtained based on audio data conversion, and a text character obtained by identifying an image may be obtained. The term "plurality" as used in the various embodiments of the present application means two or more.

In one possible implementation, the chat information may be obtained through an interactive session page, where the chat information includes: one or more of characters input by the object and obtained through the interactive session page, characters obtained by converting audio data received through the interactive session page, and characters obtained by identifying image data received through the interactive session page; and then obtaining interactive content information according to the chat information, such as some simple nonsensical word deleting processes, sentence combination and other processes. The interactive content information may be obtained from text characters obtained by processing the most recently acquired chat information, or may be obtained from text characters obtained by processing a plurality of pieces of the most recently acquired chat information.

For example, the chat process of the object Zhang Sanand the chat robot on the interactive session page is as follows:

and Zhang San: do the family go beyond the question of a graduated paper?

Chat robot: this looks at school bars, and some schools will.

And Zhang San: this is so serious.

Chat robot: i are also less clear and the students of our school do not seem to be.

And Zhang San: that determines the end of the run at university a.

The interactive session information acquired in S301 may be obtained from the most recently acquired chat information "that determines the delay condition of university a", for which the subsequent chat robot needs to generate reply information. The interactive session information acquired in S301 may also be a plurality of pieces of chat information acquired recently, and the interactive session information may be obtained by connecting corresponding identifiers, for example: "[ bos ] will not be prolonged by the answering of the graduation paper? [ sep ] this looks at school bars, some schools will. [ sep ] is so serious? [ eos ] is an interactive session information, wherein [ bos ], [ sep ] and [ sep ] are special identifiers, and the special identifiers are defined to indicate the meanings of the start, connection sentence breaking, end and the like of the interactive session information.

The character attribute information may be obtained according to information related to personalities set for the chat robot itself, for example, information related to personalities such as female, hobby ball sports, etc. set for the question-answer chat robot, and information related to personalities such as male, hobby tourism, etc. set for the game character.

The character attribute information may also be determined based on an account number of the chat-initiating object, and mainly includes: the interest of the subject includes basic information of gender, age, etc. and other information capable of expressing individuality. The information can be manually filled by the object, or can be obtained by induction statistics according to dialogue information of the object in the chat process of the object.

That is, in the embodiment of the present application, the chat robot may have its own personality, generate the dialogue content information based on its own personality to complete the chat with the object, or generate the dialogue content information only with reference to the personality of the object to cater to the communicating object, and complete the chat with the object. The object can set the source of character attribute information to be acquired through the interactive session page in the chat process, so that the chat robot can conduct chat sessions with different character personalities.

It should be noted that, in the present application, related data in the information processing process is referred to, for example: various chat data, interest and hobbies related to character attribute information, identity data (such as gender, age, nickname, region and face information) of an object, and the like, when all embodiments of the application are applied to specific products or technologies, the acquisition of the data needs to obtain permission or consent of the object, and the related data collection, use and processing processes need to comply with related laws and regulations and standards, conform to legal, legal and necessary principles and do not relate to the data types forbidden or limited by the acquired laws and regulations. In some alternative embodiments, the related data in the embodiments of the present application is acquired after the object is individually authorized, for example, by means of a pop-up window or the like, to initiate a prompt to the object about the specific content, use, etc. of the data to be acquired, and wait for the confirmation of the object, after which the individual authorization is considered to be acquired.

The method comprises the steps of acquiring character attribute information associated with an account initiating interactive content information according to the account initiating the interactive content information, wherein the character attribute information is obtained by analyzing and processing one or more of historical interaction data, account registration data, interaction behavior data and social relationship data submitted through the account and used for carrying out character attribute analysis under the condition of permission of an object. Of course, the character attribute information may be preset information, for example, character attribute information set for a chat robot.

S302: the method comprises the steps of obtaining text feature vectors and character feature vectors, wherein the text feature vectors are obtained by encoding interactive content information, and the character feature vectors are obtained by encoding character attribute information associated with the interactive content information. In the embodiment of the application, the dialogue content information serving as the reply information is generated through a dialogue model. The dialogue model is an encoder-decoder based structure. The interactive content information is encoded by an encoder to obtain text feature vectors, and the character attribute information associated with the interactive content information is encoded by the encoder to obtain character feature vectors.

S303: based on the text feature vector and the character feature vector, predicting to obtain a weight coefficient, wherein the weight coefficient is used for representing dialogue content information corresponding to the interaction content information to be obtained and the association degree of the dialogue content information and the character attribute information. The weight coefficient can be predicted by a weight predictor included in the dialogue model, the weight predictor can take a text feature vector and a character feature vector as input to predict whether the dialogue is related to character personality information, the larger the weight coefficient is, the greater the role of the character feature vector in the decoding process is, the greater the degree of expressing the character personality information is, and finally, the response related to the character personality is generated; conversely, the smaller the weight coefficient, the smaller the role of the character attribute vector in the decoding process, and the finally generated reply message can be a dialogue content message without personality characteristics and emotion colors.

S304: and according to the predicted weight coefficient, performing dialogue prediction processing on the text feature vector and the character feature vector to obtain dialogue content information corresponding to the interaction content information. Namely, the dialogue model takes the weight coefficient, the text feature vector and the character feature vector as inputs, and outputs corresponding dialogue content information.

In addition, in some possible implementations, for clients of the chat application, different interactive information processing manners may be selected according to actual needs. Prior to S301, it may further include: detecting the type of an interaction scene of an account initiating interaction content information; if the interaction scene type is detected to be the first type, acquiring interaction content information, and generating dialogue feedback information according to the interaction content information; and if the interaction scene type is detected to be the second type, triggering and executing to acquire the interaction content information and the role attribute information. That is, the object may select a manner of generating the reply message according to need during the chat process, and may select whether to generate the dialogue content information as the chat reply message in the manner of S301 to S304 described in the present application or to generate the dialogue content information in a common manner. A selection control can be arranged on the interaction session page, if the interaction scene type selected by the object through the control is the first type, the dialogue feedback information is generated only according to the interaction content information, and the chat feedback can be completed in a conventional processing mode. If the interactive scene type selected by the object through the control is the second type, the steps of S301 to S304 described above in the embodiment of the present application are performed.

In one possible implementation, the acquisition of the attribute information of the character may be determined by identifying whether a report flag set on the interactive session page is in an on state, and the report flag may also be used as a selection control to determine the type of the interactive scene. If the object starts the reporting mark, the reporting mark is in an on state, the object is indicated to allow reporting of the character attribute information, and the interactive scene type at the moment is of a second type, so that the character attribute information of the object can be obtained. If the object does not turn on the reporting mark, the reporting mark is in a closed state, the object is indicated not to allow reporting of the character attribute information, and the type of the interaction scene at the moment is a second type. In addition, the object can switch between the first type and the second type at any time according to the selection control or the report mark in the chat process, so that the whole chat process of the object can be switched between personalized dialogue information reply and non-personalized dialogue information reply as required.

By means of the selection of the types of the interaction scenes, a more flexible and changeable chat mode is provided for the object, and the object selects the first type, not only because the character attribute information is unwilling to be reported, but also because the chat is originally a formal chat, for example, knowledge question answering is performed or the chat robot is used as a knowledge retrieval tool to perform knowledge retrieval, under the scenes, if personalized chat reply information is added, the situation that the type of the interaction scene is the first type can be selected manually, or reporting marks are closed by the object can be selected manually.

Of course, the computing device may also determine the current interaction scene type by analyzing the chat information that has been input in advance, so as to implement automatic selection of the interaction scene type, without having to select the control or report a flag manually to detect and determine the current interaction scene type.

For the above-mentioned dialogue model, in general, dialogue content information corresponding to interactive content information is generated by the dialogue model, text feature vectors and character feature vectors are generated by an encoder in the dialogue model, and inputs of the encoder include: an input vector obtained according to the interactive content information and an input vector obtained according to the character attribute information; the weight coefficients are predicted by a weight predictor in the dialog model, the inputs to the weight predictor comprising: text feature vectors and character feature vectors; the dialogue content information is obtained from the output of a decoder in the dialogue model, and the input of the decoder is a weight coefficient, a text feature vector and a character feature vector. The process of generating the dialogue model according to the present application will be described below.

Referring to fig. 4, a flowchart of a method for obtaining a dialogue model according to an embodiment of the present application may be implemented by a server or a computing device. In the embodiment of the application, a general dialogue model, namely an intermediate model, is pre-trained by using a large amount of general dialogue data, and then fine adjustment is performed on personalized dialogue data with sparse characters to obtain a final dialogue model, so that the dialogue model combines the general language representation capability and character personalization characteristics of the pre-trained model. The method comprises the following steps.

S401: and acquiring first dialogue corpora, and generating a first dialogue training data set according to each group of first dialogue corpora. In S401, various social media platforms can be used as data sources, and web crawlers and other modes can be used to capture a large number of dialogue corpora in open fields, and these corpora do not emphasize character personalized information, and can not embody content such as character attributes of interests and hobbies. In some possible implementations, the collected raw dialogue corpus data may be preprocessed and data cleaned to improve data quality. The pretreatment step comprises the following steps: removing irrelevant information such as links, HTML (Hypertext Markup Language ) tags, advertisements and the like; unifying the cases; duplicate, nonsensical or low quality conversations, etc. are removed. Dialogs between three or more objects are filtered out, and only dialogs between two objects are reserved as training data. The purpose of the preprocessing is to normalize the data more, so that the corresponding training data can be determined later.

In one possible implementation, the first dialog training data in the first set of dialog training data includes: a text sequence pair consisting of a first training text sequence and a first supervising text sequence obtained from a set of first dialog corpus. The acquisition of the first dialog training data set is described subsequently taking a first target dialog corpus of the first dialog corpora as an example. The first target dialogue corpus in the first dialogue corpus comprises texts corresponding to N rounds of dialogue, wherein N is an integer greater than or equal to 2; generating a first dialog training data set from each set of first dialog corpora, including: splicing the first N-1 texts in the texts corresponding to the N rounds of dialogue included in the first target dialogue corpus through splicing marks to obtain a first spliced character sequence, adding a start mark at the beginning of the first spliced character sequence and an end mark at the end of the first spliced character sequence to obtain a first input text sequence corresponding to the first target dialogue corpus, and encoding the first input text sequence to obtain a first training text sequence; and obtaining a first output text sequence according to the last text in the texts corresponding to the N rounds of dialogue included in the first target dialogue corpus, and encoding the first output text sequence to obtain a first supervision text sequence corresponding to the first target dialogue corpus. In an alternative implementation, the coding mentioned includes word coding and position coding.

For example, the following illustrates multiple rounds of dialogue data between speaker a and speaker B, which may be considered a set of first dialogue corpora. In this example, n=4 rounds of dialog are included, with n=4 texts.

(1) Speaker a: do the family go beyond the question of a graduated paper?

(2) Speaker B: this looks at school bars, and some schools will.

(3) Speaker a: this is so serious.

(4) Speaker B: i are also less clear and the students of our school do not seem to be.

For the set of first dialogue corpora, the multi-turn dialogue data is further preprocessed to convert the dialogue data into a form acceptable to the dialogue model. First, a special symbol "bos" is added at the beginning of the dialogue history (i.e., (1) (2) (3) three texts) to characterize the beginning of a sentence, and multiple rounds of dialogue (i.e., (1) (2) (3) three texts) of the dialogue history are spliced together with a special symbol "sep", and a special symbol "eos" is added at the end of the dialogue history to characterize the end of the dialogue history, so that a first input text sequence can be obtained. Likewise, a special symbol "bos" is added at the beginning of the reply (i.e., text (4)) to characterize the beginning of the sentence, and a special symbol "eos" is added at the end of the reply to characterize the end of the reply, so that a first output text sequence can be obtained. the training data of the transducer model is a pair of sentences comprising a dialogue history and corresponding replies, as shown below, x1 represents the input text, corresponding to the first input text sequence mentioned above, and y1 represents the output text, corresponding to the first output text sequence mentioned above.

x1 = [ bos ] the family will not be extended to be beyond the question of graduation? [ sep ] this looks at school bars, some schools will. [ sep ] is so serious? [ eos ]

y1= [ bos ] i am also unclear, as if the students of our school were not. [ eos ]

And for x1 and y1, continuing to segment and index the dialogue text, and then inputting the dialogue text as a dialogue model. In order for the transducer model to learn the positional information of the vocabulary in the sentence, it is necessary to add a positional code to the input data. As shown below, the first training text sequence X1 obtained from X1 is the input of the encoder, the first supervised text sequence Y1 obtained from Y1 is the input of the decoder, and Y1' is the target sequence output by the decoder. The input data of the decoder is the first N-1 words except the last word 'eos' of the output text, and the label data corresponding to the decoder is the N-1 words except the first word 'bos' of the output text.

On the basis of the first input text sequence, word encoding (Word encoding on the left side in fig. 5) and position encoding (Position Embedding on the left side in fig. 5) are performed to obtain a first training text sequence, and similarly, word encoding (Word encoding on the right side in fig. 5) and position encoding (Position Embedding on the right side in fig. 5) are performed on the first output text sequence to obtain a first supervision text sequence, which is schematically shown below.

X1= [ ' bos ', ' family ', ' will because ', ' graduation paper ', ' answer dialect ', ' but ', ' extend ', ' graduation ', ' is? ' sep ', ' this ', ' view ', ' school ', ' bar ', ' ', ' something ', ' school ', ' meeting ', ' the term. ' sep ', ' so ', ' severe ', ' o? ' eos

Y1= [ ' bos ', ' i am ', ' less clear ', ', ' we school ', ' student ', ' look like ', ' don't see ', '. ']

Y1' = [ ' i am ', ' less clear ', ', ' we school ', ' student ', ' look like ', ' don't ' and ' do not ' the same. ' eos

After the first training text sequence X1 and the first supervision text sequence Y1 are obtained, the following S402 may be executed to train the initial model.

S402: and training the initial model by using the first dialogue training data set to obtain an intermediate model. The method comprises the steps of pre-training to obtain a universal dialogue model, and further training to obtain the dialogue model on the basis of the universal dialogue model.

As above, the dialogue model adopts a transducer model structure, and the transducer model is an encoder-decoder structure based on a self-attention mechanism. Thus, the initial model also corresponds to a self-attention mechanism based encoder-decoder structure, wherein the encoder comprises a plurality of feature handling layers, each feature handling layer of the encoder comprises a multi-headed self-attention machine sub-layer and a feed-forward neural network sub-layer, and each sub-layer has access to one residual connection and layer normalization handling layer; the decoder includes a plurality of feature processing layers, each feature processing layer of the decoder including: a multi-headed self-attention machine sublayer, an encoder-decoder attention machine sublayer and a feedforward neural network sublayer, and each sublayer is connected into a residual connection and layer normalization processing layer, wherein the multi-headed self-attention machine sublayer in the decoder performs position shielding processing through a mask.

That is, the encoder is stacked from multiple identical layers, each layer comprising two sublayers: multi-head Self-Attention mechanism (Multi-head Self-Attention) and Feed-forward neural network (Feed-Forward Neural Network). Furthermore, each sub-layer is followed by a residual connection (Residual Connection) and layer normalization (Layer Normalization). The multi-head self-attention mechanism calculates the association degree between each vocabulary and other vocabularies in the input sequence, thereby capturing long-distance dependency relations in sentences. The multi-headed mechanism then allows the model to focus on information at different locations simultaneously. The feedforward neural network is used to extract local features of the input sequence, and typically includes two fully connected layers and an activation function. The encoder is responsible for encoding the input sentence into a continuous vector representation, capturing semantic and structural information in the input text.

The decoder is also stacked from a plurality of identical layers, each layer comprising three sublayers: a multi-headed self-attention mechanism, an Encoder-decoder attention mechanism (Encoder-Decoder Attention), and a feedforward neural network. Similar to the encoder, each sub-layer is followed by a residual connection and layer normalization. The multi-headed self-attention mechanism of the decoder is similar to that in the encoder for capturing long-range dependencies in the target sequence (i.e., the output sequence). To prevent the decoder from looking into the subsequent vocabulary in advance when generating the t-th vocabulary, a Mask (Mask) is required to Mask the information of the subsequent position. The encoder-decoder attention mechanism calculates the degree of association between each word in the decoder and words in the encoder input sequence, thereby capturing the association and correspondence between the target sequence and the input sequence. Similar to the encoder, the feedforward neural network is used to extract local features of the target sequence. The decoder is responsible for generating the next vocabulary from the output of the encoder and the generated partial target sequence. the transducer model has strong modeling capability and good expandability, and can well perform parallel computation, and in one possible implementation, the model structure of the initial model is shown in fig. 5.

It should be noted that the input of the decoder in the training process is Y1, the corresponding output of the decoder is Y1', and the two are staggered by one bit. For example, input y10 corresponds to output y11, input y11 corresponds to output y12, and so on.

When training the dialogue model, a cross entropy loss function can be used to measure the difference between the target sequence Y1' generated by the initial model in the training process and the first supervision text sequence Y1, and the model parameters of the initial model are updated by minimizing the loss function so as to obtain a general dialogue model, namely an intermediate model. On the basis of a first session training data set comprising a large number of text sequence pairs obtained on a massive (billion scale) open-area dialog corpus, the initial model of the transducer model structure is pre-trained by means of the first session training data set, so that a generic dialog model, i.e. the above-mentioned intermediate model, can be obtained. The pre-trained general dialogue model has learned basic dialogue generation capability and language structure, has better man-machine interaction and dialogue capability and can generate smooth replies conforming to the context due to huge dialogue data volume in the open field.

S403: and acquiring second dialogue corpora used for reflecting character attribute information, and generating a second dialogue training data set according to each group of second dialogue corpora. The actual dialogue data is character sparse, and in daily actual dialogues, the dialogues will in most cases not show their personality, only a small part of the dialogues being personality dependent. In real conversations, most conversations are indeed independent of character personality, because objects tend to focus more on the topics and content of the conversation in daily communications, rather than the personality and characteristics of the conversation participants. For example, when two people discuss the purchase of a commodity in a store, in many cases their dialogue content is independent of their personality, but focuses on the price, quality, style, etc. of the commodity. On the other hand, dialogs related to character personalities also exist, and generally relate to interpersonal relationships, emotion expressions, personal perspectives and the like, and have the characteristics of more profound and individualization. For example, in a conversation between two friends, they may discuss their interests, personal preferences, family backgrounds, etc., which are related to their personality and experience, with richer and personalized features.

In one possible implementation, fig. 6 is a flowchart of a method for determining personalized dialog training data according to an embodiment of the present application. A personalized dialog data set containing character information may be constructed by means of a crawling. First, at S601, a data source is determined, where movie scripts and television dramas typically contain rich character description information and character dialogs, where novels and story texts also typically contain character personality related information and dialogs, and where social media and character accounts in online forums may have unique personalities and dialog styles. Movie scripts and television shows, doctrines and story text, social media and online forums can thus be used as data sources for personalized dialog training data. S602 grabs or collects data in the data source determined in S601. Dialogue data and character description information can be captured from the data sources by a crawler or manually collecting data. In sorting data, it is necessary to ensure that the dialogue content and the character description information are matched with each other. For the captured or collected data, preprocessing is performed on the data in S603, including text washing, and irrelevant information such as HTML tags, advertisements, etc. is removed; unifying the cases; removing duplicate session data, and the like. In S604, a data formatting process is performed, and a second dialog training data set is obtained from the formatted data. The preprocessed data is arranged into a format suitable for model training, typically comprising an input comprising dialog content and character description information, and an output comprising replies corresponding to the input dialog. Finally, in S605, it is determined whether the reply is related to the character personalization information by means of manual labeling or keyword matching. If the reply is related to character personality, it is marked as 1; if the reply is independent of character personality, it is marked as 0.

In general, the second dialog training data in the second set of dialog training data includes: character attribute training information, a second training text sequence and a second supervision text sequence obtained from a set of second dialogue corpora, and character-related labels for indicating whether an output of a pre-training model obtained from the intermediate model is related to the character attribute training information. The value of the character-related tag is determined by the above-mentioned manual labeling or keyword matching in S605, specifically, 1 or 0. The keyword matching refers to presetting a plurality of keywords which are considered to be capable of reflecting character personalities, then performing keyword matching in the obtained second dialogue corpus, and considering character-related labels r=1 if one or more preset keywords are contained in the second dialogue corpus, otherwise, the character-related labels r=0.

In one possible implementation, for a second target dialog corpus in the second dialog corpus, the second target dialog corpus includes text corresponding to M rounds of dialog, where M is an integer greater than or equal to 2; generating a second dialog training data set from the sets of second dialog corpora, including: acquiring character attribute training information associated with an account corresponding to a second target dialogue corpus; character attribute training information can be obtained by encoding collected color individualization information such as hobbies and sexes. Splicing the first M-1 texts in the texts corresponding to the M rounds of dialogue included in the second target dialogue corpus through splicing marks to obtain a second spliced character sequence, adding a start mark at the beginning of the second spliced character sequence and an end mark at the end of the second spliced character sequence to obtain a second input text sequence, and encoding the second input text sequence to obtain a second training text sequence; obtaining a second output text sequence according to the last text in the texts corresponding to the M rounds of dialogue included in the second target dialogue corpus, and encoding the second output text sequence to obtain a second governor-channel text sequence corresponding to the second target dialogue corpus; a character-related tag is set to indicate whether an output of a pre-training model obtained from the intermediate model is related to character attribute training information.

An example of personalized dialog data containing character information is illustrated below, as follows, in conjunction with the description of fig. 6. P represents character personalization information of a replier, which can be used to obtain character attribute training information P, X2 represents dialogue history, namely a second input text sequence, which can be used to obtain a second training text sequence X2, Y2 represents a corresponding reply, namely a second output text sequence, and which can be used to obtain a second supervision text sequence Y2, wherein R represents whether the reply is related to the character personalization information, namely a character related label.

p= [ bos ] i call sufei. [ sep ] I are girls. [ sep ] I are solitary. [ sep ] I prefer climbing and hiking. [ sep ] i prefer that the song is Daoxiang. [ eos ].

x 2= [ bos ] do you rest today? [ sep ] is idle today and not active. [ sep ] today's weather is very cool ]! [ eos ].

y2= [ bos ] is o, and is very suitable for removing hiking to [ eos ].

R=1, character personality is relevant.

Based on p, x2, and Y2 described above, after the encoding processing such as Word encoding (Word encoding in fig. 7) and position encoding (Position Embedding in fig. 7) is performed, P, X and Y2 can be obtained correspondingly, which is schematically shown below.

P= [ ' bos ', ' i call ', ' sufei ', ' i. 'sep', 'i' are personal ',' girl ','. ' sep ', ' i are ', ' solitary seed ', '. ' sep ', ' i like ', ' climb mountain ', ' and ', ' hiking ', '. ' sep ', ' i prefer ', ' song ', ' is ', ' rice ', '. 'eos' ].

X2= [ ' bos ', ' you ', ' today ', ' rest ', ' mock ', '? 'sep', 'today' idle ',' not "but (not yet," ") is performed. 'sep', 'today' weather ',' cool down ',' |! ' eos ].

Y2= [ ' bos ', ' is very suitable for ', ' go out ', ' hiking ', ' eos ].

R=1 or 0, and the character related label is used to indicate whether the character is related to character personality, and can be understood as supervision data of real weight parameters or weight coefficients.

It should be noted that the character attribute training information characterizes the relevant personality information of the object corresponding to the determined second input text sequence, that is, the character attribute training information used during training is the character personalization information of the replier, for example, P characterizes the personalization information of the object initiating Y2.

According to P, X, Y2 and R corresponding to a large number of second target dialogue corpora, a second dialogue training data set can be obtained, and after the second dialogue training data set is obtained, the next training can be performed.

S404: training the pre-training model obtained according to the intermediate model by using the second dialogue training data set to obtain a dialogue model. For the obtained intermediate model, in the embodiment of the application, adjustment is performed so as to learn personalized relevant knowledge. In one possible implementation, a weight predictor for predicting the obtained weight coefficients is set in the trained intermediate model, and the attention machine sublayer of the encoder-decoder in the trained intermediate model is adjusted to be the attention router sublayer, so as to obtain the pre-trained model, wherein the input of the attention router sublayer comprises the output of the weight predictor, the output of the encoder and the output of the last sublayer of the attention router sublayer of the decoder. Please refer to fig. 7 for the structure of the pre-training model obtained by adjusting the intermediate model.

And fine-tuning a pre-training model corresponding to the pre-trained universal dialogue model on a second dialogue training set obtained by the acquired sparse dialogue data of the roles so as to finally obtain the dialogue model capable of performing personalized perception. The second dialog training data of the second dialog training set includes: character attribute training information P, specifically, personalized information describing characters by using several sentences; x2 represents a second training text sequence corresponding to the dialogue history; y2 represents a second supervision text sequence corresponding to the corresponding reply; r is a role related labelThe tag indicates whether the reply is related to persona personalization information. Encoder of pre-training model for encoding X2 into training text feature vector E _X Coding the P of the replier as training character feature vector E _P . Will E _X And E is _P Input to the decoder, which decodes the output sequence Y2' in an autoregressive manner.

In one possible implementation, after obtaining the pre-training model and the related second dialogue training data, training the pre-training model obtained according to the intermediate model with the second dialogue training data set to obtain the dialogue model includes: modifying model parameters of the pre-training model by using the second dialogue training data set as input of the pre-training model obtained according to the intermediate model and using loss calculation results obtained by the first loss function and the second loss function so as to train to obtain the dialogue model; wherein the first penalty function is used to determine a difference between the output of the weight predictor in the pre-training model and the role-related label; the second loss function is used for determining a difference between a second supervision text sequence and a sequence obtained by autoregressive decoding of training text feature vectors and training character feature vectors output by an encoder of the pre-training model; the training text feature vector is obtained by processing a second training text sequence obtained from a group of second dialogue corpus by an encoder of a pre-training model, namely E _X The training character feature vector is obtained by processing the character attribute training information by an encoder of the pre-training model, namely E _P 。

In one possible implementation manner, in order to fully utilize the personalized dialogue corpus with sparse characters, the training samples of dialogue topics and contents are more focused when the initial model is trained, and the personalized information of the characters is rarely or not related in the decoding process; so when training the pre-training model, the dialogue data related to the characters is modeled, so that the decoding process contains a large number of character personalized features. The method proposed by the present application designs an attention routing mechanism based on E derived from dialog history in the decoder _X To judge whether or notA response should be generated to control the training character feature vector E in relation to character personality _P Plays a role in the decoding process, thereby dynamically balancing and controlling the expression degree of the character personalization information.

Extending the original attention mechanism, selecting the output E of a layer on the decoder _y As query of the attention mechanism, for E _X And E is _P Modeling, each set of attention is called an attention route, and is shown below.

Ox＝MultiHead(E _y ，E _X ，E _X )

Op＝MultiHead(E _y ，E _P ，E _P )

As shown schematically in FIG. 7, E _y The output normalized for the first layer in the right decoder. Based on Ox and Op, a weighting coefficient alpha E [0,1 ] is used]Combining Ox and Op, wherein the larger the weight coefficient alpha is, the larger the role vector is used in the decoding process, the greater the degree of expressing the role personalized information is, and the response related to the role individuality is generated; conversely, a smaller weight coefficient α indicates a smaller role of the character vector representation in the decoding process.

Omerge＝αOp+(1-α)Ox

The value of the weight coefficient alpha is determined according to whether the dialogue is related to character personality information. The application uses a neural network module as a weight predictor, and automatically calculates a weight coefficient alpha in the training process. The neural network module can be modeled as a binary classifier P _θ (r|E _X ,E _P ) In E _X And E is _P As an input, predicting whether a dialog is related to character personality information, r=1 indicating that the dialog is related to character information; r=0 means that the dialog is independent of the character information. The confidence of the binary classifier prediction is taken as a weight parameter alpha.

α＝P _θ (r＝1|E _X ,E _P )

θ represents a trainable parameter of the neural network module. R represents whether the dialog is related to character personalization information, which is a true weight parameter. Weighting binary classifier pre-decisions using cross entropy loss functions And measuring the difference between the output and the real weight parameter. The following L ₁ The expression of the (theta) correlation is a first loss function used to determine the difference between the output of the weight predictor in the pre-training model and the character-related label, in which, P _θ (r|E _X ,E _P ) Representing the predicted output of the weight predictor or binary classifier, R is the true weight parameter.

L ₁ (θ)＝-R·log(P _θ (r|E _X ,E _P ))+(1-R)·log(1-P _θ (r|E _X ,E _P ))

In addition, during the trimming process, according to E _X And E is _P Decoding (or output of pre-training model) and second supervision text sequence y2= { Y in autoregressive manner ₀ ,y ₁ ,y ₂ ,...,y _n And determining the settlement result of the loss function so as to facilitate fine tuning of the pre-training model. In the fine tuning process, the loss function of the predicted recovery is:

wherein,trainable model parameters representing a dialog model, +.>Then a pre-trained model is represented. Above->The associated expression is a second penalty function, i corresponds to the number of words in the second supervised text sequence, and, as exemplified by Y2 above, the words of Y2 have n=8 (bos _、 O (o) _、， _、 Is very suitable for _、 Go out _、 Hiking woolen cloth _、～ _、 eos), i is equal to or less than the number of words in Y2 n=8.

In general, the loss function during trimming is:

where λ is a superparameter that balances the two losses, which superparameter may specify, for example, that λ may take a number between 0.5 and 2. In the fine tuning process, the model parameters of the pre-trained model are updated by minimizing the loss function, i.e., updating the θ sum described above Thus, the required dialogue model is finally obtained.

It should be noted that, in other embodiments of the present application, the first dialogue training data included in the first dialogue training data set may also be data before the above-mentioned word encoding and position encoding, where the encoding process of the word encoding and position encoding may be used as an initial model, or a data processing part in a pre-training model, for example, the initial model and the pre-training model each include corresponding sub-layers for performing the word encoding and position encoding, and relevant model parameters of the sub-layers may also be optimized for training.

In addition, the softmax layer in fig. 5 and 7 is an output layer of the model, outputting the corresponding probability.

The method for generating the dialogue can realize personalized perception. The real dialogue data is sparse, in daily real dialogues, most dialogues are independent of character personalities, and only a small part of dialogues are related to character personalities. Training and constructing a personalized dialog model directly on the sparse dialog data of the character is likely to cause the dialog model to focus on most of the dialog data unrelated to the character's personality, while ignoring some of the dialog related to the character's personality as noise. The method comprises the steps of pre-training a general dialogue model on a large amount of dialogue corpora in the open field, and helping models learn basic dialogue generating capacity and language structures. And then, acquiring personalized dialogue data according to the data source selected in a targeted way, further fine-tuning the model on the personalized dialogue data to construct a personalized dialogue model, and learning character-related characteristics and knowledge while fully utilizing the coherent dialogue generation capability of the universal dialogue model. The method combines the characteristics of the universal language representation capability and character individuation of the pre-training model, can avoid maintaining two independent models, and reduces the calculation and storage cost. In addition, in the dialogue model, the application provides an attention routing mechanism, and whether the response related to the character personality should be generated is judged in the decoder according to the dialogue history, so that the expression degree of the character personalization information is dynamically weighted and controlled. This dynamic attention routing mechanism allows us to fully exploit the sparse conversational data of the character during training and control the degree of expressing the character personalization information during decoding to generate replies.

Referring to fig. 8 again, a schematic structural diagram of an information processing apparatus according to an embodiment of the present application is shown. The apparatus of the embodiment of the present application at least includes an interface unit 801 and a processing unit 802. The specific uses of each unit are as follows.

An interface unit 801 for performing information interaction; a processing unit 802, configured to obtain interactive content information and character attribute information; acquiring text feature vectors and character feature vectors, wherein the text feature vectors are obtained by encoding interactive content information, and the character feature vectors are obtained by encoding character attribute information associated with the interactive content information; based on the text feature vector and the character feature vector, predicting to obtain a weight coefficient, wherein the weight coefficient is used for representing dialogue content information corresponding to the interaction content information to be obtained and the association degree of the dialogue content information and the character attribute information; and according to the predicted weight coefficient, performing dialogue prediction processing on the text feature vector and the character feature vector to obtain dialogue content information corresponding to the interaction content information.

In an optional implementation, the processing unit 802 is further configured to detect an interaction scenario type of an account initiating the interaction content information; if the interaction scene type is detected to be the first type, acquiring interaction content information, and generating dialogue feedback information according to the interaction content information; and if the interaction scene type is detected to be the second type, acquiring interaction content information and role attribute information.

In an alternative implementation, the processing unit 802, when configured to acquire the interactive content information and the character attribute information, is configured to acquire chat information through the interactive session page, where the chat information includes: one or more of characters input by the object and obtained through the interactive session page, characters obtained by converting audio data received through the interactive session page, and characters obtained by identifying image data received through the interactive session page; obtaining interactive content information according to the chat information; and acquiring character attribute information, wherein the character attribute information is associated with an account initiating the interactive content information, and the character attribute information is obtained by analyzing and processing one or more of historical interaction data, account registration data, interaction behavior data and social relationship data submitted through the account for character attribute analysis, or is preset.

In an alternative implementation, the dialogue content information corresponding to the interaction content information is generated through a dialogue model; the apparatus further comprises: the training unit 803 is configured to obtain a first dialogue corpus, and generate a first dialogue training data set according to each group of the first dialogue corpus; training the initial model by using the first dialogue training data set to obtain an intermediate model; acquiring second dialogue corpus used for reflecting character attribute information, and generating a second dialogue training data set according to each group of second dialogue corpus; training the pre-training model obtained according to the intermediate model by using the second dialogue training data set to obtain a dialogue model.

In an alternative implementation, the first dialog training data in the first set of dialog training data includes: a text sequence pair consisting of a first training text sequence and a first supervision text sequence obtained from a set of first dialogue corpora; the second dialog training data in the second set of dialog training data comprises: character attribute training information, a second training text sequence and a second supervision text sequence obtained from a set of second dialogue corpora, and character-related labels for indicating whether an output of a pre-training model obtained from the intermediate model is related to the character attribute training information.

In an alternative implementation, the text feature vector and the character feature vector are generated by an encoder in the dialog model, the input to the encoder comprising: an input vector obtained according to the interactive content information and an input vector obtained according to the character attribute information; the weight coefficients are predicted by a weight predictor in the dialog model, the inputs to the weight predictor comprising: text feature vectors and character feature vectors; the dialogue content information is obtained from the output of a decoder in the dialogue model, and the input of the decoder is a weight coefficient, a text feature vector and a character feature vector.

In an optional implementation manner, the first target dialogue corpus in the first dialogue corpus includes texts corresponding to N rounds of dialogues, where N is an integer greater than or equal to 2; training unit 803, when generating a first dialog training data set according to each group of first dialog corpus, is configured to splice the first N-1 texts in the texts corresponding to the N rounds of dialogues included in the first target dialog corpus through splice marks to obtain a first spliced character sequence, add a start mark at the beginning of the first spliced character sequence, add an end mark at the end of the first spliced character sequence to obtain a first input text sequence corresponding to the first target dialog corpus, and encode the first input text sequence to obtain a first training text sequence; and obtaining a first output text sequence according to the last text in the texts corresponding to the N rounds of dialogue included in the first target dialogue corpus, and encoding the first output text sequence to obtain a first supervision text sequence corresponding to the first target dialogue corpus.

In an alternative implementation, the initial model is a self-attention mechanism based encoder-decoder architecture, wherein the encoder includes a plurality of feature processing layers, each feature processing layer of the encoder includes a multi-headed self-attention machine sublayer and a feed-forward neural network sublayer, and each sublayer has access to one residual connection and layer normalization processing layer; the decoder includes a plurality of feature processing layers, each feature processing layer of the decoder including: a multi-headed self-attention machine sublayer, an encoder-decoder attention machine sublayer and a feedforward neural network sublayer, and each sublayer is connected into a residual connection and layer normalization processing layer, wherein the multi-headed self-attention machine sublayer in the decoder performs position shielding processing through a mask.

In an optional implementation manner, the second target dialogue corpus in the second dialogue corpus includes texts corresponding to M rounds of dialogue, where M is an integer greater than or equal to 2; training unit 803, when used for generating a second dialogue training data set according to each group of second dialogue corpus, is used for obtaining role attribute training information associated with an account corresponding to a second target dialogue corpus; splicing the first M-1 texts in the texts corresponding to the M rounds of dialogue included in the second target dialogue corpus through splicing marks to obtain a second spliced character sequence, adding a start mark at the beginning of the second spliced character sequence and an end mark at the end of the second spliced character sequence to obtain a second input text sequence, and encoding the second input text sequence to obtain a second training text sequence; obtaining a second output text sequence according to the last text in the texts corresponding to the M rounds of dialogue included in the second target dialogue corpus, and encoding the second output text sequence to obtain a second governor-channel text sequence corresponding to the second target dialogue corpus; a character-related tag is set to indicate whether an output of a pre-training model obtained from the intermediate model is related to character attribute training information.

In an alternative implementation, a weight predictor for predicting the resulting weight coefficients is provided in the trained intermediate model, and the encoder-decoder attention machine sublayer in the trained intermediate model is adjusted to the attention router sublayer to obtain the pre-trained model, wherein the inputs of the attention router sublayer comprise the output of the weight predictor, the output of the encoder and the output of the previous sublayer of the attention router sublayer of the decoder.

In an alternative implementation, the training unit 803 is configured, when configured to train the pre-training model obtained according to the intermediate model by using the second session training data set to obtain a session model, to use the second session training data set as an input of the pre-training model obtained according to the intermediate model, and to modify model parameters of the pre-training model by using the loss calculation results obtained by the first loss function and the second loss function, so as to train to obtain the session model; wherein the first penalty function is used to determine a difference between the output of the weight predictor in the pre-training model and the role-related label; the second loss function is used for determining a difference between a second supervision text sequence and a sequence obtained by autoregressive decoding of training text feature vectors and training character feature vectors output by an encoder of the pre-training model; the training text feature vector is obtained by processing a second training text sequence obtained from a group of second dialogue corpus by an encoder of the pre-training model, and the training character feature vector is obtained by processing the character attribute training information by the encoder of the pre-training model.

Based on the same inventive concept, the principle and the beneficial effects of the information processing device provided in the embodiments of the present application for solving the problems are similar to those of the methods in the embodiments of the methods described in the present application, and may be referred to for brevity and description, and are not repeated here.

Referring to fig. 9, a schematic structural diagram of a computing device according to an embodiment of the present application is shown. The computing device may include an interaction interface 901, a storage 902, and a processor 903. Other functional modules are also included, such as power supply units, cameras, network interfaces, communication interfaces, etc.

The interaction interface 901 is used for information interaction; the device can be a touch screen, a physical key and other components, and is used for receiving various operations of an operation object, such as text, voice input and other operations, presenting corresponding information to the object, including displaying text, playing audio and video and the like, and realizing other functions.

The storage 902 may include volatile memory (RAM), such as random-access memory (RAM); the storage 902 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a Solid State Drive (SSD), etc.; the storage 902 may also include a combination of the types of memory described above.

The processor 903 may be a central processing unit (central processing unit, CPU). The processor 903 may further comprise a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or the like. The PLD may be a field-programmable gate array (FPGA), general-purpose array logic (generic array logic, GAL), or the like.

Optionally, the storage 902 is also used for storing a computer program. The processor 903 may execute these computer programs to implement the various methods, steps as mentioned in the previous embodiments of the present application.

In an alternative embodiment, the processor 903 is configured to obtain interactive content information and character attribute information; acquiring text feature vectors and character feature vectors, wherein the text feature vectors are obtained by encoding interactive content information, and the character feature vectors are obtained by encoding character attribute information associated with the interactive content information; based on the text feature vector and the character feature vector, predicting to obtain a weight coefficient, wherein the weight coefficient is used for representing dialogue content information corresponding to the interaction content information to be obtained and the association degree of the dialogue content information and the character attribute information; and according to the predicted weight coefficient, performing dialogue prediction processing on the text feature vector and the character feature vector to obtain dialogue content information corresponding to the interaction content information.

In an alternative implementation, the processor 903 is further configured to detect an interaction scenario type of an account that initiates the interaction content information; if the interaction scene type is detected to be the first type, acquiring interaction content information, and generating dialogue feedback information according to the interaction content information; and if the interaction scene type is detected to be the second type, acquiring interaction content information and role attribute information.

In an alternative implementation, the processor 903, when configured to obtain the interactive content information and the character attribute information, is configured to obtain chat information through an interactive session page, where the chat information includes: one or more of characters input by the object and obtained through the interactive session page, characters obtained by converting audio data received through the interactive session page, and characters obtained by identifying image data received through the interactive session page; obtaining interactive content information according to the chat information; and acquiring character attribute information, wherein the character attribute information is associated with an account initiating the interactive content information, and the character attribute information is obtained by analyzing and processing one or more of historical interaction data, account registration data, interaction behavior data and social relationship data submitted through the account for character attribute analysis, or is preset.

In an alternative implementation, the dialogue content information corresponding to the interaction content information is generated through a dialogue model; the processor 903 is configured to obtain a first dialogue corpus, and generate a first dialogue training data set according to each group of the first dialogue corpus; training the initial model by using the first dialogue training data set to obtain an intermediate model; acquiring second dialogue corpus used for reflecting character attribute information, and generating a second dialogue training data set according to each group of second dialogue corpus; training the pre-training model obtained according to the intermediate model by using the second dialogue training data set to obtain a dialogue model.

In an optional implementation manner, the first target dialogue corpus in the first dialogue corpus includes texts corresponding to N rounds of dialogues, where N is an integer greater than or equal to 2; the processor 903 is configured to, when generating a first dialog training data set according to each set of first dialog corpus, splice first N-1 texts in texts corresponding to N rounds of dialogues included in a first target dialog corpus through splice marks to obtain a first spliced character sequence, add a start mark at the beginning of the first spliced character sequence and an end mark at the end of the first spliced character sequence to obtain a first input text sequence corresponding to the first target dialog corpus, and encode the first input text sequence to obtain a first training text sequence; and obtaining a first output text sequence according to the last text in the texts corresponding to the N rounds of dialogue included in the first target dialogue corpus, and encoding the first output text sequence to obtain a first supervision text sequence corresponding to the first target dialogue corpus.

In an optional implementation manner, the second target dialogue corpus in the second dialogue corpus includes texts corresponding to M rounds of dialogue, where M is an integer greater than or equal to 2; the processor 903 is configured to obtain character attribute training information associated with an account corresponding to a second target dialogue corpus when configured to generate a second dialogue training data set according to each set of second dialogue corpuses; splicing the first M-1 texts in the texts corresponding to the M rounds of dialogue included in the second target dialogue corpus through splicing marks to obtain a second spliced character sequence, adding a start mark at the beginning of the second spliced character sequence and an end mark at the end of the second spliced character sequence to obtain a second input text sequence, and encoding the second input text sequence to obtain a second training text sequence; obtaining a second output text sequence according to the last text in the texts corresponding to the M rounds of dialogue included in the second target dialogue corpus, and encoding the second output text sequence to obtain a second governor-channel text sequence corresponding to the second target dialogue corpus; a character-related tag is set to indicate whether an output of a pre-training model obtained from the intermediate model is related to character attribute training information.

In an alternative implementation, the processor 903 is configured, when configured to train the pre-training model obtained according to the intermediate model using the second session training data set to obtain a session model, to use the second session training data set as an input to the pre-training model obtained according to the intermediate model, and to modify model parameters of the pre-training model using the first loss function and the loss calculation result obtained by the second loss function to train to obtain the session model; wherein the first penalty function is used to determine a difference between the output of the weight predictor in the pre-training model and the role-related label; the second loss function is used for determining a difference between a second supervision text sequence and a sequence obtained by autoregressive decoding of training text feature vectors and training character feature vectors output by an encoder of the pre-training model; the training text feature vector is obtained by processing a second training text sequence obtained from a group of second dialogue corpus by an encoder of the pre-training model, and the training character feature vector is obtained by processing the character attribute training information by the encoder of the pre-training model.

Based on the same inventive concept, the computing device provided in the embodiments of the present application may be, for example, the computing device mentioned in the embodiments of fig. 1, etc., and the principle and beneficial effects of solving the problem are similar to those of each method in the embodiments of the present application, and may be referred to the principle and beneficial effects of the embodiments of the method, and are not repeated herein for brevity.

Embodiments of the present application also provide a computer program product comprising a computer program or computer instructions stored in a computer-readable storage medium. The computer program or computer instructions, when executed by a processor, implement the methods described in the method embodiments described above.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.

Those skilled in the art will appreciate that the processes implementing all or part of the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the processes of the embodiments of the methods as above when executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.

The above disclosure is only a few examples of the present application, and it is not intended to limit the scope of the claims, and those of ordinary skill in the art will understand that all or a portion of the above-described embodiments may be implemented and that equivalent changes may be made in the claims of the present application while still falling within the scope of the present invention.

Claims

1. An information processing method, characterized by comprising:

acquiring interactive content information and role attribute information;

predicting to obtain a weight coefficient based on the text feature vector and the character feature vector, wherein the weight coefficient is used for representing dialogue content information corresponding to the interaction content information to be obtained and the association degree with the character attribute information;

and according to the predicted weight coefficient, performing dialogue prediction processing on the text feature vector and the role feature vector to obtain dialogue content information corresponding to the interaction content information.

2. The method of claim 1, wherein before the acquiring the interactive content information and the character attribute information, further comprising:

detecting the type of an interaction scene of an account initiating interaction content information;

if the interaction scene type is detected to be the first type, acquiring interaction content information, and generating dialogue feedback information according to the interaction content information;

and if the interaction scene type is detected to be the second type, triggering and executing to acquire the interaction content information and the role attribute information.

3. The method of claim 1, wherein the acquiring interactive content information and character attribute information comprises:

the chat information is obtained through the interactive session page, and the chat information comprises: one or more of characters input by an object and obtained through the interactive session page, characters obtained by converting audio data received through the interactive session page, and characters obtained by identifying image data received through the interactive session page;

obtaining interactive content information according to the chat information;

and acquiring character attribute information, wherein the character attribute information is associated with an account initiating the interactive content information, and the character attribute information is obtained by analyzing and processing one or more of historical interaction data, account registration data, interaction behavior data and social relationship data submitted through the account for character attribute analysis, or is preset.

4. A method according to any one of claims 1-3, wherein the session content information corresponding to the interactive content information is generated by a session model; the method further comprises the steps of:

acquiring first dialogue corpus, and generating a first dialogue training data set according to each group of first dialogue corpus;

training the initial model by using the first dialogue training data set to obtain an intermediate model;

acquiring second dialogue corpus used for reflecting character attribute information, and generating a second dialogue training data set according to each group of second dialogue corpus;

training a pre-training model obtained according to the intermediate model by using a second dialogue training data set to obtain the dialogue model.

5. The method of claim 4, wherein the first dialog training data in the first set of dialog training data comprises: a text sequence pair consisting of a first training text sequence and a first supervision text sequence obtained from a set of first dialogue corpora;

the second dialog training data in the second set of dialog training data comprises: character attribute training information, a second training text sequence and a second supervision text sequence obtained from a set of second dialogue corpora, and character correlation tags for indicating whether an output of a pre-training model obtained from the intermediate model is correlated with the character attribute training information.

6. The method of claim 4, wherein the text feature vector and the character feature vector are generated by an encoder in a dialog model, the input of the encoder comprising: an input vector obtained according to the interactive content information and an input vector obtained according to the character attribute information;

the weight coefficient is predicted by a weight predictor in the dialogue model, and the input of the weight predictor comprises: the text feature vector and the character feature vector;

the dialogue content information is obtained according to the output of a decoder in the dialogue model, and the input of the decoder is the weight coefficient, the text feature vector and the character feature vector.

7. The method of claim 5, wherein a first target dialog corpus in the first dialog corpus comprises text corresponding to N rounds of dialog, N being an integer greater than or equal to 2; the generating a first dialogue training data set according to each group of first dialogue corpus comprises the following steps:

splicing the first N-1 texts in the texts corresponding to the N rounds of dialogue included in the first target dialogue corpus through splicing marks to obtain a first spliced character sequence, adding a start mark at the beginning of the first spliced character sequence and an end mark at the end of the first spliced character sequence to obtain a first input text sequence corresponding to the first target dialogue corpus, and encoding the first input text sequence to obtain a first training text sequence;

And obtaining a first output text sequence according to the last text in the texts corresponding to the N rounds of dialogue included in the first target dialogue corpus, and encoding the first output text sequence to obtain a first supervision text sequence corresponding to the first target dialogue corpus.

8. The method of claim 7, wherein the initial model is a self-attention mechanism based encoder-decoder architecture, wherein the encoder comprises a plurality of feature processing layers, each feature processing layer of the encoder comprises a multi-headed self-attention mechanism sublayer and a feed-forward neural network sublayer, and each sublayer has access to one residual connection and layer normalization processing layer;

the decoder includes a plurality of feature processing layers, each feature processing layer of the decoder including: a multi-headed self-care machine sublayer, an encoder-decoder attention machine sublayer and a feedforward neural network sublayer, and each sublayer is connected into a residual connection and layer normalization processing layer, wherein the multi-headed self-care machine sublayer in the decoder performs position shielding processing through a mask.

9. The method of claim 5, wherein the second target dialogue corpus in the second dialogue corpus comprises text corresponding to M rounds of dialogue, M being an integer greater than or equal to 2; the generating a second dialogue training data set according to each group of second dialogue corpus comprises the following steps:

Acquiring character attribute training information associated with an account corresponding to the second target dialogue corpus;

splicing the first M-1 texts in the texts corresponding to the M rounds of dialogue included in the second target dialogue corpus through splicing marks to obtain a second spliced character sequence, adding a start mark at the beginning of the second spliced character sequence and an end mark at the end of the second spliced character sequence to obtain a second input text sequence, and encoding the second input text sequence to obtain a second training text sequence;

obtaining a second output text sequence according to the last text in the texts corresponding to the M rounds of dialogue included in the second target dialogue corpus, and encoding the second output text sequence to obtain a second governor-channel text sequence corresponding to the second target dialogue corpus;

setting a character correlation label for indicating whether the output of the pre-training model obtained according to the intermediate model is correlated with character attribute training information.

10. The method of claim 9, wherein a weight predictor for predicting derived weight coefficients is provided in the trained intermediate model, and an encoder-decoder attention machine sublayer in the trained intermediate model is adapted to an attention router sublayer to obtain the pre-trained model, wherein inputs to the attention router sublayer comprise outputs of the weight predictor, outputs of the encoder and outputs of a previous sublayer of the attention router sublayer of the decoder.

11. The method of claim 9, wherein training the pre-training model from the intermediate model using the second set of dialogue training data to obtain the dialogue model comprises:

using a second dialogue training data set as an input of a pre-training model obtained according to the intermediate model, and modifying model parameters of the pre-training model by using loss calculation results obtained by a first loss function and a second loss function so as to train and obtain the dialogue model;

wherein the first penalty function is used to determine a difference between the output of the weight predictor in the pre-training model and the role-related label;

the second loss function is used for determining the difference between the second supervision text sequence and a sequence obtained by autoregressive decoding of training text feature vectors and training character feature vectors output by an encoder of the pre-training model;

the training text feature vector is obtained by processing the second training text sequence obtained from a group of second dialogue corpus by an encoder of the pre-training model, and the training character feature vector is obtained by processing the character attribute training information by the encoder of the pre-training model.

12. An information processing apparatus, characterized by comprising:

the interface unit is used for carrying out information interaction;

the processing unit is used for acquiring the interactive content information and the character attribute information; acquiring text feature vectors and character feature vectors, wherein the text feature vectors are obtained by encoding interactive content information, and the character feature vectors are obtained by encoding character attribute information associated with the interactive content information; predicting to obtain a weight coefficient based on the text feature vector and the character feature vector, wherein the weight coefficient is used for representing dialogue content information corresponding to the interaction content information to be obtained and the association degree with the character attribute information; and according to the predicted weight coefficient, performing dialogue prediction processing on the text feature vector and the role feature vector to obtain dialogue content information corresponding to the interaction content information.

13. A computing device comprising an interactive interface, a storage device, and a processor;

the interaction interface is used for carrying out information interaction;

the storage device has stored therein a computer program, and the processor executes the computer program stored in the storage device to implement the method of any of claims 1-11.

14. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause a computing device having the processor to implement the method of any of claims 1-11.

15. A computer program product, characterized in that the computer program product comprises a computer program or computer instructions which, when executed by a processor, implement the method according to any of claims 1-11.