CN111967224A

CN111967224A - Method and device for processing dialog text, electronic equipment and storage medium

Info

Publication number: CN111967224A
Application number: CN202010832715.9A
Authority: CN
Inventors: 陶健
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2020-11-20

Abstract

The embodiment of the application discloses a processing method and device of a dialog text, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a first text sequence; the method comprises the steps of obtaining text characteristics of a first text sequence through a text processing model, obtaining a second text sequence according to the text characteristics, obtaining a reference digestion text corresponding to a dialog text of a latest round of dialog by the second text sequence, obtaining first self-attention characteristics of the first text sequence by the encoder, obtaining second self-attention characteristics by the decoder according to the first self-attention characteristics, and pre-training the encoder and the decoder through a natural language processing model based on a self-attention mechanism. The processing method, the processing device, the electronic equipment and the storage medium for the dialog text disclosed by the embodiment of the application can accurately identify the object actually indicated by the pronouns in the dialog text, and improve the processing capacity of the dialog text.

Description

Method and device for processing dialog text, electronic equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a device for processing a dialog text, an electronic device and a storage medium.

Background

In the field of artificial intelligence technology, Natural Language Processing (NLP) has been one of the major research directions. In the last few years, with the rapid development of technologies in the field of NLP, automatic reply of a simple one-turn dialog by an electronic device (e.g., a smartphone, a smart wearable device, etc.) has advanced greatly. However, in a more complex conversation scenario, since pronouns frequently appear in the conversation content, it is difficult for the electronic device to effectively understand the real intention of the user like a human being. How to enable the electronic equipment to more accurately identify the true meaning represented by the pronouns in the conversation content becomes a problem which needs to be solved urgently.

Disclosure of Invention

The embodiment of the application discloses a processing method and device of a dialog text, electronic equipment and a storage medium, which can accurately identify an object actually referred by a pronoun in the dialog text and improve the processing capacity of the dialog text.

The embodiment of the application also discloses a training method and device of the text processing model, electronic equipment and a storage medium, which can be used for training to obtain a text processing module capable of accurately identifying the actual reference object of the pronouns in the dialog text, so that the processing capacity of the dialog text is improved.

The embodiment of the application discloses a method for processing a dialog text, which comprises the following steps:

acquiring a first text sequence, wherein the first text sequence at least comprises a dialog text of a latest dialog round;

acquiring text characteristics of the first text sequence through a text processing model, and acquiring a second text sequence according to the text characteristics, wherein the second text sequence is a reference resolution text corresponding to the dialog text of the latest round of dialog,

the text feature comprises a first self-attention feature and a second self-attention feature, the text processing model comprises an encoder and a decoder, the encoder is used for obtaining the first self-attention feature of the first text sequence, the decoder is used for obtaining the second self-attention feature according to the first self-attention feature, and the encoder and the decoder are obtained through pre-training of a natural language processing model based on a self-attention mechanism.

The embodiment of the application discloses a processing device of a dialog text, which comprises:

the text sequence acquisition module is used for acquiring a first text sequence, wherein the first text sequence at least comprises a dialog text of the latest dialog round;

a text processing module, configured to obtain a text feature of the first text sequence through a text processing model, and obtain a second text sequence according to the text feature, where the second text sequence is a reference resolution text corresponding to the dialog text of the latest round of dialog,

The embodiment of the application discloses an electronic device, which comprises a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor is enabled to realize the processing method of the dialog text.

An embodiment of the application discloses a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the processing method of dialog text as described above.

The method, the device, the electronic equipment and the storage medium for processing the dialog text, which are disclosed by the embodiment of the application, obtain a first text sequence at least comprising the dialog text of the latest round of dialog, acquiring the text characteristics of the first text sequence through a text processing model, obtaining a reference resolution text corresponding to the dialog text of the latest dialog according to the text characteristics, wherein the text feature comprises a first self-attention feature and a second self-attention feature, the text processing model comprises an encoder and a decoder, the encoder is used for acquiring the first self-attention feature of the first text sequence, the decoder is used for acquiring the second self-attention feature according to the first self-attention feature, the text processing model can process one or more rounds of dialogue reference resolution tasks, can accurately identify the object actually referred by the pronouns in the dialogue text, and improves the comprehension capability of the electronic equipment on the dialogue text. In addition, because the encoder and the decoder are obtained by pre-training the natural language processing model based on the attention mechanism, the model complexity of the text processing model can be reduced, and the performance of the text processing model is improved.

The embodiment of the application discloses a training method of a text processing model, which comprises the following steps:

pre-training an encoder in a text processing model through a natural language processing model based on an attention mechanism so as to enable the pre-trained encoder to be fitted with the natural language processing model;

pre-training a decoder in the text processing model through the natural language processing model so as to fit the pre-trained decoder with the natural language processing model;

in the text processing model, a first self-attention sample feature of a first text sample is obtained through an encoder obtained through the pre-training, a second self-attention sample feature of the first text sample is obtained through a decoder obtained through the pre-training, and a prediction reference resolution text is obtained according to the first self-attention sample feature and the second self-attention sample feature;

adjusting parameters of the text processing model based on the predicted reference resolution text and the actual reference resolution text of the first text sample.

The embodiment of the application discloses a training device of a text processing model, comprising:

the first pre-training module is used for pre-training an encoder in a text processing model through a natural language processing model based on an attention mechanism so as to enable the pre-trained encoder to be fitted with the natural language processing model;

the second pre-training module is used for pre-training a decoder in the text processing model through the natural language processing model so as to enable the pre-trained decoder to be fitted with the natural language processing model;

the sample processing module is used for obtaining a first self-attention sample characteristic of a first text sample through the encoder obtained through the pre-training in the text processing model, obtaining a second self-attention sample characteristic of the first text sample through the decoder obtained through the pre-training, and obtaining a prediction reference resolution text according to the first self-attention sample characteristic and the second self-attention sample characteristic;

a parameter adjustment module to adjust parameters of the text processing model based on the predicted reference resolution text and the actual reference resolution text of the first text sample.

The embodiment of the application discloses an electronic device, which comprises a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor is enabled to realize the training method of the text processing model.

The embodiment of the application discloses a computer readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to realize the training method of the text processing model.

The embodiment of the application discloses a training method, a device, an electronic device and a storage medium of a text processing model, wherein a coder and a decoder in the text processing model are pre-trained respectively through a natural language processing model based on a self-attention mechanism, so that the coder and the decoder obtained through pre-training are both fitted with the natural language processing model, after pre-training, a first self-attention sample characteristic of a first text sample is obtained through the coder obtained through pre-training in the text processing model, a second self-attention sample characteristic of the first text sample is obtained through the decoder obtained through pre-training, a predicted reference resolution text is obtained according to the first self-attention sample characteristic and the second self-attention sample characteristic, parameters of the text processing model are adjusted based on the predicted reference resolution text and an actual reference resolution text of the first text sample, the method comprises the steps of pre-training an encoder and a decoder by using a natural language processing model based on a self-attention mechanism, and then performing formal training by using a text sample, so that the encoder and the decoder can quickly learn the text processing capability of the natural language processing model, the model complexity of the text processing model is reduced while the capability of the text processing model for processing the reference resolution task of the dialog text is improved, and the performance of the text processing model is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1A is a diagram illustrating an exemplary implementation of a method for processing dialog text;

FIG. 1B is a diagram illustrating an electronic device performing a reference resolution task, in accordance with an embodiment;

FIG. 2 is a flowchart of a method of processing dialog text in one embodiment;

FIG. 3 is an architectural diagram of a text processing model in one embodiment;

FIG. 4 is a flowchart of a processing method of a dialog text in another embodiment;

FIG. 5 is a flow diagram of obtaining a first self-attention feature of a first text sequence by an encoder in one embodiment;

FIG. 6 is a block diagram of a text processing model according to another embodiment;

FIG. 7 is a flowchart of a processing method of a dialog text in another embodiment;

FIG. 8A is a diagram illustrating pre-training of an encoder in one embodiment;

FIG. 8B is a diagram illustrating pre-training of a decoder in one embodiment;

FIG. 9 is a diagram illustrating a formal training phase of a text processing model in one embodiment;

FIG. 10 is a flowchart of a method of training a text processing model in one embodiment;

FIG. 11 is a block diagram of a device for processing dialog text in one embodiment;

FIG. 12 is a block diagram of an apparatus for training a text processing model in one embodiment;

FIG. 13 is a block diagram showing the structure of an electronic apparatus according to an embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the examples and figures of the present application are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, a first self-attention feature may be referred to as a second self-attention feature, and similarly, a second self-attention feature may be referred to as a first self-attention feature, without departing from the scope of the present application. The first self-attention feature and the second self-attention feature are both self-attention features, but they are not the same self-attention feature.

The reference resolution is one of the major tasks in the field of NLP, and is mainly applied to scenes such as multi-turn conversation, chat semantic understanding, long text semantic understanding in articles and the like. For scenes such as multi-turn conversations, chat semantic understanding and the like, because semantic relationships between events and entities concerned by users are often scattered at different positions of a conversation text, the involved entities can usually have a plurality of different expression modes, for example, entities in a certain semantic relationship may appear in pronouns, and pronouns, zero-pronouns and the like appearing in conversations are required to be subjected to pronouns resolution tasks in order to more accurately understand the real meaning expressed by users in the multi-turn conversations. The meaning resolution task has important significance for functions of intelligent response, voice assistant and the like of the electronic equipment.

In the related art, the resolution task is mainly performed in two ways:

the first method is as follows: and performing a reference resolution task by using the rule. Logic rules are formed through a large amount of manually constructed domain and language knowledge, and the text is subjected to reference resolution by using the logic rules. The method needs a great deal of manual participation, the automation degree of the system is very low, and the portability is poor.

The second method comprises the following steps: and performing a reference resolution task through a multitask model. The multitask model can be a Convolutional Neural Network (CNN) model, and the multitask model can obtain pronouns and candidate entities in a text, and links the relation between the entities and the pronouns through one-to-one matching and sorting.

The embodiment of the application provides a processing method and device of a dialog text, an electronic device and a storage medium, which can automatically process a reference resolution task of one or more rounds of dialog, can accurately identify an object actually referred by a pronoun in the dialog text, and improve the comprehension capability of the electronic device on the dialog text.

Fig. 1A is an application scenario diagram of a processing method of a dialog text in an embodiment. As shown in FIG. 1A, a user may engage in a conversation with the electronic device 10, which may be a conversation with another electronic device or a conversation with a smart phone on the electronic device 10. The electronic device 10 may acquire dialog text for one or more dialog rounds, input the acquired dialog text into a trained text processing model, perform a reference resolution task through the trained text processing model, and output a reference resolution text for the dialog text for the one or more dialog rounds. The electronic device 10 can be helped to understand the dialog content more accurately and quickly.

FIG. 1B is a diagram illustrating an electronic device performing a reference resolution task, in accordance with an embodiment. As shown in fig. 1B, user a has conducted 3 rounds of conversations with user B via the electronic device 10, round 1: little cloth you know how to do the little art, round 2: as if heard but not recognized, round 3: i do not know him. The electronic device 10 may input all 3 pairs of dialog texts into the trained text processing model, obtain the first and second self-attention features of the 3 pairs of dialog texts through an encoder and a decoder of the text processing model, respectively, and output a reference resolution text "i don't know the art" based on the first and second self-attention features, which may be a dialog text only for the last round of dialog (i.e., the 3 rd round).

As shown in fig. 2, in an embodiment, a method for processing a dialog text is provided, which is applicable to electronic devices such as a mobile phone, an intelligent wearable device, a tablet Computer, and a Personal Computer (PC), and the embodiment of the present invention is not limited thereto. The method comprises the following steps:

step 210, a first text sequence is obtained, where the first text sequence at least includes a dialog text of the latest dialog turn.

When the electronic device is triggered to enter a conversation scene, the conversation text of one or more turns of conversations in the conversation scene can be acquired to perform a reference resolution task. Alternatively, the conversation scenario may include, but is not limited to, a chat scenario, an auto-answer scenario, a voice analysis scenario, and the like. At least the dialog text of the latest dialog turn in the dialog scene can be obtained, and the dialog text of the latest dialog turn can refer to the dialog text input by the user last time, or the dialog text received last time from other electronic equipment, or the dialog text automatically responded last by the electronic equipment last time, and the like. After the dialog text of the last round of dialog, no new dialog text content is generated for a while.

The electronic equipment can generate a first text sequence according to the acquired dialog texts, and can sequentially splice the dialog texts from far to near according to the dialog time corresponding to each acquired pair of dialog texts to obtain the first text sequence. In some embodiments, the first text sequence may include all of the characters contained in all of the captured dialog text, which may include, but is not limited to, words, symbols, numbers, and the like. In some embodiments, the obtained dialog text may also be processed to obtain a first text sequence, and the first text sequence may only include partial characters included in all the obtained dialog text. For example, punctuation marks and the like contained in the dialog text can be eliminated, and the first text sequence is obtained. For example, the captured dialog text includes the 1 st turn of the dialog "how did you eat breakfast? ", 2 nd wheel conversation" does not eat woolen! ", after eliminating punctuation marks contained in the dialog text, a first text sequence" how you eat breakfast and do not eat woollen yet "is obtained, but is not limited thereto.

Step 220, obtaining the text characteristics of the first text sequence through the text processing model, and obtaining a second text sequence according to the text characteristics, wherein the second text sequence is a reference resolution text corresponding to the dialog text of the latest round of dialog.

In the implementation of the present application, the text processing model may be a network model based on a Self-Attention (Self-Attention) mechanism, which is a mechanism for simulating human visual Attention, and human visual perception generally does not mean that a scene looks all at once from head to tail, but rather tends to observe a specific part of Attention according to needs. And when people find that a scene often appears something they want to observe in a certain part, people can learn to pay attention to the part when similar scenes reappear in the future.

The text processing model may extract text features of the first text sequence, which may include a first self-attention feature and a second self-attention feature. The text processing model may include an encoder and a decoder, and a first text sequence may be input to the encoder, and the input first text sequence may be converted to a second text sequence output by the encoder and the decoder. Wherein the encoder is operable to obtain a first self-attention feature of the first text sequence. The encoder may include at least one self-attention layer, and after the first text sequence is input into the encoder, the encoder may calculate self-attention of each character in the first text sequence through the self-attention layer when encoding the first text sequence, so as to obtain a first self-attention feature of the first text sequence. Wherein the self-attention of a character is understood to be the connection between the character and other characters in the first text sequence entered, i.e. the "attention" of the character to other characters.

The decoder may be configured to obtain a second self-attention feature from the first self-attention feature. In some embodiments, the decoder may include at least two self-attention layers, where a first self-attention layer is used to calculate self-attention for each character in a text sequence input to the decoder (which may be a text sequence output from a previous round), and a second self-attention layer is used to focus on a portion of the first text sequence. After the encoder obtains the first self-attention feature of the first text sequence, the first self-attention feature may be input to a second self-attention layer in the decoder, and the self-attention of the character obtained by the first self-attention layer may also be input to the second self-attention layer, so as to associate the two to obtain the second self-attention feature.

The text processing model may derive a second text sequence from the first self-attention feature and the second self-attention feature output by the encoder, and the second text sequence may be a reference digest text of the dialog text for the latest dialog turn. Alternatively, the second text sequence may be empty if the dialog text of the last round of dialog does not refer to the resolved text, i.e., no reference resolution is required. No matter how many rounds of dialog texts are included in the first text sequence input to the text processing model, the text processing model only outputs the reference resolution text aiming at the latest round of dialog texts, and effective information in the multi-round dialogs is compressed into the one-round dialog texts, so that resources can be utilized more efficiently.

The text processing model in the above embodiment is exemplarily described with reference to fig. 3. FIG. 3 is an architectural diagram of a text processing model in one embodiment. As shown in fig. 3, the electronic device inputs a first text sequence to the text processing model, the text processing model performs a reference resolution task based on the first text sequence, and outputs a second text sequence. The text processing model may include an encoder 310, a decoder 320, and an output layer 330. The encoder 310 may include a self-attention layer and a feedforward layer (i.e., a feedforward neural network), and after the first text sequence is input into the encoder 310, the self-attention of each character in the first text sequence may be calculated by the self-attention layer and the calculation result may be input into the feedforward layer, and the feedforward layer may obtain the first self-attention feature based on the input calculation result and output the first self-attention feature into the output layer 330 and the decoder 320.

The output layer 330 may obtain the current output sequence according to the first self-attention feature output by the encoder 310 and the second self-attention feature output by the decoder 320 each time, and may use the current output sequence as the input of the next decoder 320 to implement the process of the regression loop. The decoder 320 may include a self-attention layer 322, a self-attention layer 324, and a feedforward layer, the decoder 320 may input the output sequence output by the last output layer 330 into the self-attention layer 322, calculate the self-attention of each character in the output sequence through the self-attention layer 322 to obtain a first calculation result, and input the first calculation result into the self-attention layer 324, where the first calculation result may be a relationship between each character in the output sequence and other characters in the output sequence. The self-attention layer 324 may correlate the first self-attention feature output by the encoder 310 and recalculate the self-attention of the characters in the output sequence based on the first calculation, resulting in a second calculation that may output a link between each character in the sequence and other characters in the output sequence, as well as the characters in the first text sequence. The self-attention layer 324 may input the second calculation result into the feedforward layer, obtain the second self-attention feature of the current time through the feedforward layer, and input the second self-attention feature of the current time into the output layer 330. The output layer 330 may obtain the output sequence of this time according to the first self-attention feature output by the encoder 310 and the second self-attention feature output by the decoder 320 this time, and repeatedly perform the above processes until the output layer 330 outputs the complete second text sequence.

In the embodiment of the present application, both the decoder and the Encoder of the text processing model may be obtained by pre-training a natural language processing model based on a self-attention mechanism, and optionally, the natural language processing model may be a BERT (Bidirectional Encoder retrieval from transducers) model, and the BERT model is a semantic analysis model implemented by a transducer-based Encoder, but is not limited thereto. The natural language processing model can be trained through a large number of text samples, so that the natural language processing model has the capability of highly understanding text semantics. And respectively pre-training a decoder and an encoder in the text processing model through the complete natural language processing model, so that the decoder and the encoder obtained by pre-training are fitted with the natural language processing model. After a decoder and an encoder in the text processing model are pre-trained by using the complete natural language processing model, the text processing model can be formally trained by using the text sample, so that the predicted reference resolution text output by the text processing model meets the output expectation, and the capability of the text processing model for performing a reference resolution task is improved.

In the embodiment of the application, one or more rounds of dialogue reference resolution tasks can be processed through the text processing model, the actual reference object of the pronouns in the dialogue text can be accurately identified, and the comprehension capability of the electronic equipment to the dialogue text is improved. In addition, because the encoder and the decoder are obtained by pre-training the natural language processing model based on the attention mechanism, the model complexity of the text processing model can be reduced, and the performance of the text processing model is improved.

As shown in fig. 4, in an embodiment, another processing method for dialog text is provided, which is applicable to the electronic device described above, and the method may include the following steps:

step 402, obtaining an original text, and generating a first text sequence according to the original text.

Raw text may refer to unprocessed dialog text that is obtained directly from a dialog scene. The original text may include a dialog text of the latest dialog round and N pairs of dialog texts before the latest dialog round, where N may be a natural number, and N may be set according to actual requirements or adjusted according to different dialog scenarios.

As an embodiment, the electronic device may obtain the dialog text of each turn in the current conversation scene as the original text, where the original text includes all the dialog text from the beginning to the current conversation scene. As another embodiment, the electronic device may also acquire, as the original text, the dialog text within a certain time period from the current time, for example, all the dialog text within 5 minutes before the current time may be acquired as the original text. As another embodiment, the electronic device may also obtain the dialog text of the fixed number of turns of dialog closest to the current time as the original text, for example, may obtain the dialog text of the last 3 turns of dialog as the original text, and the like, but is not limited thereto.

In some embodiments, each time the electronic device acquires the dialog text of a new round of dialog, the electronic device may acquire the original text containing the dialog text of the new round of dialog again, generate a first text sequence, and perform a reference resolution task on the first text sequence through a text processing model. Taking the dialog scenario shown in fig. 1B as an example, when the electronic device 10 acquires the 1 st round of dialog "how you know the small art", a corresponding first text sequence "how you know the small art" may be generated and refers to a resolution task; when the 2 nd turn of the dialog "hear as if but not know" is acquired, a corresponding first text sequence "how do you know the art as if they hear but not know" may be generated and referred to as a resolution task ", and when the 3 rd turn of the dialog" do not know as if do you know as if they hear but not know as if "is acquired, a corresponding first text sequence" how do you know as if they hear but not know as if i know "may be generated and referred to as a resolution task.

When the reference resolution task is carried out, the dialogue text of the last round of dialogue is considered, N pairs of dialogue texts before the last round of dialogue are also considered comprehensively, and the output reference resolution text can be more accurate by combining historical dialogue information.

In some embodiments, after the original text is obtained, the original text may be normalized to obtain a first text sequence. The normalization processing of the original text may refer to a processing mode of mapping characters in the original text that do not meet the rules into unicodes. The electronic device may recognize a first type of characters contained in the original text, and optionally, the first type may be set according to actual requirements, for example, the first type may be a type of numbers, symbols, and the like. In some embodiments, when the original text is chinese text, the first type of character may be a non-chinese type of character, such as numeric, symbolic, and alphabetical types, because non-chinese characters generally do not contain too important information.

After the first type of characters included in the original text are recognized, the first type of characters included in the original text can be deleted to obtain a first text sequence, or the first type of characters included in the original text can be replaced by target characters to obtain the first text sequence. The target character can be set according to actual requirements, and is not limited herein. For example, the original text is "Hi, where you are going", the non-chinese characters "H", "i" and "in the original text may be deleted to obtain the first text sequence" where you are going ", and the non-chinese characters" H "," i "and" may be replaced with the target character "to obtain the first text sequence" where you are going ", but not limited thereto. By the method, characters without important information in the original text can be removed, so that the data processing amount of the text processing model is reduced, the text processing efficiency is improved, the interference influence caused by the characters without the important information can be reduced, and the accuracy of the output reference resolution text is improved.

After obtaining the first text sequence, the electronic device may input the first text sequence into a text processing model, and the text processing model may obtain text features of each character in the first text sequence. In one embodiment, the text features may include a round feature of the character, which may be used to characterize the corresponding number of turns of dialog of the character in the original text. After the electronic device acquires the original text, corresponding round number features can be added to each character contained in the original text. For example, the original text includes a 1 st conversation of "you eat a breakfast" and a 2 nd conversation of "do not eat woolen", then a round number feature 1 may be added for "you", "eat", "early", "lunch", "eaten", and "do", respectively, and a round number feature 2 may be added for "do not", "eat", and "woolen", respectively. The round number feature may represent a precedence order of the dialog text to which the characters belong in the dialog scene.

In some embodiments, the text features of the individual characters may also include position information for the characters, which may be used to characterize the position of the characters in the dialog text to which they pertain. After the electronic device obtains the original text, the electronic device can label the position information for each character contained in the original text. For the dialog text of each turn of dialog contained in the original text, the position of each character in the dialog text can be determined and labeled. For example, the original text includes "how you have breakfast" in the 1 st round of conversation and "don't eat woolen" in the 2 nd round of conversation, where the position information of "you", "eat", "early", "lunch", "how" in the 1 st round of conversation may be 1, 2, 3, 4, 5, 6, respectively, and the position information of "don't eat", "woolen" in the 2 nd round of conversation may be 1, 2, 3, respectively.

In the embodiment of the application, the characteristics of the number of turns of each character, the position information of the character in the original dialog text and the like are introduced, so that the text processing model can be helped to more accurately identify the object actually referred by the pronouns in the dialog text, and the comprehension capability of the electronic equipment on the dialog text is improved.

A first self-attention feature of the first text sequence is obtained by the encoder, step 404.

The electronic device can input the first text sequence into an encoder of the text processing model and obtain, by the encoder, a first self-attention feature of the first text sequence. As shown in FIG. 5, in one embodiment, step 404 includes the following steps 502-506.

Step 502, obtaining an embedded vector of each character in the first text sequence.

Embedding vectors (Embedding) refers to converting each character into a vector representation of fixed length, thereby facilitating digital processing. Each character in the first text sequence may be first converted from text to a digitally represented embedded vector by a trained embedded vector generation model, and the length of the embedded vector may be a length set by an operator, such as 300, 200, 256, and the like. Alternatively, the embedded vector generation model may be obtained by training a constructed training data set, where the training data set may include a large number of characters, and each character corresponds to occurrence frequency, sequence number, and the like, to generate a dictionary file. Through the dictionary file, the embedded vector generation model can learn the relation between the characters, and the distance of the embedded vectors corresponding to similar characters is small.

In some embodiments, after obtaining the embedded vector of each character in the first text sequence, the round number feature and the position information of each character may be concatenated with the embedded vector, and the round number feature and the position information are added to the embedded vector to obtain the embedded vector with the round number and the position feature. The encoder and the decoder can be facilitated to clarify the positions of the characters in the dialog text through the round number characteristics and the position information when extracting the self-attention characteristics. For example, if the character "it" has an embedded vector of [0, 0.2, 0.84, 0.6], a round number characteristic of 3, and position information of 4, then [0, 0.2, 0.84, 0.6, 3, 4] can be obtained by concatenation.

Step 504, the embedded vector of each character is input into the encoder, and the embedded vector of each character is normalized by the encoder.

In some embodiments, the encoder may further include a normalization layer operable to normalize the input embedded vector for each character to conform the embedded vector for each character to data normalization. Unlike the encoder based on the self-attention mechanism in the related art, the normalization layer of the encoder in the embodiment of the present application may be disposed before the self-attention layer, and the self-attention of each character is calculated after normalization. Alternatively, the normalization layer may be normalized by a Z-score normalization method (normal normalization method), or by a maximum-minimum normalization method, and the like, which is not limited in the embodiment of the present application. The normalization processing is performed first, and then the self-attention of the character is calculated, so that the accuracy of the output self-attention feature can be improved.

Step 506, respectively calculating the self-attention of each character in the first text sequence according to the embedded vector after the normalization processing, and generating a first self-attention feature according to the self-attention obtained through calculation.

The self-attention layer of the encoder may mainly include the following steps when calculating the self-attention of each character:

step one, generating a query vector, a key vector and a value vector according to the embedded vector of the character, and representing the character by the query vector, the key vector and the value vector. Alternatively, three weight matrices may be constructed in advance, and the embedded vector is multiplied by the three weight matrices, respectively, to obtain a query vector, a key vector, and a value vector.

And step two, calculating the score of each character in the first text sequence. The character calculated from attention may be taken as the target character and the target character may be scored according to other characters in the first text sequence, the score indicating a degree of attention the target character has to the other characters when encoding. The score for the target character may be calculated by dot-product the key vectors for each of the other characters in the first text sequence with the query vector for the target character.

And step three, multiplying the fraction of each character by the corresponding value vector to obtain the self-attention vector of the character.

In one embodiment, the encoder may further include an addition layer, the self-attention vector of each character may be output to the addition layer after the self-attention vector of each character is calculated from the attention layer, the addition layer may add the self-attention vector of each character to the embedding vector and input the added vector into the feedforward layer, and the first self-attention feature is obtained through the feedforward layer.

Step 406, the first self-attention feature is input into a decoder, and a second self-attention feature is obtained by the decoder according to the first self-attention feature.

In one embodiment, the architecture of the decoder may include a first normalization layer-a first self-attention layer-a first addition layer-a second normalization layer-a second self-attention layer-a second addition layer-a feed-forward layer. After obtaining the output sequence of the last output, the embedded vector of each character of the output sequence can be obtained and input into the decoder. The first normalization layer of the decoder normalizes the input embedded vector of each character of the output sequence, and then inputs the normalized embedded vector into the first self-attention layer. A first self-attention vector for each character of the output sequence is calculated by a first self-attention layer, and then added to the embedded vector for each character of the output sequence by a first addition layer. After obtaining the first self-attention feature of the first text sequence, the encoder may input the first self-attention feature into a second normalization layer of the decoder, and perform normalization processing on the first self-attention feature and the first vector result output by the first addition layer through the second normalization layer. And calculating a second self-attention vector of each character of the output sequence by the second self-attention layer according to the first self-attention feature and the first vector result after normalization processing. And adding the second self-attention vector and the first vector result through a second addition layer to obtain a second vector result, inputting the second vector result into a feedforward layer, and finally obtaining the current second self-attention feature through the feedforward layer.

The process of calculating each layer in the decoder is similar to that of the encoder, and reference may be made to the description of the encoder in the above embodiments, which is not repeated herein.

Step 408, analyzing the first self-attention feature and the second self-attention feature through the pointer network model, and outputting the position coordinates.

The text processing model may also include a pointer network model. The pointer network model can obtain the position coordinates of the character required to be output at this time according to the first self-attention feature output by the encoder and the second self-attention feature output by the decoder at this time, and the position coordinates can be used for indicating the position of the output character in the original text. Alternatively, the position coordinates may include the number of turns and the position in the text of the dialog, for example, if the position coordinates output by the pointer network model are (2, 1), it indicates that the character output this time is the 1 st character of the 2 nd turn of dialog in the original text. The method is different from the method for searching the characters needing to be output in the dictionary prepared in advance in the related art, the position coordinates of the characters in the original text are directly output in the embodiment of the application, and the characters needing to be output can be found more quickly and accurately.

Step 410, searching characters matched with the position coordinates in the original text, and obtaining a second text sequence according to the searched characters.

After the pointer network model outputs the position coordinate obtained this time, characters matched with the position coordinate obtained this time can be searched in the original text, the searched characters are spliced to the intermediate text sequence of the last time to obtain the intermediate text sequence of this time, the intermediate text sequence of this time is used as the input of a decoder of the next time, and the second self-attention feature of the next time is output through the decoder.

In some embodiments, for the intermediate text sequence first output by the pointer network model, since only the first self-attention feature is input to the pointer network model at this time, and the decoder has not yet obtained the second self-attention feature, the pointer network model may output the intermediate text sequence including the start character. And the intermediate text sequence containing the start character is used as input for the decoder. The decoder obtains a second self-attention feature based on the input intermediate text sequence, the pointer network model obtains the position coordinates of the first character to be output by combining the first self-attention feature and the second self-attention feature obtained by the decoder latest, after the first character to be output is found, the intermediate text sequence is updated to be the start character + the first character and is input into the decoder again, and the process is circulated until the pointer network model outputs the terminator.

When the pointer network model outputs the terminator, the pointer network model outputs a complete reference resolution text, and the last obtained intermediate text sequence can be output as a second text sequence.

The process of obtaining the reference resolution text through the text processing model described in the above embodiment is exemplarily described with reference to fig. 6. FIG. 6 is a block diagram of a text processing model according to another embodiment. As shown in fig. 6, the text processing model includes an encoder 610, a decoder 620, and a pointer network model 630. The electronic device may input a first text sequence to the text processing model, obtain an embedded vector of each character after the first text sequence is subjected to Embedding, input the embedded vector of each character in the first text sequence to the encoder 610, and obtain a first self-attention feature of the first text sequence through the encoder 610. The encoder 610 may input the resulting first self-attention feature into the pointer network model 630 and the second normalization layer 622 of the decoder 620.

Take the first text sequence entered as "you eat breakfast and do not eat me at all by visual observation" as an example. The encoder 610 extracts the first self-attention feature of the first text sequence, the pointer network model 630 outputs the initiator < sos > when the second self-attention feature of the decoder 620 is not received, the intermediate text sequence is "< sos >", and the embedded vector of each character of the intermediate text sequence is input into the decoder 620, so that the decoder 620 obtains the second self-attention feature of this time. The pointer network model 630 outputs the position coordinates (3, 1) according to the first self-attention feature and the second self-attention feature obtained by the decoder 620 this time, finds that the character is "me", can splice "me" with the previous intermediate text sequence to obtain an intermediate text sequence "< sos > me" this time, and then continuously inputs the intermediate text sequence this time into the decoder 620. Finally, when it is recognized that the pointer network model 630 outputs the terminator < eos >, it is obtained that the second text sequence is "< sos > i have already had a breakfast < eos >.

In the embodiment of the application, the encoder and the decoder of the text processing model can accurately acquire the self-attention feature of each character in the input dialog text, the pointer network model can accurately obtain the position coordinate of the character to be output in the original text according to the self-attention feature of each character, the searching speed is high, and the result is more accurate. The text processing model can process one or more rounds of dialogue reference resolution tasks, accurately identifies the object actually referred by the pronouns in the dialogue text, and improves the comprehension capability of the electronic equipment on the dialogue text.

As shown in fig. 7, in an embodiment, another processing method for dialog text is provided, which is applicable to the electronic device described above, and the method may include the following steps:

and step 702, pre-training an encoder in the text processing model through a natural language processing model based on a self-attention mechanism, so that the pre-trained encoder is fitted with the natural language processing model.

The training process of the text processing model can comprise pre-training and formal training, wherein the pre-training can be performed by using a trained natural language processing model based on a self-attention mechanism, so that an encoder and a decoder have the capability of accurately extracting self-attention characteristics, and the formal training is performed by using a first text sample to train the whole text processing model, so that the text processing model has the capability of performing a reference resolution task. The pre-training process may include steps 702-704, and the formal training process may include steps 706-708.

In some embodiments, during the pre-training of the encoder in the text processing model, the electronic device may perform knowledge distillation training on the encoder through the natural language processing model with the encoder in the text processing model as a teacher model and the encoder in the text processing model as a student model, wherein knowledge distillation is a simple and effective model compression method, which may be understood as learning the student model from the teacher model to the required and important knowledge, and learning is not required for the less relevant knowledge. The natural language processing model is more complex and has a deeper network than the encoder, which is a more compact model. The natural language processing model can be supervised model training through a large number of text samples, so that the natural language processing model has the capability of highly understanding text semantics. And then training the encoder through the complete natural language processing model so that the encoder captures more effective information from the natural language processing model and learns the text processing capability of the natural language processing model, thereby realizing the model compression of the encoder.

As an embodiment, knowledge distillation training of an encoder through a natural language processing model may include: and obtaining a first characteristic of the second text sample through the natural language processing model, obtaining a second characteristic of the second text sample through the encoder, calculating a first loss of the second characteristic relative to the first characteristic, and adjusting parameters of the encoder according to the first loss.

The second text sample may be a text sample of the unlabeled result. The second text sample may be a sample of dialog text for one or more dialog rounds, etc. The second text sample may be input into a natural language processing model and an encoder, respectively, the natural language processing model may extract a first feature of the second text sample, and the encoder may extract a second feature of the second text sample. A first loss of the second feature relative to the first feature may be calculated by a loss function, which may be used to represent an error between the second feature and the first feature.

Alternatively, it may be determined whether the first loss is within a desired range, which may be a preset acceptable error range. If the first loss is within the expected range, it can be said that the encoder is highly fitted with the result output by the natural language processing model, and the encoder already has the text processing capability of the natural language processing model. If the first loss is not in the expected range, the parameters of the encoder can be adjusted according to the first loss, new second text samples are input to the natural language processing model and the encoder again, the encoder is updated iteratively until the calculated first loss is in the expected range, and then the pre-training process of the encoder is completed. In the embodiments of the present application, the first feature and the second feature may be self-attention features.

FIG. 8A is a diagram illustrating pre-training of an encoder in one embodiment. As shown in fig. 8A, the natural language processing model may be a BERT model 800, and the BERT model 800 may be used as a teacher model to pre-train the encoder 810. And inputting the second text sample without the labeling result into the BERT model 800 and the encoder 810 respectively, wherein the first characteristic of the second text sample can be obtained through the BERT model 800, and the second characteristic of the second text sample can be obtained through the encoder 810. The first and second characteristics may be input into the loss function 812, the loss of the second characteristic with respect to the first characteristic may be calculated by the loss function 812, and the parameters of the encoder 810 may be adjusted according to the loss.

Step 704, pre-training the decoder in the text processing model through the natural language processing model, so that the pre-trained decoder fits the natural language processing model.

In one embodiment, during pre-training of a decoder in a text processing model, the electronic device may perform knowledge distillation training on the decoder through a natural language processing model with the natural language processing model as a teacher model and the decoder as a student model. The decoder is trained through the complete natural language processing model, so that the decoder captures more effective information from the natural language processing model and learns the text processing capability of the natural language processing model.

As an embodiment, knowledge distillation training is performed on a decoder through a natural language processing model, and the knowledge distillation training comprises the following steps: the third feature of the third text sample is obtained through an encoder obtained through pre-training, the third feature is input into a decoder, so that the decoder obtains an output text sequence and a fourth feature aiming at the third text sample according to the third feature, then the third text sample is spliced with the output text sequence, a fifth feature of a spliced text is obtained through a natural language processing model, a second loss of the fourth feature relative to the fifth feature is calculated, and parameters of the encoder are adjusted according to the second loss.

Because the decoder needs to combine the self-attention features output by the encoder in the feature extraction process, the encoder can be pre-trained first and then the pre-trained encoder is used to cooperate with the decoder to perform pre-training. The third text sample may also be a text sample without a labeling result. The third text sample may be a sample such as a dialog text of one or more dialog turns, and the third text sample may be the same sample as the second text sample or may be a different sample. The third text sample may be input into a pre-trained decoder, and the pre-trained decoder may output a third feature of the third text sample. The third feature may be input into a decoder, and the decoder may generate an output text sequence for a third text sample according to the third feature, and obtain a fourth feature corresponding to the output text sequence.

The third text sample and the output text sequence output by the decoder may be spliced, and the spliced text may be input into the natural language processing model. Optionally, when the third text sample and the output text sequence are spliced, a sentence tag may be added in the middle of the third text sample and the output text sequence to distinguish the two parts. The natural language processing model can identify sentence labels contained in the input spliced text and acquire a fifth feature corresponding to the spliced text.

In some embodiments, since the fifth feature corresponds to the spliced text, and the spliced text includes the third text sample and the output text sequence output by the decoder, after the fifth feature of the spliced text is obtained through the natural language processing model, the fifth feature may be truncated according to the feature length of the fourth feature obtained by the decoder, so as to obtain a partial feature matched with the fourth feature after truncation, where the partial feature is the feature of the output text sequence obtained by the natural language processing model. Optionally, the natural language processing model may also directly distinguish the third text sample and the output text sequence according to the recognized sentence identifier, and then output only a part of features corresponding to the output text sequence.

A second loss of the fourth feature relative to the truncated partial feature may be calculated by a loss function, and the second loss may be used to represent an error between the fourth feature and the truncated partial feature. If the second loss is not in the expected range, the parameters of the decoder can be adjusted according to the second loss, a new third text sample is input to the trained encoder again, the training process is repeated, the decoder is updated iteratively until the calculated second loss is in the expected range, and the pre-training process of the decoder is completed. In the embodiments of the present application, the third feature, the fourth feature, the fifth feature, and the like can be self-attention features.

FIG. 8B is a diagram illustrating pre-training of a decoder, according to one embodiment. As shown in fig. 8B, the natural language processing model may be a BERT model 800, and the BERT model 800 may be used as a teacher model to pre-train the decoder 820. The third text sample without the labeling result is input to the pre-trained encoder 810, the pre-trained encoder 810 obtains a third feature of the third text sample, and the third feature is input to the decoder 820. The decoder 820 may derive an output text sequence for the third text sample and a fourth feature of the output text sequence according to the third feature. The output text sequence from decoder 820 may be concatenated with the third text sample and the concatenated text may be input into BERT model 800. The BERT model 800 analyzes the input spliced text to obtain a fifth feature of the spliced text, and may truncate the fifth feature according to a feature length of the fourth feature to obtain a partial feature which is matched with the fourth feature and is truncated in the fifth feature. The fourth feature obtained by the decoder 820 and the truncated partial feature may be input into the loss function 822, the loss of the fourth feature with respect to the truncated partial feature may be calculated by the loss function 822, and the parameter of the encoder 820 may be adjusted according to the loss. The loss function 822 in fig. 8B and the loss function 812 in fig. 8A may be the same loss function or different loss functions, and are not limited herein.

The pre-training of the coder and the decoder is carried out by utilizing the natural language processing model obtained by the complete training, so that the fitting speed of the coder and the decoder can be improved, and the model complexity of the coder and the decoder can be reduced.

Step 706, in the text processing model, a first self-attention sample feature of the first text sample is obtained through the encoder obtained through pre-training, a second self-attention sample feature of the first text sample is obtained through the decoder obtained through pre-training, and then the prediction resolution text is obtained according to the first self-attention sample feature and the second self-attention sample feature.

After the pre-training of the encoder and the decoder, the text processing model can be formally trained through a first text sample, wherein the first text sample can be a sample with a labeling result, and the labeling result can comprise an actual reference resolution text of the first text sample. The first text sample and the actual reference resolution text thereof can be input into a text processing model, and the text processing model can obtain a predicted reference resolution text of the first text sample.

In some embodiments, for each character included in the first text sample, a round number feature and position information may be labeled, the round number feature may be used to characterize the number of dialog rounds of the character in the first text sample, and the position information may be used to characterize the position order of the character in the dialog text. After the first text sample is input into the text processing model, a first self-attention sample feature of the first text sample can be obtained through an encoder obtained through pre-training, and a decoder obtained through pre-training can obtain a second self-attention sample feature based on the first self-attention sample feature. The first and second self-attention sample features may be analyzed by a pointer network model and position coordinates may be output that may be used to represent the position of a desired output character in the first text sample. And searching the character matched with the position coordinate in the first text sample, and obtaining a prediction reference resolution text according to the searched character.

After the first self-attention sample feature of the first text sample is obtained by the pre-trained encoder, the pre-trained decoder may obtain the second self-attention sample feature of this time according to the first self-attention sample feature and the intermediate prediction text output by the previous pointer network model. The pointer network model can determine the position coordinates of the characters required to be output at this time by combining the first self-attention sample characteristics and the second self-attention sample characteristics obtained at this time, searches the characters matched with the position coordinates in the first text sample, can splice the searched characters with the intermediate predicted text output at the previous time to obtain the intermediate predicted text at this time, inputs the intermediate predicted text at this time into a decoder obtained by pre-training, and continuously performs regression circulation until the pointer network model outputs a terminator. When the pointer network model outputs a terminator, the previously output intermediate predicted text may be output as the predicted reference to the first text sample to resolve the text output.

At step 708, parameters of the text processing model are adjusted based on the predicted reference resolution text and the actual reference resolution text of the first text sample.

After the pointer network model outputs the predicted reference resolution text, a third loss of the predicted reference resolution text relative to the actual reference resolution text of the first text sample can be calculated, the third loss can be used for representing the distance between the predicted reference resolution text and the actual reference resolution text, and when the third loss is smaller, the predicted reference resolution text is closer to the actual reference resolution text. The parameters of the pre-trained encoder, the pre-trained decoder, and the pointer network model may be adjusted according to the third loss, and optionally, the parameters may include weight coefficients, bias coefficients, and the like. After parameters of the encoder obtained through pre-training, the decoder obtained through pre-training and the pointer network model are adjusted, the adjusted text processing model can be continuously used for predicting a new first text sample until a third loss between the output predicted reference resolution text and the actual reference resolution text is smaller than a threshold value, and then the training of the text processing model is completed.

In some embodiments, the parameters of the pre-trained encoder and decoder may also be fine-tuned by the natural language processing model while the text processing model is being trained. After the first self-attention sample feature of the first text sample is obtained through the pre-trained encoder, the parameters of the pre-trained encoder can be subjected to fine adjustment by utilizing a natural language processing model based on the first self-attention sample feature. The first text sample may be input into a natural language processing model, a third self-attention sample feature of the first text sample may be obtained through the natural language processing model, a loss between the first self-attention sample feature and the third self-attention sample feature obtained by the encoder may be calculated, and parameters of the encoder may be fine-tuned according to the loss.

Similar to the encoder, after the decoder obtained through the pre-training obtains the second self-attention sample feature, the parameters of the pre-trained decoder may be fine-tuned using the natural language processing model based on the second self-attention sample feature. And inputting the intermediate predicted text output last time into the natural language processing model each time when the intermediate predicted text output last time by the pointer network model is input into the decoder obtained by pre-training. After the decoder outputs the second self-attention sample feature of the current time according to the first self-attention sample feature and the middle predicted text output last time, the fourth self-attention sample feature of the middle predicted text output last time is obtained through the natural language processing model, the loss between the second self-attention sample feature of the current time and the fourth self-attention sample feature is calculated, and the parameters of the encoder are finely adjusted according to the loss. In the formal training process, the encoder and the decoder are finely adjusted through the natural language processing model, so that the accuracy of the encoder and the decoder can be further improved, and the performance of the whole text processing model is improved.

FIG. 9 is a diagram illustrating a formal training phase of a text processing model in one embodiment. As shown in fig. 9, the first text sample may be input into a text processing model, and the text processing model may obtain an embedded vector (Embedding) of each character in the first text sample, and input the embedded vector of each character into a pre-trained encoder. The normalization layer in the encoder may normalize the embedded vector of each character in the first text sample, and input the normalized embedded vector into the self-attention layer, and the self-attention layer may calculate the self-attention vector of each character according to the embedded sample vector of each character, and add the self-attention vector of each character to the normalized embedded sample vector of each character through the addition layer. And analyzing the added vector result through a feedforward layer to obtain a first self-attention sample characteristic.

The pointer network model can obtain the position coordinates of the character required to be output at this time in the first text sample according to the first self-attention sample characteristic output by the encoder obtained through pre-training and the second self-attention sample characteristic output by the decoder obtained through pre-training at this time. According to the position coordinates, the characters output this time can be found in the first text sample, and are spliced with the intermediate prediction text obtained last time, the intermediate prediction text is updated, and the intermediate prediction text of this time is obtained. The intermediate prediction text obtained this time can be used as the input of the next decoder and input into the decoder.

After the last obtained intermediate predictive text is obtained, the embedded vector of each character in the intermediate predictive text can be obtained, and the embedded vector of each character in the last intermediate predictive text is input into a decoder. The first normalization layer of the decoder may normalize the embedded vectors for each character of the intermediate predictive text and input the normalized embedded vectors into the first self-attention layer. And calculating a first self-attention vector of each character of the intermediate prediction text through the first self-attention layer, and adding the first self-attention vector and the embedded vector subjected to normalization processing of each character by using the first addition layer to obtain a first result vector. The first self-attention sample features obtained by the pre-trained encoder may be input into a second normalization layer of the decoder, and the first self-attention sample features and the first result vector may be normalized by the second normalization layer. The second self-attention layer can calculate a second self-attention vector of each character of the intermediate prediction text through the first self-attention sample characteristics and the first result vector after normalization processing, and then the second self-attention layer is added with the first result vector to obtain a second result vector. And analyzing the second result vector through a feedforward layer to obtain the second self-attention sample characteristic of the time, and inputting the second self-attention sample characteristic into the pointer network model.

When the pointer network model outputs a terminator, the intermediate prediction text obtained last time can be obtained as the prediction reference resolution text for the first text sample, and the distance (namely loss) between the prediction reference resolution text and the actual reference resolution text of the first text sample is calculated, so that parameters of an encoder, a decoder and the pointer network model in the text processing model are adjusted until the distance between the output prediction reference resolution text and the actual reference resolution text meets the expectation, and then the training of the text processing model is completed.

During the formal training process, the loss of the first self-attention sample characteristic and the loss of the second self-attention sample characteristic can be calculated through a BERT model, so that the parameters of the encoder and the decoder can be finely adjusted.

Step 710, obtaining an original text, and generating a first text sequence according to the original text.

At step 712, a first self-attention feature of the first text sequence is obtained by the encoder.

In step 714, the first self-attention feature is input into the decoder, and the decoder obtains a second self-attention feature according to the first self-attention feature.

In step 716, the first self-attention feature and the second self-attention feature are analyzed by the pointer network model, and the position coordinates are output.

Step 718, finding the character matched with the position coordinate in the original text, and obtaining a second text sequence according to the found character.

The descriptions of steps 710-718 can refer to the related descriptions in the above embodiments, and are not repeated herein.

In the embodiment of the application, the encoder and the decoder are pre-trained by using the natural language processing model based on the attention mechanism, and then are formally trained by using the text sample, so that the encoder and the decoder can quickly learn the text processing capability of the natural language processing model, the model complexity of the text processing model is reduced while the capability of the text processing model for processing the reference resolution task of the dialogue text is improved, and the performance of the text processing model is improved.

As shown in fig. 10, in an embodiment, a method for training a text processing model is provided, which is applicable to electronic devices such as a personal computer, a notebook computer, a mobile phone, and a tablet computer, and the embodiment of the present application is not limited thereto. The method may comprise the steps of:

and step 1010, pre-training an encoder in the text processing model through the natural language processing model based on the self-attention mechanism so as to enable the pre-trained encoder to be fitted with the natural language processing model.

In one embodiment, step 1010 includes: and (3) taking a natural language processing model based on a self-attention mechanism as a teacher model, taking an encoder in the text processing model as a student model, and carrying out knowledge distillation training on the encoder through the natural language processing model.

In one embodiment, the steps of knowledge distillation training of the encoder through a natural language processing model include: obtaining a first feature of a second text sample through the natural language processing model, and obtaining a second feature of the second text sample through the encoder; a first loss of the second feature relative to the first feature is calculated, and a parameter of the encoder is adjusted based on the first loss.

And 1020, pre-training a decoder in the text processing model through the natural language processing model so as to fit the pre-trained decoder with the natural language processing model.

In one embodiment, step 1020, comprises: and (3) performing knowledge distillation training on the decoder by using the natural language processing model as a teacher model and a decoder in the text processing model as a student model.

In one embodiment, the steps of knowledge distillation training of the decoder through a natural language processing model include: obtaining a third feature of a third text sample through an encoder obtained through pre-training; inputting the third feature into a decoder so that the decoder obtains an output text sequence and a fourth feature for a third text sample according to the third feature; splicing the third text sample with the output text sequence, and obtaining a fifth characteristic of the spliced text through a natural language processing model; and calculating a second loss of the fourth characteristic relative to the fifth characteristic, and adjusting the parameters of the decoder according to the second loss.

In one embodiment, after the step of obtaining the fifth feature of the spliced text through the natural language processing model, the method further includes: truncating the fifth feature according to the feature length of the fourth feature to obtain a part of feature matched with the fourth feature after truncation; the step of calculating a second loss of the fourth feature relative to the fifth feature comprises: a second loss of the fourth feature relative to the partial feature is calculated.

In the text processing model, a first self-attention sample feature of the first text sample is obtained through an encoder obtained through pre-training, a second self-attention sample feature of the first text sample is obtained through a decoder obtained through pre-training, and then the prediction resolution text is obtained according to the first self-attention sample feature and the second self-attention sample feature.

Step 1040, adjusting parameters of the text processing model based on the predicted reference resolution text and the actual reference resolution text of the first text sample.

In one embodiment, step 1030, comprises: analyzing the first self-attention sample characteristic and the second self-attention sample characteristic through a pointer network model, and outputting a position coordinate; and searching characters matched with the position coordinates in the first text sample, and obtaining a predicted reference resolution text according to the searched characters.

In one embodiment, step 1040, comprises: calculating a third loss of the predicted reference resolution text relative to the actual reference resolution text of the first text sample; and respectively adjusting parameters of the coder obtained by pre-training, the decoder obtained by pre-training and the pointer network model according to the third loss.

In one embodiment, after the step of obtaining the second self-attention sample feature of the first text sample by the pre-trained decoder, the method further includes: based on the first self-attention sample characteristic, utilizing a natural language processing model to finely adjust the parameters of the coder obtained by pre-training; and fine-tuning the parameters of the decoder obtained by pre-training by utilizing the natural language processing model based on the second self-attention sample characteristic.

It should be noted that, for the description of the training method of the text processing model provided in the embodiment of the present application, reference may be made to the related description about the training of the text processing model in the processing method of the dialog text provided in the foregoing embodiments, and details are not repeated here.

As shown in fig. 11, in an embodiment, a processing apparatus 1100 for dialog text is provided, which is applicable to the electronic device described above, and the processing apparatus 1100 for dialog text may include a text sequence obtaining module 1110 and a text processing module 1120.

The text sequence obtaining module 1110 is configured to obtain a first text sequence, where the first text sequence includes at least a dialog text of a latest dialog turn.

The text processing module 1120 is configured to obtain a text feature of the first text sequence through the text processing model, and obtain a second text sequence according to the text feature, where the second text sequence is a reference resolution text corresponding to a dialog text of a latest dialog turn,

the text processing model comprises an encoder and a decoder, the encoder is used for obtaining the first self-attention feature of the first text sequence, the decoder is used for obtaining the second self-attention feature according to the first self-attention feature, and the encoder and the decoder are obtained through pre-training of a natural language processing model based on a self-attention mechanism.

In one embodiment, the text sequence obtaining module 1110 is further configured to obtain an original text, and generate a first text sequence according to the original text, where the original text includes a dialog text of a latest dialog turn and N pairs of dialog texts before the latest dialog turn, where N is a natural number.

In one embodiment, the text sequence acquisition module 1110 includes a recognition unit and a character processing unit.

The recognition unit is used for recognizing the first type of characters contained in the original text.

And the character processing unit is used for deleting the first type of characters in the original text to obtain a first text sequence, or replacing the first type of characters in the original text with target characters to obtain the first text sequence.

In one embodiment, the text features further include round number features for individual characters in the first text sequence. The apparatus 1100 for processing dialog text includes a text sequence acquiring module 1110 and a text processing module 1120, and further includes an adding module.

And the adding module is used for adding corresponding round number characteristics in each character contained in the original text, and the round number characteristics are used for representing the corresponding conversation round number of the character in the original text.

In one embodiment, the text feature further comprises position information for individual characters in the first text sequence. The apparatus 1100 for processing dialog text includes a labeling module in addition to a text sequence acquiring module 1110, a text processing module 1120, and an adding module.

And the marking module is used for respectively marking position information aiming at each character contained in the original text, and the position information is used for representing the position of the character in the dialog text to which the character belongs.

In one embodiment, the text processing model further comprises a pointer network model. The text processing module 1120 includes a coordinate output unit and a searching unit.

And the coordinate output unit is used for analyzing the first self-attention feature and the second self-attention feature through the pointer network model and outputting the position coordinate.

And the searching unit is used for searching the characters matched with the position coordinates in the original text and obtaining a second text sequence according to the searched characters.

In one embodiment, the coordinate output unit is further configured to input the first self-attention feature and the second self-attention feature output by the decoder at this time into the pointer network model, analyze the first self-attention feature and the second self-attention feature at this time through the pointer network model, and output the position coordinate obtained at this time.

The searching unit is further configured to search a character matched with the position coordinate obtained this time in the original text, splice the searched character to the previous intermediate text sequence to obtain the current intermediate text sequence, use the current intermediate text sequence as an input of a next decoder, output a next second self-attention feature through the decoder, and output the last obtained intermediate text sequence as a second text sequence when the pointer network model outputs the terminator.

In one embodiment, the text processing module 1120, in addition to the coordinate output unit and the search unit, further includes an embedded vector acquisition unit, a normalization unit, and a self-attention calculation unit.

And the embedded vector acquisition unit is used for acquiring the embedded vector of each character in the first text sequence.

And the normalization unit is used for inputting the embedded vector of each character into the encoder and normalizing the embedded vector of each character through the encoder.

And the self-attention calculating unit is used for respectively calculating the self-attention of each character in the first text sequence according to the embedding vector after the normalization processing, and generating a first self-attention feature according to the calculated self-attention.

In one embodiment, the apparatus 1100 for processing dialog text includes a first pre-training module, a second pre-training module, a sample processing module, and a parameter adjusting module, in addition to a text sequence obtaining module 1110, a text processing module 1120, an adding module, and a labeling module.

And the first pre-training module is used for pre-training the coder in the text processing model through the natural language processing model based on the self-attention mechanism so as to fit the coder obtained by pre-training with the natural language processing model.

In one embodiment, the first pre-training module is further configured to perform knowledge distillation training on the encoder through the natural language processing model by taking the natural language processing model based on the self-attention mechanism as a teacher model and taking the encoder in the text processing model as a student model.

In one embodiment, the first pre-training module includes a first feature obtaining unit and a first loss calculating unit.

And the first characteristic acquisition unit is used for acquiring the first characteristic of the second text sample through the natural language processing model and acquiring the second characteristic of the second text sample through the encoder.

And the first loss calculating unit is used for calculating the first loss of the second characteristic relative to the first characteristic and adjusting the parameters of the encoder according to the first loss.

And the second pre-training module is used for pre-training the decoder in the text processing model through the natural language processing model so as to fit the pre-trained decoder with the natural language processing model.

In one embodiment, the second pre-training module is further configured to perform knowledge distillation training on the decoder through the natural language processing model by taking the natural language processing model as a teacher model and taking the decoder in the text processing model as a student model.

In one embodiment, the second pre-training module includes a second feature obtaining unit, a splicing unit, and a second loss calculating unit.

And the second feature acquisition unit is used for acquiring a third feature of the third text sample through the coder obtained by pre-training.

And the second feature acquisition unit is also used for inputting the third feature into the decoder so that the decoder can obtain the output text sequence and the fourth feature aiming at the third text sample according to the third feature.

And the splicing unit is used for splicing the third text sample with the output text sequence.

And the second characteristic acquisition unit is also used for acquiring the fifth characteristic of the spliced text through the natural language processing model.

And the second loss calculation unit is used for calculating the second loss of the fourth characteristic relative to the fifth characteristic and adjusting the parameters of the decoder according to the second loss.

In an embodiment, the second pre-training module further includes a truncation unit, configured to truncate the fifth feature according to a feature length of the fourth feature, so as to obtain a partial feature matched with the fourth feature after truncation.

And the second loss calculating unit is also used for calculating a second loss of the fourth characteristic relative to the partial characteristic and adjusting the parameter of the decoder according to the second loss.

And the sample processing module is used for obtaining a first self-attention sample characteristic of a first text sample through an encoder obtained through pre-training in the text processing model, obtaining a second self-attention sample characteristic of the first text sample through a decoder obtained through pre-training, and obtaining a prediction reference resolution text according to the first self-attention sample characteristic and the second self-attention sample characteristic.

And the parameter adjusting module is used for adjusting the parameters of the text processing model based on the predicted reference resolution text and the actual reference resolution text of the first text sample.

As shown in fig. 12, in an embodiment, a training apparatus 1200 for a text processing model is provided and is applicable to the electronic device. The training apparatus 1200 for the text processing model may include a first pre-training module 1210, a second pre-training module 1220, a sample processing module 1230, and a parameter adjusting module 1240.

A first pre-training module 1210, configured to pre-train an encoder in the text processing model through a natural language processing model based on a self-attention mechanism, so that the pre-trained encoder is fitted to the natural language processing model.

In one embodiment, the first pre-training module 1210 is further configured to perform knowledge distillation training on the encoder through the natural language processing model by using the natural language processing model based on the attention mechanism as a teacher model and using the encoder in the text processing model as a student model.

In one embodiment, the first pre-training module 1210 includes a first feature obtaining unit and a first loss calculating unit.

The second pre-training module 1220 is configured to pre-train a decoder in the text processing model through the natural language processing model, so that the pre-trained decoder is fitted to the natural language processing model.

In one embodiment, the second pre-training module 1220 is further configured to perform knowledge distillation training on the decoder through the natural language processing model by using the natural language processing model as a teacher model and using the decoder in the text processing model as a student model.

In one embodiment, the second pre-training module 1220 includes a second feature obtaining unit, a splicing unit, and a second loss calculating unit.

In an embodiment, the second pre-training module 1220 further includes a truncation unit, configured to truncate the fifth feature according to a feature length of the fourth feature, so as to obtain a partial feature that is matched with the fourth feature after truncation.

The sample processing module 1230 is configured to, in the text processing model, obtain a first self-attention sample feature of the first text sample through the encoder obtained through the pre-training, obtain a second self-attention sample feature of the first text sample through the decoder obtained through the pre-training, and obtain the prediction reference resolution text according to the first self-attention sample feature and the second self-attention sample feature.

A parameter adjusting module 1240 for adjusting parameters of the text processing model based on the predicted reference resolution text and the actual reference resolution text of the first text sample.

In one embodiment, the text processing model further comprises a pointer network model. The sample processing module 1230 is further configured to, after the second self-attention sample feature of the first text sample is obtained by the pre-trained decoder, analyze the first self-attention sample feature and the second self-attention sample feature through the pointer network model, output the position coordinates, search for a character matching the position coordinates in the first text sample, and obtain the predicted reference resolution text according to the searched character.

In one embodiment, the parameter adjusting module 1240 is further configured to calculate a third loss of the predicted reference resolution text relative to the actual reference resolution text of the first text sample, and adjust the parameters of the pre-trained encoder, the pre-trained decoder, and the pointer network model according to the third loss.

In one embodiment, the training apparatus 1200 for the text processing model includes a fine tuning module in addition to the first pre-training module 1210, the second pre-training module 1220, the sample processing module 1230 and the parameter adjusting module 1240.

And the fine tuning module is used for carrying out fine tuning on the parameters of the pre-trained encoder by utilizing the natural language processing model based on the first self-attention sample characteristic.

And the fine tuning module is also used for fine tuning the parameters of the pre-trained decoder by utilizing the natural language processing model based on the second self-attention sample characteristic.

FIG. 13 is a block diagram showing the structure of an electronic apparatus according to an embodiment. The electronic equipment can be mobile phones, tablet computers, intelligent wearable equipment, PCs, notebook computers and other equipment. As shown in fig. 13, electronic device 1300 may include one or more of the following components: a processor 1310, a memory 1320 coupled to the processor 1310, wherein the memory 1320 may store one or more applications, and the one or more applications may be configured to be executed by the one or more processors 1310, and the one or more programs are configured to perform the method for processing dialog text as described in the embodiments above.

Processor 1310 may include one or more processing cores. The processor 1310 interfaces with various interfaces and circuitry throughout the electronic device 1300 to perform various functions of the electronic device 1300 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1320, as well as invoking data stored in the memory 1320. Alternatively, the processor 1310 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1310 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is to be understood that the modem may not be integrated into the processor 1310, but may be implemented by a communication chip.

The Memory 1320 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 1320 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1320 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The stored data area may also store data created during use by the electronic device 1300, and the like.

It is understood that the electronic device 1300 may include more or less structural elements than those shown in the above structural block diagrams, such as a power supply, an input button, a camera, a speaker, a screen, an RF (Radio Frequency) circuit, a Wi-Fi (Wireless Fidelity) module, a bluetooth module, a sensor, etc., and is not limited thereto.

An embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor implements the method for training a text processing model described in the foregoing embodiments.

The embodiment of the application discloses a computer readable storage medium, which stores a computer program, wherein the computer program realizes the processing method of the dialog text described in the above embodiments when being executed by a processor.

The embodiment of the application discloses a computer readable storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the training method of the text processing model as described in the embodiments above.

Embodiments of the present application disclose a computer program product comprising a non-transitory computer-readable storage medium storing a computer program, and the computer program is executable by a processor to implement a method for processing dialog text as described in the embodiments above.

Embodiments of the present application disclose a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program is executable by a processor to implement a method for training a text processing model as described in the embodiments above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.

Any reference to memory, storage, database, or other medium as used herein may include non-volatile and/or volatile memory. Suitable non-volatile memory can include ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), and Direct Rambus DRAM (DRDRAM).

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also appreciate that the embodiments described in this specification are all alternative embodiments and that the acts and modules involved are not necessarily required for this application.

In various embodiments of the present application, it should be understood that the size of the serial number of each process described above does not mean that the execution sequence is necessarily sequential, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present application, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, may be embodied in the form of a software product, stored in a memory, including several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of the embodiments of the present application.

The foregoing describes in detail a method, an apparatus, an electronic device, and a storage medium for processing a dialog text disclosed in the embodiments of the present application, and a specific example is applied in the present application to explain the principles and implementations of the present application. Meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for processing dialog text, comprising:

2. The method of claim 1, wherein obtaining the first text sequence comprises:

the method comprises the steps of obtaining an original text, and generating a first text sequence according to the original text, wherein the original text comprises a conversation text of a latest conversation and an N-wheel conversation text before the latest conversation, and N is a natural number.

3. The method of claim 2, wherein the text processing model further comprises a pointer network model, and wherein obtaining a second text sequence according to the text features comprises:

analyzing the first self-attention feature and the second self-attention feature through the pointer network model, and outputting a position coordinate;

and searching characters matched with the position coordinates in the original text, and obtaining a second text sequence according to the searched characters.

4. The method of claim 3, wherein analyzing the first and second self-attention features and outputting location coordinates via the pointer network model comprises:

inputting the first self-attention feature and a second self-attention feature output by the decoder at this time into the pointer network model;

analyzing the first self-attention feature and the second self-attention feature of the current time through the pointer network model, and outputting the position coordinate obtained at the current time;

searching for the character matched with the position coordinate in the original text, and obtaining a second text sequence according to the searched character, wherein the step of searching comprises the following steps:

searching characters matched with the position coordinates obtained at this time in the original text;

splicing the searched characters to the previous intermediate text sequence to obtain the current intermediate text sequence, and taking the current intermediate text sequence as the input of the decoder for the next time so as to output a second self-attention feature for the next time through the decoder;

and when the pointer network model outputs the terminator, outputting the intermediate text sequence obtained last time as a second text sequence.

5. The method of any of claims 2 to 4, wherein the text features further comprise round number features for each character in the first text sequence;

after the obtaining the original text, the method further comprises:

and adding corresponding round number features in each character contained in the original text, wherein the round number features are used for representing the corresponding conversation round number of the character in the original text.

6. The method of any of claims 2 to 4, wherein the text features further comprise position information of each character in the first text sequence;

after obtaining the most obtained original text, the method further comprises:

and respectively labeling position information for each character contained in the original text, wherein the position information is used for representing the position of the character in the dialog text to which the character belongs.

7. The method of claim 2, wherein generating the first text sequence from the original text comprises:

identifying a first type of character contained in the original text;

deleting the characters of the first type in the original text to obtain a first text sequence,

or replacing the first type of character in the original text with a target character to obtain a first text sequence.

8. The method according to any one of claims 1 to 4, wherein the obtaining the text feature of the first text sequence by the text processing model comprises:

acquiring an embedded vector of each character in the first text sequence;

inputting the embedded vector of each character into an encoder, and carrying out normalization processing on the embedded vector of each character through the encoder;

and respectively calculating the self-attention of each character in the first text sequence according to the embedded vector after normalization processing, and generating a first self-attention feature according to the self-attention obtained through calculation.

9. The method of claim 1, wherein prior to said obtaining the first text sequence, the method further comprises:

10. The method of claim 9, wherein pre-training an encoder in a text processing model through a natural language processing model based on a self-attention mechanism comprises:

taking a natural language processing model based on a self-attention mechanism as a teacher model, taking an encoder in a text processing model as a student model, and carrying out knowledge distillation training on the encoder through the natural language processing model;

the pre-training of the decoder in the text processing model by the natural language processing model comprises:

and the natural language processing model is used as a teacher model, a decoder in the text processing model is used as a student model, and knowledge distillation training is carried out on the decoder through the natural language processing model.

11. The method of claim 10, wherein the knowledge-distillation training of the encoder by the natural language processing model comprises:

obtaining a first feature of a second text sample through the natural language processing model and obtaining a second feature of the second text sample through the encoder;

and calculating a first loss of the second characteristic relative to the first characteristic, and adjusting the parameters of the encoder according to the first loss.

12. The method of claim 10 or 11, wherein the knowledge distillation training of the decoder by the natural language processing model comprises:

obtaining a third feature of a third text sample through an encoder obtained through pre-training;

inputting the third feature into a decoder to cause the decoder to obtain an output text sequence and a fourth feature for the third text sample according to the third feature;

splicing the third text sample with the output text sequence, and obtaining a fifth characteristic of the spliced text through the natural language processing model;

and calculating a second loss of the fourth characteristic relative to the fifth characteristic, and adjusting the parameters of the decoder according to the second loss.

13. The method of claim 12, wherein after obtaining the fifth feature of the stitched text through the natural language processing model, the method further comprises:

truncating the fifth feature according to the feature length of the fourth feature to obtain a part of feature matched with the fourth feature after truncation;

the calculating a second loss of the fourth feature relative to the fifth feature comprises:

a second loss of the fourth feature relative to the partial feature is calculated.

14. A method for training a text processing model, comprising:

15. The method of claim 14, wherein pre-training an encoder in a text processing model through a natural language processing model based on a self-attention mechanism comprises:

16. The method of claim 15, wherein the knowledge-distillation training of the encoder by the natural language processing model comprises:

17. The method of claim 15 or 16, wherein the knowledge distillation training of the decoder by the natural language processing model comprises:

18. The method of claim 17, wherein after obtaining the fifth feature of the stitched text through the natural language processing model, the method further comprises:

19. The method of claim 14, wherein the text processing model further comprises a pointer network model, and wherein deriving the predicted reference resolution text from the first self-attention sample feature and the second self-attention sample feature comprises:

analyzing the first self-attention sample characteristic and the second self-attention sample characteristic through the pointer network model, and outputting a position coordinate;

and searching the character matched with the position coordinate in the first text sample, and obtaining a prediction reference resolution text according to the searched character.

20. The method of claim 19, wherein adjusting parameters of the text processing model based on the predicted resolved reference text and the actual resolved reference text of the first text sample comprises:

calculating a third loss of the predicted reference resolution text relative to the actual reference resolution text of the first text sample;

and respectively adjusting parameters of the coder obtained by pre-training, the decoder obtained by pre-training and the pointer network model according to the third loss.

21. The method of claim 14, wherein after the decoder obtained by the pre-training obtains second self-attention sample features of the first text sample, the method further comprises:

fine-tuning the pre-trained encoder parameters by using the natural language processing model based on the first self-attention sample feature;

and fine-tuning the parameters of the pre-trained decoder by utilizing the natural language processing model based on the second self-attention sample characteristics.

22. A device for processing dialog text, comprising:

23. An apparatus for training a text processing model, comprising:

24. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program that, when executed by the processor, causes the processor to carry out the method of any one of claims 1 to 13.

25. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program that, when executed by the processor, causes the processor to carry out the method of any one of claims 14 to 21.

26. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 13.

27. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 14 to 21.