CN115994211B

CN115994211B - Text processing method and device, electronic equipment and storage medium

Info

Publication number: CN115994211B
Application number: CN202211646299.9A
Authority: CN
Inventors: 缪湾湾; 艾鹏; 夏妍; 林锋
Original assignee: Mgjia Beijing Technology Co ltd
Current assignee: Mgjia Beijing Technology Co ltd
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2024-03-08
Anticipated expiration: 2042-12-19
Also published as: CN115994211A

Abstract

The invention discloses a text processing method, a text processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining whether the current dialog scene is a multi-turn dialog scene in response to receiving operation of the voice text and according to the received voice text and the historical voice text; if the current dialogue scene is a multi-round dialogue scene, judging whether missing information exists in voice text content contained in the multi-round dialogue scene; if the missing information exists, inputting the voice text content contained in the multi-round dialogue scene into a pre-trained text processing model, so that the text processing model carries out complement processing on the voice text content with the missing information, and the text processing model comprises a pointer network model.

Description

Text processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a text processing method, a text processing device, an electronic device, and a storage medium.

Background

As artificial intelligence continues to develop, intelligent dialog systems have also received increased attention, where multiple rounds of dialog are an important direction, such as through dialog to control intelligent devices. However, in the prior art, the dialogue text with the missing information is directly complemented, and the accuracy of the complemented result is poor, so that the intelligent device cannot make a correct response. Therefore, a text processing method is needed to be proposed to improve the accuracy of the intelligent dialogue interaction result.

Disclosure of Invention

Therefore, the technical problem to be solved by the invention is to overcome the defect that the existing dialogue system cannot make correct response to the dialogue with missing information, so as to provide a text processing method, a text processing device, an electronic device and a storage medium.

According to a first aspect, an embodiment of the present invention discloses a text processing method, the method including: determining whether the current dialog scene is a multi-turn dialog scene in response to receiving operation of the voice text and according to the received voice text and the historical voice text; if the current dialogue scene is a multi-round dialogue scene, judging whether missing information exists in voice text content contained in the multi-round dialogue scene; if the missing information exists, inputting the voice text content contained in the multi-turn dialogue scene into a pre-trained text processing model, so that the text processing model carries out complement processing on the voice text content with the missing information, and the text processing model comprises a pointer network model.

Optionally, the text processing model comprises a CPT pre-training model; if the missing information exists, inputting the voice text content contained in the multi-turn dialogue scene into a pre-trained text processing model, so that the text processing model carries out complement processing on the voice text content with the missing information, and the method comprises the following steps: performing vector conversion on voice text content contained in a multi-turn dialogue scene by using the CPT pre-training model; for the voice text to be rewritten with missing information, utilizing the CPT pre-training model to rewrite the vector corresponding to the voice text to be rewritten by combining the vectors corresponding to all voice text contents to generate a vector set corresponding to the voice text with complete semantics; inputting a first vector in the vector set into the pointer network model, enabling the pointer network model to perform text conversion on the received vector to obtain corresponding characters, and inputting the characters obtained after conversion into the CPT pre-training model; outputting a next vector from the vector set by using the CPT pre-training model and inputting the next vector to the pointer network model according to the received converted text; and repeating the steps of converting the text of the received vector by using the pointer network model to obtain corresponding characters and inputting the characters obtained after conversion into the CPT pre-training model until the voice text to be rewritten with the missing information is formed into a complete voice text.

Optionally, before the modifying the vector corresponding to the to-be-modified voice text by using the CPT pre-training model in combination with the vectors corresponding to all the voice text contents to generate the vector set corresponding to the to-be-modified voice text with complete semantics, the method further includes: performing dialogue scene judgment again according to vectors corresponding to the voice text content contained in the multi-round dialogue scene obtained after conversion; and when the multi-turn dialogue scene is judged, performing complement processing on the content of the to-be-rewritten voice text with the missing information in the multi-turn dialogue scene.

Optionally, the pointer network determines the text corresponding to the received vector by: determining a degree of association of the received vector with a phonetic text vector contained in the multi-turn dialog scene; constructing a probability distribution based on each degree of association; and selecting characters corresponding to the vector with the highest probability from the probability distribution as characters corresponding to the received vector.

Optionally, the rewriting the vector corresponding to the to-be-rewritten voice text with the vector corresponding to all voice text contents by using the CPT pre-training model to generate a vector set corresponding to the voice text with complete semantics includes: determining the position of missing information in the voice text to be rewritten and the type of the missing information according to a preset text expression rule; and finding out vectors corresponding to the missing information from the vectors corresponding to all the voice text contents according to the position of the missing information and the missing information type in the voice text to be rewritten, and combining the vectors corresponding to the voice text to be rewritten to generate a vector set corresponding to the voice text with complete semantics.

According to a second aspect, the embodiment of the invention also discloses a text processing device, which comprises: the multi-round judging module is used for responding to the receiving operation of the voice text and determining whether the current scene is a multi-round dialogue scene according to the received voice text and the historical voice text; the system comprises a missing information judging module, a judging module and a judging module, wherein the missing information judging module is used for judging whether missing information exists in voice text content contained in a multi-round dialogue scene if the current scene is the multi-round dialogue scene; and the text processing module is used for inputting the voice text content contained in the multi-turn dialogue scene into a pre-trained text processing model if the missing information exists, so that the text processing model carries out complement processing on the voice text content with the missing information, and the text processing model comprises a pointer network model.

Optionally, the text processing model comprises a CPT pre-training model; a text processing module, comprising: the vector conversion sub-module is used for carrying out vector conversion on the voice text content contained in the multi-round dialogue scene by utilizing the CPT pre-training model; the text rewriting submodule is used for rewriting the vectors corresponding to the voice text to be rewritten by combining the vectors corresponding to all voice text contents by utilizing the CPT pre-training model to generate a vector set corresponding to the voice text with complete semantics; the vector conversion text sub-module is used for inputting a first vector in the vector set into the pointer network model, so that the pointer network model performs text conversion on the received vector to obtain corresponding text, and inputting the text obtained after conversion into the CPT pre-training model; the character input model submodule is used for outputting a next vector from the vector set by utilizing the CPT pre-training model according to the received converted characters and inputting the next vector into the pointer network model; and the step repeating sub-module is used for repeating the steps of converting the text of the received vector by using the pointer network model to obtain corresponding characters and inputting the characters obtained after conversion into the CPT pre-training model until the voice text to be rewritten with the missing information forms a voice text with complete semantics.

Optionally, the apparatus further comprises: the multi-round judging module is used for judging the dialogue scene again according to the vectors corresponding to the voice text content contained in the multi-round dialogue scene obtained after conversion; and the text completion module is used for responding to completion processing of the content of the voice text to be rewritten, which has the missing information in the multi-round dialogue scene, when the multi-round dialogue scene is judged.

According to a third aspect, an embodiment of the present invention further discloses an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the text processing method according to the first aspect or any of the alternative embodiments of the first aspect.

According to a fourth aspect, an embodiment of the present invention also discloses a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the text processing method according to the first aspect or any of the alternative embodiments of the first aspect.

The technical scheme of the invention has the following advantages:

according to the text processing method provided by the invention, the current dialogue scene is a multi-round dialogue scene and the missing information exists in the voice text contents contained in the multi-round dialogue scene, all the voice text contents in the dialogue scene are input into the pre-trained text processing model, so that the text processing model carries out the completion processing on the voice text contents with the missing information to obtain the voice text contents with complete semantics, namely, the judgment of the multi-round scene and the completion of the missing information of the historical voice text in the corresponding multi-round dialogue scene are combined, the accuracy of the text completion result is improved, and then the interaction effect of the intelligent dialogue is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart showing a specific example of a text processing method according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a specific example of a text processing apparatus in an embodiment of the present invention;

fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; the two components can be directly connected or indirectly connected through an intermediate medium, or can be communicated inside the two components, or can be connected wirelessly or in a wired way. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

With the development of internet of things and artificial intelligence technology, more and more intelligent devices based on voice control are available, a user can realize the control of the intelligent devices through voice interaction with the intelligent devices, but when voice control instructions sent by the user are incomplete, the intelligent devices can not accurately identify user intentions and make accurate reactions, so that the user needs to repeatedly send the voice control instructions to the intelligent devices for a plurality of times, and the accuracy of interaction results of intelligent control is poor, and the intelligent degree is low.

The embodiment of the invention discloses a text processing method, as shown in fig. 1, which comprises the following steps:

step S101, responding to the receiving operation of the voice text and determining whether the current scene is a multi-round dialogue scene according to the received voice text and the historical voice text; for example, in the embodiment of the present application, the dialogue of the user may be in audio form or text form, if the audio form may be converted into text by the automatic speech recognition technology asr, it is required to determine whether there is a history speech text before the current dialogue after receiving the speech text, where the dialogue interval between the history speech and the current speech cannot exceed a certain time, such as the time interval between two dialogues cannot exceed 1 minute, which is just an example, if it is determined that there is a history speech text before the current dialogue, the current dialogue scene is a multi-round dialogue scene, and if it is determined that there is no history speech text before the current dialogue, the original dialogue text output of the user is directly maintained.

Step S102, if the current scene is a multi-round dialogue scene, judging whether missing information exists in voice text content contained in the multi-round dialogue scene; in the embodiment of the present application, if the current dialog scene is determined to be a multi-turn dialog scene, it is determined whether the missing information exists in the voice text content included in the current multi-turn dialog scene, if the missing information does not exist, the voice text content is directly output according to the original dialog text of the user, if the missing information exists, it is indicated that the voice text to be rewritten exists in the multi-turn dialog scene, all the voice text content in the current multi-turn dialog scene needs to be input into a pre-trained text processing model, where the voice file packet received in the current turn in all the voice text content is the voice text content to be rewritten, the voice text content to be rewritten is compared with the voice text content in the previous turn to determine the missing information of the voice text received in the current turn, for example, the voice text content in the previous turn is "beijing-day-raining" why the voice text content in the current turn is always raining ", and the missing information exists in the voice text in the current turn can be determined, so that the missing information is needed to be input into the pre-trained text processing model for full-complement.

Step S103, if missing information exists, inputting the voice text content contained in the multi-turn dialogue scene into a pre-trained text processing model, so that the text processing model carries out complement processing on the voice text content with the missing information, and the text processing model comprises a pointer network model;

in an exemplary embodiment, after determining that missing information exists in voice text content included in a current multi-turn dialogue scene, all voice text content in the current multi-turn dialogue scene is input into a pre-trained text processing model, so that the text processing model carries out complement processing on a voice text to be rewritten according to text content of a previous turn to obtain a voice text with complete semantics. The pre-trained text processing model comprises a pointer network model (pointer network), so that the information source and the range of input information can be controlled, only the voice text content of the previous round can be selected as the information source to complement the part of the voice text to be rewritten, which is lack of information, and the generation difficulty is reduced and the generation result is controllable.

As an alternative embodiment of the present invention, the text processing model includes a CPT pre-training model; if there is missing information, step 103 includes: performing vector conversion on voice text content contained in a multi-turn dialogue scene by using the CPT pre-training model; for the voice text to be rewritten with missing information, utilizing the CPT pre-training model to rewrite the vector corresponding to the voice text to be rewritten by combining the vectors corresponding to all voice text contents to generate a vector set corresponding to the voice text with complete semantics; inputting a first vector in the vector set into the pointer network model, enabling the pointer network model to perform text conversion on the received vector to obtain corresponding characters, and inputting the characters obtained after conversion into the CPT pre-training model; and outputting the next vector from the vector set by utilizing the CPT pre-training model according to the received converted characters, inputting the next vector into the pointer network model, repeatedly utilizing the pointer network model to perform text conversion on the received vector to obtain corresponding characters, and inputting the characters obtained after conversion into the CPT pre-training model until the to-be-rewritten voice text with missing information is formed into a complete voice text.

Illustratively, the pre-trained text processing model in the embodiment of the present application includes a CPT pre-training model and a pointer network model, and all the voice text contents in the current multi-round dialog scene are input into the CPT pre-training model, where the CPT pre-training model is divided into three parts, namely, a vector conversion part S-Enc, a classification judgment part U-Dec, and a text rewrite part G-Dec, where the U-Dec may be used for some sequence labeling and dialog scene classification tasks, and the G-Dec may be used for generating tasks, by way of example only. After inputting all voice text contents in the current multi-turn dialogue scene into a vector conversion part S-Enc in the CPT pre-training model, coding to convert all voice text contents into corresponding vectors, inputting all converted vectors into a text rewrite part G-Dec, wherein the G-Dec rewrites the corresponding vectors of the voice text to be rewritten into corresponding words based on standard text expression rules by combining the corresponding vectors of the previous turn, the text rewrite part G-Dec outputs the first vector contained in the generated complete vector set and then inputs the first vector into a pointer network model, the pointer network model matches the received first vector with all converted vectors to obtain words corresponding to the received vectors, the converted words are input into the G-Dec, the G-Dec is integrated into a transformation coding block transformer block, in the embodiment of the application, the G-Dec is used for outputting a second vector according to the generated complete vector set and inputting the second vector into the pointer network model to convert the vector into the corresponding words, and the steps are repeated until the voice text to be rewritten is formed into the complete voice text.

As an optional implementation manner of the present invention, before the modifying the vector corresponding to the to-be-modified voice text by using the CPT pre-training model in combination with the vectors corresponding to all the voice text contents to generate the vector set corresponding to the to-be-modified voice text with complete semantics, the method further includes: performing dialogue scene judgment again according to vectors corresponding to the voice text content contained in the multi-round dialogue scene obtained after conversion; and when the multi-turn dialogue scene is judged, performing complement processing on the content of the to-be-rewritten voice text with the missing information in the multi-turn dialogue scene.

In an exemplary embodiment of the present application, after all the voice text contents in the current multi-turn dialog scene are input to the vector conversion section S-Enc in the CPT pre-training model, all the vectors are input to the classification judgment section U-Dec to perform multi-turn judgment on the dialog scene again, for example, the classification judgment section U-Dec performs relevance or context analysis on the vectors corresponding to all the voice texts received in the preset duration, determines whether all the voice texts received in the preset duration constitute a multi-turn dialog scene according to the analysis result, and outputs a judgment result, for example, when the probability of multi-turn scene correspondence in two different judgment results is 0.7 and the probability of non-multi-turn scene correspondence is 0.3, thereby determining that the dialog scene is a multi-turn dialog scene. And the classification judgment part U-Dec in the CPT pre-training model is used for judging the multi-round dialogue scene again, so that the accuracy of the voice text completion operation is ensured.

As an optional embodiment of the present invention, the pointer network determines the text corresponding to the received vector by: determining a degree of association of the received vector with a phonetic text vector contained in the multi-turn dialog scene; constructing a probability distribution based on each degree of association; and selecting characters corresponding to the vector with the highest probability from the probability distribution as characters corresponding to the received vector.

In the process of converting the vector into the corresponding word, the text rewriting part is used for determining the calculated association degree (the association degree can be determined by determining the similarity of the two vectors) of the received vector and the voice text vector of all voice text contents to compare and construct an attention matrix, the final probability distribution of each corresponding vector in all voice text contents and the received vector is obtained based on the attention matrix, the word corresponding to the vector with the highest probability is selected as the word corresponding to the received vector, and the previous turn of input is definedAll vectors corresponding to the phonetic text content of (2) areDefining the vector corresponding to the speech text to be rewritten as +.>Defining the vector output by the CPT pre-training model as M ^(l) (i.e., the vector received by the pointer network), the attention vector is obtained by:

wherein l represents the number of layers of S-Enc; l represents the number of layers of G-Dec.

The attention matrix is obtained by connecting the attention vectors as follows:

the training parameter λ is found by:

wherein sigma is an activation function, and the value of the obtained training parameter is limited between 0 and 1;representing a parameter matrix of the network layer, T representing a transpose of the matrix; t represents the step t of converting the received vector into text.

The method is calculated by the following formula:

wherein a' represents the probability of the degree of association between each vector corresponding to the phonetic text content of the previous round and the received vector; a represents the probability of the degree of association between each vector corresponding to the speech text to be rewritten and the received vector.

The probability distribution of each vector corresponding to all the voice text contents and the received vector is calculated by the following formula:

wherein R is _t =ω represents the probability distribution result of the current step of converting the vector into text; r is R _<t Representing the previous steps of converting vectors into characters, wherein i represents the association degree probability of the ith column in a; j represents the probability of the degree of association of the j-th column in a'.

As an optional implementation manner of the present invention, the rewriting the vector corresponding to the speech text to be rewritten with the vector corresponding to all the speech text content by using the CPT pre-training model to generate a vector set corresponding to the speech text with complete semantics includes: determining the position of missing information in the voice text to be rewritten and the type of the missing information according to a preset text expression rule; and finding out vectors corresponding to the missing information from the vectors corresponding to all the voice text contents according to the position of the missing information and the missing information type in the voice text to be rewritten, and combining the vectors corresponding to the voice text to be rewritten to generate a vector set corresponding to the voice text with complete semantics.

For example, after receiving all the converted vectors, the text rewrite portion G-Dec in the text processing model in the embodiment of the present application determines the location of the missing information and the type of the missing information in the to-be-rewritten voice text based on rules such as rules including main predicate and the like in the complete sentence, for example, the voice text content of the previous round is "beijing-to-be-rewritten is rainy", the current to-be-rewritten voice text content is "why it is always rainy", it may be determined that the to-be-rewritten voice text lacks information, after converting all the two words into vectors, the text rewrite portion determines the location of the missing information and the type of the missing information in the to-be-rewritten sentence, for example, determines that the main predicate "beijing" is lacking in the to-be-rewritten "sentence and that the two words of" beijing "should be placed behind" so that the missing "beijing" is required to be found from the vectors corresponding to "beijing-to-be-for-be-a" text, and then forms the set of "beijing-to always rainy" corresponding vector after "corresponding to" beijing "is placed behind" as an example.

The embodiment of the invention also discloses a text processing device, as shown in fig. 2, which comprises: a multi-turn judging module 201, configured to determine whether the current scene is a multi-turn dialogue scene according to the received voice text and the historical voice text in response to the receiving operation of the voice text; the missing information judging module 202 is configured to judge whether missing information exists in the voice text content included in the multi-round dialogue scene if the current scene is the multi-round dialogue scene; the text processing module 203 is configured to input, if missing information exists, the voice text content included in the multi-turn dialog scene into a pre-trained text processing model, so that the text processing model performs complement processing on the voice text content with the missing information, where the text processing model includes a pointer network model.

According to the text processing device provided by the invention, the current dialogue scene is a multi-round dialogue scene and the missing information exists in the voice text contents contained in the multi-round dialogue scene, all the voice text contents in the dialogue scene are input into the pre-trained text processing model, so that the text processing model carries out the completion processing on the voice text contents with the missing information to obtain the voice text contents with complete semantics, namely, the judgment of the multi-round scene and the completion of the missing information of the historical voice text in the corresponding multi-round dialogue scene are combined, the accuracy of the text completion result is improved, and then the interaction effect of the intelligent dialogue is improved.

As an alternative embodiment of the present invention, the text processing model includes a CPT pre-training model; a text processing module, comprising: the vector conversion sub-module is used for carrying out vector conversion on the voice text content contained in the multi-round dialogue scene by utilizing the CPT pre-training model; the text rewriting submodule is used for rewriting the vectors corresponding to the voice text to be rewritten by combining the vectors corresponding to all voice text contents by utilizing the CPT pre-training model to generate a vector set corresponding to the voice text with complete semantics; the vector conversion text sub-module is used for inputting a first vector in the vector set into the pointer network model, so that the pointer network model performs text conversion on the received vector to obtain corresponding text, and inputting the text obtained after conversion into the CPT pre-training model; the character input model submodule is used for outputting a next vector from the vector set by utilizing the CPT pre-training model according to the received converted characters and inputting the next vector into the pointer network model; and the step repeating sub-module is used for repeating the steps of converting the text of the received vector by using the pointer network model to obtain corresponding characters and inputting the characters obtained after conversion into the CPT pre-training model until the voice text to be rewritten with the missing information forms a voice text with complete semantics.

As an alternative embodiment of the present invention, the apparatus further comprises: the multi-round judging module is used for judging the dialogue scene again according to the vectors corresponding to the voice text content contained in the multi-round dialogue scene obtained after conversion; and the text completion module is used for responding to completion processing of the content of the voice text to be rewritten, which has the missing information in the multi-round dialogue scene, when the multi-round dialogue scene is judged.

As an optional embodiment of the present invention, the pointer network determines the text corresponding to the received vector by: a correlation degree determining sub-module, configured to determine a correlation degree between the received vector and a speech text vector included in the multi-turn dialog scene; the probability distribution construction submodule is used for constructing probability distribution based on each association degree; and the text conversion sub-module is used for selecting the text corresponding to the vector with the highest probability from the probability distribution as the text corresponding to the vector which is determined to be received.

As an alternative embodiment of the present invention, the text rewrite sub-module includes: the information deletion determining submodule is used for determining the position of the deleted information and the type of the deleted information in the voice text to be rewritten by combining with a preset text expression rule; and the deletion complement sub-module is used for finding out a vector corresponding to the deletion information from vectors corresponding to all the voice text contents according to the position of the deletion information in the voice text to be rewritten and the type of the deletion information, and combining the vector corresponding to the voice text to be rewritten to generate a vector set corresponding to the voice text with complete semantics.

The embodiment of the present invention further provides an electronic device, as shown in fig. 3, which may include a processor 401 and a memory 402, where the processor 401 and the memory 402 may be connected by a bus or other means, and in fig. 3, the connection is exemplified by a bus.

The processor 401 may be a central processing unit (Central Processing Unit, CPU). The processor 401 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 402 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the text processing method in the embodiment of the invention. The processor 401 executes various functional applications of the processor and data processing, i.e., implements the text processing method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 402.

Memory 402 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor 401, or the like. In addition, memory 402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, such remote memory being connectable to processor 401 through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 402 and when executed by the processor 401 perform the text processing method in the embodiment shown in fig. 1.

The specific details of the electronic device may be understood correspondingly with respect to the corresponding related descriptions and effects in the embodiment shown in fig. 1, which are not repeated herein.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (RandomAccessMemory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope as defined.

Claims

1. A method of text processing, the method comprising:

determining whether the current dialog scene is a multi-turn dialog scene in response to receiving operation of the voice text and according to the received voice text and the historical voice text;

if the current dialogue scene is a multi-round dialogue scene, judging whether missing information exists in voice text content contained in the multi-round dialogue scene;

if the missing information exists, inputting the voice text content contained in the multi-round dialogue scene into a pre-trained text processing model, so that the text processing model carries out complement processing on the voice text content with the missing information, wherein the text processing model comprises a pointer network model and a CPT pre-training model;

if the missing information exists, inputting the voice text content contained in the multi-turn dialogue scene into a pre-trained text processing model, so that the text processing model carries out complement processing on the voice text content with the missing information, and the method comprises the following steps:

performing vector conversion on voice text content contained in a multi-turn dialogue scene by using the CPT pre-training model;

for the voice text to be rewritten with missing information, utilizing the CPT pre-training model to rewrite the vectors corresponding to the voice text to be rewritten by combining the vectors corresponding to all voice text contents, and generating a vector set corresponding to the voice text with complete semantics;

inputting a first vector in the vector set into the pointer network model, enabling the pointer network model to perform text conversion on the received vector to obtain corresponding characters, and inputting the characters obtained after conversion into the CPT pre-training model;

the pointer network model determines the text corresponding to the received vector by the following steps: determining a degree of association of the received vector with a phonetic text vector contained in the multi-turn dialog scene; constructing a probability distribution based on each degree of association; selecting characters corresponding to the vector with the highest probability from the probability distribution as characters corresponding to the received vector;

the constructing a probability distribution based on each degree of association includes: calculating training parameters based on the received vectors and the phonetic text vectors contained in the multi-turn dialog scene; constructing probability distribution based on the training parameters and each association degree;

outputting a next vector from the vector set by using the CPT pre-training model and inputting the next vector to the pointer network model according to the received converted text;

and repeating the steps of converting the text of the received vector by using the pointer network model to obtain corresponding characters and inputting the characters obtained after conversion into the CPT pre-training model until the voice text to be rewritten with the missing information is formed into a complete voice text.

2. The text processing method according to claim 1, wherein, before the method uses the CPT pre-training model to rewrite the vector corresponding to the to-be-rewritten voice text with the vector corresponding to all voice text contents to generate the vector set corresponding to the to-be-rewritten voice text with complete semantics, the method further comprises:

performing dialogue scene judgment again according to vectors corresponding to the voice text content contained in the multi-round dialogue scene obtained after conversion;

and when the multi-turn dialogue scene is judged, performing complement processing on the content of the to-be-rewritten voice text with the missing information in the multi-turn dialogue scene.

3. The text processing method according to claim 1, wherein the rewriting the vector corresponding to the speech text to be rewritten with the vector corresponding to all the speech text contents by using the CPT pre-training model to generate a set of vectors corresponding to the speech text with complete semantics includes:

determining the position of missing information in the voice text to be rewritten and the type of the missing information according to a preset text expression rule;

and finding out vectors corresponding to the missing information from the vectors corresponding to all the voice text contents according to the position of the missing information and the missing information type in the voice text to be rewritten, and combining the vectors corresponding to the voice text to be rewritten to generate a vector set corresponding to the voice text with complete semantics.

4. A text processing apparatus, the apparatus comprising:

the multi-round judging module is used for responding to the receiving operation of the voice text and determining whether the current scene is a multi-round dialogue scene according to the received voice text and the historical voice text;

the system comprises a missing information judging module, a judging module and a judging module, wherein the missing information judging module is used for judging whether missing information exists in voice text content contained in a multi-round dialogue scene if the current scene is the multi-round dialogue scene;

the text processing module is used for inputting the voice text content contained in the multi-turn dialogue scene into a pre-trained text processing model if the missing information exists, so that the text processing model carries out complement processing on the voice text content with the missing information, the text processing model comprises a pointer network model and a CPT pre-training model, and the text processing module comprises:

the vector conversion sub-module is used for carrying out vector conversion on the voice text content contained in the multi-round dialogue scene by utilizing the CPT pre-training model;

the text rewriting submodule is used for rewriting the vectors corresponding to the voice text to be rewritten by combining the vectors corresponding to all voice text contents by utilizing the CPT pre-training model to generate a vector set corresponding to the voice text with complete semantics;

the vector conversion text sub-module is used for inputting a first vector in the vector set into the pointer network model, so that the pointer network model performs text conversion on the received vector to obtain corresponding text, and inputting the text obtained after conversion into the CPT pre-training model;

the character input model sub-module is used for outputting a next vector from the vector set by utilizing the CPT pre-training model according to the received converted characters and inputting the next vector into the pointer network model, and the pointer network model determines the characters corresponding to the received vectors by the following steps: determining a degree of association of the received vector with a phonetic text vector contained in the multi-turn dialog scene; constructing a probability distribution based on each degree of association; selecting characters corresponding to the vector with the highest probability from the probability distribution as characters corresponding to the received vector;

and the step repeating sub-module is used for repeating the steps of converting the text of the received vector by using the pointer network model to obtain corresponding characters and inputting the characters obtained after conversion into the CPT pre-training model until the voice text to be rewritten with the missing information forms a voice text with complete semantics.

5. The text processing apparatus of claim 4, wherein the apparatus further comprises:

the multi-round judging module is used for judging the dialogue scene again according to the vectors corresponding to the voice text content contained in the multi-round dialogue scene obtained after conversion;

and the text completion module is used for responding to completion processing of the content of the voice text to be rewritten, which has the missing information in the multi-round dialogue scene, when the multi-round dialogue scene is judged.

6. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the text processing method of any of claims 1-3.

7. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the text processing method according to any one of claims 1-3.