CN114528850A

CN114528850A - Punctuation prediction model training method, punctuation adding method and device

Info

Publication number: CN114528850A
Application number: CN202210142401.5A
Authority: CN
Inventors: 李长林; 权佳成; 曹磊
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-05-24
Anticipated expiration: 2042-02-16
Also published as: CN114528850B

Abstract

The application discloses a training method of a punctuation prediction model, a punctuation adding method and a device, which are used for accurately and efficiently adding punctuation for a text. The training method comprises the following steps: inputting the sample text marked with punctuation information into a feature extraction network of a punctuation prediction model, and outputting a first feature vector representing the quantity of punctuation and a second feature vector representing the position and the type of the punctuation; inputting the first feature vector and the second feature vector into a multitask network of a punctuation prediction model, outputting first punctuation prediction information, and adjusting network parameters of each network based on the marked punctuation information and the first punctuation prediction information, wherein the multitask network comprises: the first task layer outputs the predicted number of punctuations in the sample text based on the first feature vector; the feature fusion layer is used for performing fusion processing on the first feature vector and the second feature vector to obtain a first fusion feature vector; and the second task layer outputs the predicted position and the predicted type of the punctuation on the basis of the first fusion feature vector.

Description

Punctuation prediction model training method, punctuation adding method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method for training a punctuation prediction model, a method and a device for adding punctuation.

Background

In the current stage of voice interaction, except for the simple scene, the voice signal can be directly used for recognition, most scenes need to be converted into characters, and then corresponding research and analysis are carried out. However, the text transcribed by the existing speech transcription engine does not contain punctuation, and the punctuation often plays an important role in expressing human emotion, for example, the same text is marked with different punctuation, and the expressed emotion is often different. Therefore, the method adds correct punctuation to the text, and plays an important role in solving the real intention of human beings by a computing mechanism and realizing better human-computer interaction.

The current common punctuation addition scheme mainly realizes the addition of punctuation based on acoustic features and/or text features. The proposal based on the acoustic characteristics predicts punctuation according to the pause duration of a person during speaking, but in a real Automatic Speech Recognition (ASR) system, if unnatural pause occurs, the prediction capability of the punctuation is influenced; in the scheme based on the text characteristics, because the text data often have different sources, the punctuation prediction model trained by the text from the scene A is difficult to play on the text from the scene B; according to the scheme based on the acoustic features and the text features, the training data set is required to simultaneously contain the speech data and the text data transcribed by the ASR, so that the difficulty in obtaining the training data set is increased, the complexity of a prediction process is increased, and the prediction efficiency is low.

Disclosure of Invention

The embodiment of the application provides a training method of a punctuation prediction model, a punctuation adding method and a device, which are used for accurately and efficiently adding punctuation to a text and have wide applicability.

In a first aspect, an embodiment of the present application provides a method for training a punctuation prediction model, including:

inputting a sample text marked with punctuation information into a feature extraction network of a punctuation prediction model, and outputting a first feature vector and a second feature vector, wherein the punctuation information comprises the number, the position and the type of punctuation, the first feature vector is used for representing the number of punctuation in the sample text, and the second feature vector is used for representing the position and the type of punctuation in the sample text;

inputting the first feature vector and the second feature vector into a multitask network of the punctuation prediction model, outputting first punctuation prediction information, wherein the first punctuation prediction information comprises a prediction number, a prediction position and a prediction type of punctuation in the sample text, the multitask network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for carrying out punctuation quantity identification processing on the first feature vector to obtain the predicted quantity of punctuation in the sample text, the feature fusion layer is used for fusing the first feature vector and the second feature vector to obtain a first fused feature vector, the second task layer is used for carrying out punctuation type position identification processing on the first fusion feature vector to obtain a prediction position and a prediction type of punctuation in the sample text;

and adjusting network parameters of each network in the punctuation prediction model based on the marked punctuation information and the first punctuation prediction information.

It can be seen that, in the embodiment of the application, by using the punctuation prediction model of the multitask learning architecture, the sample text marked with punctuation information is input to the feature extraction network of the punctuation prediction model for feature extraction, so as to obtain a first feature vector representing the quantity of punctuation and a second feature vector representing the position and type of the punctuation; then, inputting the first feature vector and the second feature vector into a multi-task network of a punctuation prediction model, predicting the number of punctuations in a sample text based on the first feature vector by a first task layer, fusing the first feature vector and the second feature vector by a feature fusion layer, inputting the fused first feature vector and the second feature vector into a second task layer, and predicting the position and the type of the punctuations in the sample text based on the first fused feature vector obtained by fusion by the second task layer; finally, network parameters of each network in the punctuation prediction model are adjusted based on the marked punctuation information and the punctuation information predicted by the punctuation prediction model. Therefore, the trained punctuation prediction model has the capacity of jointly learning the punctuation quantity prediction and the punctuation position and type prediction of the two tasks, the feature information required by the two tasks is fused in the fusion feature vector after the feature fusion layer fusion processing, the feature information comprises the association information between the two tasks and the information irrelevant to each other between the two tasks, and the association information enables the punctuation prediction model to fully learn the association between the two tasks, so that the prediction accuracy of the punctuation prediction model can be improved; the irrelevant information is equivalent to noise introduced in the learning process of each task, so that the generalization effect of learning of each task can be improved, the punctuation prediction model has wide applicability and can be suitable for text data of various service scenes and various sources, and furthermore, punctuation can be accurately added to the text by utilizing the punctuation prediction model obtained by training.

In a second aspect, an embodiment of the present application provides a punctuation addition method, including:

inputting a text to be processed into a punctuation prediction model, and outputting second punctuation prediction information, wherein the second punctuation prediction information comprises the predicted number, the predicted position and the predicted type of punctuation in the text to be processed, the punctuation prediction model comprises a feature extraction network and a multitask network, the feature extraction network is used for performing feature extraction on the text to be processed to obtain a third feature vector and a fourth feature vector, the third feature vector is used for representing the number of punctuation in the text to be processed, and the fourth feature vector is used for representing the position and the type of punctuation in the text to be processed; the multitask network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for performing punctuation quantity identification processing on the input third feature vector to obtain the predicted quantity of punctuation in the text to be processed, the feature fusion layer is used for performing fusion processing on the third feature vector and the fourth feature vector to obtain a second fusion feature vector, and the second task layer is used for performing punctuation type position identification processing on the input second fusion feature vector to obtain the predicted position and the predicted type of the punctuation in the text to be processed;

and adding punctuation to the text to be processed based on the second punctuation prediction information.

In the embodiment of the application, the punctuation information in the text to be processed can be obtained by inputting the text to be processed into the punctuation prediction model obtained by training, and then punctuation addition is completed, so that the method is simple, convenient and quick to realize and has high efficiency; in addition, the trained punctuation prediction model has the capability of jointly learning the punctuation quantity prediction and the punctuation position and type prediction of the two tasks, the fusion feature vectors subjected to fusion processing by the feature fusion layer in the punctuation prediction model are fused with feature information required by the two tasks, the feature information comprises the association information between the two tasks and information irrelevant to each other between the two tasks, and the association information enables the punctuation prediction model to fully learn the association between the two tasks, so that the prediction accuracy of the punctuation prediction model can be improved; the irrelevant information is equivalent to noise introduced in the learning process of each task, so that the generalization effect of learning of each task can be improved, the punctuation prediction model has wide applicability and can be suitable for text data of various service scenes and various sources, and the accuracy of predicting the punctuation information in the text to be processed can be improved based on the punctuation prediction model obtained by training.

In a third aspect, an embodiment of the present application provides a training apparatus for a punctuation prediction model, including:

the system comprises a first input module, a second input module and a third input module, wherein the first input module is used for inputting a sample text marked with punctuation information into a feature extraction network of a punctuation prediction model and outputting a first feature vector and a second feature vector, the punctuation information comprises the number, the position and the type of punctuation, the first feature vector is used for representing the number of punctuation in the sample text, and the second feature vector is used for representing the position and the type of punctuation in the sample text;

a second input module for inputting the first feature vector and the second feature vector into a multitask network of the punctuation prediction model and outputting first punctuation prediction information, wherein the first punctuation prediction information comprises a prediction number, a prediction position and a prediction type of punctuation in the sample text, the multitask network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for carrying out punctuation quantity identification processing on the first feature vector to obtain the predicted quantity of punctuation in the sample text, the feature fusion layer is used for fusing the first feature vector and the second feature vector to obtain a first fused feature vector, the second task layer is used for carrying out punctuation type position identification processing on the first fusion feature vector to obtain a prediction position and a prediction type of punctuation in the sample text;

and the adjusting module is used for adjusting the network parameters of each network in the punctuation prediction model based on the marked punctuation information and the first punctuation prediction information.

In a fourth aspect, an embodiment of the present application provides a punctuation adding device, including:

the system comprises a third input module, a second input module and a third output module, wherein the third input module is used for inputting a text to be processed into a punctuation prediction model and outputting second punctuation prediction information, the second punctuation prediction information comprises the predicted number, the predicted position and the predicted type of punctuation in the text to be processed, the punctuation prediction model comprises a feature extraction network and a multitask network, the feature extraction network is used for performing feature extraction on the text to be processed to obtain a third feature vector and a fourth feature vector, the third feature vector is used for representing the number of punctuation in the text to be processed, and the fourth feature vector is used for representing the position and the type of punctuation in the text to be processed; the multitask network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for performing punctuation quantity identification processing on the input third feature vector to obtain the predicted quantity of punctuation in the text to be processed, the feature fusion layer is used for performing fusion processing on the third feature vector and the fourth feature vector to obtain a second fusion feature vector, and the second task layer is used for performing punctuation type position identification processing on the input second fusion feature vector to obtain the predicted position and the predicted type of the punctuation in the text to be processed;

and the punctuation adding module is used for adding punctuation for the text to be processed based on the second punctuation prediction information.

In a fifth aspect, an embodiment of the present application provides an electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of the first aspect.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, where instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method according to the first aspect.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart of a method for training a punctuation prediction model according to an embodiment of the present application;

fig. 2 is a schematic flowchart illustrating a sample data tagging method according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a punctuation prediction model according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a punctuation addition method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a punctuation addition method according to another embodiment of the present application;

FIG. 6 is a schematic structural diagram of a training apparatus for a punctuation prediction model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a punctuation adding device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, and a character "/" generally means that a front and rear related object is in an "or" relationship.

Partial concept description:

ASR: a technique converts speech to text.

Multi-task Learning (MTL): the generalization performance is improved by using domain knowledge contained in the supervisory signals of the relevant tasks, with the goal of utilizing useful information contained in multiple learning tasks to help each task learn to get a more accurate learner. Jointly learning multiple tasks can lead to better performance than learning each task individually, on the basis of the assumption that all tasks (at least a portion of them) are related. The MTL may be divided into various types according to the nature of a task, for example, including multitask supervised learning, multitask unsupervised learning, multitask semi-supervised learning, multitask active learning, multitask reinforcement learning, multitask online learning, and multitask multi-view learning.

In order to solve the problems of low prediction accuracy and low efficiency of the existing punctuation addition scheme, the embodiment of the application provides a training method of a text prediction model based on multi-task learning and a punctuation addition scheme executed subsequently based on the trained text prediction model, and by adopting the punctuation prediction model of a multi-task learning architecture, a sample text marked with punctuation information is input into a feature extraction network of the punctuation prediction model for feature extraction, so as to obtain a first feature vector representing the quantity of punctuation and a second feature vector representing the position and the type of the punctuation; then, inputting the first feature vector and the second feature vector into a multi-task network of a punctuation prediction model, predicting the number of punctuations in a sample text based on the first feature vector by a first task layer, fusing the first feature vector and the second feature vector by a feature fusion layer, inputting the fused first feature vector and the second feature vector into a second task layer, and predicting the position and the type of the punctuations in the sample text based on the first fused feature vector obtained by fusion by the second task layer; finally, network parameters of each network in the punctuation prediction model are adjusted based on the marked punctuation information and the punctuation information predicted by the punctuation prediction model. Therefore, the trained punctuation prediction model has the capacity of jointly learning punctuation quantity prediction and punctuation position and type prediction of two tasks, feature information required by the two tasks is fused in a fusion feature vector subjected to fusion processing by a feature fusion layer, the feature information comprises association information between the two tasks and information irrelevant between the two tasks, and the association information can enable the punctuation prediction model to fully learn the association between the two tasks in the learning process; the irrelevant information is equivalent to noise introduced in the learning process of each task, so that the generalization effect of learning of each task can be improved, the prediction accuracy of the punctuation prediction model can be improved, and further, punctuation can be accurately added to the text by utilizing the punctuation prediction model obtained by training; in addition, punctuation information in the text to be processed can be obtained by inputting the text to be processed into the punctuation prediction model obtained by training, and then punctuation addition is completed, so that the method is simple to implement and high in efficiency.

It should be understood that the punctuation prediction model training method and the punctuation adding method provided in the embodiments of the present application may be executed by an electronic device or software installed in the electronic device, and specifically may be executed by a terminal device or a server device.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a flow chart of a method for training a punctuation prediction model according to an embodiment of the present application is shown, where the method may include the following steps:

s102, inputting the sample text marked with the punctuation information into a feature extraction network of the punctuation prediction model, and outputting a first feature vector and a second feature vector.

The punctuation information comprises the number, the position and the type of the punctuation. In particular, the position of the punctuation is the position of the punctuation in the belonging text, such as after a certain character of the belonging text. Types of punctuation may include, for example, but are not limited to: comma, period, exclamation point, colon, etc.

In the embodiment of the application, the sample text marked with the punctuation information can be obtained by marking the text based on the punctuation information in the text before training the punctuation prediction model. The sample text marked with the punctuation information can be obtained through any appropriate marking processing mode, and can be specifically selected according to actual needs, which is not limited in the embodiment of the application.

In order to accurately and intuitively reflect punctuation information in a sample text, in an alternative implementation manner, as shown in fig. 2, acquiring a sample text marked with punctuation information may specifically be implemented as follows: acquiring a text containing punctuation information; then, based on punctuation information contained in the text, removing punctuation in the text to obtain a sample text; then, the number of punctuations in the text containing the punctuations is added before the first character of the sample text, and annotation information corresponding to each character in the sample text is generated based on the position and the type of the punctuations in the text containing the punctuations, wherein the annotation information is used for indicating whether the punctuations exist after the corresponding characters and the types of the existing punctuations.

For example, in the text "hello, here immediate consumption finance, containing punctuation. Asking you for mr. Zhang? "for example, the text contains 3 punctuations (i.e. 1 COMMA", ",1 period". ", and 1 question mark"; "12" means PERIOD; "21" is quench }.

TABLE 1

Then, traversing the dictionary Punctation _ Dict, and based on the Punctuation positions and types recorded in the dictionary, sequentially removing commas after the 2 nd character, periods after the 12 th character and question marks after the 21 st character in the text to obtain a sample text' you are here the financial ask you are Mr. Zhang); further, with the character as a unit, carrying out format conversion on the sample text according to the format of a single line of each character; finally, the text containing punctuation "hello, here immediate expense finance, is added before the first character of the sample text. Asking you for mr. Zhang? "the number of punctuations" 3 ", and based on the positions and types of the punctuations recorded in the form of a dictionary as described above, generate the labeling information corresponding to each character in the sample text as described below, where" N "in the labeling information indicates that there is no punctuation after the corresponding character," Y "in the labeling information" Y _ X "indicates that there is a punctuation after the corresponding character," X "in the labeling information" Y _ X "indicates the type of the punctuation, and the characters and the corresponding labeling information may be separated by spaces.

3

Your N

Good Y _ COMMA

This N

LiN

Is N

Horse N

Upper N

Eliminating N

Fee N

Gold N

Fuse Y _ PERIOD

Please N

Question N

Your N

Is N

Zhang N

First N

Raw N

Dolomy _ resolution

After the sample text marked with the punctuation information is obtained, the step S102 may be executed to obtain a first feature vector and a second feature vector. The first feature vector is used for representing the number of punctuations in the sample text, and the second feature vector is used for representing the positions and types of the punctuations in the sample text.

In the embodiment of the application, the feature extraction network of the punctuation prediction model can be preset according to experience, and can extract features of the input text. The feature extraction network may have any appropriate network structure, and may be specifically set according to actual needs, which is not limited in the embodiment of the present application.

In an alternative implementation, the feature extraction network may be a neural network, and the network structure of the neural network may be determined based on the number of convolution kernels, the number of network layers, the number of channels in each layer, and the connection manner between each layer and the previous layer, in other words, the network parameters may include, but are not limited to, the number of convolution kernels, the number of network layers, the number of channels in each layer, and the connection manner between each layer and the previous layer.

In order to ensure that the extracted feature vector is more accurate and comprehensive, in another alternative implementation manner, as shown in fig. 3, the feature extraction network may include a first feature extraction layer and a second feature extraction layer, where the first feature extraction layer and the second feature extraction layer have different network structures. The first feature extraction layer may be configured to perform feature extraction on the input text according to a first feature extraction manner, and in S102, perform feature extraction on the input sample text labeled with the punctuation information according to the first feature extraction manner, so as to obtain a first sub-feature vector. The second feature extraction layer may be configured to perform feature extraction on the input text according to a second feature extraction manner, and in S102, perform feature extraction on the input sample text labeled with the punctuation information according to the second feature extraction manner, so as to obtain a second sub-feature vector and a second feature vector. The first feature vector comprises a first sub-feature vector and a second sub-feature vector, and the first feature extraction mode and the second feature extraction mode are different.

For example, the first feature extraction method may be a non-deep learning-based feature extraction method, and the second feature extraction method may be a deep learning-based feature extraction method.

For another example, the first feature extraction layer may encode the number of punctuations marked on the sample text according to a one-hot encoding manner, so as to obtain a first sub-feature vector representing the number of the punctuations; the second feature extraction layer may be a language representation model, such as a Bidirectional Encoder Representation (BERT) pre-training model, which may perform feature extraction on the input sample text to obtain a second sub-feature vector representing the number of punctuation marks and a second feature vector representing the positions and types of the punctuation marks.

And S104, inputting the first characteristic vector and the second characteristic vector into a multi-task network of the punctuation prediction model, and outputting first punctuation prediction information.

The first punctuation prediction information comprises the prediction number, the prediction position and the prediction type of punctuation in the sample text.

In the embodiment of the application, the multi-task network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for identifying the number of punctuations of a first feature vector to obtain the predicted number of the punctuations in a sample text; the feature fusion layer is used for performing fusion processing on the first feature vector and the second feature vector to obtain a first fusion feature vector; and the second task layer is used for carrying out punctuation type position identification processing on the first fusion characteristic vector to obtain a prediction position and a prediction type of punctuation in the sample text.

In an alternative implementation, the connection relationship between the feature extraction network and the multitasking network and the connection relationship between the network layers in the multitasking network can be seen in fig. 3. Fig. 3 is a schematic structural diagram of a punctuation prediction model according to an embodiment of the present application, and as shown in fig. 3, for convenience of description, a sample text marked with punctuation information is described below.

Specifically, after a sample text marked with punctuation information is input into a feature extraction network, feature extraction is carried out through the feature extraction network, and a first feature vector representing the quantity of punctuation and a second feature vector representing the position and the type of a punctuation are obtained, wherein the first feature vector comprises a first sub-feature vector and a second sub-feature vector.

On one hand, the first feature vector is input into a first task layer, and after punctuation quantity identification processing is carried out on the first task layer, the predicted quantity of punctuation in a sample text is output; on the other hand, the first feature vector and the second feature vector are input into the feature fusion layer together, and the first fusion feature vector is output after fusion processing is performed by the feature fusion layer, so that the first fusion feature vector not only contains the correlation information between the punctuation quantity, the punctuation position and the type of the sample text, but also contains the information which is irrelevant to each other.

Furthermore, the first fusion feature vector is input into a second task layer, and punctuation type position identification processing is performed through the second task layer to obtain a predicted position and a predicted type of punctuation in the sample text, wherein the second task layer can fully learn the relevance between the two tasks according to the relevance information between the punctuation quantity and the punctuation position and the type of the sample text contained in the first fusion feature vector, and the information that the punctuation quantity and the punctuation position and the type of the sample text contained in the first fusion feature vector are irrelevant to each other is equivalent to noise introduced in the learning process of the punctuation position and the type prediction task, so that the generalization effect of the second task layer can be improved, and the accuracy of the second task layer in predicting the punctuation position and the type is improved.

More specifically, when the feature fusion layer performs fusion processing on the first feature vector and the second feature vector, the first feature vector and the second feature vector may be spliced first, so that the punctuation quantity feature, the punctuation position and the type feature of the sample text are included in the feature vectors obtained by splicing; then, the feature fusion layer multiplies the spliced feature vector by the second feature vector, so that the punctuation quantity features and the punctuation position and type features of the sample text are more closely fused together, and a first fusion feature vector is obtained.

Further, in order to enable the first task layer to learn the punctuation quantity feature more specifically, as shown in fig. 3, in the punctuation prediction model according to the embodiment of the present application, the multitask network may further include a first attention mechanism layer, where an output end of the first attention mechanism layer is connected to an input end of the first task layer and an input end of the feature fusion layer, respectively. The first attention mechanism layer is used for performing feature enhancement processing on the first sub-feature vector and the second sub-feature vector to obtain a first attention feature vector. Correspondingly, the first task layer is used for carrying out punctuation quantity identification processing on the first attention feature vector to obtain the predicted quantity of the punctuation in the sample text; the feature fusion layer is used for carrying out fusion processing on the first attention feature vector and the second feature vector to obtain a first fusion feature vector.

Specifically, a sample text with punctuation information is still described. The sample text is respectively input into a first feature extraction layer and a second feature extraction layer in a feature extraction network to obtain a first sub-feature vector, a second sub-feature vector and a second feature vector; after the first sub-feature vector and the second sub-feature vector are input into the first Attention Mechanism layer, the first Attention Mechanism layer firstly splices the first sub-feature vector and the second sub-feature vector, then codes the spliced feature vector based on an Attention Mechanism (Attention Mechanism), so that the feature information of the punctuation quantity, which is more important for the task of punctuation quantity prediction, is enhanced, and then outputs the first Attention feature vector.

On one hand, the first attention feature vector is input into the first task layer, the predicted number of the punctuations in the sample text is obtained after punctuation number identification processing is carried out on the first task layer, and the punctuation number features represented by the first attention feature vector are enhanced compared with the original first feature vector, so that the punctuation number features of the sample text can be learned by the first task layer more pertinently, and the accuracy of the second task layer in predicting the punctuation number in the text is improved.

On the other hand, the first attention feature vector and the second feature vector are input into the feature fusion layer together, the feature fusion layer firstly splices the first attention feature vector and the second feature vector, so that the punctuation quantity feature, the punctuation position and the type feature of the sample text are contained in the spliced feature vector, then the spliced feature vector is calculated by using a Sigmod function and then multiplied by the second feature vector, so that the punctuation quantity feature, the punctuation position and the type feature of the sample text are fused together more closely to obtain a first fusion feature vector, and the obtained first fusion feature vector is irrelevant to the contained correlation information and information between the punctuation quantity, the punctuation position and the type of the sample text and the obtained fusion feature vector, And accuracy is achieved, so that the accuracy of predicting the position and the type of the standard point by the second task layer is further improved.

Further, in order to enable the second task layer to learn the landmark position characteristics and the type characteristics more specifically, as shown in fig. 3, in the landmark prediction model according to the embodiment of the present application, the multitask network may further include a second attention mechanism layer, where an input end of the second attention mechanism layer is connected to an output end of the feature fusion layer and an output end of the feature fusion network, respectively, and an output end of the second attention mechanism layer is connected to an input end of the second task layer. The second attention mechanism layer is used for performing feature enhancement processing on the input first feature fusion vector and the second feature vector to obtain a second attention feature vector. Correspondingly, the second task layer is used for performing punctuation type position identification processing on the input second attention feature vector to obtain a predicted position and a predicted type of punctuation in the sample text.

Specifically, the second attention mechanism layer firstly splices the first feature fusion vector and the second feature vector, then codes the spliced feature vector based on the attention mechanism, so that the more important punctuation position and type feature information for the task of punctuation position and type prediction is enhanced, and then outputs a second attention feature vector; and then, inputting the second attention feature vector into a second task layer, and performing punctuation type position identification processing on the second task layer to obtain the predicted position and the predicted type of the punctuation in the sample text. The punctuation positions and the type features contained in the second attention feature vector are enhanced compared with the original first fusion feature vector, so that the second task layer can more specifically learn the punctuation quantity features of the sample text, and the accuracy of predicting the punctuation positions and the types by the second task layer is further improved.

And S106, adjusting network parameters of each network in the punctuation prediction model based on the marked punctuation information and the first punctuation prediction information.

For each network in the punctuation prediction model, the network parameters may include, but are not limited to, the number of neurons in each network layer, connection relationships and connection edge weights between neurons in different network layers, offsets corresponding to the neurons in each network layer, and the like.

In the embodiment of the application, the marked punctuation information represents the actual punctuation information of the sample text, and the first punctuation prediction information is obtained by learning and predicting the sample text by the punctuation prediction model, so that the prediction accuracy of the punctuation prediction model can be reflected by the difference between the marked punctuation information and the first punctuation prediction information, and the network parameters of each network in the punctuation prediction model can be adjusted on the basis of the difference, so that the prediction accuracy of the punctuation prediction model is improved.

In consideration of the influence of the prediction accuracy of the punctuation prediction model on the output result of the punctuation prediction model for each task, in order to improve the prediction accuracy of the punctuation prediction model based on this, in an optional implementation manner, the network parameters of each network in the punctuation prediction model can be adjusted based on the difference between the prediction result and the actual result output by the punctuation prediction model for each task. Specifically, S106 may be implemented as:

step A1, determining a first loss value corresponding to the sample text based on the marked punctuations quantity and the predicted quantity of the punctuations in the sample text.

Wherein the first loss value is used to characterize a resulting loss value for the number of predicted punctuation, which reflects a difference between the number of annotated punctuation and the number of predicted punctuation.

And step A2, determining a second loss value corresponding to the sample text based on the marked punctuation position and punctuation type and the predicted position and predicted type of the punctuation in the sample text.

The second loss value is used for representing a loss value generated by the position and the type of the predicted punctuation, and reflects the difference between the marked punctuation position and the type and the predicted punctuation position and the type.

Step a3, determining a predicted loss value corresponding to the sample text based on the first loss value and the second loss value.

The prediction loss value is used for representing a loss value generated by predicting punctuation information of the sample text, and reflects the difference between the marked punctuation information and the first punctuation prediction information; and finally, based on the prediction loss value corresponding to the sample text, adjusting the network parameters of each network in the punctuation prediction model by adopting a back propagation algorithm.

In the specific application, in the process of adjusting the network parameters of each network in the punctuation prediction model by adopting the back propagation algorithm, the loss values corresponding to each network in the punctuation prediction model can be determined by adopting the back propagation algorithm based on the prediction loss values corresponding to the sample text, and then the prediction loss values corresponding to the sample text are reduced to a target, so that the network parameters of each network are adjusted layer by layer.

Further, since the two tasks of the number of predicted punctuations and the positions and types of the predicted punctuations have a certain correlation, in order to enable the prediction loss value of the sample text to more accurately and objectively reflect the difference between the marked punctuation information and the first punctuation prediction information, the step a3 may be specifically implemented as follows: determining a third loss value corresponding to the sample text based on the first loss value and the second loss value, wherein the third loss value is used for representing the loss values generated by the number, the positions and the types of the predicted punctuations; and then, carrying out weighted summation on the first loss value, the second loss value and the third loss value to obtain a prediction loss value corresponding to the sample text.

For example, loss₃＝n·loss₁·(1-n)·loss₂Wherein, loss₃Represents the third loss value, loss₁Representing a first loss value, loss₂Represents the second loss value, and n represents a preset coefficient, which can be set according to actual needs.

In practical applications, both the first loss value and the second loss value may be determined by using any suitable loss function, which is not limited in the embodiment of the present application. Secondly, the weights corresponding to the first loss value, the second loss value and the third loss value may be set according to actual needs, and may be adjusted in the training process of the punctuation prediction model, which is not limited in the embodiment of the present application.

It should be noted that the above-mentioned process is only a training process, and in practical applications, multiple training may be required, so that the above-mentioned steps S102 to S106 may be repeatedly executed multiple times until the training stop condition is satisfied. The training stopping condition may be that the iteration number reaches a preset number threshold, the prediction loss value corresponding to the sample text is within a preset range, and the like, which is not limited in the embodiment of the present application. In addition, for the adjustment of the network parameters of each network in the punctuation prediction model, a gradient descent algorithm can be adopted to adjust the network parameters in the direction of the negative gradient of the network parameters of each network.

According to the training method of the punctuation prediction model, the sample text marked with punctuation information is input into a feature extraction network of the punctuation prediction model for feature extraction, and a first feature vector representing the quantity of punctuation and a second feature vector representing the position and the type of the punctuation are obtained; then, inputting the first feature vector and the second feature vector into a multi-task network of a punctuation prediction model, predicting the number of punctuations in a sample text based on the first feature vector by a first task layer, fusing the first feature vector and the second feature vector by a feature fusion layer, inputting the fused first feature vector and the second feature vector into a second task layer, and predicting the position and the type of the punctuations in the sample text based on the first fused feature vector obtained by fusion by the second task layer; finally, network parameters of each network in the punctuation prediction model are adjusted based on the marked punctuation information and the punctuation information predicted by the punctuation prediction model. Therefore, the trained punctuation prediction model has the capacity of jointly learning the punctuation quantity prediction and the punctuation position and type prediction of the two tasks, the feature information required by the two tasks is fused in the fusion feature vector after the feature fusion layer fusion processing, the feature information comprises the association information between the two tasks and the information irrelevant to each other between the two tasks, and the association information enables the punctuation prediction model to fully learn the association between the two tasks, so that the prediction accuracy of the punctuation prediction model can be improved; the irrelevant information is equivalent to noise introduced in the learning process of each task, so that the generalization effect of learning of each task can be improved, the punctuation prediction model has wide applicability and can be suitable for text data of various service scenes and various sources, and furthermore, punctuation can be accurately added to the text by utilizing the punctuation prediction model obtained by training; in addition, punctuation information in the text to be processed can be obtained by inputting the text to be processed into the punctuation prediction model obtained by training, and then punctuation addition is completed, so that the method is simple, convenient and quick to realize and has high efficiency.

The embodiment of the application also provides a punctuation adding method, which can automatically add punctuation to a text based on a punctuation prediction model trained by the method shown in fig. 1.

Referring to fig. 4, a schematic flow chart of a punctuation adding method according to an embodiment of the present application is provided, where the method includes the following steps:

s402, inputting the text to be processed into the punctuation prediction model and outputting second punctuation prediction information.

The text to be processed is a text without punctuation, and can be obtained by performing voice conversion processing on a voice signal based on an ASR technology. The second punctuation prediction information comprises the prediction number, the prediction position and the prediction type of punctuation in the text to be processed.

The punctuation prediction model in this embodiment may be obtained by training based on the training method shown in fig. 1, for example, the punctuation prediction model shown in fig. 3. The punctuation prediction model comprises a feature extraction network and a multi-task network, wherein the feature extraction network is used for extracting features of the text to be processed to obtain a third feature vector and a fourth feature vector, the third feature vector is used for representing the number of punctuations in the text to be processed, and the fourth feature vector is used for representing the positions and types of the punctuations in the text to be processed; the multitask network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for carrying out punctuation quantity identification processing on a third feature vector to obtain the predicted quantity of punctuation in a text to be processed, the feature fusion layer is used for carrying out fusion processing on the third feature vector and a fourth feature vector to obtain a second fusion feature vector, and the second task layer is used for carrying out punctuation type position identification processing on the second fusion feature vector to obtain the predicted position and the predicted type of the punctuation in the text to be processed.

And S404, adding punctuation for the text to be processed based on the second punctuation prediction information.

After the predicted number, the predicted type and the predicted position of the punctuations in the text to be processed are obtained, the punctuations can be added to the text to be processed based on the information.

Considering that the prediction information output by the punctuation prediction model may be incorrect, which may result in that punctuation is not added to the text to be processed or incorrect punctuation is added to the text to be processed, thereby affecting the processing effect of other subsequent processing tasks, in an optional implementation manner, the step S404 may specifically be implemented as: determining the number of punctuations in the text to be processed based on the predicted position and the predicted type in the second punctuation prediction information; detecting whether the determined number of punctuations in the text to be processed is consistent with the predicted number in the second punctuation prediction information or not; and if the first punctuation prediction information is consistent with the second punctuation prediction information, adding punctuation for the text to be processed based on the prediction position and the prediction type in the second punctuation prediction information.

For example, the text to be processed is "thank you for listening to thank you for goodbye", the text to be processed is input into the punctuation prediction model, and the output second punctuation prediction information includes: the predicted number N of punctuations in the text to be processed is 3, and the predicted position and predicted type P _ pos _ sty _ lis of the punctuations in the text to be processed recorded in the form of a list are [ [ [ "Y _ PERIOD",6], [ "Y _ COMMA",8], [ "Y _ EXCLAMATION",10] ], that is, after the 6 th character in the text to be processed (i.e., "listening") is a PERIOD. ", the 8 th character (i.e.," thank ") is followed by a comma" "and the 10 th character (i.e.," see ") is followed by an exclamation point"! ". Based on the predicted positions and the predicted types of the punctuations in the text to be processed, the number of the punctuations can be determined to be 3, and the number of the punctuations is consistent with the predicted number in the second punctuation prediction information, so that the punctuations can be added to the text to be processed, and the following text 'thank you' answer is obtained. Thanks to the general theory of metabolism! "

In order to further ensure the accuracy of adding punctuations to the text to be processed, in the above S404, if the determined number of punctuations in the text to be processed is inconsistent with the predicted number in the second punctuation prediction information, it may be determined that the second punctuation prediction information is incorrect, and then the second punctuation prediction information is sent to the auditing platform, and the auditing platform audits and corrects the second punctuation prediction information; and further, adding punctuation for the text to be processed based on the corrected second punctuation prediction information returned by the auditing platform.

For example, still taking the text to be processed as "thank you for listening to thank you for goodbye" as an example, the text to be processed is input into the punctuation prediction model, and the output second punctuation prediction information includes: the predicted number N of punctuations in the text to be processed is 3, and the predicted positions and predicted types P _ pos _ sty _ lis of the punctuations in the text to be processed recorded in the form of a list are [ [ [ "Y _ PERIOD",6], [ "Y _ COMMA",8] ], i.e., a PERIOD is formed after (i.e., "listening") the 6 th character in the text to be processed. ", the 8 th character (i.e.," thank ") is followed by a comma", ". Based on the predicted position and the predicted type of the punctuations in the text to be processed, the number of the punctuations can be determined to be 2, and the punctuations are not consistent with the predicted number in the second punctuation prediction information, the second punctuation prediction information can be sent to an auditing platform, and the second punctuation prediction information returned after being audited and corrected by the auditing platform comprises: n is 3, P _ pos _ sty _ lis is [ [ "Y _ PERIOD",6], [ "Y _ COMMA",8], [ "Y _ exterior", 10] ], and punctuation can be further added to the text to be processed based on the predicted position and the predicted type in the corrected second punctuation prediction information, so as to obtain the following text "thank you" listening. Thanks to the general theory of metabolism! "

According to the punctuation adding method provided by the embodiment of the application, punctuation information in the text to be processed can be obtained by inputting the text to be processed into the punctuation prediction model obtained through training, and then punctuation addition is completed, so that the method is simple, convenient and fast to realize and high in efficiency; in addition, the trained punctuation prediction model has the capacity of jointly learning punctuation quantity prediction and punctuation position and type prediction, feature information required by the two tasks is fused in a fusion feature vector after fusion processing of a feature fusion layer in the punctuation prediction model, the feature information comprises association information between the two tasks and information irrelevant between the two tasks, and the association information enables the punctuation prediction model to fully learn the association between the two tasks, so that the prediction accuracy of the punctuation prediction model can be improved; the irrelevant information is equivalent to noise introduced in the learning process of each task, so that the generalization effect of learning of each task can be improved, the punctuation prediction model has wide applicability and can be suitable for text data of various service scenes and various sources, and the accuracy of predicting the punctuation information in the text to be processed can be improved based on the punctuation prediction model obtained by training.

The punctuation prediction method provided by the embodiment of the application has any scenes with punctuation addition requirements, such as but not limited to text emotion classification, user intention identification, automatic response based on a question-answer knowledge base, and the like. The punctuation prediction method provided by the embodiment of the present application is described in detail below with reference to an application scenario of automatic response based on a knowledge base of question and answer as an example.

When a user needs to consult a problem in a certain field, the problem can be presented to a question-answering system in a voice interaction mode, the question-answering system converts a received voice signal into a text by utilizing an ASR technology, the problem consulted by the user cannot be accurately understood because the text obtained by conversion of the ASR technology does not carry punctuation, further, the question-answering system can input the text obtained by conversion into a punctuation prediction model to carry out punctuation prediction to obtain punctuation prediction information of the text, wherein the obtained punctuation prediction information comprises the prediction number, the prediction position and the prediction type of punctuation in the text; secondly, the question-answering system determines the number of punctuations in the text based on the predicted positions and the predicted types of the punctuations in the text, and if the determined number of the punctuations in the text is consistent with the predicted number, the punctuations are added to the text based on the predicted positions and the predicted types of the punctuations in the text; and if the number of punctuations in the text is determined to be inconsistent with the predicted number, sending punctuation prediction information of the text to an auditing platform for auditing and correcting, and adding punctuation for the text based on the corrected punctuation prediction information returned by the auditing platform. The text added with the punctuation is beneficial to accurately expressing the real intention of the user, the question-answering system can analyze and identify the intention of the user based on the text added with the punctuation so as to understand the questions consulted by the user, and then recall the answer sentences matched with the questions consulted by the user from the question-answering knowledge base and return the answer sentences to the user, so that the questions consulted by the user are answered.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In addition, corresponding to the method for training the punctuation prediction model shown in fig. 1, the embodiment of the present application further provides a device for training the punctuation prediction model. Fig. 6 is a schematic structural diagram of an apparatus 600 for training a punctuation prediction model according to an embodiment of the present application, the apparatus including:

a first input module 610, configured to input a sample text labeled with punctuation information into a feature extraction network of a punctuation prediction model, and output a first feature vector and a second feature vector, where the punctuation information includes the number, position, and type of punctuation, the first feature vector is used to represent the number of punctuation in the sample text, and the second feature vector is used to represent the position and type of punctuation in the sample text;

a second input module 620, configured to input the first feature vector and the second feature vector into a multitasking network of the punctuation prediction model, output first punctuation prediction information, wherein the first punctuation prediction information comprises a prediction number, a prediction position and a prediction type of punctuation in the sample text, the multi-task network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for carrying out punctuation quantity identification processing on the first feature vector to obtain the predicted quantity of punctuation in the sample text, the feature fusion layer is used for fusing the first feature vector and the second feature vector to obtain a first fused feature vector, the second task layer is used for carrying out punctuation type position identification processing on the first fusion feature vector to obtain a prediction position and a prediction type of punctuation in the sample text;

an adjusting module 630, configured to adjust network parameters of each network in the punctuation prediction model based on the marked punctuation information and the first punctuation prediction information.

According to the training device of the punctuation prediction model, provided by the embodiment of the application, a sample text marked with punctuation information is input into a feature extraction network of the punctuation prediction model for feature extraction, so that a first feature vector representing the quantity of punctuation points and a second feature vector representing the positions and types of the punctuation points are obtained; then, inputting the first feature vector and the second feature vector into a multi-task network of a punctuation prediction model, predicting the number of punctuations in a sample text based on the first feature vector by a first task layer, fusing the first feature vector and the second feature vector by a feature fusion layer, inputting the fused first feature vector and the second feature vector into a second task layer, and predicting the position and the type of the punctuations in the sample text based on the first fused feature vector obtained by fusion by the second task layer; finally, network parameters of each network in the punctuation prediction model are adjusted based on the marked punctuation information and the punctuation information predicted by the punctuation prediction model. Therefore, the trained punctuation prediction model has the capacity of jointly learning the punctuation quantity prediction and the punctuation position and type prediction of the two tasks, the feature information required by the two tasks is fused in the fusion feature vector after the feature fusion layer fusion processing, the feature information comprises the association information between the two tasks and the information irrelevant to each other between the two tasks, and the association information enables the punctuation prediction model to fully learn the association between the two tasks, so that the prediction accuracy of the punctuation prediction model can be improved; the irrelevant information is equivalent to noise introduced in the learning process of each task, so that the generalization effect of learning of each task can be improved, the punctuation prediction model has wide applicability and can be suitable for text data of various service scenes and various sources, and furthermore, punctuation can be accurately added to the text by utilizing the punctuation prediction model obtained by training; in addition, punctuation information in the text to be processed can be obtained by inputting the text to be processed into the punctuation prediction model obtained by training, and then punctuation addition is completed, so that the method is simple, convenient and quick to realize and has high efficiency.

Optionally, the first feature vector includes a first sub-feature vector and a second sub-feature vector, and the first sub-feature vector and the second sub-feature vector are obtained by performing feature extraction on the sample text by the feature extraction network according to different feature extraction manners;

the multitask network further comprises a first attention mechanism layer, wherein the first attention mechanism layer is used for performing feature enhancement processing on the input first sub-feature vector and the input second sub-feature vector to obtain a first attention feature vector;

the first task layer is used for carrying out punctuation quantity identification processing on the first attention feature vector to obtain the predicted quantity of punctuation in the sample text;

the feature fusion layer is used for performing fusion processing on the first attention feature vector and the second feature vector to obtain a first fusion feature vector.

Optionally, the feature extraction network includes a first feature extraction layer and a second feature extraction layer, and the first feature extraction layer and the second feature extraction layer have different network structures;

the first feature extraction layer is used for performing feature extraction on the sample text according to a first feature extraction mode to obtain the first sub-feature vector;

and the second feature extraction network is used for extracting features of the sample text according to a second feature extraction mode to obtain the second sub-feature vector and the second feature vector.

Optionally, the multitask network further includes a second attention mechanism layer, where the second attention mechanism layer is configured to perform feature enhancement processing on the input first feature fusion vector and the input second feature vector to obtain a second attention feature vector;

and the second task layer is used for carrying out punctuation type position identification processing on the second attention feature vector to obtain the predicted position and the predicted type of the punctuation in the sample text.

Optionally, the adjusting module includes:

the first loss value determining submodule is used for determining a first loss value corresponding to the sample text based on the number of marked punctuations and the predicted number of punctuations in the sample text, and the first loss value is used for representing a loss value generated by the number of the predicted punctuations;

a second loss value determining submodule, configured to determine a second loss value corresponding to the sample text based on the labeled punctuation position and punctuation type and the predicted position and predicted type of the punctuation in the sample text, where the second loss value is used to represent a loss value generated by the position and type of the predicted punctuation;

the prediction loss value determining submodule is used for determining a prediction loss value corresponding to the sample text based on the first loss value and the second loss value;

and the adjusting submodule is used for adjusting the network parameters of each network in the punctuation prediction model by adopting a back propagation algorithm based on the prediction loss value.

Optionally, the predicted loss value determination sub-module is configured to:

determining a third loss value corresponding to the sample text based on the first loss value and the second loss value, wherein the third loss value is used for representing the loss values generated by the number, the positions and the types of the predicted punctuations;

and performing weighted summation on the first loss value, the second loss value and the third loss value to obtain a prediction loss value corresponding to the sample text.

Optionally, the training device of the punctuation prediction model further comprises:

the first acquisition module is used for acquiring a text containing punctuation before the first input module inputs the sample text marked with the punctuation information into a feature extraction network of the punctuation prediction model;

the punctuation removal module is used for removing punctuations in the text containing the punctuations based on the positions and types of the punctuations in the text containing the punctuations to obtain a sample text;

the adding module is used for adding the number of punctuations in the text containing the punctuations before the initial character of the sample text;

and the generating module is used for generating marking information corresponding to each character in the sample text based on the position and the type of the punctuation in the text containing the punctuation, wherein the marking information is used for indicating whether the punctuation exists after the corresponding character and the type of the existing punctuation.

Obviously, the training device of the punctuation prediction model according to the embodiment of the present application may be an execution subject of the training method of the punctuation prediction model shown in fig. 1, and thus the functions of the training method of the punctuation prediction model in fig. 1 can be realized. Since the principle is the same, the description will not be repeated here.

In addition, corresponding to the punctuation adding method shown in fig. 4, an embodiment of the present application further provides a punctuation adding device. Fig. 7 is a schematic structural diagram of a punctuation adding device 700 according to an embodiment of the present application, where the device includes:

a third input module 710, configured to input a text to be processed into a punctuation prediction model, and output second punctuation prediction information, where the second punctuation prediction information includes a predicted number, a predicted position, and a predicted type of punctuation in the text to be processed, the punctuation prediction model includes a feature extraction network and a multitasking network, the feature extraction network is configured to perform feature extraction on the text to be processed, so as to obtain a third feature vector and a fourth feature vector, the third feature vector is used to represent the number of punctuation in the text to be processed, and the fourth feature vector is used to represent the position and the type of punctuation in the text to be processed; the multitask network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for performing punctuation quantity identification processing on the input third feature vector to obtain the predicted quantity of punctuation in the text to be processed, the feature fusion layer is used for performing fusion processing on the third feature vector and the fourth feature vector to obtain a second fusion feature vector, and the second task layer is used for performing punctuation type position identification processing on the input second fusion feature vector to obtain the predicted position and the predicted type of the punctuation in the text to be processed;

and a punctuation adding module 720, configured to add punctuation to the to-be-processed text based on the second punctuation prediction information.

The punctuation adding device provided by the embodiment of the application can obtain punctuation information in the text to be processed by inputting the text to be processed into the punctuation prediction model obtained by training, and then completes punctuation addition, so that the punctuation adding device is simple, convenient and fast to realize and high in efficiency; in addition, the trained punctuation prediction model has the capability of jointly learning the punctuation quantity prediction and the punctuation position and type prediction of the two tasks, the fusion feature vectors subjected to fusion processing by the feature fusion layer in the punctuation prediction model are fused with feature information required by the two tasks, the feature information comprises the association information between the two tasks and information irrelevant to each other between the two tasks, and the association information enables the punctuation prediction model to fully learn the association between the two tasks, so that the prediction accuracy of the punctuation prediction model can be improved; the irrelevant information is equivalent to noise introduced in the learning process of each task, so that the generalization effect of learning of each task can be improved, the punctuation prediction model has wide applicability and can be suitable for text data of various service scenes and various sources, and the accuracy of predicting the punctuation information in the text to be processed can be improved based on the punctuation prediction model obtained by training.

Optionally, the punctuation adding module comprises:

the punctuation quantity determining submodule is used for determining the quantity of punctuations in the text to be processed based on the predicted position and the predicted type in the second punctuation prediction information;

the detection submodule is used for detecting whether the determined number of punctuations in the text to be processed is consistent with the predicted number in the second punctuation prediction information;

and the first punctuation adding submodule is used for adding punctuation to the text to be processed based on the prediction position and the prediction type in the second punctuation prediction information when the determined punctuation quantity is consistent with the prediction quantity in the second punctuation prediction information.

Optionally, the punctuation adding module further comprises:

the sending submodule is used for sending the second punctuation prediction information to an auditing platform when the determined number of the punctuations in the text to be processed is inconsistent with the predicted number in the second punctuation prediction information;

and the second punctuation adding submodule is used for adding punctuations for the text to be processed based on the corrected second punctuation prediction information returned by the auditing platform.

Obviously, the punctuation adding device according to the embodiment of the present application may be used as the execution main body of the punctuation adding method shown in fig. 4, and thus the functions of the punctuation adding method in fig. 4 can be realized. Since the principle is the same, the description will not be repeated here.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 8, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 8, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a training device of the punctuation prediction model on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

Or the processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the punctuation adding device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

The method performed by the training apparatus for punctuation prediction model as disclosed in the embodiment shown in fig. 1 of the present application or the method performed by the punctuation addition apparatus as disclosed in the embodiment shown in fig. 4 of the present application can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may also execute the method of fig. 1 and implement the functions of the training apparatus for punctuation prediction models in the embodiment shown in fig. 1, which are not described herein again in this application. The electronic device may further execute the method in fig. 4, and implement the function of the punctuation addition device in the embodiment shown in fig. 4, which is not described herein again in this embodiment of the application.

Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Embodiments of the present application also provide a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which when executed by a portable electronic device including a plurality of application programs, enable the portable electronic device to perform the method of the embodiment shown in fig. 1, and are specifically configured to:

inputting the first feature vector and the second feature vector into a multitask network of the punctuation prediction model, outputting first punctuation prediction information, wherein the first punctuation prediction information comprises a prediction number, a prediction position and a prediction type of punctuation in the sample text, the multi-task network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for carrying out punctuation quantity identification processing on the first feature vector to obtain the predicted quantity of punctuation in the sample text, the feature fusion layer is used for fusing the first feature vector and the second feature vector to obtain a first fused feature vector, the second task layer is used for carrying out punctuation type position identification processing on the first fusion feature vector to obtain a prediction position and a prediction type of punctuation in the sample text;

Alternatively, the instructions, when executed by a portable electronic device comprising a plurality of application programs, can cause the portable electronic device to perform the method of the embodiment shown in fig. 4, and in particular to perform the following operations:

In short, the above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A method for training a punctuation prediction model, comprising:

2. The method according to claim 1, wherein the first feature vector comprises a first sub-feature vector and a second sub-feature vector, and the first sub-feature vector and the second sub-feature vector are obtained by feature extraction of the sample text by the feature extraction network according to different feature extraction manners;

3. The method of claim 2, wherein the feature extraction network comprises a first feature extraction layer and a second feature extraction layer, the first feature extraction layer and the second feature extraction layer having different network structures;

4. The method of claim 1, wherein the multitasking network further comprises a second attention mechanism layer, and the second attention mechanism layer is configured to perform feature enhancement processing on the input first feature fusion vector and the input second feature vector to obtain a second attention feature vector;

5. The method of claim 1, wherein adjusting network parameters of each network in the punctuation prediction model based on the annotated punctuation information and the first punctuation prediction information comprises:

determining a first loss value corresponding to the sample text based on the number of marked punctuations and the predicted number of punctuations in the sample text, wherein the first loss value is used for representing a loss value generated by the number of the predicted punctuations;

determining a second loss value corresponding to the sample text based on the marked punctuation position and punctuation type and the predicted position and predicted type of the punctuation in the sample text, wherein the second loss value is used for representing the loss value generated by the position and type of the predicted punctuation;

determining a prediction loss value corresponding to the sample text based on the first loss value and the second loss value;

and adjusting the network parameters of each network in the punctuation prediction model by adopting a back propagation algorithm based on the prediction loss value.

6. The method of claim 5, wherein determining the predicted loss value corresponding to the sample text based on the first loss value and the second loss value comprises:

7. The method of any one of claims 1 to 6, wherein prior to entering sample text labeled with punctuation information into a feature extraction network of a punctuation prediction model, the method further comprises:

acquiring a text containing punctuation;

removing punctuations in the text containing the punctuations based on the positions and types of the punctuations in the text containing the punctuations to obtain a sample text;

adding the number of punctuation in the punctuation-containing text before the first character of the sample text;

and generating marking information corresponding to each character in the sample text based on the position and the type of the punctuation in the text containing the punctuation, wherein the marking information is used for indicating whether the punctuation exists after the corresponding character and the type of the existing punctuation.

8. A punctuation adding method is characterized by comprising the following steps:

9. The method according to claim 8, wherein adding punctuation to the text to be processed based on the second punctuation prediction information comprises:

determining the number of punctuations in the text to be processed based on the predicted position and the predicted type in the second punctuation prediction information;

detecting whether the determined number of punctuations in the text to be processed is consistent with the predicted number in the second punctuation prediction information;

and if the text to be processed is consistent with the text to be processed, adding punctuation for the text to be processed based on the prediction position and the prediction type in the second punctuation prediction information.

10. The method according to claim 9, wherein adding punctuation to the text to be processed based on the second punctuation prediction information further comprises:

if the determined number of punctuations in the text to be processed is inconsistent with the predicted number in the second punctuation prediction information, sending the second punctuation prediction information to an auditing platform;

and adding punctuation to the text to be processed based on the corrected second punctuation prediction information returned by the auditing platform.

11. An apparatus for training a punctuation prediction model, comprising:

a second input module for inputting the first feature vector and the second feature vector into a multitask network of the punctuation prediction model and outputting first punctuation prediction information, wherein the first punctuation prediction information comprises a prediction number, a prediction position and a prediction type of punctuation in the sample text, the multi-task network comprises a first task layer, a second task layer and a feature fusion layer, wherein the first task layer is used for carrying out punctuation quantity identification processing on the first feature vector to obtain the predicted quantity of punctuation in the sample text, the feature fusion layer is used for fusing the first feature vector and the second feature vector to obtain a first fused feature vector, the second task layer is used for carrying out punctuation type position identification processing on the first fusion feature vector to obtain a prediction position and a prediction type of punctuation in the sample text;

12. A punctuation adding device, comprising:

13. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 10.

14. A computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-10.