CN112257470A

CN112257470A - Model training method and device, computer equipment and readable storage medium

Info

Publication number: CN112257470A
Application number: CN202011261074.2A
Authority: CN
Inventors: 王星; 焦文祥; 涂兆鹏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-01-22

Abstract

The embodiment of the application provides a model training method, a model training device, computer equipment and a readable storage medium, wherein the method comprises the following steps: obtaining an original sample set, training a sample recognition model according to the original sample set, and obtaining an original recognition model; the original sample set comprises a plurality of original samples; calling the original identification model to respectively identify each original sample to obtain an original identification result of each original sample; dividing the original sample set into a target sample set and a first activation sample set according to the original recognition result of each original sample; calling an activation model to perform activation processing on each target sample to obtain a second activation sample set; and training the recognition processing model according to the first activation sample set and the second activation sample set to obtain a target recognition model, so that the translation accuracy of the translated model after training is effectively improved while the data volume of the translated model is ensured to be trained.

Description

Model training method and device, computer equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a model training method and apparatus, a computer device, and a readable storage medium.

Background

The translation model is a data-driven task, a large amount of training data is often needed for training in order to obtain a translation model with good performance, however, the existence of noisy data in the large-scale training data makes training of the machine translation model difficult, and thus the performance of the translation model is affected.

At present, in order to solve the above problems, noise data is generally removed from a large amount of original training data, and then the remaining original training data after the noise data is removed is trained on the translation model, which results in a small amount of data for training the translation model, and the translation accuracy of the translation model trained based on a small amount of sample data is low.

Disclosure of Invention

The embodiment of the application provides a model training method and device, computer equipment and a readable storage medium, which can train a translation model by reasonably utilizing training data, ensure the data volume of the translation model, and effectively improve the translation accuracy of the translation model after training.

An aspect of the embodiments of the present application provides a model training method, including:

obtaining an original sample set, training a sample recognition model according to the original sample set, and obtaining an original recognition model; the original sample set comprises a plurality of original samples;

calling the original identification model to respectively identify each original sample to obtain an original identification result of each original sample;

dividing the original sample set into a target sample set and a first activation sample set according to an original recognition result of each original sample, wherein the target sample set comprises at least one target sample;

calling an activation model to perform activation processing on each target sample to obtain a second activation sample set; wherein the activation model is trained based on the first set of activation samples;

and training a recognition processing model according to the first activation sample set and the second activation sample set to obtain a target recognition model.

An aspect of an embodiment of the present application provides a model training apparatus, including:

the acquisition module is used for acquiring an original sample set, training a sample recognition model according to the original sample set and obtaining an original recognition model; the original sample set comprises a plurality of original samples;

the calling module is used for calling the original identification model to respectively identify each original sample to obtain an original identification result of each original sample;

a determining module, configured to divide the original sample set into a target sample set and a first active sample set according to an original recognition result of each original sample, where the target sample set includes at least one target sample;

the calling module is further used for calling the activation model to perform activation processing on each target sample to obtain a second activation sample set; wherein the activation model is trained based on the first set of activation samples;

the determining module is further configured to train a recognition processing model according to the first activation sample set and the second activation sample set, so as to obtain a target recognition model.

In one aspect, the present invention provides a computer device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the model training method described above.

An aspect of the embodiments of the present application provides a computer-readable storage medium, in which program instructions are stored, and when the program instructions are executed, the computer-readable storage medium is used for implementing the model training method described above.

An aspect of the embodiments of the present application provides a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium, and when the computer instructions are executed by a processor of a computer device, the computer instructions perform the model training method described above.

In the embodiment of the application, the computer device activates a target sample set in an original sample set to obtain a first activation sample set, and trains the recognition processing model according to the first activation sample set and a second activation sample set in the original sample set. Under the condition that the training model is not changed and extra data is added, noise data of the training data does not need to be removed, the translation model is trained by reasonably utilizing the training data, the data volume of the training translation model is guaranteed, and meanwhile the translation accuracy of the trained translation model is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is an architecture diagram of model training provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;

FIG. 5 is a graph of NMT model performance with different target scales considered as inactive samples, provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Among them, Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The model training method provided by the embodiment of the application relates to technologies such as artificial intelligence natural language processing and the like, and can distinguish a large amount of original data, namely, the large amount of original data is divided into original inactive data and original active data. Furthermore, the computer equipment activates the inactive data in the original data, so that the recognition processing model is trained according to the activated data and the original active data to obtain the target recognition model, thereby realizing that the original data does not need to be denoised, ensuring the data volume of the translation model, shortening the model training time and effectively improving the translation accuracy of the translation model after training.

In one possible embodiment, the model training architecture is shown in FIG. 1. The model training architecture mainly relates to two models, namely an original recognition model and an activation model. When a translation model is to be trained, a computer device may first obtain a large number of raw samples, where the raw samples include source data X and destination data Y corresponding to the source data, and call a pre-trained recognition model to recognize the raw samples, so as to obtain a plurality of inactive samples and a plurality of active samples. And then training a preset model by the computer equipment according to the plurality of active samples to obtain an activated model. Further, the computer equipment calls an activation model to perform data processing on each inactive sample, namely source end data X of the inactive sample is translated to obtain destination end data Y1, the source end data X and the destination end data Y1 are synthesized into parallel corpora (X, Y1), and the parallel corpora (X, Y1) form an activation sample; according to the method, the activation sample corresponding to each inactive sample can be obtained. And then training the recognition processing model according to the activation samples and the active samples, thereby completing the training of the recognition processing model and obtaining the target recognition model.

In a specific application, the model training method provided in the embodiment of the present application may be applied to training all Machine Translation models, for example, the Machine Translation model may be a Neural network Machine Translation model (NMT). In the embodiment of the application, when the model training method is used for training the translation model, the framework of the translation model does not need to be changed. The model training method is illustrated by the following examples. Furthermore, since the model training method provided by the embodiment of the present application mainly involves processing of sample data, the scheme of the present application may be applied to training of all artificial intelligence models, for example, training of an image recognition model, training of a semantic segmentation model, and the like.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a model training method according to an embodiment of the present disclosure. The model training method may be executed by a computer device, and the model training method described in this embodiment includes the following steps S201 to S205:

s201, obtaining an original sample set, training a sample recognition model according to the original sample set, and obtaining an original recognition model.

The original sample set includes a plurality of original samples, each original sample may be a sample of a text class, a sample of a speech class, and so on, for example, each original sample may include a first source text and a first target text corresponding to the first source text, that is, each original sample includes sample data and a tag of the sample data.

In particular implementations, the computer device may obtain the plurality of raw samples from a predetermined route, for example, from a storage space of the computer device or from a network. And then training a sample recognition model according to a plurality of original samples to obtain an original recognition model.

S202, calling an original identification model to respectively identify each original sample to obtain an original identification result of each original sample.

The original recognition result may include the first predicted text or a phrase probability set corresponding to the first target text.

S203, according to the original recognition result of each original sample, dividing the original sample set into a target sample set and a first activation sample set.

Wherein the target sample set may comprise at least one target sample. It will be appreciated that the target samples may be considered samples that do not substantially contribute to or have a negative impact on subsequent recognition processing model training, i.e., may be referred to as inactive samples. The first set of activation samples may include at least one first set of activation samples. Each first activation sample may be considered as text that contributes significantly to the training of the subsequent recognition processing model, i.e., may be referred to as an active sample. In a specific implementation, dividing the original sample set into a target sample set and a first activation sample set may be determined according to a first predicted text of each original sample or a phrase probability set corresponding to the first target text. In a possible embodiment, the original recognition result includes a phrase probability set corresponding to the first target text, and the computer device superimposes the phrase probabilities in the phrase probability set to a target probability, and divides the original sample set into a target sample set and a first activation sample set according to the target probability. In another possible embodiment, the original recognition result includes a first predicted text, and the computer device may determine, according to the first predicted text of each original sample, a recognition error of each original sample according to the first predicted text corresponding to the original sample, and divide the original sample set into a target sample set and a first active sample set according to the recognition error of each original sample.

And S204, calling an activation model to perform activation processing on each target sample to obtain a second activation sample set.

Wherein, the activation model is trained based on the first activation sample set, and the activation model can be a forward activation model or a reverse activation model. In particular implementations, forward and reverse translation may be employed to implement the activation model. After the computer device obtains the first activation sample set, a forward model (i.e., a forward translation model) or a reverse model (i.e., a reverse translation model) can be trained according to each first activation sample, so that the forward activation model or the reverse activation model is obtained.

In a possible embodiment, each target sample comprises a second source text and a second target text, where the second source text is the first source text in the original sample and the second target text is the first target text in the original sample. The computer equipment can extract a second source text or a second target text or the second source text and the second target text from the second source text and the second target text of each target sample to be used as a text to be processed of each target sample, then invokes an activation model to perform activation processing on the text to be processed of each target sample to obtain a second predicted text of each target sample, combines the text to be processed of each target sample and the second predicted text into each second activation sample, and combines each second activation sample into a second activation sample set. In a specific implementation, the following three ways to generate the second activation sample set may be selected according to requirements or experience.

(1) The computer device may extract a second source text from the second source text and the second target text of each target sample as a text to be processed of each target sample, and then the computer device invokes a forward activation model to perform activation processing on the second source text of each target sample to obtain a second predicted text of each target sample, and combines the second source text of each target sample and the second predicted text of each target sample into each second activation sample. Illustratively, the target sample is (x, y), the text to be processed is x (i.e., the second source text mentioned above), the computer device calls the forward activation model to activate x, and obtains a second predicted text y1 of the target sample, and the computer device combines x and y1 into an activation sample (x, y 1).

(2) The computer device can extract a second target text from the second source text and the second target text of each target sample to be used as a text to be processed of each target sample, then the computer device calls a reverse activation model to perform activation processing on the second target text of each target sample to obtain a second predicted text of each target sample, and the second target text of each target sample and the second predicted text of each target sample are combined into each second activation sample. Illustratively, the target sample is (x, y), the text to be processed is y (i.e. the second target text mentioned above), the computer device invokes a reverse activation model to activate y, so as to obtain a second predicted text x1 of the target sample, and the computer device combines x1 and y into an activation sample (x1, y).

(3) The computer equipment can extract a second source text and a second target text from a second source text and a second target text of each target sample to be used as texts of each target sample to be processed, and the computer equipment calls a forward activation model to perform activation processing on the second source text of each target sample to obtain a third predicted text of each target sample; and calling a reverse activation model to perform activation processing on the second target text of each target sample to obtain a fourth predicted text of each target sample. And finally, combining the second source text and the third predicted text of each target sample into each second activation sample by the computer equipment, and combining the second target text and the fourth predicted text of each target sample into each second activation sample, so that the performance of the target recognition model can be improved to the maximum extent by a subsequently generated second activation sample set.

Illustratively, the target sample is (x, y), the text to be processed is x, the computer device invokes a forward activation model to perform activation processing on x (i.e., the second source text), so as to obtain y1 (i.e., the third predicted text) of the target sample; the reverse activation model pair y (i.e., the second target text) is invoked, resulting in x1 for the target sample. The computer determines both x1 and y1 as the second predicted text according to the user's needs. X1 and y of the target sample are combined into an activated sample (x1, y), and x and y1 of the target sample are combined into an activated sample (x, y 1).

It should be noted that the above-mentioned process of activating each target sample is a process of re-labeling the second source text or the second target text of the target sample, for example, if the second source text is x, re-labeling the x is performed, and the label may be understood as y 1. The generation of each second activation sample is actually referred to as synthesizing parallel corpora, and the parallel corpora can be (x1, y), (x, y1), and the like.

S205, training the recognition processing model according to the first activation sample set and the second activation sample set to obtain a target recognition model.

In one possible embodiment, after obtaining the target recognition model, the user may submit the request at a web page or translation interface provided by the computer device. The computer equipment can respond to the translation request aiming at the text to be translated, call the target recognition model to perform text translation on the text to be translated to obtain the translated text of the text to be translated, and then output the translated text. The target recognition model is used for translating, so that a relatively accurate translation text can be obtained.

In the embodiment of the application, the computer device activates a target sample set in an original sample set to obtain a first activation sample set, and trains the recognition processing model according to the first activation sample set and a second activation sample set in the original sample set. Under the condition that a training model is not changed and extra data is not added, noise data does not need to be removed, the training data is reasonably utilized to train the translation model, the translation accuracy of the translated model after training is effectively improved while the data volume of the translation model is guaranteed, different activation models can exist in the process of activating the target sample set, the accuracy of the obtained first activation sample set is guaranteed, and the translation accuracy of the translated model after training is further improved.

Referring to fig. 3, fig. 3 is a schematic flowchart of another model training method according to an embodiment of the present disclosure. The model training method described in this embodiment is introduced mainly by the fact that the original recognition result includes the first predicted text, and includes the following steps S301 to S307:

s301, obtaining an original sample set, training a sample recognition model according to the original sample set, and obtaining an original recognition model.

Wherein the original sample set comprises a plurality of original samples.

It should be noted that, for a specific implementation manner of step S301, reference may be made to a specific implementation manner of step S201.

And S302, calling an original identification model to respectively identify each original sample to obtain a first predicted text of each original sample.

Each original sample comprises a first source text and a first target text corresponding to the first source text.

In a possible embodiment, the computer device may call the original recognition model to perform recognition processing on the first source text of each original sample, respectively, to obtain a first predicted text of each original sample. In another possible embodiment, the first predicted text includes a first unit of text and a second unit of text, and the original recognition model includes a forward recognition model and a backward recognition model. The computer equipment can respectively train two preset recognition models according to a large amount of training data to obtain a forward recognition model and a reverse recognition model. Then the computer equipment can call a forward recognition model to perform recognition processing on the first source text to obtain a first unit text; the computer equipment can call the reverse recognition model to recognize the first target text to obtain a second unit text.

S303, determining the identification error of each original sample according to the first predicted text corresponding to each original sample.

In a possible embodiment, the application provides two ways of determining the identification error of each original sample, wherein one way is to determine the identification error of each original sample through the first target text and the first predicted text corresponding to each original sample; the other is by determining the recognition error for each original sample based on the first predicted text, the first source text, and the first target text for each original sample.

First, a first determination method will be explained in detail:

after the first predicted text of each original sample is obtained, a polling priority can be set for the plurality of original samples, the computer device selects a target original sample from the plurality of original samples according to the polling priority, aligns the first target text of the target original sample and the first predicted text of the target original sample, counts overlapped characters of the aligned first target text and the first predicted text, and determines the recognition error of the target original sample according to the overlapped characters and the total quantity of the characters. The total number of characters refers to the total number of characters of the first target text of the target original sample or the first predicted text of the target original sample. In a specific implementation, in order to accurately determine the recognition error of the target original sample, the first target text of the target original sample and the first predicted text of the target original sample may be aligned, so that whether the characters at the same position are the same or not may be determined. The computer device counts the number of the same characters, determines the sample accuracy of the identification processing of the target original sample according to the ratio of the number of the same characters to the total number of the characters, determines the identification error of the target original sample according to the sample accuracy, and stops polling when each original sample is used as the target original sample.

For example, the target original sample comprises a first source text of "have lunch", a first target text corresponding to the first source text of "eat", the computer device calls the original recognition model to recognize the first source text to obtain a first predicted text of the original sample of "eat melon", the computer aligns the "eat" and the "eat melon", the computer device counts that the number of characters is 1, that is, only the characters are "eat" and the total number of the characters is 2, the accuracy of the sample for recognizing the target original sample is determined to be 0.5 according to the ratio of the number 1 of the characters to the total number of the characters 2, and the computer device determines the recognition error of the target original sample to be 0.5 according to the accuracy of the sample.

In a possible embodiment, after the first target text of the target original sample and the first predicted text of the target original sample are aligned, if the characters at the same position are the same, the probability at the position is 1; if the characters at the same position are different, the probability at the position is 0, the probabilities at all the positions are added to calculate the average, the accuracy of the target original sample can be determined, and the identification error of the target original sample can be determined according to the accuracy of the target original sample.

It should be noted that, when each original sample is taken as a target sample, the identification error of each original sample can be determined according to the implementation procedure of the target original sample.

The second determination method will be described in detail:

in one possible embodiment, the computer device may introduce cross entropy to determine recognition error for text recognition after obtaining the first unit text and the second unit text. In particular implementations, the computer device may determine a first cross entropy for each original sample based on the first unit text for each original sample and the first target text for each original sample; and determining a second cross entropy for each original sample based on the second unit text for each original sample and the first source text for each original sample. The first cross entropy and the second cross entropy for each original sample are then superimposed as an identification error for each original sample. Or superposing and averaging the first cross entropy and the second cross entropy of each original sample, and taking the averaged cross entropy as the identification error of each original sample.

S304, according to the identification error of each original sample, at least one target sample meeting a first preset condition is determined from the original samples, and at least one first activation sample not meeting the first preset condition is determined from the original samples.

In a specific implementation, a first preset condition can be set in advance according to requirements or experience, the computer device judges whether the identification error of each original sample meets the first preset condition, and the original sample meeting the first preset condition is determined as a target sample; the original sample not satisfying the first preset condition is determined as the first activation sample.

In one possible embodiment, the first preset condition is that the identification error is greater than the error threshold, and the computer device may determine whether the identification error of each original sample is greater than the error threshold, determine the original sample corresponding to the identification error greater than the error threshold as the target sample, and determine the original sample corresponding to the identification error less than or equal to the error threshold as the first activated text.

In a possible embodiment, if the first preset condition is that the original samples with the target proportion are obtained from the original samples from high to low, the computer device may sort each original sample according to the identification error of each original sample, obtain the original samples with the target proportion from high to low from the original samples, use the obtained original samples with the target proportion as the target samples, and determine the rest original samples without the target proportion as the first activated samples. Illustratively, if the number of the original samples is 100, and the first preset condition is that 10% of the original samples are obtained from 100 original samples from high to low, the computer device performs sorting according to the identification errors of the 100 original samples, and then obtains 10 original samples satisfying 10% from the 100 original samples from high to low as target samples, that is, obtains 10 original samples from the 100 original samples from high to low, and determines all the 10 original samples as target samples; the remaining original samples that do not satisfy 10% are determined as the first activation samples, i.e., each of the remaining 90 original samples is determined as the first activation sample.

S305, combining at least one target sample into a target sample set, and combining at least one first active sample into a first active sample set.

S306, calling an activation model to perform activation processing on each target sample to obtain a second activation sample set; wherein the activation model is trained based on a first set of activation samples.

S307, training the recognition processing model according to the first activation sample set and the second activation sample set to obtain a target recognition model.

The specific implementation manners of steps S306 to S307 can refer to the specific implementation manners of steps S204 to S205.

In the embodiment of the application, the computer device determines the identification error of each original text according to the first predicted text, and divides each original sample according to whether the identification error meets a first preset condition to obtain a reasonable target sample set and a first activation sample set. The data size required by model training is ensured, and after the subsequent target sample set is activated, the recognition processing model is trained with each first activation sample, so that the translation accuracy of the target recognition model is more effectively improved.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating another model training method according to an embodiment of the present disclosure. The model training method described in the embodiments of the present application is introduced mainly by using the original recognition result including the phrase probability set corresponding to the first target text, and includes the following steps S401 to S407:

s401, obtaining an original sample set, training a sample recognition model according to the original sample set, and obtaining an original recognition model.

In a particular implementation, the original sample set obtained may be used { [ x ]ⁿ，yⁿ]Means that the computer device is based on the original sample set { [ x ]ⁿ，yⁿ]Training a sample recognition model, wherein the essence of the training is to maximize the original sample set { [ x ]ⁿ，yⁿ]And (4) log-likelihood estimation, so that the obtained original recognition model can determine the phrase probability of each original sample subsequently.

S402, calling an original identification model to respectively identify each original sample to obtain a phrase probability set corresponding to the first target text of each original sample.

Each original sample comprises a first source text and a first target sample corresponding to the first source text.

In a specific implementation, the computer device invokes the original recognition model to respectively recognize the first source text of each original sample according to the probabilities of the corresponding different phrases in the original recognition model, and determines a phrase probability set corresponding to the first target text of each original sample according to the probabilities of the corresponding different phrases in the original recognition model. Illustratively, the original sample is (I am going to have meal), let the probability of the phrase "have lunch" be 0.2, and the probability of the phrase "eat lunch" be 0.1. The computer device calls the original recognition model to recognize that "I are ready to eat". The phrase "eat" of the first source text corresponds to the phrase of the first target text of the original sample and is "have lunch", the computer device can determine that the probability of the phrase "have lunch" is 0.2, and similarly, the probabilities of the phrase "I", the phrase "am", and the phrase "going to" can be sequentially determined. And combining the phrase probabilities in the first target text 'I am going to have luma' of the original sample to obtain a phrase probability set corresponding to the first target text.

And S403, superposing the phrase probabilities in the phrase probability set into a target probability.

In a specific implementation, the computer device may directly superimpose the phrase probabilities in the phrase probability set as the target probabilities. Or the computer device may calculate the phrase probability in the phrase probability set by using the maximum likelihood estimation to obtain the target probability, that is, the calculation formula of the maximum likelihood estimation is: i (y | x) ═ pi p (y)_t|x,y<t). Wherein, p (y)_t| x) is the phrase probability, I (y | x) represents the probability of getting y at x, and the target probability represents the confidence of the source-end sentence (i.e., the first source text) x to the target-end sentence y (i.e., the first target text). In a possible embodiment, after determining the target probability, the first target sample may be normalized according to the target probability, so as to be able to subsequently influence the text length.

S404, according to the target probability of each original sample, at least one target sample meeting a second preset condition is determined from the original samples, and at least one first activation sample not meeting the second preset condition is determined from the original samples.

The second preset condition can be set according to requirements or experience. Empirically, if a raw sample achieves a low probability of being a target, the raw sample is unlikely to provide useful information on the target recognition model performance improvement and can be considered a target sample (or inactive sample). Therefore, the target probability can be used as an index to measure the activity of each original sample, and further, the characteristic difference (whether the original sample is active) of each original sample can be researched. In a specific implementation, the computer device may rank the original samples according to the target probability of each original sample, then determine whether the target probability of each original sample satisfies a second preset condition, divide the original samples whose target probabilities satisfy the second preset condition into target samples, and divide the original samples whose target probabilities do not satisfy the second preset condition into target sample first activated samples.

In a possible embodiment, the second preset condition is that the original samples with the target proportion are obtained from the original samples from low to high, the computer device may sort each original sample according to the target probability of each original sample, obtain the original samples with the target proportion from low to high from the original samples, use the obtained original samples with the target proportion as the target samples, and determine the remaining original samples without the target proportion as the first activation samples. Exemplarily, if the number of the original samples is 50, and the second preset condition is that 10% of the original samples are obtained from 50 original samples from low to high, the computer device performs sorting according to the identification errors of the 50 original samples, then obtains the original samples meeting the 10% from the 50 original samples from low to high as target samples, that is, obtains 5 original samples from the 50 original samples from low to high, and determines all the 5 original samples as the target samples; the remaining original samples that do not satisfy 10% are determined as the first activation samples, i.e., each of the remaining 45 original samples is determined as the first activation sample.

It should be noted that the set target proportion may be determined according to the characteristic difference of the original sample set. Wherein, the characteristic difference of the original sample set refers to the activity degree. For a particular raw sample set, a target proportion may be determined according to a particular validation test. For example, for the original sample set of "english-french", according to a specific verification test, it can be known that the original sample with the target proportion of 10% is the most reasonable target sample, and the performance of the target recognition model can be improved.

In one possible embodiment, the first preset condition is that the target probability is less than a sentence level threshold, the computer device may determine whether the target probability of each original sample is less than the sentence level threshold, determine the original sample corresponding to the sentence level threshold as the target sample, and determine the original sample corresponding to the sentence level threshold or greater as the first activated text. Wherein the sentence level threshold may be set empirically.

S405, combining at least one target sample into a target sample set, and combining at least one first active sample into a first active sample set.

S406, calling an activation model to perform activation processing on each target sample to obtain a second activation sample set; wherein the activation model is trained based on the first set of activation samples;

s407, training a recognition processing model according to the first activation sample set and the second activation sample set to obtain a target recognition model.

The specific implementation manners of steps S405 to S407 can refer to the specific implementation manners of steps S305 to S307.

In the embodiment of the application, the computer device sets a second preset condition according to the performance requirement of the translated model after training, and divides each original text into the target sample and the first activation sample according to whether the phrase probability set corresponding to the first target text meets the second preset condition, so that an accurate and reasonable target sample set and a first activation sample set can be obtained, the data volume required by model training is ensured, and after the subsequent target sample set is activated, the recognition processing model is trained with each first activation sample, and the translation accuracy of the target recognition model is more effectively improved.

To sum up, the model training method of the embodiment of the present application mainly includes the following steps:

(1) first, based on training data

Training an NMT model (corresponding to the original sample set) as a recognition model M_{identification}(corresponding to the original recognition model described above).

(2) By identifying the model M_{identification}The sentence-level probability I (y | x) (corresponding to the above-mentioned target probability) is calculated for each sample (x, y) of training data, so that all training data are sorted according to I (y | x).

(3) Segmenting training data into S ═ S_inactive+S_activeConsider 10% of the training data with the lowest I (y | x) to be inactive samples S_inactive(corresponding to the target sample above), the remaining 90% are active samples S_active(corresponding to the first activation sample described above).

(4) Based on active samplesThis S_inactiveTraining another NMT model as an activation model M_rejuvenation。

(5) By activating model M_rejuvenationTranslating the source sentence x (corresponding to the second source text) of the inactive sample to obtain a target translation result y '(second predicted text), and forming an activation sample (corresponding to the second activation sample) by the synthesized parallel sentence pairs (x, y'), namely

(6) Combining the active sample with the original active sample to obtain S ═ S_rejuvenated+S_activeThe final NMT model (target recognition model) is trained based on the data set S'.

Aiming at the model training method provided above, the model training method is verified through the following three groups of experiments.

(1) An experiment is carried out by adopting an original sample set as English-German, samples (second preset conditions) under different target proportions are selected according to target probability to serve as inactive samples, the final performance of the NMT model is improved by the model training method, the performance of the NMT model when the samples under different target proportions are regarded as the inactive samples is shown in figure 5, and meanwhile, the final NMT model performance result when the samples sampled randomly at the same proportion are activated is shown, and obviously, the effect of activating the inactive samples is obviously superior to that of activating the random samples at the same proportion. It can be seen from fig. 5 that this ratio of 10% is reasonable as the target ratio. Meanwhile, the final performance improvement of the model gradually decreases with the increase of the proportion of the inactive samples. Intuitively, it is reasonable to say that such a phenomenon is mainly that samples with a relatively high target probability can themselves provide useful information to the NMT model, forcing them to not only not significantly improve, but may even jeopardize the final performance. Thus, 10% was experimentally verified to be a target proportion of a relatively suitable inactive sample in "english-german".

(2) The model training method was validated on different model frameworks and language pairs. As shown in table 1, the model training method achieves consistent and significant improvement on the baseline model with powerful performance and on WMT14 (a machine translation competition) pairs of "english-german" and "english-french", which demonstrates the effectiveness and versatility of the model training method. It should be noted that the model training method achieves significant improvement without changing the model and adding additional data, which makes the model training method robustly applicable to existing NMT models.

TABLE 1

(3) The proposed inactive sample activation method is compared to past related work, including data diversification and data noise reduction. As shown in table 2, the model training method and the related work can improve the performance of the NMT model alone, but the model training method provided in the embodiment of the present application can be further improved on the basis of other methods, which indicates that the model training method and the related work are complementary. Also, we calculated the overlap ratio between the noise samples identified in the data denoising and our inactive samples, and found that only 32% of the inactive samples were also noise samples. This illustrates that our inactive samples do not consist entirely of noisy data.

TABLE 2

Aiming at the verification of the model training method provided by the embodiment of the application, the embodiment of the application only focuses on the problem of translation data, a frame of a target recognition model does not need to be adjusted, and experiments show that the performance of the NMT model is hardly influenced by removing 10% of the least active samples. Further, we found that the least active samples obtained by different random seeds, model sizes and model frameworks had a high coincidence (80%). The presence of inactive samples in a large-scale dataset is visible and does not rely on a model but the data distribution itself (i.e. the nature of the data itself). We performed experiments on two standard translation datasets, WMT14 english-german and english-french, and found that data activation consistently and significantly improved the performance of all NMT models.

Further, please refer to fig. 6, which is a schematic structural diagram of a model training apparatus according to an embodiment of the present application. As shown in fig. 6, the model training apparatus may be applied to the computer device in the embodiment corresponding to fig. 2, fig. 3, or fig. 4, specifically, the model training apparatus may be a computer program (including program code) running in the computer device, for example, the model training apparatus is an application software; the model training device can be used for executing corresponding steps in the method provided by the embodiment of the application.

An obtaining module 601, configured to obtain an original sample set, train a sample recognition model according to the original sample set, and obtain an original recognition model; the original sample set comprises a plurality of original samples;

a calling module 602, configured to call the original identification model to perform identification processing on each original sample, so as to obtain an original identification result of each original sample;

a determining module 603, configured to divide the original sample set into a target sample set and a first active sample set according to an original recognition result of each original sample, where the target sample set includes at least one target sample;

the calling module 602 is further configured to call an activation model to perform activation processing on each target sample, so as to obtain a second activation sample set; wherein the activation model is trained based on the first set of activation samples;

the determining module 603 is further configured to train a recognition processing model according to the first activation sample set and the second activation sample set, so as to obtain a target recognition model.

In one possible embodiment, the original recognition result includes a first predicted text; the determining module 603 is specifically configured to:

determining the identification error of each original sample according to the first predicted text corresponding to each original sample;

determining at least one target sample meeting a first preset condition from the original samples according to the identification error of each original sample, and determining at least one first activation sample not meeting the first preset condition from the original samples;

combining the at least one target sample into a set of target samples, and combining the at least one first activation sample into a set of first activation samples.

In a possible embodiment, each original sample includes a first source text and a first target text corresponding to the first source text, and the first predicted text is a text obtained by performing recognition processing on the first source text by the original recognition model; the determining module 603 is specifically configured to:

setting polling priorities for the original samples, and selecting a target original sample from the original samples according to the polling priorities;

aligning a first target text of the target original sample and a first predicted text of the target original sample;

counting overlapped characters of the aligned first target text and the first predicted text, and determining the identification error of the target original sample according to the overlapped characters and the total amount of the characters;

stopping polling when each of the original samples is taken as a target original sample.

In a possible embodiment, each original sample includes a first source text and a first target text corresponding to the first source text, the first predicted text includes a first unit text and a second unit text, the original recognition model includes a forward recognition model and a reverse recognition model, the first unit text is a text recognized by the forward recognition model from the first source text, and the second unit text is a text recognized by the reverse recognition model from the first target text; the determining module 603 is specifically configured to:

determining a first cross entropy of each original sample according to the first unit text of each original sample and the first target text of each original sample;

determining a second cross entropy of each original sample according to the second unit text of each original sample and the first source text of each original sample;

and adding the first cross entropy and the second cross entropy of each original sample into the identification error of each original sample.

In a feasible embodiment, each original sample includes a first source text and a first target text corresponding to the first source text, the original recognition result includes a phrase probability set corresponding to the first target text, and the phrase probability set is obtained by performing recognition processing on a phrase in the first target text by the original recognition model; the determining module 603 is specifically configured to:

superposing the phrase probabilities in the phrase probability set into a target probability;

determining at least one target sample meeting a second preset condition from all the original samples according to the target probability of each original sample, and determining at least one first activation sample not meeting the second preset condition from all the original samples;

In a possible embodiment, each target sample comprises a second source text and a second target text; the determining module 603 is configured to extract a to-be-processed text of each target sample from a second source text and a second target text of each target sample;

the calling module 602 is configured to call an activation model to perform activation processing on the text to be processed of each target sample, so as to obtain a second predicted text of each target sample;

the determining module 603 is configured to combine the to-be-processed text and the second predicted text of each target sample into each second activation sample;

the determining module 603 is configured to combine the second activation samples into a second activation sample set.

In one possible embodiment, the target recognition model is a model for text translation; the device further comprises: an output module 604 for outputting, among other things,

the determining module 603 is configured to respond to a translation request for a text to be translated;

the calling module is used for calling the target recognition model to perform text translation on the text to be translated to obtain a translation text of the text to be translated;

the output module 604 is configured to output the translation text.

It can be understood that the functions of the functional modules of the model training apparatus in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the description related to fig. 2, fig. 3, or fig. 4 in the foregoing method embodiment, which is not described herein again.

Further, please refer to fig. 7, where fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application. The computer device in the corresponding embodiment of fig. 2, fig. 3 or fig. 4 may be the computer device shown in fig. 7. As shown in fig. 7, the computer device may include: a processor 701, an input device 702, an output device 703, and a memory 704. The processor 701, the input device 702, the output device 703, and the memory 704 are connected by a bus 705. The memory 704 is used to store a computer program comprising program instructions, and the processor 701 is used to execute the program instructions stored by the memory 704.

In the embodiment of the present application, the processor 701 executes the executable program code in the memory 704 to perform the following operations: obtaining an original sample set, training a sample recognition model according to the original sample set, and obtaining an original recognition model; the original sample set comprises a plurality of original samples; calling the original identification model to respectively identify each original sample to obtain an original identification result of each original sample; dividing the original sample set into a target sample set and a first activation sample set according to an original recognition result of each original sample, wherein the target sample set comprises at least one target sample; calling an activation model to perform activation processing on each target sample to obtain a second activation sample set; wherein the activation model is trained based on the first set of activation samples; and training a recognition processing model according to the first activation sample set and the second activation sample set to obtain a target recognition model.

In one possible embodiment, the original recognition result includes a first predicted text; the processor 701 is specifically configured to: determining the identification error of each original sample according to the first predicted text corresponding to each original sample; determining at least one target sample meeting a first preset condition from the original samples according to the identification error of each original sample, and determining at least one first activation sample not meeting the first preset condition from the original samples; combining the at least one target sample into a set of target samples, and combining the at least one first activation sample into a set of first activation samples.

In a possible embodiment, each original sample includes a first source text and a first target text corresponding to the first source text, and the first predicted text is a text obtained by performing recognition processing on the first source text by the original recognition model; the processor 701 is specifically configured to: setting polling priorities for the original samples, and selecting a target original sample from the original samples according to the polling priorities; aligning a first target text of the target original sample and a first predicted text of the target original sample; counting overlapped characters of the aligned first target text and the first predicted text, and determining the identification error of the target original sample according to the overlapped characters and the total amount of the characters; stopping polling when each of the original samples is taken as a target original sample.

In a possible embodiment, each original sample includes a first source text and a first target text corresponding to the first source text, the first predicted text includes a first unit text and a second unit text, the original recognition model includes a forward recognition model and a reverse recognition model, the first unit text is a text recognized by the forward recognition model from the first source text, and the second unit text is a text recognized by the reverse recognition model from the first target text; the processor 701 is specifically configured to: determining a first cross entropy of each original sample according to the first unit text of each original sample and the first target text of each original sample; determining a second cross entropy of each original sample according to the second unit text of each original sample and the first source text of each original sample; and adding the first cross entropy and the second cross entropy of each original sample into the identification error of each original sample.

In a feasible embodiment, each original sample includes a first source text and a first target text corresponding to the first source text, the original recognition result includes a phrase probability set corresponding to the first target text, and the phrase probability set is obtained by performing recognition processing on a phrase in the first target text by the original recognition model; the processor 701 is specifically configured to: superposing the phrase probabilities in the phrase probability set into a target probability; determining at least one target sample meeting a second preset condition from all the original samples according to the target probability of each original sample, and determining at least one first activation sample not meeting the second preset condition from all the original samples; combining the at least one target sample into a set of target samples, and combining the at least one first activation sample into a set of first activation samples.

In a possible embodiment, each target sample comprises a second source text and a second target text; the processor 701 is specifically configured to: extracting texts to be processed of the target samples from the second source texts and the second target texts of the target samples; calling the activation model to perform activation processing on the text to be processed of each target sample to obtain a second predicted text of each target sample; combining the text to be processed and the second predicted text of each target sample into each second activation sample; and combining the second activation samples into a second activation sample set.

In one possible embodiment, the target recognition model is a model for text translation; the processor 701 is specifically configured to: responding to a translation request aiming at the text to be translated; calling the target recognition model to perform text translation on the text to be translated to obtain a translation text of the text to be translated; and outputting the translation text.

It should be understood that, in the embodiment of the present Application, the Processor 701 may be a Central Processing Unit (CPU), and the Processor 701 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 704 may include both read-only memory and random-access memory, and provides instructions and data to the processor 701. A portion of the memory 704 may also include non-volatile random access memory.

The input device 702 may include a keyboard or the like, and inputs a translation request to the processor 701; the output device 703 may include a display or the like.

In a specific implementation, the processor 701, the input device 702, the output device 703, and the memory 704 described in this embodiment may perform the implementation described in all the embodiments, or may also perform the implementation described in the apparatus described above, and no further description is provided herein.

A computer-readable storage medium is provided in an embodiment of the present application, and stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, can perform the steps performed in all the above embodiments.

Embodiments of the present application further provide a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium, and when the computer instructions are executed by a processor of a computer device, the computer instructions perform the methods in all the embodiments described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of model training, comprising:

2. The method of claim 1, wherein the original recognition result comprises a first predicted text; the dividing the original sample set into a target sample set and a first activation sample set according to the original recognition result of each original sample includes:

3. The method according to claim 2, wherein each original sample comprises a first source text and a first target text corresponding to the first source text, and the first predicted text is a text obtained by performing recognition processing on the first source text by the original recognition model;

the determining the identification error of each original sample according to the first predicted text corresponding to each original sample comprises:

4. The method according to claim 2, wherein each original sample comprises a first source text and a first target text corresponding to the first source text, the first predicted text comprises a first unit text and a second unit text, the original recognition model comprises a forward recognition model and a reverse recognition model, the first unit text is a text recognized by the forward recognition model from the first source text, and the second unit text is a text recognized by the reverse recognition model from the first target text;

5. The method according to claim 1, wherein each original sample includes a first source text and a first target text corresponding to the first source text, the original recognition result includes a phrase probability set corresponding to the first target text, and the phrase probability set is obtained by performing recognition processing on a phrase in the first target text by the original recognition model;

the dividing the original sample set into a target sample set and a first activation sample set according to the original recognition result of each original sample includes:

6. The method of claim 1, wherein each target sample comprises a second source text and a second target text; the calling of the activation model to perform activation processing on each target sample to obtain a second activation sample set includes:

extracting texts to be processed of the target samples from second source texts and second target texts of the target samples;

calling an activation model to perform activation processing on the text to be processed of each target sample to obtain a second predicted text of each target sample;

combining the text to be processed and the second predicted text of each target sample into each second activation sample;

and combining the second activation samples into a second activation sample set.

7. The method of claim 1, wherein the target recognition model is a model for text translation; the method further comprises the following steps:

responding to a translation request aiming at the text to be translated;

calling the target recognition model to perform text translation on the text to be translated to obtain a translation text of the text to be translated;

and outputting the translation text.

8. A model training apparatus, comprising:

9. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1-7.

10. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method of any one of claims 1-7.