CN113139391A

CN113139391A - Translation model training method, device, equipment and storage medium

Info

Publication number: CN113139391A
Application number: CN202110454958.8A
Authority: CN
Inventors: 潘骁; 王明轩; 吴礼蔚; 李磊
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2021-07-20
Anticipated expiration: 2041-04-26
Also published as: WO2022228041A1; CN113139391B

Abstract

The embodiment of the disclosure discloses a method, a device, equipment and a storage medium for training a translation model. The method comprises the following steps: acquiring at least one original corpus; aligning and replacing at least one original vocabulary of a source corpus in the original corpus with a target vocabulary with the same meaning to obtain a replacement corpus corresponding to the original corpus; wherein the original vocabulary and the target vocabulary are different in language; and constructing a pseudo parallel corpus based on the original corpus and the replacement corpus, and training a preset basic translation model by using the pseudo parallel corpus to obtain a target translation model. The method improves the translation accuracy of the translation model on other non-universal language pairs.

Description

Translation model training method, device, equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, and in particular relates to a translation model training method, a translation model training device, translation model training equipment and a storage medium.

Background

With the continuous development of computer technology, various translation software is produced at the same time, and becomes an important channel for people to obtain external information.

The existing translation software, the language translation model of which is usually built by training based on a parallel corpus centered on a certain general language, is used to implement the translation between the general language and other languages (for example, the general language is english, such as implementing an english translation method). However, such translation software has less accuracy in translating other non-common language pairs (e.g., german translations).

Disclosure of Invention

The disclosure provides a method, a device, equipment and a storage medium for training a translation model so as to improve the translation accuracy of the translation model in each scene.

In a first aspect, an embodiment of the present disclosure provides a method for training a translation model, including:

acquiring at least one original corpus;

aligning and replacing at least one original vocabulary of a source corpus in the original corpus with a target vocabulary with the same meaning to obtain a replacement corpus corresponding to the original corpus; wherein the original vocabulary and the target vocabulary are different in language;

and constructing a pseudo parallel corpus based on the original corpus and the replacement corpus, and training a preset basic translation model by using the pseudo parallel corpus to obtain a target translation model.

In a second aspect, an embodiment of the present disclosure provides a training apparatus for a translation model, including:

the acquisition module is used for acquiring at least one original corpus;

the replacing module is used for aligning and replacing at least one original vocabulary of the source corpus in the original corpus with a target vocabulary with the same meaning to obtain a replacing corpus corresponding to the original corpus; wherein the original vocabulary and the target vocabulary are different in language;

a construction module for constructing pseudo parallel corpora based on the original corpora and the replacement corpora;

and the training module is used for training a preset basic translation model by using the pseudo parallel corpus so as to obtain a target translation model.

In a third aspect, an embodiment of the present disclosure provides a training apparatus for a translation model, including a memory and a processor, where the memory stores a computer program, and the processor implements, when executing the computer program, the steps of the training method for a translation model provided in the first aspect of the embodiment of the present disclosure.

In a fourth aspect, the present disclosure provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the training method for translation models provided in the first aspect of the present disclosure.

According to the technical scheme provided by the embodiment of the disclosure, at least one original corpus is obtained, at least one original vocabulary of a source corpus in the original corpus is aligned and replaced by a target vocabulary with the same meaning, a replacement corpus corresponding to the original corpus is obtained, and the languages of the original vocabulary and the target vocabulary are different; and constructing a pseudo parallel corpus based on the original corpus and the replacement corpus, and training a preset basic translation model by using the pseudo parallel corpus to obtain a target translation model. That is to say, after at least one original vocabulary of the source corpus in the original corpus is aligned and replaced by synonymous vocabulary of any other language, a large number of pseudo parallel corpora containing any other language can be constructed, and the translation model is trained by using the pseudo parallel corpora, so that the translation model can learn the grammatical structure and vocabulary association between any other language, and the translation accuracy of the translation model on other non-universal language pairs is improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flow chart of a method for training a translation model according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a principle of a pseudo-parallel corpus construction process according to an embodiment of the present disclosure;

FIG. 3 is another schematic diagram illustrating a construction process of pseudo-parallel corpus according to an embodiment of the disclosure;

fig. 4 is another schematic flow chart of a method for training a translation model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a training process of a translation model according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a training apparatus for translation models according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a training apparatus for a translation model according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict.

It should be noted that the execution subject of the method embodiments described below may be a training apparatus for a translation model, and the apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of software and hardware. Optionally, the electronic device may be a client, including but not limited to a smart phone, a tablet computer, an e-book reader, a vehicle-mounted terminal, and the like. Of course, the electronic device may also be an independent server or a server cluster, and the embodiment of the present disclosure does not limit the specific form of the electronic device. The method embodiments described below are described by taking as an example that the execution subject is an electronic device.

Fig. 1 is a schematic flow chart of a method for training a translation model according to an embodiment of the present disclosure. The embodiment relates to a specific process of how the electronic device trains the multilingual translation model. As shown in fig. 1, the method may include:

s101, obtaining at least one original corpus.

The original corpus is used as training data for training a basic translation model. The modality of the original corpus may be at least one of image, text, video or audio. Of course, the language included in the original corpus may be a single language or multiple languages, which is not limited in this embodiment.

Alternatively, the original corpus may include monolingual corpus and/or parallel corpus. The parallel corpora comprise paired source end corpora and target end corpora. The source corpus can be understood as the corpus before translation, and the target corpus can be understood as the corpus after translation of the source corpus. Taking parallel corpora as text corpora as an example, the Chinese-English parallel corpora include a Chinese document and a corresponding English document, if Chinese-English translation operation is performed through a translation model, the Chinese document is the source-end corpora, and the English document is the target-end corpora.

The monolingual corpus can be a source end corpus or a target end corpus, and lacks corresponding parallel corpora. For example, in the field of traditional Chinese medicine, a large number of chinese corpora can be obtained, and also english corpora can be obtained, but parallel corpora corresponding to each other in chinese and english are difficult to obtain.

Generally, a corpus database stores a large amount of original corpuses, so that the electronic device can directly obtain at least one original corpus from the corpus database.

S102, at least one original word of the source corpus in the original corpus is aligned and replaced by a target word with the same meaning, and a replacement corpus corresponding to the original corpus is obtained.

The target vocabulary is the synonym of the original vocabulary, and the language of the original vocabulary is different from that of the target vocabulary. Because the multilingual synonym dictionary is relatively easy to obtain, the electronic equipment can replace at least one original vocabulary of the source corpus in the original corpus with synonyms of other arbitrary languages based on the multilingual synonym dictionary, so that the replacement corpus corresponding to the original corpus is obtained.

In order to enable translation between multiple languages, optionally, the languages of the target words in the alternative corpus are at least partially different.

The following introduces the process of obtaining the above-mentioned alternative corpus based on parallel corpus and monolingual corpus:

when the original corpus is a parallel corpus, part of vocabularies of the source corpus in the parallel corpus can be aligned and replaced by synonymy vocabularies of other arbitrary languages, and the target corpus remains unchanged. For example, as shown in FIG. 2, assume that the source corpus of the parallel corpus is "I like sing and dance", which is in English, and the target corpus is "J' adore chanter et danser", which is in French. Therefore, the electronic equipment can replace the word 'singing' in the source end corpus to be aligned with the Chinese word 'singing' with the same meaning, and replace the word 'dance' to be aligned with the Chinese word 'dancing' with the same meaning, so that the replacement corpus 'Ilike singing and dancing' corresponding to the source end corpus is formed, the target end corpus is not replaced, and the target end corpus is kept unchanged. Of course, the languages of the target words after alignment and replacement may be the same or different, that is, the word "sing" may be aligned and replaced with german word having the same meaning, and the word "dance" may be aligned and replaced with chinese word having the same meaning.

When the original corpus is a monolingual corpus, the monolingual corpus is the source-end corpus, and no corresponding translation corpus exists, at this time, part of the vocabulary in the monolingual corpus can be aligned and replaced by synonym vocabulary of any other language. The replaced synonyms have at least partially different languages, that is, the replaced synonyms may be of the same language or different languages. For example, as shown in fig. 3, assuming that the monolingual corpus is a chinese corpus "what kind of music you like", in order to be able to use a large amount of monolingual corpora to train the translation model, the electronic device may replace the word "like" in the monolingual corpus with an english word "like" having the same meaning, replace the word "which" in the monolingual corpus with a french word "que" having the same meaning, and replace the word "music" with a german word "music" having the same meaning, thereby forming a replacement corpus "music wool of your like que type" corresponding to the monolingual corpus.

S103, constructing a pseudo parallel corpus based on the original corpus and the replacement corpus, and training a preset basic translation model by using the pseudo parallel corpus to obtain a target translation model.

Specifically, after the alternative corpus corresponding to the original corpus is obtained, the parallel corpus may be constructed based on the original corpus and the corresponding alternative corpus. Since the parallel corpus is not a standard corpus but obtained by synonym alignment and replacement, the parallel corpus is labeled as a pseudo-parallel corpus.

The construction process of the pseudo parallel corpus is introduced as follows for different original corpora:

when the original corpus is a parallel corpus, after part of vocabularies of the source corpus in the parallel corpus are aligned and replaced by synonymous vocabularies of other arbitrary languages, and a replacement corpus corresponding to the source corpus is obtained, the replacement corpus corresponding to the source corpus in the parallel corpus can be used as a pseudo source corpus, and a target corpus in the parallel corpus is used as a pseudo target corpus, so that the pseudo parallel corpus is formed. And the corpus of the target end is kept unchanged, so that the translation model can be ensured to learn a correct translation result. With continued reference to fig. 2, the alternative corpus "I like singing and dancing" is used as the pseudo source corpus, and the target corpus "J' adore chanter et danser" in the parallel corpus is continuously used as the pseudo target corpus, thereby forming the pseudo parallel corpus.

When the original corpus is a monolingual corpus, after part of vocabularies in the monolingual corpus are aligned and replaced by synonymous vocabularies of other arbitrary languages, and the replacement corpus corresponding to the monolingual corpus is obtained, the replacement corpus corresponding to the monolingual corpus can be used as a pseudo-source-end corpus and the monolingual corpus can be used as a pseudo-target-end corpus to form a pseudo-parallel corpus. Continuing to refer to fig. 3, the replacement corpus "your like quel type Musik woolen" is used as a pseudo source corpus, and the monolingual corpus itself "which kind of music woolen you like" is used as a pseudo target corpus, so as to form a pseudo parallel corpus.

After obtaining a large number of pseudo parallel corpora, the electronic device may train a preset basic translation model using the pseudo parallel corpora to obtain a target translation model. For the monolingual corpus, after simple alignment and replacement of synonymous words, the monolingual corpus can be directly applied to the training process of the translation model, a 'retranslation' technology is not needed, the process of training the translation model by using the monolingual corpus is greatly shortened, and therefore the training efficiency of the translation model is improved.

Optionally, the basic translation model may include a Sequence-to-Sequence model, which is a neural network of an Encoder-Decoder structure, with the input being a Sequence (Sequence) and the output also being a Sequence; in the Encoder, a variable-length sequence is converted into a fixed-length vector representation, and the Decoder converts the fixed-length vector representation into a variable-length target signal sequence, thereby realizing the input of an indefinite length to an indefinite length output. The sequence-to-sequence model may include various types, such as a seq2seq model based on a Recurrent Neural Network (RNN) and a seq2seq model based on a Convolution Operation (CONV), and the specific type of the basic translation model is not limited in this embodiment.

Partial vocabulary of a source corpus in an original corpus is replaced by synonymous vocabulary of other arbitrary languages, so that the constructed pseudo-parallel corpus contains other languages which do not appear in the original corpus, and the basic translation model is trained through the constructed pseudo-parallel corpus, so that the basic translation model can learn the grammatical structure and vocabulary association between other languages, and the translation accuracy of the translation model on other non-universal language pairs is enhanced. For example, a translation model is usually obtained by training parallel corpora centered on english, and therefore the translation effect on other non-english language pairs cannot meet the expected requirement, and for this reason, referring to fig. 2, the embodiment of the present disclosure replaces the english vocabulary "sing" of the source corpus in the original corpus with the chinese vocabulary "singing", the target corpus remains unchanged, and trains a basic translation model using a pseudo parallel corpus constructed by the replacement corpus and the target corpus in the parallel corpus, the basic translation model can learn that the Chinese word 'singing' and the French word 'chanter' have the same meaning, and can learn the grammatical structure and vocabulary association between the sentence "I like singing and dancing" and the sentence "J' adore chanter et danser", therefore, the translation between the Chinese language and the French language is realized, and the translation accuracy between the Chinese language and the French language is ensured.

The training method of the translation model provided by the embodiment of the disclosure obtains at least one original corpus, and at least one original vocabulary of a source corpus in the original corpus is aligned and replaced by a target vocabulary with the same meaning, so as to obtain a replacement corpus corresponding to the original corpus, wherein the languages of the original vocabulary and the target vocabulary are different; and constructing a pseudo parallel corpus based on the original corpus and the replacement corpus, and training a preset basic translation model by using the pseudo parallel corpus to obtain a target translation model. That is to say, after at least one original vocabulary of the source corpus in the original corpus is aligned and replaced by synonymous vocabulary of any other language, a large number of pseudo parallel corpora containing any other language can be constructed, and the translation model is trained by using the pseudo parallel corpora, so that the translation model can learn the grammatical structure and vocabulary association between any other language, and the translation accuracy of the translation model on other non-universal language pairs is improved.

In one embodiment, optionally, the base translation model comprises an encoder and a decoder. The encoder may perform feature extraction on the input sequence to obtain a feature vector. In order to further improve the accuracy of the encoder for extracting the input sequence features, on the basis of the foregoing embodiment, optionally, the process of training the preset basic translation model by using the pseudo parallel corpus in the foregoing S103 to obtain the target translation model may be: and training an encoder of the basic translation model through a first loss function by using the pseudo parallel corpus.

The first loss function is a contrast learning loss function and is used for updating parameters of the encoder. Because the pseudo parallel corpus is constructed based on the replacement corpus corresponding to the original corpus and the original corpus, namely the replacement corpus can be regarded as a synonymous sentence of the original corpus, in order to improve translation accuracy, when the pseudo parallel corpus is used for training a basic translation model, a comparison loss function can be used for training an encoder of the basic translation model, so that high-dimensional expression of the synonymous sentence coded by the encoder is drawn closer, and high-dimensional expression of an irrelevant sentence coded by the encoder is drawn further. That is, after the encoder is trained by comparing the learning loss functions, the features of the two input corpora, which are originally similar, are still similar in the feature space after being encoded by the encoder; after the two input corpora which are originally dissimilar are coded by the coder, the features of the two input corpora are still dissimilar in the feature space. Thus, when the basic translation model is trained by using the pseudo parallel corpus constructed by the parallel corpus and/or the monolingual corpus, the translation effect of the trained target translation model can be ensured.

Further, optionally, as shown in fig. 4, the process of training the encoder of the base translation model through the first loss function by using the pseudo parallel corpus may be:

s401, constructing positive examples linguistic data and negative examples linguistic data of the pseudo source end linguistic data in the pseudo parallel linguistic data.

The pseudo parallel corpus may include paired pseudo source end corpus and pseudo target end corpus, the pseudo source end corpus may be understood as a corpus before translation, and the pseudo target end corpus may be understood as a corpus after translation of the pseudo source end corpus. The just example corpus is a corpus with a matching degree with the pseudo source corpus larger than a first preset value, that is, the two corpora are similar corpora, and when the two corpora are completely similar, the matching degree of the two corpora is 1. The above negative examples of corpora refer to corpora whose matching degree with the pseudo source corpus is smaller than the second preset value, that is, the two corpora are irrelevant corpora, and when the two corpora are completely irrelevant, the matching degree of the two corpora is 0. The first preset value is larger than the second preset value.

Optionally, the pseudo source corpus is usually a replacement corpus corresponding to a source corpus in an original corpus, so the positive example corpus may be a pseudo target corpus in a current pseudo parallel corpus, and the negative example corpus is a pseudo target corpus in another pseudo parallel corpus.

Specifically, when the original corpus is a monolingual corpus, the pseudo source corpus in the pseudo parallel corpus may be a replacement corpus corresponding to the monolingual corpus, and the pseudo target corpus may be the monolingual corpus itself, so that the electronic device may use the monolingual corpus itself as a positive example corpus of the pseudo source corpus and use the pseudo target in other pseudo parallel corpora as a negative example corpus of the pseudo source corpus.

When the original corpus is a parallel corpus, the pseudo source corpus in the pseudo parallel corpus may be a replacement corpus corresponding to the source corpus in the parallel corpus, and the pseudo target corpus may be a target corpus in the parallel corpus, so that the electronic device may use the target corpus in the current parallel corpus as a positive corpus of the pseudo source corpus and use the pseudo target corpus in other pseudo parallel corpuses as a negative corpus of the pseudo source corpus.

For example, referring to fig. 5, It is assumed that a pseudo source corpus in a pseudo parallel corpus is "I love you", and a pseudo target corpus is "Je t 'ame", where the pseudo source corpus is a replacement corpus after source corpus in an original corpus is replaced by synonyms of other arbitrary languages, at this time, the pseudo target corpus "Je t' ame" may be used as a regular corpus of the replacement corpus "I love you", and the pseudo target corpus in other pseudo parallel corpora may be selected as a negative corpus of the replacement corpus "I love you" (for example, the english corpus "It's sunny", the french corpus "C' est la vie", and the chinese corpus "who" in fig. 5 are). In addition, a plurality of negative examples of linguistic data can be selected to train the encoder, and therefore the training efficiency of the translation model is improved.

S402, training the encoder through a first loss function by using the pseudo source end linguistic data, the positive example linguistic data and the negative example linguistic data.

After the positive examples corpus and the negative examples corpus of the pseudo source corpus are obtained, the electronic equipment can use the pseudo source corpus, the positive examples corpus and the negative examples corpus as training data of the encoder, and repeatedly train the encoder by adopting a comparison learning loss function so as to continuously update parameters of the encoder until a training target is reached. The training target is to maximize the similarity between the vector representations of the pseudo source corpus and the positive corpus and minimize the similarity between the vector representations of the pseudo source corpus and the negative corpus.

With continued reference to FIG. 5, the electronic device will use the pseudo target end corpus "Je t' air" as the regular corpus of the anchor point "I love you" and will follow other pseudo parallel corporaThe English corpus "It 'ssunny" sampled from the pseudo target end corpus, the French corpus "C' est la vie" and the Chinese corpus "who is who" as the negative example corpus of the anchor point "I love you" adopt a contrast loss function L^ctlAnd training an encoder of the basic translation model, so that the encoder can zoom in the high-dimensional expression after the synonymous sentence is encoded, and zoom out the high-dimensional expression after the irrelevant sentence is encoded.

As an optional implementation manner, the process of S402 may include the following steps:

s4021, inputting the pseudo source corpus, the positive corpus and the negative corpus into the encoder to obtain a first vector representation corresponding to the pseudo source corpus, a second vector representation corresponding to the positive corpus and a third vector representation corresponding to the negative corpus.

The encoder is used for extracting a feature vector of an input corpus, so that the pseudo source-end corpus, a positive corpus of the pseudo source-end corpus and a negative corpus of the pseudo source-end corpus are respectively input into the encoder, and the pseudo source-end corpus, the positive corpus and the negative corpus are encoded by the encoder, so that a first vector representation corresponding to the pseudo source-end corpus, a second vector representation corresponding to the positive corpus and a third vector representation corresponding to the negative corpus are obtained.

S4022, determining a first loss value of the first loss function according to the first vector representation and the second vector representation, and updating the parameter of the encoder based on the first loss value until the first loss value of the first loss function meets a convergence condition.

S4023, determining a second loss value of the first loss function according to the first vector representation and the third vector representation, and updating the parameters of the encoder based on the second loss value until the second loss value of the first loss function meets a convergence condition.

Specifically, the first loss function is a comparative learning loss function, and the optimization objective is that when the input corpora are similar, the vector representations corresponding to the two input corpora coded by the coder are expected to be similar, and when the input corpora are dissimilar, the vector representations corresponding to the two input corpora coded by the coder are expected to be dissimilar. Accordingly, the electronic device can determine a first loss value of the comparison learning loss function based on the first vector representation, the second vector representation, and a degree of matching between the pseudo source corpus and the regular corpus. When the first loss value does not satisfy the convergence condition, the parameters of the encoder are updated, the updated encoder is used as the encoder in S4021, and the above S4021 is continuously executed until the first loss value of the contrast loss function satisfies the convergence condition.

Similarly, the electronic device may determine a second loss value of the comparison learning loss function based on the first vector representation, the third vector representation, and the matching degree between the pseudo source corpus and the negative example corpus. When the second loss value does not satisfy the convergence condition, the parameters of the encoder are updated, the updated encoder is used as the encoder in S4021, and the above S4021 is continuously executed until the second loss value of the contrast loss function satisfies the convergence condition.

In practical applications, the training of the encoder can be superimposed on the training process of the entire basic translation model, so that the encoder can share the parameters of the multitask training. For this reason, on the basis of the foregoing embodiment, optionally, the training a preset base translation model by using the pseudo parallel corpus in the foregoing S103 to obtain the target translation model may include: and performing multi-task training on the basic translation model through a first loss function and a second loss function by using a pseudo parallel corpus to obtain a target translation model.

The first loss function is a contrast learning loss function and is used for updating parameters of the encoder, and the second loss function is used for updating parameters of the encoder and the decoder, namely the second loss function is used for training the whole basic translation model. Alternatively, the second loss function may be a cross-entropy loss function. The multitasking may include at least a training task for the encoder and training tasks for the encoder and decoder, and the encoder may share parameters updated by the multitasking. One task is that an anchor corpus, a positive example corpus and a negative example corpus are used, and an encoder is trained through a comparison loss function, so that the trained encoder can draw close to the high-dimensional expression after coding of synonymous sentences, and simultaneously draw far from the high-dimensional expression after coding of irrelevant sentences; the other task is to use the pseudo parallel language material to train the basic translation model through a second loss function, so that the basic translation model learns the grammar structure and the vocabulary association between any other languages contained in the pseudo parallel language material, and the mutual translation between multiple languages in a zero-resource scene and an unsupervised scene is realized. Namely, on the basis of training a basic translation model by using a pseudo parallel corpus and a second loss function, a comparison learning loss function is added for multi-task training, so that the trained target translation model can support translation between multiple languages in any direction, and the accuracy of a translation result is ensured.

With reference to fig. 5, the electronic device uses the pseudo source corpus (i.e. anchor point) "I love you" in the pseudo parallel corpus as the input of the basic translation model, uses the pseudo target corpus "Je t' air" in the pseudo parallel corpus as the expected output, and uses the second loss function L^mtAn encoder and decoder in the base translation model are trained. Meanwhile, the electronic equipment inputs anchor points such as ' I love you ', positive examples of linguistic data of the anchor points such as ' Je t ' air ' and negative examples of linguistic data of the anchor points such as ' It's sunny ', ' C ' est la vie ' and ' who you are ' into the encoder, corresponding first vector representation, second vector representation and third vector representation are obtained after encoding of the encoder, and the encoder is trained by adopting a contrast loss function based on the first vector representation, the second vector representation and the third vector representation until the contrast loss function L is achieved^ctlAnd a second loss function L^mtThe convergence condition of (1).

In this embodiment, in the training process of the basic translation model, a contrast loss function is added to train the encoder of the basic translation model, so that the trained encoder can zoom in on the high-dimensional expression after coding of the synonymous sentences, and zoom out on the high-dimensional expression after coding of the irrelevant sentences. Therefore, the basic translation model is trained by using the pseudo parallel corpus constructed by aligning the replaced corpus with other synonyms of any language, so that the trained target translation model can support the translation between multiple languages in any direction, and the accuracy of the translation result is ensured.

Fig. 6 is a schematic structural diagram of a training apparatus for a translation model according to an embodiment of the present disclosure. As shown in fig. 6, the apparatus may include: an acquisition module 601, a replacement module 602, a construction module 603, and a training module 604.

Specifically, the obtaining module 601 is configured to obtain at least one original corpus;

the replacing module 602 is configured to align and replace at least one original vocabulary of a source corpus in the original corpus with a target vocabulary having the same meaning, so as to obtain a replacement corpus corresponding to the original corpus; wherein the original vocabulary and the target vocabulary are different in language;

the constructing module 603 is configured to construct a pseudo parallel corpus based on the original corpus and the replacement corpus;

the training module 604 is configured to train a preset basic translation model using the pseudo parallel corpus to obtain a target translation model.

The training device of the translation model, provided by the embodiment of the disclosure, acquires at least one original corpus, and aligns and replaces at least one original vocabulary of a source corpus in the original corpus with a target vocabulary with the same meaning, so as to obtain a replacement corpus corresponding to the original corpus, wherein the languages of the original vocabulary and the target vocabulary are different; and constructing a pseudo parallel corpus based on the original corpus and the replacement corpus, and training a preset basic translation model by using the pseudo parallel corpus to obtain a target translation model. That is to say, after at least one original vocabulary of the source corpus in the original corpus is aligned and replaced by synonymous vocabulary of any other language, a large number of pseudo parallel corpora containing any other language can be constructed, and the translation model is trained by using the pseudo parallel corpora, so that the translation model can learn the grammatical structure and vocabulary association between any other language, and the translation accuracy of the translation model on other non-universal language pairs is improved.

Optionally, the original corpus comprises a monolingual corpus and/or a parallel corpus; the monolingual corpus is the source end corpus, and the parallel corpus comprises paired source end corpora and target end corpora.

On the basis of the above embodiment, optionally, the base translation model includes an encoder and a decoder; the training module 604 includes: a first training unit;

specifically, the first training unit is configured to train, using the pseudo parallel corpus, an encoder of the basic translation model through a first loss function; wherein the first loss function is a contrast learning loss function for updating parameters of the encoder.

On the basis of the foregoing embodiment, optionally, the training module 604 further includes: a second training unit;

specifically, the second training unit is configured to perform multi-task training on the basic translation model through a first loss function and a second loss function by using the pseudo parallel corpus to obtain a target translation model; wherein the first loss function is a contrast learning loss function for updating parameters of the encoder, and the second loss function is for updating parameters of the encoder and the decoder.

On the basis of the foregoing embodiment, optionally, the first training unit is specifically configured to construct positive examples corpora and negative examples corpora of the pseudo source corpus in the pseudo parallel corpora; training the encoder through a first loss function by using the pseudo source end corpus, the positive case corpus and the negative case corpus; the training target is to maximize the similarity between the vector representations of the pseudo source corpus and the positive corpus and minimize the similarity between the vector representations of the pseudo source corpus and the negative corpus.

On the basis of the foregoing embodiment, optionally, the first training unit is specifically configured to input the pseudo source corpus, the positive corpus, and the negative corpus into the encoder, so as to obtain a first vector representation corresponding to the pseudo source corpus, a second vector representation corresponding to the positive corpus, and a third vector representation corresponding to the negative corpus; determining a first loss value of a first loss function according to the first vector representation and the second vector representation, and updating parameters of the encoder based on the first loss value until the first loss value of the first loss function satisfies a convergence condition; determining a second loss value of the first loss function according to the first vector representation and the third vector representation, and updating parameters of the encoder based on the second loss value until the second loss value of the first loss function satisfies a convergence condition.

On the basis of the foregoing embodiment, optionally, when the original corpus is a monolingual corpus, the constructing module 603 is specifically configured to use a replacement corpus corresponding to the monolingual corpus as a pseudo source-end corpus and use the monolingual corpus as a pseudo target-end corpus to form a pseudo parallel corpus.

On the basis of the foregoing embodiment, optionally, when the original corpus is a parallel corpus, the constructing module 603 is specifically configured to use a replacement corpus corresponding to a source-end corpus in the parallel corpus as a pseudo source-end corpus, and use a target-end corpus in the parallel corpus as a pseudo target-end corpus to form the pseudo parallel corpus.

Optionally, the regular corpus is the pseudo target end corpus.

Optionally, the negative examples corpus is a pseudo target end corpus in other pseudo parallel corpora.

Optionally, the language of each target vocabulary in the replacement corpus is at least partially different.

Referring now to FIG. 7, a schematic diagram of an electronic device 700 (i.e., a training device for translation models) suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, the electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage means 706 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 706 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 706, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In one embodiment, there is provided a training apparatus for translation models, comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring at least one original corpus;

Optionally, the base translation model comprises an encoder and a decoder;

in one embodiment, the processor, when executing the computer program, further performs the steps of: training an encoder of the basic translation model through a first loss function by using the pseudo parallel corpus; wherein the first loss function is a contrast learning loss function for updating parameters of the encoder.

In one embodiment, the processor, when executing the computer program, further performs the steps of: using the pseudo parallel corpus to perform multi-task training on the basic translation model through a first loss function and a second loss function so as to obtain a target translation model; wherein the first loss function is a contrast learning loss function for updating parameters of the encoder, and the second loss function is for updating parameters of the encoder and the decoder.

In one embodiment, the processor, when executing the computer program, further performs the steps of: constructing positive examples corpora and negative examples corpora of the pseudo source corpus in the pseudo parallel corpora; training the encoder through a first loss function by using the pseudo source end corpus, the positive case corpus and the negative case corpus; the training target is to maximize the similarity between the vector representations of the pseudo source corpus and the positive corpus and minimize the similarity between the vector representations of the pseudo source corpus and the negative corpus.

In one embodiment, the processor, when executing the computer program, further performs the steps of: inputting the pseudo source-end corpus, the positive example corpus and the negative example corpus into the encoder to obtain a first vector representation corresponding to the pseudo source-end corpus, a second vector representation corresponding to the positive example corpus and a third vector representation corresponding to the negative example corpus; determining a first loss value of a first loss function according to the first vector representation and the second vector representation, and updating parameters of the encoder based on the first loss value until the first loss value of the first loss function satisfies a convergence condition; determining a second loss value of the first loss function according to the first vector representation and the third vector representation, and updating parameters of the encoder based on the second loss value until the second loss value of the first loss function satisfies a convergence condition.

In one embodiment, when the original corpus is a monolingual corpus, the processor executes the computer program to further implement the following steps: and taking the replacement corpus corresponding to the monolingual corpus as a pseudo source end corpus and the monolingual corpus as a pseudo target end corpus to form a pseudo parallel corpus.

In one embodiment, when the original corpus is a parallel corpus, the processor executes the computer program to further implement the following steps: and taking the alternative corpus corresponding to the source-end corpus in the parallel corpus as a pseudo source-end corpus and taking the target-end corpus in the parallel corpus as a pseudo target-end corpus to form a pseudo parallel corpus.

Optionally, the regular corpus is the pseudo target end corpus.

In one embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

acquiring at least one original corpus;

The training device, the equipment and the storage medium of the translation model provided in the above embodiments can execute the training method of the translation model provided in any embodiment of the present disclosure, and have corresponding functional modules and beneficial effects for executing the method. For technical details that are not described in detail in the above embodiments, reference may be made to a method for training a translation model provided in any embodiment of the present disclosure.

According to one or more embodiments of the present disclosure, there is provided a method for training a translation model, including:

acquiring at least one original corpus;

Optionally, the base translation model comprises an encoder and a decoder;

according to one or more embodiments of the present disclosure, there is provided the above method for training a translation model, further including: training an encoder of the basic translation model through a first loss function by using the pseudo parallel corpus; wherein the first loss function is a contrast learning loss function for updating parameters of the encoder.

According to one or more embodiments of the present disclosure, there is provided the above method for training a translation model, further including: using the pseudo parallel corpus to perform multi-task training on the basic translation model through a first loss function and a second loss function so as to obtain a target translation model; wherein the first loss function is a contrast learning loss function for updating parameters of the encoder, and the second loss function is for updating parameters of the encoder and the decoder.

According to one or more embodiments of the present disclosure, there is provided the above method for training a translation model, further including: constructing positive examples corpora and negative examples corpora of the pseudo source corpus in the pseudo parallel corpora; training the encoder through a first loss function by using the pseudo source end corpus, the positive case corpus and the negative case corpus; the training target is to maximize the similarity between the vector representations of the pseudo source corpus and the positive corpus and minimize the similarity between the vector representations of the pseudo source corpus and the negative corpus.

According to one or more embodiments of the present disclosure, there is provided the above method for training a translation model, further including: inputting the pseudo source-end corpus, the positive example corpus and the negative example corpus into the encoder to obtain a first vector representation corresponding to the pseudo source-end corpus, a second vector representation corresponding to the positive example corpus and a third vector representation corresponding to the negative example corpus; determining a first loss value of a first loss function according to the first vector representation and the second vector representation, and updating parameters of the encoder based on the first loss value until the first loss value of the first loss function satisfies a convergence condition; determining a second loss value of the first loss function according to the first vector representation and the third vector representation, and updating parameters of the encoder based on the second loss value until the second loss value of the first loss function satisfies a convergence condition.

According to one or more embodiments of the present disclosure, there is provided the above method for training a translation model, further including: and when the original corpus is a monolingual corpus, taking a replacement corpus corresponding to the monolingual corpus as a pseudo source end corpus and the monolingual corpus as a pseudo target end corpus to form a pseudo parallel corpus.

According to one or more embodiments of the present disclosure, there is provided the above method for training a translation model, further including: and when the original corpus is a parallel corpus, taking a replacement corpus corresponding to a source corpus in the parallel corpus as a pseudo source corpus, and taking a target corpus in the parallel corpus as a pseudo target corpus to form the pseudo parallel corpus.

Optionally, the regular corpus is the pseudo target end corpus.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for training a translation model, comprising:

acquiring at least one original corpus;

2. The method according to claim 1, wherein the original corpus comprises monolingual corpus and/or parallel corpus; the monolingual corpus is the source end corpus, and the parallel corpus comprises paired source end corpora and target end corpora.

3. The method of claim 1, wherein the base translation model comprises an encoder and a decoder;

the training of the preset basic translation model by using the pseudo parallel corpus to obtain a target translation model comprises the following steps:

training an encoder of the basic translation model through a first loss function by using the pseudo parallel corpus; wherein the first loss function is a contrast learning loss function for updating parameters of the encoder.

4. The method according to claim 3, wherein the training a preset basic translation model using the pseudo-parallel corpus to obtain a target translation model comprises:

using the pseudo parallel corpus to perform multi-task training on the basic translation model through a first loss function and a second loss function so as to obtain a target translation model; wherein the first loss function is a contrast learning loss function for updating parameters of the encoder, and the second loss function is for updating parameters of the encoder and the decoder.

5. The method according to claim 3, wherein said training an encoder of said base translation model with a first loss function using said pseudo-parallel corpus comprises:

constructing positive examples corpora and negative examples corpora of the pseudo source corpus in the pseudo parallel corpora;

training the encoder through a first loss function by using the pseudo source end corpus, the positive case corpus and the negative case corpus; the training target is to maximize the similarity between the vector representations of the pseudo source corpus and the positive corpus and minimize the similarity between the vector representations of the pseudo source corpus and the negative corpus.

6. The method of claim 5, wherein the training the encoder with the pseudo source corpus, the positive corpus, and the negative corpus by a first loss function comprises:

inputting the pseudo source-end corpus, the positive example corpus and the negative example corpus into the encoder to obtain a first vector representation corresponding to the pseudo source-end corpus, a second vector representation corresponding to the positive example corpus and a third vector representation corresponding to the negative example corpus;

determining a first loss value of a first loss function according to the first vector representation and the second vector representation, and updating parameters of the encoder based on the first loss value until the first loss value of the first loss function satisfies a convergence condition;

determining a second loss value of the first loss function according to the first vector representation and the third vector representation, and updating parameters of the encoder based on the second loss value until the second loss value of the first loss function satisfies a convergence condition.

7. The method according to claim 5, wherein when the original corpus is a monolingual corpus, the constructing a pseudo-parallel corpus based on the original corpus and the alternative corpus comprises:

and taking the replacement corpus corresponding to the monolingual corpus as a pseudo source end corpus and the monolingual corpus as a pseudo target end corpus to form a pseudo parallel corpus.

8. The method according to claim 5, wherein when the original corpus is a parallel corpus, the constructing a pseudo-parallel corpus based on the original corpus and the replacement corpus comprises:

and taking the alternative corpus corresponding to the source-end corpus in the parallel corpus as a pseudo source-end corpus and taking the target-end corpus in the parallel corpus as a pseudo target-end corpus to form a pseudo parallel corpus.

9. The method according to claim 7 or 8, wherein the normal corpus is the pseudo target-side corpus.

10. The method according to claim 7 or 8, wherein the negative examples corpus is a pseudo target-side corpus of other pseudo parallel corpora.

11. The method according to any one of claims 1 to 8, wherein the target words in the alternative corpus are at least partially different languages.

12. An apparatus for training a translation model, comprising:

the acquisition module is used for acquiring at least one original corpus;

13. Training device for translation models, comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 11.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.