WO2023169024A1 - Translation system and training and application methods therefor, and related device - Google Patents

Translation system and training and application methods therefor, and related device Download PDF

Info

Publication number
WO2023169024A1
WO2023169024A1 PCT/CN2022/137877 CN2022137877W WO2023169024A1 WO 2023169024 A1 WO2023169024 A1 WO 2023169024A1 CN 2022137877 W CN2022137877 W CN 2022137877W WO 2023169024 A1 WO2023169024 A1 WO 2023169024A1
Authority
WO
WIPO (PCT)
Prior art keywords
alignment
encoder
decoder
language
vector
Prior art date
Application number
PCT/CN2022/137877
Other languages
French (fr)
Chinese (zh)
Inventor
史佳欣
常建龙
宁可
陈鑫
张恒亨
田奇
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023169024A1 publication Critical patent/WO2023169024A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/189Automatic justification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of machine translation, and in particular to a translation system, its training, application methods and related equipment.
  • Machine translation refers to the translation of sentences in one language into sentences in another language with the same meaning.
  • Machine translation includes rule-based machine translation, statistics-based machine translation and neural network-based machine translation.
  • Machine translation based on neural networks has risen rapidly this year. Compared with rule-based and statistics-based machine translation, the model construction of neural network-based machine translation is simpler and the translation is more accurate. Therefore, how to provide a machine translation system based on neural networks is an urgent technical problem that needs to be solved.
  • This application provides a translation system, its training, application methods and related equipment, which can improve the training efficiency of the translation system, save computing resources, and improve the scalability of the translation system.
  • this application provides a training method for a translation system.
  • the translation system includes a first language model and a second language model.
  • the first language model includes a first encoder, a first alignment encoder, a first alignment encoder, and a first alignment encoder.
  • decoder and a first decoder the second language model includes a second encoder, a second aligned encoder, a second aligned decoder and a second decoder, and the method includes:
  • the first encoder is used to encode the sentence expressed in the first language and output the corresponding vector.
  • the first decoder is used to decode the input vector and output the sentence expressed in the first language.
  • the second encoder is used to encode the sentence expressed in the second language. Encode the sentence and output the corresponding vector.
  • the second decoder is used to decode the input vector and output the sentence expressed in the second language;
  • the first language model and the second language model are jointly trained based on the parallel corpus to obtain the trained first aligned encoder, the trained first aligned decoder, the trained second aligned encoder and the trained second aligned encoder.
  • Alignment decoder, the parallel corpus includes a collection of synonymous sentences in the first language and the second language;
  • the first alignment encoder is used to convert the vector output by the first encoder into the aligned vector space
  • the second alignment encoder is used to convert the vector output by the second encoder into the aligned vector space
  • the first alignment decoder is used to convert the vector output by the second encoder into the aligned vector space
  • the output of an aligned encoder is converted into a vector space corresponding to the input of the first decoder, so that the first decoder decodes the output of the first aligned decoder and outputs a sentence in the first language;
  • the second aligned decoder is used to The output of the second alignment encoder is converted into the vector space of the input of the second decoder, so that the second decoder decodes the output of the second alignment decoder and outputs a sentence in the second language.
  • the above-mentioned joint training of the first language model and the second language model based on parallel corpus includes: encoding the vector corresponding to the first sentence output by the first encoder through the first alignment encoder , and encode the vector corresponding to the second statement output by the second encoder through the second alignment encoder.
  • the first statement and the second statement are statements with the same semantics expressed in two languages; based on the first alignment encoder The output of the first alignment encoder and the output of the second alignment encoder, update the parameters of the first alignment encoder and the parameters of the second alignment encoder; decode the output of the first alignment encoder through the first alignment decoder, based on the first alignment decoding The output of the encoder and the output of the first encoder, update the parameters of the first alignment decoder and the parameters of the first alignment encoder; decode the output of the second alignment encoder through the second alignment decoder, based on the second alignment decoding The output of the decoder and the output of the second encoder are used to update the parameters of the second alignment decoder and the parameters of the second alignment encoder.
  • the first alignment encoder and the second alignment encoder convert the vector corresponding to the first statement and the vector corresponding to the second statement into the same vector space, that is, the above-mentioned alignment vector space, through the output of the first alignment encoder and the second alignment
  • the output of the encoder updates the parameters of the first alignment encoder and the parameters of the second alignment encoding, so that sentences with the same semantics expressed in the two languages are output after being encoded by the first alignment encoding and the second alignment encoder respectively.
  • the vectors are closer.
  • the first aligned decoder converts the output of the first aligned encoder into the vector space corresponding to the input of the first decoder, because the vector output by the first aligned decoder and the vector output by the first encoder both belong to the first decoder.
  • Input the corresponding vector space update the parameters of the first alignment decoder and the parameters of the first alignment encoder through the output of the first alignment decoder and the output of the first encoder, so that the output of the first alignment decoder can be consistent with the first alignment decoder.
  • the output of the encoder is closer, so that the sentence obtained after the output of the first alignment decoder is decoded by the first decoder is closer to the sentence input to the first encoder.
  • updating the parameters of the first alignment encoder and the parameters of the second alignment encoder includes: based on the alignment encoder output and the output of the second alignment encoder, calculate the semantic alignment loss, and update the parameters of the first alignment encoder and the parameters of the second alignment encoder according to the semantic alignment loss.
  • the above translation system also includes a third language model
  • the third language model includes a third encoder, a third alignment encoder, a third alignment decoder, and a third decoder
  • the above method also includes :
  • the first language model and the second language model are trained through parallel corpora to obtain the trained first language model and second language model, including:
  • the first language model, the second language model and the third language model are jointly trained through parallel corpus to obtain the trained first language model, second language model and third language model.
  • the parallel corpus includes the first language, the second language model and the third language model.
  • the third alignment encoder is used to convert the vector output by the third encoder into the alignment vector space;
  • the third alignment decoder is used to convert the output of the third alignment encoder into the vector space corresponding to the input of the third decoder, so as to The third decoder is caused to decode the output of the third alignment decoder and output a sentence in the third language.
  • the first language model, the second language model and the third language model are jointly trained through parallel corpora, including:
  • the vector corresponding to the first sentence output by the first encoder is encoded by the first alignment encoder
  • the vector corresponding to the second sentence output by the second encoder is encoded by the second alignment encoder
  • the vector corresponding to the second sentence output by the second encoder is encoded by the third alignment encoder.
  • the output of the third alignment encoder is decoded by the third alignment decoder, and the parameters of the third alignment decoder and the parameters of the third alignment encoder are updated based on the output of the third alignment decoder and the output of the third encoder.
  • the trained translation system that can translate two languages into each other, and then train the above-mentioned translation system that can translate between three languages, it is not necessary to update the parameters of the first language model and the second language model, and only update the third language model.
  • the language model is pre-trained, and then the third-aligned encoder and third-aligned decoder are trained. That is, there is no need to train and parameter update the first single language model and the second single language model.
  • the third single language model After training the third single language model, the first language model, the second language model and the third single language model are trained through parallel corpus.
  • the three-language model only calculates the semantic alignment loss and the third autoencoding loss, then updates the parameters of the third alignment encoder based on the semantic alignment loss and the third autoencoding loss, and updates the third alignment decoder based on the third autoencoding loss. parameters.
  • the first language model, the second language model and the third language model are jointly trained through parallel corpora, including:
  • the vector corresponding to the first sentence output by the first encoder is encoded by the first alignment encoder
  • the vector corresponding to the second sentence output by the second encoder is encoded by the second alignment encoder
  • the vector corresponding to the second sentence output by the second encoder is encoded by the third alignment encoder.
  • this application provides a translation method.
  • the translation method includes: encoding the input first source sentence of the first language through a first encoder, and outputting a vector set corresponding to the first source sentence.
  • the first source sentence The vectors in the corresponding vector set belong to the vector space corresponding to the input of the first decoder; the vectors in the vector set corresponding to the first source sentence are converted into the first feature vector set in the aligned vector space through the first alignment encoder; Convert the vectors in the first feature vector set into vectors in the vector space corresponding to the input of the second decoder through the second alignment decoder; decode the vector output by the second alignment decoder through the second decoder, and output the first The translation sentence in the second language corresponding to the source sentence.
  • the first alignment encoder can convert the input vector into a vector in the aligned vector space
  • the second alignment decoder can convert the input vector into a vector in the vector space corresponding to the input of the second decoder.
  • the above method further includes: encoding the input second source sentence of the second language through the second encoder, outputting a vector set corresponding to the second source sentence, and the vector set corresponding to the second source sentence.
  • the vectors in the set belong to the vector space corresponding to the input of the second decoder; the vectors in the vector set corresponding to the input of the second decoder are converted into a second feature vector set in the aligned vector space through the second alignment encoder;
  • the first alignment decoder converts the vectors in the second feature vector set into vectors in the vector space corresponding to the input of the first decoder; the vector in the vector space corresponding to the input of the first decoder is decoded by the first decoder, Output the translation sentence in the first language corresponding to the second source sentence.
  • the second aligned encoder can convert the input vector into a vector in the aligned vector space
  • the first aligned decoder can convert the input vector into a vector in the vector space corresponding to the input of the first decoder.
  • the translation method provided by this application only needs to combine the encoder and alignment encoder corresponding to the source language and the decoder and alignment decoder corresponding to the target language in the above-trained translation system to obtain the translation method.
  • a translation model that translates the source language into the target language For example, to realize the mutual translation of language A and language B, you only need to combine the encoder and alignment encoder corresponding to language A and the decoder and alignment decoder corresponding to language B in the trained translation system to obtain the translation of language A.
  • the above method further includes: encoding the input third source sentence of the third language through a third encoder, outputting a vector set corresponding to the third source sentence, and a vector set corresponding to the third source sentence.
  • the vectors in the set belong to the vector space corresponding to the input of the third decoder; the vectors in the vector set corresponding to the input of the third decoder are converted into a third feature vector set of the aligned vector space through the third alignment encoder;
  • An alignment decoder converts the vectors in the third feature vector set into vectors in the vector space corresponding to the input of the first decoder; the first decoder decodes the vector in the vector space corresponding to the input of the first decoder and outputs The translation sentence in the first language corresponding to the third source sentence.
  • the above method further includes: encoding the input fourth source sentence of the first language through the first encoder, outputting a vector set corresponding to the fourth source sentence, and a vector set corresponding to the fourth source sentence.
  • the vectors in the set belong to the vector space corresponding to the input of the first decoder; the vectors in the vector set corresponding to the fourth source sentence are converted into a fourth feature vector set of the aligned vector space through the first alignment encoder; through the third alignment
  • the decoder converts the vectors in the fourth feature vector set into vectors in the vector space corresponding to the input of the third decoder; decodes the vectors in the vector space corresponding to the input of the third decoder through the third decoder, and outputs the fourth The translation sentence in the third language corresponding to the source sentence.
  • the present application provides a translation system, which includes a first encoder, a first alignment encoder, a second alignment decoder, and a second decoder, wherein,
  • the first encoder is used to encode the input first source sentence of the first language and output a vector set corresponding to the first source sentence.
  • the vectors in the vector set corresponding to the first source sentence belong to the input correspondence of the first decoder. vector space;
  • the first alignment encoder is used to convert the vectors in the vector set corresponding to the first source sentence into the first feature vector set of the aligned vector space;
  • a second alignment decoder configured to convert vectors in the first feature vector set into vectors in the vector space corresponding to the input of the second decoder
  • the second decoder is used to decode the vector in the vector space corresponding to the input of the second decoder, and output the translated sentence in the second language corresponding to the first source sentence.
  • the above translation system further includes a second encoder, a second alignment decoder, a first alignment decoder and a first decoder, wherein,
  • the second encoder is used to encode the input second source sentence of the second language and output a vector set corresponding to the second source sentence.
  • the vectors in the vector set corresponding to the second source sentence belong to the input correspondence of the second decoder. vector space;
  • the second alignment encoder is used to convert the vectors in the vector set corresponding to the input of the second decoder into the second feature vector set of the aligned vector space;
  • a first alignment decoder configured to convert vectors in the second feature vector set into vectors in the vector space corresponding to the input of the first decoder
  • the first decoder is used to decode the vector in the vector space corresponding to the first language and output the translated sentence in the first language corresponding to the second source sentence.
  • the above translation system further includes a first alignment decoder, a third encoder, and a third alignment encoder, wherein,
  • the third encoder is used to encode the input third source sentence of the third language and output the vector set corresponding to the third source sentence.
  • the vectors in the vector set corresponding to the third source sentence belong to the input correspondence of the third decoder. vector space;
  • the third alignment encoder is used to convert the vectors in the vector set corresponding to the input of the third decoder into the third feature vector set of the aligned vector space;
  • the first alignment decoder is used to convert the vectors in the third feature vector set into vectors in the vector space corresponding to the input of the first decoder;
  • the first decoder is used to decode the vector in the vector space corresponding to the input of the first decoder, and output the translated sentence in the first language corresponding to the third source sentence.
  • the above translation system also includes a third alignment decoder and a third decoder, wherein,
  • the first encoder is also used to encode the input fourth source sentence of the first language and output a vector set corresponding to the fourth source sentence.
  • the vectors in the vector set corresponding to the fourth source sentence belong to the input of the first decoder.
  • the first alignment encoder is also used to convert the vectors in the vector set corresponding to the fourth source sentence into the fourth feature vector set of the aligned vector space;
  • the third alignment decoder is used to convert the vectors in the fourth feature vector set into vectors in the vector space corresponding to the input of the third decoder;
  • the third decoder is used to decode the vector in the vector space corresponding to the input of the third decoder, and output the translated sentence in the third language corresponding to the fourth source sentence.
  • this application provides a training device for a translation system.
  • the translation system includes a first language model and a second language model, wherein the first language model includes a first encoder, a first alignment encoder, a first alignment The decoder and the first decoder, the second language model includes a second encoder, a second aligned encoder, a second aligned decoder and a second decoder, and the device includes:
  • the acquisition module is used to acquire the first encoder and the first decoder of the first language which are trained based on the corpus of the first language, and acquire the second encoder and the second decoder of the first language which are trained based on the corpus of the second language.
  • Decoder wherein, the first encoder is used to encode the sentence expressed in the first language and output the corresponding vector; the first decoder is used to decode the input vector and output the sentence expressed in the first language; the second encoder is used to Encode the sentence expressed in the second language and output the corresponding vector, and the second decoder is used to decode the input vector and output the sentence expressed in the second language;
  • the processing module is used to jointly train the first language model and the second language model based on parallel corpus, and obtain the trained first alignment encoder, the trained first alignment decoder, the trained second alignment encoder and The trained second alignment decoder, the parallel corpus includes a set of synonymous sentences in the first language and the second language;
  • the first alignment encoder is used to convert the vector output by the first encoder into the aligned vector space
  • the second alignment encoder is used to convert the vector output by the second encoder into the aligned vector space
  • the first alignment decoder is used to convert the vector output by the second encoder into the aligned vector space
  • the output of an aligned encoder is converted into a vector space corresponding to the input of the first decoder, so that the first decoder decodes the output of the first aligned decoder and outputs a sentence in the first language;
  • the second aligned decoder is used to The output of the second alignment encoder is converted into the vector space of the input of the second decoder, so that the second decoder decodes the output of the second alignment decoder and outputs a sentence in the second language.
  • the above processing module is specifically configured to encode the vector corresponding to the first sentence output by the first encoder through the first alignment encoder, and encode the vector corresponding to the first sentence output by the second alignment encoder through the second alignment encoder.
  • the vector corresponding to the output second sentence is encoded.
  • the first sentence and the second sentence are sentences with the same semantics expressed in two languages; based on the output of the first alignment encoder and the output of the second alignment encoder, update the Parameters of an alignment encoder and parameters of a second alignment encoder; decoding the output of the first alignment encoder through the first alignment decoder, and updating the second alignment encoder based on the output of the first alignment decoder and the output of the first encoder.
  • Parameters of an alignment decoder and parameters of the first alignment encoder decoding the output of the second alignment encoder through the second alignment decoder, and updating the second alignment encoder based on the output of the second alignment decoder and the output of the second encoder. Parameters of the second aligned decoder and parameters of the second aligned encoder.
  • the above processing module is specifically configured to: update the parameters of the first alignment encoder and the parameters of the second alignment encoder based on the output of the first alignment encoder and the output of the second alignment encoder,
  • the method includes: calculating the semantic alignment loss based on the output of the alignment encoder and the output of the second alignment encoder, and updating the parameters of the first alignment encoder and the parameters of the second alignment encoder based on the semantic alignment loss.
  • the above translation system also includes a third language model
  • the third language model includes a third encoder, a third alignment encoder, a third alignment decoder and a third decoder
  • the above acquisition module It is also used to obtain the third encoder and the third decoder of the third language that are trained based on the corpus of the third language
  • the above-mentioned processing module is also used to compare the first language model, the second language model and the third language model through parallel corpus.
  • the language models are jointly trained to obtain the trained first language model, second language model and third language model.
  • the parallel corpus includes a collection of synonymous sentences in the first language, second language and third language; among them, the third language model
  • the aligned encoder is used to convert the vector output by the third encoder into the aligned vector space
  • the third aligned decoder is used to convert the output of the third aligned encoder into the vector space corresponding to the input of the third decoder, so that the third decoded
  • the decoder decodes the output of the third alignment decoder and outputs a sentence in the third language.
  • the above processing module is specifically configured to: encode the vector corresponding to the first sentence output by the first encoder through the first alignment encoder, and encode the vector corresponding to the first sentence output by the second alignment encoder through the second alignment encoder.
  • the vector corresponding to the second sentence is encoded and the third sentence output by the third encoder is encoded through the third alignment encoder.
  • the first sentence, the second sentence and the third sentence are expressed in three languages and have the same semantics.
  • the above processing module is specifically configured to: encode the vector corresponding to the first sentence output by the first encoder through the first alignment encoder, and encode the vector corresponding to the first sentence output by the second alignment encoder through the second alignment encoder.
  • the vector corresponding to the second sentence is encoded and the third sentence output by the third encoder is encoded through the third alignment encoder.
  • the first sentence, the second sentence and the third sentence are expressed in three languages and have the same semantics.
  • the present application provides a computing device, including a processor and a memory; the memory is used to store instructions, the processor is used to execute the instructions, and when the processor executes the instructions, the processing The machine executes the training method in the first aspect or any possible implementation manner of the first aspect.
  • the present application provides a computing device, including a processor and a memory; the memory is used to store instructions, the processor is used to execute the instructions, and when the processor executes the instructions, the processing The processor executes the translation method in the second aspect or any possible implementation manner of the second aspect.
  • the present application provides a computer-readable storage medium that stores instructions. When the instructions are run on a server, they cause the server to execute the first aspect or any one of the first aspects. Training methods among possible implementations.
  • the present application provides a computer-readable storage medium that stores instructions. When the instructions are run on a server, they cause the server to execute the second aspect or any one of the second aspects. Translation methods among possible implementations.
  • Figure 1 is a schematic diagram of a translation system provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a first single language model provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a translation system training process provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a translation model provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of another translation model provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of another translation system provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of another translation system training process provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of another translation model provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of another translation model provided by an embodiment of the present application.
  • Figure 10 is a schematic flowchart of a training method for a translation system provided by an embodiment of the present application.
  • Figure 11 is a schematic diagram of a computing device provided by an embodiment of the present application.
  • Figure 12 is a schematic diagram of a computing device cluster provided by an embodiment of the present application.
  • Figure 1 is a schematic diagram of a translation system provided by an embodiment of the present application.
  • Figure 1 takes the translation system that can realize mutual translation of two languages as an example to introduce the training method of the translation system provided by the present application.
  • the translation system includes a first language model and a second language model.
  • the first language model includes a first encoder, a first aligned encoder, a first aligned decoder, and a first decoder;
  • the second language model includes a second encoder, a second aligned encoder, a second aligned decoder, and a first decoder. Two decoders.
  • the training method for the translation system shown in Figure 1 will be introduced in detail below with reference to the accompanying drawings.
  • the first single language model includes the above first encoder and the above first decoder
  • the second single language model includes the above second encoder and the above second decoder
  • the input sequence is a preprocessed sequence corresponding to a sentence in the training data of the first language. Each input sequence includes multiple tokens, and each token corresponds to a vector in the vector set. The above preprocessing includes word segmentation. Then the vector set P is input to the first decoder.
  • the first decoder decodes each vector in the input vector set and outputs the output sequence corresponding to the above input sequence. Calculate the loss function value through the input sequence and the output sequence, and update the parameters of the first single language model based on the loss function value; or update the parameters of the first single language model based on the loss function value of a batch of training data. Based on the training data of the first language, the above-mentioned first single-language model is trained through multiple iterative trainings until the first single-language model reaches the convergence condition, and the trained first single-language model is obtained.
  • the above-mentioned second single-language model is trained through multiple iterative trainings until the second single-language model reaches convergence. Conditions to obtain the trained second single language model.
  • the first encoder and the first single language model of the first single language model are A first aligned encoder and a first aligned decoder are added between the decoders, and a second aligned encoder and a second aligned decoder are added between the second encoder and the second decoder of the second single language model to obtain the above translation system. Then, the first alignment encoder, the first alignment decoder, the second alignment encoder and the second alignment decoder in the translation system are trained based on the parallel corpus, and a trained translation system capable of mutual translation between two languages is obtained.
  • the parallel corpus includes a collection of synonymous sentences in the first language and the second language, that is, the parallel corpus includes sentences in the first language and sentences in the second language. Each sentence in the first language corresponds to a sentence with the same semantics. Second language translation.
  • first the first sentence in the first language in the parallel corpus is input into the first encoder
  • the second sentence in the second language in the parallel corpus is input into the second encoder.
  • the first sentence and the second sentence are expressions in two languages of sentences with the same semantics.
  • the first encoder encodes the first statement and outputs the first vector set X corresponding to the first statement.
  • the second encoder encodes the second statement and outputs the second vector set Y corresponding to the second statement.
  • the vectors in the first vector set X belong to the vector space corresponding to the input of the first decoder
  • the vectors in the second vector set Y belong to the vector space corresponding to the input of the second decoder.
  • the input and output of the first encoder and the second encoder can be expressed as the following formula:
  • Enc represents the encoding operation performed by the first encoder and the second encoder
  • ⁇ x 1 ,x 2 ,...,x n ⁇ represents the first sequence corresponding to the first statement
  • ⁇ y 1 ,y 2 ,...,y m ⁇ represents the second sequence corresponding to the second statement, that is, the first sequence includes n tokens
  • the second sequence includes m tokens
  • ⁇ X 1 ,X 2 ,...,X n ⁇ represents the first vector set X
  • ⁇ Y 1 ,Y 2 ,...,Y m ⁇ represents the second vector set Y
  • the n tokens in the first sequence correspond to the n vectors in the first vector set X
  • the m tokens in the second sequence correspond to the n vectors in the first vector set X.
  • the m vectors in the two-vector set Y have a one-to-one correspondence, that is, X 1 is the feature vector corresponding to x 1 , X i is the feature vector corresponding to x i , Y 1 is the feature vector corresponding to y 1 , and Y j is the feature vector corresponding to y j
  • eigenvectors of x i and Y j are vectors with the same dimensions, i is a positive integer less than or equal to n, and j is a positive integer less than or equal to m.
  • first statement and the second statement are statements in different languages and are encoded by different encoders to obtain corresponding vector sets
  • the vectors in the first vector set X and the second vector set Y are vectors belonging to different vector spaces.
  • the first encoder After the first encoder outputs the first vector set X and the second encoder outputs the second vector set Y, an alignment operation is performed on the first vector set X and the second vector set Y, and the first vector set X performs an alignment operation.
  • the vector set X L ⁇ X 1 , X 2 , ... ...Y m ,Y m+1 ,...Y L ⁇ , so that the vector set X L and the vector set Y L have the same number of vectors. For example, perform a padding operation on the vector set X and the vector set Y respectively, and convert both the vector set X and the vector set Y into a set with L feature vectors.
  • the padding operation can be to supplement Ln all-zero vectors after the first vector set X, and to supplement Lm all-zero vectors after the second vector set Y; L is greater than or equal to n, and L is greater than or equal to m.
  • L can be set according to the training data in the parallel corpus. For example, L can be the number of tokens included in the longest sequence in the parallel corpus. L can also be the longest sequence that can be encoded or decoded by the first single language model or the second single language model. The number of tokens included in a long sequence is the number of tokens included in the longest sequence that a single language model allows to be input at one time.
  • the vector set X L is input to the first alignment encoder
  • the vector set Y L is input to the second alignment encoder.
  • the first alignment encoder The vectors in _ Align vector space.
  • the first alignment encoder recodes each vector in the vector set X L to obtain the third vector set X a
  • the second alignment encoder recodes each vector in the vector set Y L to obtain the fourth vector set Y a .
  • the vectors in the third vector combination X a and the vectors in the fourth vector set Y a are all vectors aligned in the vector space.
  • Equation 2 The operations performed by the first alignment encoder and the second alignment encoder can be expressed as follows (Equation 2):
  • AliEnc represents the encoding operation performed by the first alignment encoder and the second alignment encoder, represents the third vector set X a , represents the fourth vector set Y a ; the vectors in the vector set X L correspond to the vectors in the third vector set X a one-to-one, and the vectors in the vector set Y L correspond to the vectors in the fourth vector set Y a one-to-one.
  • the semantic alignment loss L is calculated according to the third vector set X a and the fourth vector set Y a a , the semantic alignment loss is used to update the parameters of the first alignment encoder and the second alignment encoder, where the semantic alignment loss satisfies the following (Formula 3).
  • the above-mentioned third vector set X a is input into the first alignment decoder
  • the fourth vector set Y a is The vector set Y a is input to the second alignment decoder.
  • the first alignment decoder converts the output of the first alignment encoder into the vector space corresponding to the input of the first decoder.
  • the second alignment decoder converts the output of the second alignment encoder. Convert to the vector space corresponding to the input of the second decoder.
  • the first alignment decoder converts each vector in the third vector set X a to obtain the fifth vector set X d
  • the second alignment decoder converts each vector in the fourth vector set Y a to obtain the sixth vector Set Y d .
  • the operations performed by the first alignment decoder and the second alignment decoder can be expressed as follows (Equation 4):
  • AliDec represents the decoding operation performed by the first alignment decoder and the second alignment decoder.
  • the purpose of calculating the first autoencoding loss corresponding to the first language model is to update the parameters in the first alignment encoder and the first alignment decoder in the first language model so that the first alignment decoding in the first language model
  • the difference between the vector set output by the encoder and the vector set output by the first encoder is as small as possible;
  • the purpose of calculating the second autoencoding loss corresponding to the second language model is to update the second alignment encoder and the second alignment in the second language model Parameters in the decoder such that the difference between the set of vectors output by the second aligned decoder of the second language model and the set of vectors output by the second encoder is as small as possible.
  • the first vector set X output by the first encoder of the first language model only includes n vectors
  • the second vector set Y output by the first encoder of the second language model only includes m vectors
  • L vectors in the fifth vector set X d output by the alignment decoder and the sixth vector set Y d output by the second alignment encoder are processed, and the vectors and sixth vectors at the target position of the fifth vector set X d are deleted.
  • the vector at the target position of the set Y d refers to the position of the vector added during the above alignment operation.
  • the above alignment operation is to supplement Ln all-zero vectors after the first vector set X, then the target position of the above fifth vector set Xd is the following Ln positions.
  • the vector set is obtained combined with vector
  • the first autoencoding loss corresponding to the first language model is calculated based on the first vector set X and the vector set X r
  • the second autoencoding loss corresponding to the second language model is calculated based on the second vector set Y and the vector set Y r .
  • the calculation method of the first autoencoding loss L sx corresponding to the first language model and the second autoencoding loss L sy corresponding to the second language model is as follows (Formula 5).
  • the semantic alignment loss and autoencoding loss (including the first autoencoding loss and the second autoencoding loss) corresponding to a sentence of parallel corpus can be calculated by the above method.
  • the first aligned encoder and the first aligned decoder in the first language model are trained through multiple iterations of training, and the second aligned encoder and the second aligned decoder in the second language model are trained until the The first language model and the second language model reach the convergence condition, and the trained first language model and the trained second language model are obtained.
  • a trained first language model and a trained second language model are obtained, that is, a trained translation system that can realize mutual translation between two languages is obtained.
  • the above-mentioned first encoder, first alignment encoder, second alignment decoder and second decoder are composed as shown in Figure 4 First language to second language translation model shown.
  • the first source sentence expressed in the first language is input to the first encoder.
  • the first encoder encodes the first source sentence and outputs a vector set corresponding to the first source sentence.
  • each vector in the vector set output by the first encoder is a vector in the vector space corresponding to the input of the first decoder.
  • the vector set corresponding to the first source sentence is input to the first alignment encoder, and the first alignment encoder maps each vector in the vector set corresponding to the first source sentence to the alignment vector space to obtain the first feature vector set.
  • the first feature vector set is then input to the second alignment decoder, and the second alignment decoder maps the vectors in the alignment vector space in the first feature vector set to the vector space corresponding to the input of the second decoder.
  • the vector of the vector space corresponding to the input of the second decoder is input to the second decoder for decoding, and the translated sentence of the second language corresponding to the first source sentence obtained after translating the first source sentence is output.
  • the above-mentioned second encoder, second alignment encoder, first alignment decoder and first decoder are composed of the second language to the first language as shown in Figure 5 translation model.
  • the second source sentence expressed in the second language is input to the second encoder.
  • the second encoder encodes the second source sentence and outputs a vector set corresponding to the second source sentence.
  • each vector in the vector set output by the second encoder is a vector in the vector space corresponding to the input of the second decoder.
  • the vector set corresponding to the second source sentence is input to the second alignment encoder, and the second alignment encoder maps each vector in the vector set corresponding to the second source sentence to the alignment vector space to obtain a second feature vector set.
  • the second feature vector set is then input to the first alignment decoder, and the first alignment decoder maps the vectors in the alignment vector space in the second feature vector set to the vector space corresponding to the input of the first decoder.
  • the vector in the vector space corresponding to the input of the first decoder is input to the first decoder for decoding, and the translated sentence in the first language corresponding to the second source sentence obtained after translating the second source sentence is output.
  • the translation system when it is necessary to train a translation system that can perform mutual translation between three languages, the translation system includes a first language model, a second language model, and a third language model.
  • the first language model includes a first encoder, a first aligned encoder, a first aligned decoder, and a first decoder;
  • the second language model includes a second encoder, a second aligned encoder, a second aligned decoder, and a first decoder.
  • the third language model includes a third encoder, a third aligned encoder, a third aligned decoder and a third decoder.
  • the third single language model includes the above-mentioned third encoder and third decoder.
  • the above-mentioned third single-language model is trained through multiple iterative training until the third single-language model reaches convergence. Conditions to obtain the trained third single language model.
  • the second single language model and the third single language model After pre-training the first single language model, the second single language model and the third single language model to obtain the trained first single language model, the trained second single language model and the trained third single language model , add a first aligned encoder and a first aligned decoder between the first encoder and the first decoder of the first single language model, and add a first aligned encoder and a first aligned decoder between the second encoder and the second decoder of the second single language model. Add a second aligned encoder and a second aligned decoder, and add a third aligned encoder and a third aligned decoder between the third encoder and the third decoder of the third single language model to achieve three languages. Translation system that translates between each other.
  • the parallel corpus is a collection of synonymous sentences including the first language, the second language and the third language, that is, the parallel corpus includes sentences in the first language, sentences in the second language and sentences in the third language.
  • the first language Each sentence of corresponds to a translation in the second language with the same semantics, and also corresponds to a translation in the third language with the same semantics.
  • the first sentence, the second sentence and the third sentence are expressions of sentences with the same semantics in three languages.
  • the processing process of the first sentence by the first language model is the same as the above-mentioned processing process when training a translation system for mutual translation between two languages.
  • the processing process of the second sentence by the second language model is the same as the above-mentioned processing process when training the translation system for mutual translation between two languages.
  • the semantic alignment loss La is calculated according to the third vector set X a , the fourth vector set Y a and the eighth vector set Z a , and the semantic alignment loss is used to update the Parameters of the first alignment encoder, the second alignment encoder and the third alignment encoder.
  • the semantic alignment loss satisfies the following (Formula 6).
  • the eighth vector set Z a is obtained through the third alignment encoder
  • the above-mentioned eighth vector set Z a is input to the third alignment decoder
  • the third alignment decoder converts each vector in the eighth vector set Z a , Get the ninth vector set
  • the vectors in the aligned vector space are mapped to the vector space corresponding to the input of the third decoder.
  • the target position refers to the position of the vector added when performing the alignment operation on the seventh vector set Z.
  • the above-mentioned alignment operation is to supplement Lt all-zero vectors after the seventh vector set Z, then the target position of the above-mentioned ninth vector set Zd is the following Lt positions.
  • the vector set is obtained Calculate the third autoencoding loss corresponding to the third language model according to the seventh vector set Z and the vector set Z r .
  • the calculation method of the first autoencoding loss L sx corresponding to the first language model and the second autoencoding loss L sy corresponding to the second language model is as shown in the above (Formula 5).
  • the autoencoding loss L of the third language model The calculation method of sz is as follows (Formula 7).
  • the semantic alignment loss and autoencoding loss corresponding to a set of parallel corpora can be calculated by the above method.
  • Autoencoding loss and tertiary autoencoding loss can be calculated by the above method.
  • the first aligned encoder and the first aligned decoder in the first language model are trained through multiple iterations of training, the second aligned encoder and the second aligned decoder in the second language model are trained, and the first aligned encoder and the second aligned decoder in the second language model are trained.
  • the third aligned encoder and the third aligned decoder in the three-language model are trained until the first language model, the second language model, and the third language model reach convergence conditions, and the trained first language model and the trained The second language model and the trained third language model.
  • the first language model and the second language model do not need to be updated.
  • the third single language model is pre-trained, and then the third aligned encoder and third aligned decoder are trained. That is, there is no need to train the first single language model and the second single language model.
  • the first language model, the second language model and the third language model are trained through parallel corpora.
  • the parameters of the encoder, the parameters of the first aligned decoder are updated according to the first autoencoding loss; and the parameters of the second aligned encoder are updated again according to the semantic alignment loss and the second autoencoding loss, and the parameters of the second aligned decoder are updated according to the second autoencoding loss. Align the parameters of the decoder; at the same time, update the parameters of the third alignment encoder according to the semantic alignment loss and the third autoencoding loss, and update the parameters of the third alignment decoder according to the third autoencoding loss.
  • the trained first language model, the trained second language model and the trained third language model are obtained, that is, a trained translation system that can realize mutual translation between the three languages is obtained.
  • the method of realizing mutual translation between the first language and the second language may refer to the descriptions related to Figures 4 and 5 above. If you want to translate the first language into the second language, you can refer to the description related to Figure 4 above to implement the translation from the first language to the second language. If you want to translate the second language into the first language, you can refer to the description related to Figure 5 above to implement the translation from the second language to the first language. It should be understood that a translation system that can realize mutual translation between three languages can also realize mutual translation between the first language and the third language, and the mutual translation between the second language and the third language.
  • the method for realizing mutual translation between the first language and the third language and the mutual translation between the second language and the third language is similar to the above-mentioned method for realizing mutual translation between the first language and the second language.
  • the above-mentioned third encoder, the third alignment encoder, the first alignment decoder and the first decoder are composed as shown in Figure 8 from the third language to the first language.
  • Language translation model the third source sentence expressed in the third language is input to the third encoder.
  • the third encoder encodes the above third source sentence and outputs a vector set corresponding to the third source sentence.
  • each vector in the vector set output by the third encoder is a vector in the vector space corresponding to the input of the third decoder.
  • the vector set corresponding to the third source sentence is input to the third alignment encoder, and the third alignment encoder maps each vector in the vector set corresponding to the third source sentence to the alignment vector space to obtain a third feature vector set.
  • the third feature vector set is then input to the first alignment decoder, and the first alignment decoder maps the vectors in the alignment vector space in the third feature vector set to the vector space corresponding to the input of the first decoder.
  • the vector in the vector space corresponding to the input of the first decoder is input to the first decoder for decoding, and the translated sentence in the first language corresponding to the third source sentence obtained after translating the third source sentence is output.
  • the above-mentioned first encoder, first alignment encoder, third alignment decoder and third decoder are composed of the first language to the third language as shown in Figure 9 translation model.
  • the fourth source sentence expressed in the first language is input to the first encoder.
  • the first encoder encodes the fourth source sentence and outputs a vector set corresponding to the fourth source sentence.
  • each vector in the vector set output by the first encoder is a vector in the vector space corresponding to the input of the first decoder.
  • the vector set corresponding to the fourth source sentence is input to the first alignment encoder, and the first alignment encoder maps each vector in the vector set corresponding to the fourth source sentence to the alignment vector space to obtain a fourth feature vector set.
  • the fourth feature vector set is then input to the third alignment decoder, and the third alignment decoder maps the vectors in the alignment vector space in the fourth feature vector set to the vector space corresponding to the input of the third decoder.
  • the vector in the vector space corresponding to the input of the third decoder is input to the third decoder for decoding, and the translated sentence in the third language corresponding to the fourth source sentence obtained after translating the fourth source sentence is output.
  • the training method of the translation system is introduced below with reference to Figure 10.
  • the translation system includes a first language model and a second language model, where the first language model includes a first encoder, a first alignment encoder, The first aligned decoder and the first decoder, the second language model includes a second encoder, a second aligned encoder, a second aligned decoder and a second decoder.
  • the method includes:
  • the first encoder is used to encode the sentence expressed in the first language and output the corresponding vector;
  • the first decoder is used to decode the input vector and output the sentence expressed in the first language;
  • the second encoder is used to encode the sentence expressed in the second language.
  • the sentence expressed in the language is encoded and the corresponding vector is output.
  • the second decoder is used to decode the input vector and output the sentence expressed in the second language.
  • the first single language model is trained using the training data of the first language to obtain a trained first encoder and a trained first decoder.
  • the second single-language model is trained using the training data of the second language to obtain a trained second encoder and a trained second decoder.
  • the training process of training the first single-language model using the training data of the first language and training the second single-language model using the training data of the second language can refer to the description related to Figure 2 above, and will not be repeated here.
  • a first alignment encoder and a first alignment decoder are added between the first encoder and the first decoder, and between the second encoder and the first decoder
  • a second aligned encoder and a second aligned decoder are added between the second decoders to obtain the above-mentioned first language model and second language model.
  • the first language model and the second language model are trained through parallel corpora, and a well-trained translation system that can realize mutual translation between the two languages is obtained.
  • the training process of training the first language model and the second language model through parallel corpora can refer to the description related to Figure 3 above, and will not be described again here.
  • the first language model and the second language model can be compared as shown in Figure 4 and Figure 5.
  • the included encoder and decoder are combined to realize the translation from the first language to the second language or the translation from the second language to the first language.
  • the translation system training method provided by this application can also be used to train a translation system that can realize mutual translation of three or more languages.
  • training method of the translation system that can realize mutual translation between three languages please refer to the relevant descriptions of the above-mentioned Figures 6 and 7, and will not be described again here.
  • FIG 11 is a schematic diagram of a computing device provided by an embodiment of the present application.
  • the computing device 11 includes: one or more processors 110, a communication interface 111, and a memory 112.
  • the processor 110, the communication interface 111 and memory 112 are connected to each other via bus 113, where,
  • the processor 110 can have a variety of specific implementation forms.
  • the processor 110 can be a central processing unit (CPU) or an image processor (graphics processing unit, GPU).
  • the processor 110 can also be a single-core processor or Multi-core processor.
  • the processor 110 may be a combination of a CPU and a hardware chip.
  • the above-mentioned hardware chip can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof.
  • the above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL) or any combination thereof.
  • the processor 110 can also be implemented solely using a logic device with built-in processing logic, such as an FPGA or a digital signal processor (DSP).
  • DSP digital signal processor
  • the communication interface 111 can be a wired interface or a wireless interface, used to communicate with other modules or devices.
  • the wired interface can be an Ethernet interface, a local interconnect network (LIN), etc.
  • the wireless interface can be a cellular network interface or use Wireless LAN interface, etc.
  • the communication interface 111 can be specifically used to receive training data, parallel corpus, etc. uploaded by user equipment.
  • the memory 112 may be a non-volatile memory, such as read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), Electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • ROM read-only memory
  • PROM programmable ROM
  • EPROM erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • Electrically erasable programmable read-only memory electrically EPROM, EEPROM
  • flash memory electrically erasable programmable read-only memory
  • the memory 112 may also be a volatile memory, and the volatile memory may be a random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the memory 112 can also be used to store program codes and data, so that the processor 110 calls the program codes stored in the memory 112 to execute the method of training the translation system or the method of applying the translation system for translation in the above method embodiments. Additionally, computing device 11 may contain more or fewer components than shown in FIG. 10 , or have components configured differently.
  • the bus 113 may be a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unifiedbus, Ubus or UB), or a computer express link (computeexpresslink, CXL). , Cache Coherent Interconnect for Accelerators (CCIX), etc.
  • PCIe peripheral component interconnect express
  • EISA extended industry standard architecture
  • CXL computer express link
  • the bus 113 can be divided into an address bus, a data bus, a control bus, etc.
  • the bus 113 may also include a power bus, a control bus, a status signal bus, etc.
  • only one thick line is used in Figure 11, but it does not mean that there is only one bus or one type of bus.
  • the computing device 11 may also include an input/output interface 114 connected with an input/output device for receiving input information and outputting operation results.
  • this application also provides a computing device cluster as shown in Figure 12, which A computing device cluster includes a plurality of computing devices 11.
  • a communication path is established between each of the above computing devices 11 through a communication network.
  • Each computing device 11 runs a first encoder, a first aligned encoder, a first aligned decoder, a first decoder, a second encoder, a second aligned encoder, a second aligned decoder, and a second decoder. any one or more of them.
  • the first computing device 11 runs a first encoder, a first alignment encoder, a first alignment decoder, and a first decoder;
  • the second computing device 11 runs a second encoder, a second alignment Encoder, second alignment decoder, second decoder.
  • Any computing device 11 may be a computer (for example, a server) in a cloud environment, a computer in an edge data center, or a terminal computing device.
  • Embodiments of the present application also provide a computer-readable storage medium. Instructions are stored in the computer-readable storage medium. When run on a processor, the method steps in the above method embodiments can be implemented. The computer can For the specific implementation of the steps of the above method performed by the processor that reads the storage medium, reference may be made to the specific operations shown in Figure 3 of the above method embodiment, which will not be described again here.
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, tapes), optical media, or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

Provided in the present application are a translation system and training and application methods therefor, and a related device. The translation system comprises a first language model and a second language model, wherein the first language model comprises a first encoder, a first alignment encoder, a first alignment decoder and a first decoder, the second language model comprises a second encoder, a second alignment encoder, a second alignment decoder and a second decoder, and the first language model and the second language model are trained by means of parallel corpus, so as to obtain a trained first language model and a trained second language model; and mutual translation between a first language and a second language can be realized by means of combining the encoders and the decoders in the trained first language model and the trained second language model. By using the translation system provided in the present application, the training efficiency of the translation system can be improved, computing resources can be saved on, and the expandability of the translation system can be improved.

Description

一种翻译***及其训练、应用方法以及相关设备A translation system and its training, application methods and related equipment
本申请要求于2022年03月11日提交中国专利局、申请号为202210244066.X、申请名称为“一种翻译***及其训练、应用方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on March 11, 2022, with the application number 202210244066.X and the application title "A translation system and its training, application methods and related equipment", all of which The contents are incorporated into this application by reference.
技术领域Technical field
本申请涉及机器翻译领域,尤其涉及一种翻译***及其训练、应用方法以及相关设备。This application relates to the field of machine translation, and in particular to a translation system, its training, application methods and related equipment.
背景技术Background technique
机器翻译是指将一种语言的语句翻译为具有相同含义的另一种语言的语句。机器翻译包括基于规则的机器翻译、基于统计的机器翻译和基于神经网络的机器翻译。基于神经网络的机器翻译今年来迅速崛起,相比于基于规则和基于统计的机器翻译,基于神经网络的机器翻译的模型构建更加简单,翻译也更加准确性。因此,如何提供一种基于神经网络的机器翻译***是一个亟待解决的技术问题。Machine translation refers to the translation of sentences in one language into sentences in another language with the same meaning. Machine translation includes rule-based machine translation, statistics-based machine translation and neural network-based machine translation. Machine translation based on neural networks has risen rapidly this year. Compared with rule-based and statistics-based machine translation, the model construction of neural network-based machine translation is simpler and the translation is more accurate. Therefore, how to provide a machine translation system based on neural networks is an urgent technical problem that needs to be solved.
发明内容Contents of the invention
本申请提供一种翻译***及其训练、应用方法以及相关设备,能够提高翻译***的训练效率,节约计算资源,并能够提高翻译***的可扩展性。This application provides a translation system, its training, application methods and related equipment, which can improve the training efficiency of the translation system, save computing resources, and improve the scalability of the translation system.
第一方面,本申请提供一种翻译***的训练方法,该翻译***包括第一语言模型和第二语言模型,其中,第一语言模型包括第一编码器、第一对齐编码器、第一对齐解码器和第一解码器,第二语言模型包括第二编码器、第二对齐编码器、第二对齐解码器和第二解码器,该方法包括:In a first aspect, this application provides a training method for a translation system. The translation system includes a first language model and a second language model. The first language model includes a first encoder, a first alignment encoder, a first alignment encoder, and a first alignment encoder. decoder and a first decoder, the second language model includes a second encoder, a second aligned encoder, a second aligned decoder and a second decoder, and the method includes:
获取基于第一语言的语料训练得到的第一语言的第一编码器和第一解码器,获取基于第二语言的语料训练得到的第一语言的第二编码器和第二解码器;其中,第一编码器用于对第一语言表述的语句进行编码,输出对应的向量,第一解码器用于对输入的向量进行解码,输出第一语言表述的语句;第二编码器用于对第二语言表述的语句进行编码,输出对应的向量,第二解码器用于对输入的向量进行解码,输出第二语言表述的语句;Obtain the first encoder and the first decoder of the first language trained based on the corpus of the first language, and obtain the second encoder and the second decoder of the first language trained based on the corpus of the second language; wherein, The first encoder is used to encode the sentence expressed in the first language and output the corresponding vector. The first decoder is used to decode the input vector and output the sentence expressed in the first language. The second encoder is used to encode the sentence expressed in the second language. Encode the sentence and output the corresponding vector. The second decoder is used to decode the input vector and output the sentence expressed in the second language;
基于平行语料对第一语言模型和第二语言模型进行联合训练,得到训练好的第一对齐编码器、训练好的第一对齐解码器、训练好的第二对齐编码器和训练好的第二对齐解码器,平行语料包括第一语言和第二语言的同义语句的集合;The first language model and the second language model are jointly trained based on the parallel corpus to obtain the trained first aligned encoder, the trained first aligned decoder, the trained second aligned encoder and the trained second aligned encoder. Alignment decoder, the parallel corpus includes a collection of synonymous sentences in the first language and the second language;
其中,第一对齐编码器用于将第一编码器输出的向量转换到对齐向量空间,第二对齐编码器用于将第二编码器输出的向量转换到对齐向量空间;第一对齐解码器用于将第一对齐编码器的输出转换到第一解码器的输入对应的向量空间中,以使第一解码器对第一对齐解码器的输出进行解码,输出第一语言的语句;第二对齐解码器用于将第二对齐编码器的输出转换到第二解码器的输入的向量空间中,以使第二解码器对第二对齐解码器的输出进行解码,输出第二语言的语句。Wherein, the first alignment encoder is used to convert the vector output by the first encoder into the aligned vector space, the second alignment encoder is used to convert the vector output by the second encoder into the aligned vector space; the first alignment decoder is used to convert the vector output by the second encoder into the aligned vector space. The output of an aligned encoder is converted into a vector space corresponding to the input of the first decoder, so that the first decoder decodes the output of the first aligned decoder and outputs a sentence in the first language; the second aligned decoder is used to The output of the second alignment encoder is converted into the vector space of the input of the second decoder, so that the second decoder decodes the output of the second alignment decoder and outputs a sentence in the second language.
通过构建本申请提供的翻译***,在需要实现多种语言之间的相互翻译时,只需要先分 别训练各种语言对应的单语言模型;然后在各个单语言模型的编码器和解码器之间增加对齐编码器和对齐解码器,构建本申请提供的翻译***;最后通过平行语料训练构建的翻译***,就可以得到训练好的能够实现多种语言之间相互翻译的翻译***。从而在进行翻译时,只需要将上述训练好的翻译***中源语言对应的编码器和对齐编码器以及目标语言对应的解码器和对齐解码器进行组合,即可得到将源语言翻译为目标语言的翻译模型。例如实现A语言和B语言的相互翻译,只需要将训练好的翻译***中,A语言对应的编码器和对齐编码器以及B语言对应的解码器和对齐解码器进行组合,即可得到将A语言翻译为B语言的模型;将B语言对应的编码器和对齐编码器以及A语言对应的解码器和对齐解码器进行组合,即可得到将B语言翻译为A语言的模型。By building the translation system provided by this application, when it is necessary to realize mutual translation between multiple languages, it is only necessary to first train the single-language models corresponding to each language; and then between the encoder and decoder of each single-language model Add an alignment encoder and an alignment decoder to construct the translation system provided by this application; finally, through the translation system constructed through parallel corpus training, you can obtain a well-trained translation system that can realize mutual translation between multiple languages. Therefore, when translating, you only need to combine the encoder and alignment encoder corresponding to the source language and the decoder and alignment decoder corresponding to the target language in the above-trained translation system to obtain the translation of the source language into the target language. translation model. For example, to realize the mutual translation of language A and language B, you only need to combine the encoder and alignment encoder corresponding to language A and the decoder and alignment decoder corresponding to language B in the trained translation system to obtain the translation of language A. A model that translates language into language B; by combining the encoder and alignment encoder corresponding to language B and the decoder and alignment decoder corresponding to language A, a model that translates language B into language A can be obtained.
另外,在训练好能够实现第一语言和第二语言这两种语言相互翻译的***之后,如果要增加第三语言,实现三种语言之间的相互翻译,只需要单独训练第三语言对应的单语言模型,然后再将第一语言模型、第二语言模型和第三语言模型一起联合训练,即可得到能够实现三种语言之间相互翻译的模型,而不需要分别训练将第一语言翻译为第三语言的模型、将第三语言翻译到第一语言、将第二语言翻译到第三语言的模型以及将第三语言翻译到第二语言的模型。因此使用本申请提供的翻译模型,在进行扩展时,能够减少需要训练的模型数量,提高训练效率,提高***的扩展性。In addition, after training a system that can realize mutual translation between the first language and the second language, if you want to add a third language and realize mutual translation between the three languages, you only need to separately train the system corresponding to the third language. Single language model, and then jointly train the first language model, second language model and third language model together, you can get a model that can realize mutual translation between the three languages, without the need to train the first language translation separately A model for a third language, a model for translating a third language into a first language, a model for translating a second language into a third language, and a model for translating a third language into a second language. Therefore, using the translation model provided by this application can reduce the number of models that need to be trained during expansion, improve training efficiency, and improve the scalability of the system.
在一种可能的实现方式中,上述基于平行语料对第一语言模型和第二语言模型进行联合训练,包括:通过第一对齐编码器对第一编码器输出的第一语句对应的向量进行编码,以及通过第二对齐编码器对第二编码器输出的第二语句对应的向量进行编码,第一语句和第二语句是使用两种语言表述的具有相同语义的语句;基于第一对齐编码器的输出和第二对齐编码器的输出,更新第一对齐编码器的参数和第二对齐编码器的参数;通过第一对齐解码器对第一对齐编码器的输出进行解码,基于第一对齐解码器的输出和第一编码器的输出,更新第一对齐解码器的参数和第一对齐编码器的参数;通过第二对齐解码器对第二对齐编码器的输出进行解码,基于第二对齐解码器的输出和第二编码器的输出,更新第二对齐解码器的参数和第二对齐编码器的参数。In a possible implementation, the above-mentioned joint training of the first language model and the second language model based on parallel corpus includes: encoding the vector corresponding to the first sentence output by the first encoder through the first alignment encoder , and encode the vector corresponding to the second statement output by the second encoder through the second alignment encoder. The first statement and the second statement are statements with the same semantics expressed in two languages; based on the first alignment encoder The output of the first alignment encoder and the output of the second alignment encoder, update the parameters of the first alignment encoder and the parameters of the second alignment encoder; decode the output of the first alignment encoder through the first alignment decoder, based on the first alignment decoding The output of the encoder and the output of the first encoder, update the parameters of the first alignment decoder and the parameters of the first alignment encoder; decode the output of the second alignment encoder through the second alignment decoder, based on the second alignment decoding The output of the decoder and the output of the second encoder are used to update the parameters of the second alignment decoder and the parameters of the second alignment encoder.
第一对齐编码器和第二对齐编码器将第一语句对应的向量和第二语句对应的向量转换到相同的向量空间,即上述对齐向量空间,通过第一对齐编码器的输出和第二对齐编码器的输出更新第一对齐编码器的参数和第二对齐编码的参数,能够使两种语言表达的具有相同语义的语句,在通过第一对齐编码和第二对齐编码器分别进行编码后输出的向量更加接近。第一对齐解码器将第一对齐编码器的输出转换到第一解码器的输入对应的向量空间,由于第一对齐解码器输出的向量与第一编码器输出的向量都属于第一解码器的输入对应的向量空间,通过第一对齐解码器的输出与第一编码器的输出更新第一对齐解码器的参数和第一对齐编码器的参数,能够使得第一对齐解码器的输出与第一编码器的输出更接近,使得第一对齐解码器的输出被第一解码器解码后得到的语句,更加接近输入到第一编码器的语句。The first alignment encoder and the second alignment encoder convert the vector corresponding to the first statement and the vector corresponding to the second statement into the same vector space, that is, the above-mentioned alignment vector space, through the output of the first alignment encoder and the second alignment The output of the encoder updates the parameters of the first alignment encoder and the parameters of the second alignment encoding, so that sentences with the same semantics expressed in the two languages are output after being encoded by the first alignment encoding and the second alignment encoder respectively. The vectors are closer. The first aligned decoder converts the output of the first aligned encoder into the vector space corresponding to the input of the first decoder, because the vector output by the first aligned decoder and the vector output by the first encoder both belong to the first decoder. Input the corresponding vector space, update the parameters of the first alignment decoder and the parameters of the first alignment encoder through the output of the first alignment decoder and the output of the first encoder, so that the output of the first alignment decoder can be consistent with the first alignment decoder. The output of the encoder is closer, so that the sentence obtained after the output of the first alignment decoder is decoded by the first decoder is closer to the sentence input to the first encoder.
在一种可能的实现方式中,基于第一对齐编码器的输出和第二对齐编码器的输出,更新第一对齐编码器的参数和第二对齐编码器的参数,包括:基于对齐编码器的输出和第二对齐编码器的输出,计算语义对齐损失,根据语义对齐损失更新第一对齐编码器的参数和第二对齐编码器的参数。In a possible implementation, based on the output of the first alignment encoder and the output of the second alignment encoder, updating the parameters of the first alignment encoder and the parameters of the second alignment encoder includes: based on the alignment encoder output and the output of the second alignment encoder, calculate the semantic alignment loss, and update the parameters of the first alignment encoder and the parameters of the second alignment encoder according to the semantic alignment loss.
通过第一对齐编码器的输出和第二对齐编码器的输出计算语义损失,更新第一对齐编码器的参数和第二对齐编码的参数,能够使两种语言表达的具有相同语义的语句,在通过第一 对齐编码和第二对齐编码器分别进行编码后输出的向量更加接近。Calculate the semantic loss through the output of the first alignment encoder and the output of the second alignment encoder, and update the parameters of the first alignment encoder and the parameters of the second alignment encoding, so that sentences with the same semantics expressed in two languages can be expressed in The output vectors encoded by the first alignment encoder and the second alignment encoder are closer.
在一种可能的实现方式中,上述翻译***还包括第三语言模型,第三语言模型包括第三编码器、第三对齐编码器、第三对齐解码器和第三解码器;上述方法还包括:In a possible implementation, the above translation system also includes a third language model, and the third language model includes a third encoder, a third alignment encoder, a third alignment decoder, and a third decoder; the above method also includes :
获取基于第三语言的语料训练得到的第三语言的第三编码器和第三解码器;Obtain the third encoder and the third decoder of the third language that are trained based on the corpus of the third language;
通过平行语料对第一语言模型和第二语言模型进行训练,得到训练好的第一语言模型和第二语言模型,包括:The first language model and the second language model are trained through parallel corpora to obtain the trained first language model and second language model, including:
通过平行语料对第一语言模型、第二语言模型和第三语言模型进行联合训练,得到训练好的第一语言模型、第二语言模型和第三语言模型,平行语料包括第一语言、第二语言和第三语言的同义语句的集合;The first language model, the second language model and the third language model are jointly trained through parallel corpus to obtain the trained first language model, second language model and third language model. The parallel corpus includes the first language, the second language model and the third language model. A collection of synonymous sentences in one language and a third language;
其中,第三对齐编码器用于将第三编码器输出的向量转换到对齐向量空间;第三对齐解码器用于将第三对齐编码器的输出转换到第三解码器的输入对应的向量空间,以使第三解码器对第三对齐解码器的输出进行解码,输出第三语言的语句。Wherein, the third alignment encoder is used to convert the vector output by the third encoder into the alignment vector space; the third alignment decoder is used to convert the output of the third alignment encoder into the vector space corresponding to the input of the third decoder, so as to The third decoder is caused to decode the output of the third alignment decoder and output a sentence in the third language.
在训练好能够实现第一语言和第二语言这两种语言相互翻译的***之后,如果要增加第三语言,实现三种语言之间的相互翻译,只需要单独训练第三语言对应的单语言模型,然后再将第一语言模型、第二语言模型和第三语言模型一起联合训练,即可得到能够实现三种语言之间相互翻译的模型,而不需要分别训练将第一语言翻译为第三语言的模型、将第三语言翻译到第一语言、将第二语言翻译到第三语言的模型以及将第三语言翻译到第二语言的模型。因此使用本申请提供的翻译模型,在进行扩展时,能够减少需要训练的模型数量和需要处理的数据量,提高训练效率,提高***的扩展性。After training a system that can realize mutual translation between the first language and the second language, if you want to add a third language and realize mutual translation between the three languages, you only need to train the single language corresponding to the third language separately. model, and then jointly train the first language model, the second language model and the third language model together, you can get a model that can realize mutual translation between the three languages, without the need to train separately to translate the first language into the third language. A model for three languages, a model for translating a third language into a first language, a model for translating a second language into a third language, and a model for translating a third language into a second language. Therefore, using the translation model provided by this application can reduce the number of models that need to be trained and the amount of data that needs to be processed during expansion, improve training efficiency, and improve the scalability of the system.
在一种可能的实现方式中,通过平行语料对第一语言模型、第二语言模型和第三语言模型进行联合训练,包括:In one possible implementation, the first language model, the second language model and the third language model are jointly trained through parallel corpora, including:
通过第一对齐编码器对第一编码器输出的第一语句对应的向量进行编码、通过第二对齐编码器对第二编码器输出的第二语句对应的向量进行编码以及通过第三对齐编码器对第三编码器输出的第三语句进行编码,第一语句、第二语句和第三语句是使用三种语言表述的具有相同语义的语句;The vector corresponding to the first sentence output by the first encoder is encoded by the first alignment encoder, the vector corresponding to the second sentence output by the second encoder is encoded by the second alignment encoder, and the vector corresponding to the second sentence output by the second encoder is encoded by the third alignment encoder. Encode the third sentence output by the third encoder, where the first sentence, the second sentence and the third sentence are sentences with the same semantics expressed in three languages;
基于第一对齐编码器的输出、第二对齐编码器的输出和第三对齐编码器的输出,更新第三对齐编码器的参数;updating parameters of the third alignment encoder based on the output of the first alignment encoder, the output of the second alignment encoder, and the output of the third alignment encoder;
通过第三对齐解码器对第三对齐编码器的输出进行解码,基于第三对齐解码器的输出和第三编码器的输出,更新第三对齐解码器的参数和第三对齐编码器的参数。The output of the third alignment encoder is decoded by the third alignment decoder, and the parameters of the third alignment decoder and the parameters of the third alignment encoder are updated based on the output of the third alignment decoder and the output of the third encoder.
在训练好的两种语言相互翻译的翻译***的基础上,再训练上述能够进行三种语言相互翻译的翻译***,可以不更新第一语言模型和第二语言模型的参数,仅对第三单语言模型进行预训练,然后再对第三对齐编码器和第三对齐解码器进行训练。即,不需要再对第一单语言模型和第二单语言模型进行训练和参数更新,在对第三单语言模型进行训练之后,再通过平行语料训练第一语言模型、第二语言模型和第三语言模型,仅通过计算得到语义对齐损失和第三自编码损失,然后根据语义对齐损失和第三自编码损失更新第三对齐编码器的参数,根据第三自编码损失更新第三对齐解码器的参数。能够在对翻译***进行扩展时,降低需要处理的数据量,提高训练效率,提高翻译***的可扩展性。On the basis of the trained translation system that can translate two languages into each other, and then train the above-mentioned translation system that can translate between three languages, it is not necessary to update the parameters of the first language model and the second language model, and only update the third language model. The language model is pre-trained, and then the third-aligned encoder and third-aligned decoder are trained. That is, there is no need to train and parameter update the first single language model and the second single language model. After training the third single language model, the first language model, the second language model and the third single language model are trained through parallel corpus. The three-language model only calculates the semantic alignment loss and the third autoencoding loss, then updates the parameters of the third alignment encoder based on the semantic alignment loss and the third autoencoding loss, and updates the third alignment decoder based on the third autoencoding loss. parameters. When expanding the translation system, it can reduce the amount of data that needs to be processed, improve training efficiency, and improve the scalability of the translation system.
在一种可能的实现方式中,通过平行语料对第一语言模型、第二语言模型和第三语言模型进行联合训练,包括:In one possible implementation, the first language model, the second language model and the third language model are jointly trained through parallel corpora, including:
通过第一对齐编码器对第一编码器输出的第一语句对应的向量进行编码、通过第二对齐 编码器对第二编码器输出的第二语句对应的向量进行编码以及通过第三对齐编码器对第三编码器输出的第三语句进行编码,第一语句、第二语句和第三语句是使用三种语言表述的具有相同语义的语句;The vector corresponding to the first sentence output by the first encoder is encoded by the first alignment encoder, the vector corresponding to the second sentence output by the second encoder is encoded by the second alignment encoder, and the vector corresponding to the second sentence output by the second encoder is encoded by the third alignment encoder. Encode the third sentence output by the third encoder, where the first sentence, the second sentence and the third sentence are sentences with the same semantics expressed in three languages;
基于第一对齐编码器的输出、第二对齐编码器的输出和第三对齐编码器的输出,更新第一对齐编码器的参数、第二对齐编码器的参数和第三对齐编码器的参数;updating the parameters of the first alignment encoder, the parameters of the second alignment encoder, and the parameters of the third alignment encoder based on the output of the first alignment encoder, the output of the second alignment encoder, and the output of the third alignment encoder;
通过第一对齐解码器对第一对齐编码器的输出进行解码,基于第一对齐解码器的输出和第一编码器的输出,更新第一对齐解码器的参数和第一对齐编码器的参数;decode the output of the first alignment encoder through the first alignment decoder, and update the parameters of the first alignment decoder and the parameters of the first alignment encoder based on the output of the first alignment decoder and the output of the first encoder;
通过第二对齐解码器对第二对齐编码器的输出进行解码,基于第二对齐解码器的输出和第二编码器的输出,更新第二对齐解码器的参数和第二对齐编码器的参数;decode the output of the second alignment encoder through the second alignment decoder, and update the parameters of the second alignment decoder and the parameters of the second alignment encoder based on the output of the second alignment decoder and the output of the second encoder;
通过第三对齐解码器对第三对齐编码器的输出进行解码,基于第三对齐解码器的输出和第三编码器的输出,更新第三对齐解码器的参数和第三对齐编码器的参数。The output of the third alignment encoder is decoded by the third alignment decoder, and the parameters of the third alignment decoder and the parameters of the third alignment encoder are updated based on the output of the third alignment decoder and the output of the third encoder.
第二方面,本申请提供一种翻译方法,该翻译方法包括:通过第一编码器对输入的第一语言的第一源语句进行编码,输出第一源语句对应的向量集合,第一源语句对应的向量集合中的向量属于第一解码器的输入对应的向量空间;通过第一对齐编码器将第一源语句对应的向量集合中的向量转换为对齐向量空间中的第一特征向量集合;通过第二对齐解码器将第一特征向量集合中的向量转换为第二解码器的输入对应的向量空间的向量;通过第二解码器对第二对齐解码器输出的向量进行解码,输出第一源语句对应的第二语言的翻译语句。In a second aspect, this application provides a translation method. The translation method includes: encoding the input first source sentence of the first language through a first encoder, and outputting a vector set corresponding to the first source sentence. The first source sentence The vectors in the corresponding vector set belong to the vector space corresponding to the input of the first decoder; the vectors in the vector set corresponding to the first source sentence are converted into the first feature vector set in the aligned vector space through the first alignment encoder; Convert the vectors in the first feature vector set into vectors in the vector space corresponding to the input of the second decoder through the second alignment decoder; decode the vector output by the second alignment decoder through the second decoder, and output the first The translation sentence in the second language corresponding to the source sentence.
其中,第一对齐编码器能够将输入的向量转换为对齐向量空间的向量,第二对齐解码器能够将输入的向量转换为第二解码器的输入对应的向量空间的向量。通过第一编码器将第一源语句转换为向量之后,通过第一对齐编码器将第一源语句对应的向量转换为对齐向量空间的向量,然后输入到第二对齐解码器,以将输入的向量转换到第二解码器的输入对应的向量空间,从而能够通过第二解码器进行解码,得到第一源语句对应的第二语言的翻译语句。The first alignment encoder can convert the input vector into a vector in the aligned vector space, and the second alignment decoder can convert the input vector into a vector in the vector space corresponding to the input of the second decoder. After the first source sentence is converted into a vector through the first encoder, the vector corresponding to the first source sentence is converted into a vector in the aligned vector space through the first alignment encoder, and then input to the second alignment decoder to convert the input The vector is converted to a vector space corresponding to the input of the second decoder, so that it can be decoded by the second decoder to obtain a translated sentence in the second language corresponding to the first source sentence.
在一种可能的实现方式中,上述方法还包括:通过第二编码器对输入的第二语言的第二源语句进行编码,输出第二源语句对应的向量集合,第二源语句对应的向量集合中的向量属于第二解码器的输入对应的向量空间;通过第二对齐编码器将第二解码器的输入对应的向量集合中的向量转换为对齐向量空间中的第二特征向量集合;通过第一对齐解码器将第二特征向量集合中的向量转换为第一解码器的输入对应的向量空间的向量;通过第一解码器对第一解码器的输入对应的向量空间的向量进行解码,输出第二源语句对应的第一语言的翻译语句。In a possible implementation, the above method further includes: encoding the input second source sentence of the second language through the second encoder, outputting a vector set corresponding to the second source sentence, and the vector set corresponding to the second source sentence. The vectors in the set belong to the vector space corresponding to the input of the second decoder; the vectors in the vector set corresponding to the input of the second decoder are converted into a second feature vector set in the aligned vector space through the second alignment encoder; The first alignment decoder converts the vectors in the second feature vector set into vectors in the vector space corresponding to the input of the first decoder; the vector in the vector space corresponding to the input of the first decoder is decoded by the first decoder, Output the translation sentence in the first language corresponding to the second source sentence.
其中,第二对齐编码器能够将输入的向量转换为对齐向量空间的向量,第一对齐解码器能够将输入的向量转换为第一解码器的输入对应的向量空间的向量。通过第二编码器将第二源语句转换为向量之后,通过第二对齐编码器将第二源语句对应的向量转换为对齐向量空间的向量,然后输入到第一对齐解码器,以将输入的向量转换到第一解码器的输入对应的向量空间,从而能够通过第一解码器进行解码,得到第二源语句对应的第一语言的翻译语句。The second aligned encoder can convert the input vector into a vector in the aligned vector space, and the first aligned decoder can convert the input vector into a vector in the vector space corresponding to the input of the first decoder. After converting the second source sentence into a vector through the second encoder, convert the vector corresponding to the second source sentence into a vector in the aligned vector space through the second alignment encoder, and then input it to the first alignment decoder to convert the input The vector is converted into a vector space corresponding to the input of the first decoder, so that it can be decoded by the first decoder to obtain a translated sentence in the first language corresponding to the second source sentence.
本申请提供的翻译方法,在进行翻译时,只需要将上述训练好的翻译***中源语言对应的编码器和对齐编码器以及目标语言对应的解码器和对齐解码器进行组合,即可得到将源语言翻译为目标语言的翻译模型。例如实现A语言和B语言的相互翻译,只需要将训练好的翻译***中,A语言对应的编码器和对齐编码器以及B语言对应的解码器和对齐解码器进行组合,即可得到将A语言翻译为B语言的模型;将B语言对应的编码器和对齐编码器以及A语言对应的解码器和对齐解码器进行组合,即可得到将B语言翻译为A语言的模型。The translation method provided by this application only needs to combine the encoder and alignment encoder corresponding to the source language and the decoder and alignment decoder corresponding to the target language in the above-trained translation system to obtain the translation method. A translation model that translates the source language into the target language. For example, to realize the mutual translation of language A and language B, you only need to combine the encoder and alignment encoder corresponding to language A and the decoder and alignment decoder corresponding to language B in the trained translation system to obtain the translation of language A. A model that translates language into language B; by combining the encoder and alignment encoder corresponding to language B and the decoder and alignment decoder corresponding to language A, a model that translates language B into language A can be obtained.
在一种可能的实现方式中,上述方法还包括:通过第三编码器对输入的第三语言的第三 源语句进行编码,输出第三源语句对应的向量集合,第三源语句对应的向量集合中的向量属于第三解码器的输入对应的向量空间;通过第三对齐编码器将第三解码器的输入对应的向量集合中的向量转换为对齐向量空间的第三特征向量集合;通过第一对齐解码器将第三特征向量集合中的向量转换为第一解码器的输入对应的向量空间的向量;通过第一解码器对第一解码器的输入对应的向量空间的向量进行解码,输出第三源语句对应的第一语言的翻译语句。In a possible implementation, the above method further includes: encoding the input third source sentence of the third language through a third encoder, outputting a vector set corresponding to the third source sentence, and a vector set corresponding to the third source sentence. The vectors in the set belong to the vector space corresponding to the input of the third decoder; the vectors in the vector set corresponding to the input of the third decoder are converted into a third feature vector set of the aligned vector space through the third alignment encoder; An alignment decoder converts the vectors in the third feature vector set into vectors in the vector space corresponding to the input of the first decoder; the first decoder decodes the vector in the vector space corresponding to the input of the first decoder and outputs The translation sentence in the first language corresponding to the third source sentence.
在一种可能的实现方式中,上述方法还包括:通过第一编码器对输入的第一语言的第四源语句进行编码,输出第四源语句对应的向量集合,第四源语句对应的向量集合中的向量属于第一解码器的输入对应的向量空间;通过第一对齐编码器将第四源语句对应的向量集合中的向量转换为对齐向量空间的第四特征向量集合;通过第三对齐解码器将第四特征向量集合中的向量转换为第三解码器的输入对应的向量空间的向量;通过第三解码器对第三解码器的输入对应的向量空间的向量进行解码,输出第四源语句对应的第三语言的翻译语句。In a possible implementation, the above method further includes: encoding the input fourth source sentence of the first language through the first encoder, outputting a vector set corresponding to the fourth source sentence, and a vector set corresponding to the fourth source sentence. The vectors in the set belong to the vector space corresponding to the input of the first decoder; the vectors in the vector set corresponding to the fourth source sentence are converted into a fourth feature vector set of the aligned vector space through the first alignment encoder; through the third alignment The decoder converts the vectors in the fourth feature vector set into vectors in the vector space corresponding to the input of the third decoder; decodes the vectors in the vector space corresponding to the input of the third decoder through the third decoder, and outputs the fourth The translation sentence in the third language corresponding to the source sentence.
第三方面,本申请提供一种翻译***,该翻译***包括第一编码器、第一对齐编码器、第二对齐解码器和第二解码器,其中,In a third aspect, the present application provides a translation system, which includes a first encoder, a first alignment encoder, a second alignment decoder, and a second decoder, wherein,
第一编码器,用于对输入的第一语言的第一源语句进行编码,输出第一源语句对应的向量集合,第一源语句对应的向量集合中的向量属于第一解码器的输入对应的向量空间;The first encoder is used to encode the input first source sentence of the first language and output a vector set corresponding to the first source sentence. The vectors in the vector set corresponding to the first source sentence belong to the input correspondence of the first decoder. vector space;
第一对齐编码器,用于将第一源语句对应的向量集合中的向量转换为对齐向量空间的第一特征向量集合;The first alignment encoder is used to convert the vectors in the vector set corresponding to the first source sentence into the first feature vector set of the aligned vector space;
第二对齐解码器,用于将第一特征向量集合中的向量转换为第二解码器的输入对应的向量空间的向量;a second alignment decoder, configured to convert vectors in the first feature vector set into vectors in the vector space corresponding to the input of the second decoder;
第二解码器,用于对第二解码器的输入对应的向量空间的向量进行解码,输出第一源语句对应的第二语言的翻译语句。The second decoder is used to decode the vector in the vector space corresponding to the input of the second decoder, and output the translated sentence in the second language corresponding to the first source sentence.
在一种可能的实现方式中,上述翻译***还包括第二编码器、第二对齐解码器、第一对齐解码器和第一解码器,其中,In a possible implementation, the above translation system further includes a second encoder, a second alignment decoder, a first alignment decoder and a first decoder, wherein,
第二编码器,用于对输入的第二语言的第二源语句进行编码,输出第二源语句对应的向量集合,第二源语句对应的向量集合中的向量属于第二解码器的输入对应的向量空间;The second encoder is used to encode the input second source sentence of the second language and output a vector set corresponding to the second source sentence. The vectors in the vector set corresponding to the second source sentence belong to the input correspondence of the second decoder. vector space;
第二对齐编码器,用于将第二解码器的输入对应的向量集合中的向量转换为对齐向量空间的第二特征向量集合;The second alignment encoder is used to convert the vectors in the vector set corresponding to the input of the second decoder into the second feature vector set of the aligned vector space;
第一对齐解码器,用于将第二特征向量集合中的向量转换为第一解码器的输入对应的向量空间的向量;A first alignment decoder, configured to convert vectors in the second feature vector set into vectors in the vector space corresponding to the input of the first decoder;
第一解码器,用于对第一语言对应的向量空间的向量进行解码,输出第二源语句对应的第一语言的翻译语句。The first decoder is used to decode the vector in the vector space corresponding to the first language and output the translated sentence in the first language corresponding to the second source sentence.
在一种可能的实现方式中,上述翻译***还包括第一对齐解码器和第一解码器、第三编码器和第三对齐编码器,其中,In a possible implementation, the above translation system further includes a first alignment decoder, a third encoder, and a third alignment encoder, wherein,
第三编码器,用于对输入的第三语言的第三源语句进行编码,输出第三源语句对应的向量集合,第三源语句对应的向量集合中的向量属于第三解码器的输入对应的向量空间;The third encoder is used to encode the input third source sentence of the third language and output the vector set corresponding to the third source sentence. The vectors in the vector set corresponding to the third source sentence belong to the input correspondence of the third decoder. vector space;
第三对齐编码器,用于将第三解码器的输入对应的向量集合中的向量转换为对齐向量空间的第三特征向量集合;The third alignment encoder is used to convert the vectors in the vector set corresponding to the input of the third decoder into the third feature vector set of the aligned vector space;
第一对齐解码器,用于将第三特征向量集合中的向量转换为第一解码器的输入对应的向量空间的向量;The first alignment decoder is used to convert the vectors in the third feature vector set into vectors in the vector space corresponding to the input of the first decoder;
第一解码器,用于对第一解码器的输入对应的向量空间的向量进行解码,输出第三源语 句对应的第一语言的翻译语句。The first decoder is used to decode the vector in the vector space corresponding to the input of the first decoder, and output the translated sentence in the first language corresponding to the third source sentence.
在一种可能的实现方式中,上述翻译***还包括第三对齐解码器和第三解码器,其中,In a possible implementation, the above translation system also includes a third alignment decoder and a third decoder, wherein,
第一编码器,还用于对输入的第一语言的第四源语句进行编码,输出第四源语句对应的向量集合,第四源语句对应的向量集合中的向量属于第一解码器的输入对应的向量空间;The first encoder is also used to encode the input fourth source sentence of the first language and output a vector set corresponding to the fourth source sentence. The vectors in the vector set corresponding to the fourth source sentence belong to the input of the first decoder. The corresponding vector space;
第一对齐编码器,还用于将第四源语句对应的向量集合中的向量转换为对齐向量空间的第四特征向量集合;The first alignment encoder is also used to convert the vectors in the vector set corresponding to the fourth source sentence into the fourth feature vector set of the aligned vector space;
第三对齐解码器,用于将第四特征向量集合中的向量转换为第三解码器的输入对应的向量空间的向量;The third alignment decoder is used to convert the vectors in the fourth feature vector set into vectors in the vector space corresponding to the input of the third decoder;
第三解码器,用于对第三解码器的输入对应的向量空间的向量进行解码,输出第四源语句对应的第三语言的翻译语句。The third decoder is used to decode the vector in the vector space corresponding to the input of the third decoder, and output the translated sentence in the third language corresponding to the fourth source sentence.
第四方面,本申请提供一种翻译***的训练装置,该翻译***包括第一语言模型和第二语言模型,其中,第一语言模型包括第一编码器、第一对齐编码器、第一对齐解码器和第一解码器,第二语言模型包括第二编码器、第二对齐编码器、第二对齐解码器和第二解码器,该装置包括:In a fourth aspect, this application provides a training device for a translation system. The translation system includes a first language model and a second language model, wherein the first language model includes a first encoder, a first alignment encoder, a first alignment The decoder and the first decoder, the second language model includes a second encoder, a second aligned encoder, a second aligned decoder and a second decoder, and the device includes:
获取模块,用于获取基于第一语言的语料训练得到的第一语言的第一编码器和第一解码器,获取基于第二语言的语料训练得到的第一语言的第二编码器和第二解码器;其中,第一编码器用于对第一语言表述的语句进行编码,输出对应的向量,第一解码器用于对输入的向量进行解码,输出第一语言表述的语句;第二编码器用于对第二语言表述的语句进行编码,输出对应的向量,第二解码器用于对输入的向量进行解码,输出第二语言表述的语句;The acquisition module is used to acquire the first encoder and the first decoder of the first language which are trained based on the corpus of the first language, and acquire the second encoder and the second decoder of the first language which are trained based on the corpus of the second language. Decoder; wherein, the first encoder is used to encode the sentence expressed in the first language and output the corresponding vector; the first decoder is used to decode the input vector and output the sentence expressed in the first language; the second encoder is used to Encode the sentence expressed in the second language and output the corresponding vector, and the second decoder is used to decode the input vector and output the sentence expressed in the second language;
处理模块,用于基于平行语料对第一语言模型和第二语言模型进行联合训练,得到训练好的第一对齐编码器、训练好的第一对齐解码器、训练好的第二对齐编码器和训练好的第二对齐解码器,平行语料包括第一语言和第二语言的同义语句的集合;The processing module is used to jointly train the first language model and the second language model based on parallel corpus, and obtain the trained first alignment encoder, the trained first alignment decoder, the trained second alignment encoder and The trained second alignment decoder, the parallel corpus includes a set of synonymous sentences in the first language and the second language;
其中,第一对齐编码器用于将第一编码器输出的向量转换到对齐向量空间,第二对齐编码器用于将第二编码器输出的向量转换到对齐向量空间;第一对齐解码器用于将第一对齐编码器的输出转换到第一解码器的输入对应的向量空间中,以使第一解码器对第一对齐解码器的输出进行解码,输出第一语言的语句;第二对齐解码器用于将第二对齐编码器的输出转换到第二解码器的输入的向量空间中,以使第二解码器对第二对齐解码器的输出进行解码,输出第二语言的语句。Wherein, the first alignment encoder is used to convert the vector output by the first encoder into the aligned vector space, the second alignment encoder is used to convert the vector output by the second encoder into the aligned vector space; the first alignment decoder is used to convert the vector output by the second encoder into the aligned vector space. The output of an aligned encoder is converted into a vector space corresponding to the input of the first decoder, so that the first decoder decodes the output of the first aligned decoder and outputs a sentence in the first language; the second aligned decoder is used to The output of the second alignment encoder is converted into the vector space of the input of the second decoder, so that the second decoder decodes the output of the second alignment decoder and outputs a sentence in the second language.
在一种可能的实现方式中,上述处理模块具体用于,通过第一对齐编码器对第一编码器输出的第一语句对应的向量进行编码,以及通过第二对齐编码器对第二编码器输出的第二语句对应的向量进行编码,第一语句和第二语句是使用两种语言表述的具有相同语义的语句;基于第一对齐编码器的输出和第二对齐编码器的输出,更新第一对齐编码器的参数和第二对齐编码器的参数;通过第一对齐解码器对第一对齐编码器的输出进行解码,基于第一对齐解码器的输出和第一编码器的输出,更新第一对齐解码器的参数和第一对齐编码器的参数;通过第二对齐解码器对第二对齐编码器的输出进行解码,基于第二对齐解码器的输出和第二编码器的输出,更新第二对齐解码器的参数和第二对齐编码器的参数。In a possible implementation, the above processing module is specifically configured to encode the vector corresponding to the first sentence output by the first encoder through the first alignment encoder, and encode the vector corresponding to the first sentence output by the second alignment encoder through the second alignment encoder. The vector corresponding to the output second sentence is encoded. The first sentence and the second sentence are sentences with the same semantics expressed in two languages; based on the output of the first alignment encoder and the output of the second alignment encoder, update the Parameters of an alignment encoder and parameters of a second alignment encoder; decoding the output of the first alignment encoder through the first alignment decoder, and updating the second alignment encoder based on the output of the first alignment decoder and the output of the first encoder. Parameters of an alignment decoder and parameters of the first alignment encoder; decoding the output of the second alignment encoder through the second alignment decoder, and updating the second alignment encoder based on the output of the second alignment decoder and the output of the second encoder. Parameters of the second aligned decoder and parameters of the second aligned encoder.
在一种可能的实现方式中,上述处理模块具体用于:基于第一对齐编码器的输出和第二对齐编码器的输出,更新第一对齐编码器的参数和第二对齐编码器的参数,包括:基于对齐编码器的输出和第二对齐编码器的输出,计算语义对齐损失,根据语义对齐损失更新第一对齐编码器的参数和第二对齐编码器的参数。In a possible implementation, the above processing module is specifically configured to: update the parameters of the first alignment encoder and the parameters of the second alignment encoder based on the output of the first alignment encoder and the output of the second alignment encoder, The method includes: calculating the semantic alignment loss based on the output of the alignment encoder and the output of the second alignment encoder, and updating the parameters of the first alignment encoder and the parameters of the second alignment encoder based on the semantic alignment loss.
在一种可能的实现方式中,上述翻译***还包括第三语言模型,第三语言模型包括第三编码器、第三对齐编码器、第三对齐解码器和第三解码器;上述获取模块,还用于获取基于第三语言的语料训练得到的第三语言的第三编码器和第三解码器;上述处理模块,还用于通过平行语料对第一语言模型、第二语言模型和第三语言模型进行联合训练,得到训练好的第一语言模型、第二语言模型和第三语言模型,平行语料包括第一语言、第二语言和第三语言的同义语句的集合;其中,第三对齐编码器用于将第三编码器输出的向量转换到对齐向量空间;第三对齐解码器用于将第三对齐编码器的输出转换到第三解码器的输入对应的向量空间,以使第三解码器对第三对齐解码器的输出进行解码,输出第三语言的语句。In a possible implementation, the above translation system also includes a third language model, and the third language model includes a third encoder, a third alignment encoder, a third alignment decoder and a third decoder; the above acquisition module, It is also used to obtain the third encoder and the third decoder of the third language that are trained based on the corpus of the third language; the above-mentioned processing module is also used to compare the first language model, the second language model and the third language model through parallel corpus. The language models are jointly trained to obtain the trained first language model, second language model and third language model. The parallel corpus includes a collection of synonymous sentences in the first language, second language and third language; among them, the third language model The aligned encoder is used to convert the vector output by the third encoder into the aligned vector space; the third aligned decoder is used to convert the output of the third aligned encoder into the vector space corresponding to the input of the third decoder, so that the third decoded The decoder decodes the output of the third alignment decoder and outputs a sentence in the third language.
在一种可能的实现方式中,上述处理模块具体用于:通过第一对齐编码器对第一编码器输出的第一语句对应的向量进行编码、通过第二对齐编码器对第二编码器输出的第二语句对应的向量进行编码以及通过第三对齐编码器对第三编码器输出的第三语句进行编码,第一语句、第二语句和第三语句是使用三种语言表述的具有相同语义的语句;基于第一对齐编码器的输出、第二对齐编码器的输出和第三对齐编码器的输出,更新第三对齐编码器的参数;通过第三对齐解码器对第三对齐编码器的输出进行解码,基于第三对齐解码器的输出和第三编码器的输出,更新第三对齐解码器的参数和第三对齐编码器的参数。In a possible implementation, the above processing module is specifically configured to: encode the vector corresponding to the first sentence output by the first encoder through the first alignment encoder, and encode the vector corresponding to the first sentence output by the second alignment encoder through the second alignment encoder. The vector corresponding to the second sentence is encoded and the third sentence output by the third encoder is encoded through the third alignment encoder. The first sentence, the second sentence and the third sentence are expressed in three languages and have the same semantics. statement; update the parameters of the third alignment encoder based on the output of the first alignment encoder, the output of the second alignment encoder and the output of the third alignment encoder; update the parameters of the third alignment encoder through the third alignment decoder The output is decoded, and based on the output of the third alignment decoder and the output of the third encoder, parameters of the third alignment decoder and parameters of the third alignment encoder are updated.
在一种可能的实现方式中,上述处理模块具体用于:通过第一对齐编码器对第一编码器输出的第一语句对应的向量进行编码、通过第二对齐编码器对第二编码器输出的第二语句对应的向量进行编码以及通过第三对齐编码器对第三编码器输出的第三语句进行编码,第一语句、第二语句和第三语句是使用三种语言表述的具有相同语义的语句;基于第一对齐编码器的输出、第二对齐编码器的输出和第三对齐编码器的输出,更新第一对齐编码器的参数、第二对齐编码器的参数和第三对齐编码器的参数;通过第一对齐解码器对第一对齐编码器的输出进行解码,基于第一对齐解码器的输出和第一编码器的输出,更新第一对齐解码器的参数和第一对齐编码器的参数;通过第二对齐解码器对第二对齐编码器的输出进行解码,基于第二对齐解码器的输出和第二编码器的输出,更新第二对齐解码器的参数和第二对齐编码器的参数;通过第三对齐解码器对第三对齐编码器的输出进行解码,基于第三对齐解码器的输出和第三编码器的输出,更新第三对齐解码器的参数和第三对齐编码器的参数。In a possible implementation, the above processing module is specifically configured to: encode the vector corresponding to the first sentence output by the first encoder through the first alignment encoder, and encode the vector corresponding to the first sentence output by the second alignment encoder through the second alignment encoder. The vector corresponding to the second sentence is encoded and the third sentence output by the third encoder is encoded through the third alignment encoder. The first sentence, the second sentence and the third sentence are expressed in three languages and have the same semantics. statement; based on the output of the first alignment encoder, the output of the second alignment encoder, and the output of the third alignment encoder, update the parameters of the first alignment encoder, the parameters of the second alignment encoder, and the third alignment encoder parameters; decode the output of the first alignment encoder through the first alignment decoder, and update the parameters of the first alignment decoder and the first alignment encoder based on the output of the first alignment decoder and the output of the first encoder. parameters; decode the output of the second alignment encoder through the second alignment decoder, and update the parameters of the second alignment decoder and the second alignment encoder based on the output of the second alignment decoder and the output of the second encoder. parameters; decode the output of the third alignment encoder through the third alignment decoder, and update the parameters of the third alignment decoder and the third alignment encoder based on the output of the third alignment decoder and the output of the third encoder. parameters.
第五方面,本申请提供一种计算设备,包括处理器和存储器;所述存储器用于存储指令,所述处理器用于执行所述指令,当所述处理器执行所述指令时,所述处理器执行如第一方面或第一方面任一种可能实现方式中的训练方法。In a fifth aspect, the present application provides a computing device, including a processor and a memory; the memory is used to store instructions, the processor is used to execute the instructions, and when the processor executes the instructions, the processing The machine executes the training method in the first aspect or any possible implementation manner of the first aspect.
第六方面,本申请提供一种计算设备,包括处理器和存储器;所述存储器用于存储指令,所述处理器用于执行所述指令,当所述处理器执行所述指令时,所述处理器执行如第二方面或第二方面任一种可能实现方式中的翻译方法。In a sixth aspect, the present application provides a computing device, including a processor and a memory; the memory is used to store instructions, the processor is used to execute the instructions, and when the processor executes the instructions, the processing The processor executes the translation method in the second aspect or any possible implementation manner of the second aspect.
第七方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当所述指令在服务器上运行时,使得服务器执行第一方面或第一方面任一种可能实现方式中的训练方法。In a seventh aspect, the present application provides a computer-readable storage medium that stores instructions. When the instructions are run on a server, they cause the server to execute the first aspect or any one of the first aspects. Training methods among possible implementations.
第八方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当所述指令在服务器上运行时,使得服务器执行第二方面或第二方面任一种可能实现方式中的翻译方法。In an eighth aspect, the present application provides a computer-readable storage medium that stores instructions. When the instructions are run on a server, they cause the server to execute the second aspect or any one of the second aspects. Translation methods among possible implementations.
附图说明Description of the drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application, which are of great significance to this field. Ordinary technicians can also obtain other drawings based on these drawings without exerting creative work.
图1为本申请实施例提供的一种翻译***的示意图;Figure 1 is a schematic diagram of a translation system provided by an embodiment of the present application;
图2为本申请实施例提供的一种第一单语言模型的示意图;Figure 2 is a schematic diagram of a first single language model provided by an embodiment of the present application;
图3为本申请实施例提供的一种翻译***训练过程的示意图;Figure 3 is a schematic diagram of a translation system training process provided by an embodiment of the present application;
图4为本申请实施例提供的一种翻译模型的示意图;Figure 4 is a schematic diagram of a translation model provided by an embodiment of the present application;
图5为本申请实施例提供的另一种翻译模型的示意图;Figure 5 is a schematic diagram of another translation model provided by an embodiment of the present application;
图6是本申请实施例提供的另一种翻译***的示意图;Figure 6 is a schematic diagram of another translation system provided by an embodiment of the present application;
图7是本申请实施例提供的另一种翻译***训练过程的示意图;Figure 7 is a schematic diagram of another translation system training process provided by an embodiment of the present application;
图8是本申请实施例提供的另一种翻译模型的示意图;Figure 8 is a schematic diagram of another translation model provided by an embodiment of the present application;
图9是本申请实施例提供的另一种翻译模型的示意图;Figure 9 is a schematic diagram of another translation model provided by an embodiment of the present application;
图10是本申请实施例提供的一种翻译***的训练方法的流程示意图;Figure 10 is a schematic flowchart of a training method for a translation system provided by an embodiment of the present application;
图11是本申请实施例提供的一种计算设备的示意图;Figure 11 is a schematic diagram of a computing device provided by an embodiment of the present application;
图12是本申请实施例提供的一种计算设备集群的示意图。Figure 12 is a schematic diagram of a computing device cluster provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in this application will be described below with reference to the accompanying drawings.
如图1所示,图1是本申请实施例提供的一种翻译***的示意图,图1中以翻译***能够实现两种语言的相互翻译为例,介绍本申请提供的翻译***的训练方法。翻译***包括第一语言模型和第二语言模型。第一语言模型包括第一编码器、第一对齐编码器、第一对齐解码器和第一解码器;第二语言模型包括第二编码器、第二对齐编码器、第二对齐解码器和第二解码器。下面结合附图详细介绍对图1所示的翻译***的训练方法。As shown in Figure 1, Figure 1 is a schematic diagram of a translation system provided by an embodiment of the present application. Figure 1 takes the translation system that can realize mutual translation of two languages as an example to introduce the training method of the translation system provided by the present application. The translation system includes a first language model and a second language model. The first language model includes a first encoder, a first aligned encoder, a first aligned decoder, and a first decoder; the second language model includes a second encoder, a second aligned encoder, a second aligned decoder, and a first decoder. Two decoders. The training method for the translation system shown in Figure 1 will be introduced in detail below with reference to the accompanying drawings.
在对该翻译***进行训练时,首先需要对第一语言对应的第一单语言模型和第二语言对应的第二单语言模型进行预训练。其中,第一单语言模型包括上述第一编码器和上述第一解码器,第二单语言模型包括上述第二编码器和上述第二解码器。When training the translation system, it is first necessary to pre-train the first single-language model corresponding to the first language and the second single-language model corresponding to the second language. Wherein, the first single language model includes the above first encoder and the above first decoder, and the second single language model includes the above second encoder and the above second decoder.
在对第一单语言模型进行预训练时,如图2所示,首先获取第一语言的训练数据,将第一语言的一个输入序列(p 1,p 2,…,p k)输入第一编码器,第一编码器对输入的序列中的每个token进行编码,输出该输入序列对应的向量集合P={P 1,P 2,…,P k}。其中,输入序列是第一语言的训练数据中一个句子经过预处理后对应的序列,每个输入序列包括多个token,每个token对应向量集合中的一个向量,上述预处理包括分词。然后将该向量集合P输入到第一解码器,第一解码器对输入的向量集合中的每个向量进行解码,输出上述输入序列对应的输出序列
Figure PCTCN2022137877-appb-000001
通过输入序列和输出序列计算损失函数值,基于该损失函数值更新第一单语言模型的参数;或者根据一个批次(batch)的训练数据的损失函数值更新第一单语言模型的参数。基于第一语言的训练数据,通过多次迭代训练对上述第一单语言模型进行训练,直至第一单语言模型达到收敛条件,得到训练好的第一单语言模型。对于第二单语言模型,根据上述第一单语言模型相同的训练方法,基于第二语言的训练数据,通过多次迭代训练对上述第二单语言模型进行训练,直至第二单语言模型达到收敛条件,得到训练好的第二单语言模型。
When pre-training the first single language model, as shown in Figure 2, first obtain the training data of the first language, and input an input sequence (p 1 , p 2 ,..., p k ) of the first language into the first language model. Encoder, the first encoder encodes each token in the input sequence and outputs the vector set P={P 1 , P 2 ,..., P k } corresponding to the input sequence. The input sequence is a preprocessed sequence corresponding to a sentence in the training data of the first language. Each input sequence includes multiple tokens, and each token corresponds to a vector in the vector set. The above preprocessing includes word segmentation. Then the vector set P is input to the first decoder. The first decoder decodes each vector in the input vector set and outputs the output sequence corresponding to the above input sequence.
Figure PCTCN2022137877-appb-000001
Calculate the loss function value through the input sequence and the output sequence, and update the parameters of the first single language model based on the loss function value; or update the parameters of the first single language model based on the loss function value of a batch of training data. Based on the training data of the first language, the above-mentioned first single-language model is trained through multiple iterative trainings until the first single-language model reaches the convergence condition, and the trained first single-language model is obtained. For the second single-language model, according to the same training method as the first single-language model, based on the training data of the second language, the above-mentioned second single-language model is trained through multiple iterative trainings until the second single-language model reaches convergence. Conditions to obtain the trained second single language model.
在对第一单语言模型和第二单语言模型进行预训练得到训练好的第一单语言模型和训练好的第二单语言模型后,在第一单语言模型的第一编码器和第一解码器之间增加第一对齐编码器和第一对齐解码器,在第二单语言模型的第二编码器和第二解码器之间增加第二对齐编码器和第二对齐解码器,得到上述翻译***。再根据平行语料对翻译***中的第一对齐编码器、第一对齐解码器、第二对齐编码器和第二对齐解码器进行训练,得到训练好的能够进行两种语言相互翻译的翻译***。其中,平行语料包括第一语言和第二语言的同义语句的集合,即平行语料中包括第一语言的语句和第二语言的语句,第一语言的每个语句均对应有具有相同语义的第二语言的翻译。After pre-training the first single language model and the second single language model to obtain the trained first single language model and the trained second single language model, the first encoder and the first single language model of the first single language model are A first aligned encoder and a first aligned decoder are added between the decoders, and a second aligned encoder and a second aligned decoder are added between the second encoder and the second decoder of the second single language model to obtain the above translation system. Then, the first alignment encoder, the first alignment decoder, the second alignment encoder and the second alignment decoder in the translation system are trained based on the parallel corpus, and a trained translation system capable of mutual translation between two languages is obtained. Among them, the parallel corpus includes a collection of synonymous sentences in the first language and the second language, that is, the parallel corpus includes sentences in the first language and sentences in the second language. Each sentence in the first language corresponds to a sentence with the same semantics. Second language translation.
如图3所示,在对上述翻译***进行训练时,首先将平行语料中第一语言的第一语句输入第一编码器,将平行语料中第二语言的第二语句输入第二编码器。其中,第一语句与第二语句是具有相同语义的语句的两种语言的表达。第一编码器对上述第一语句进行编码,输出第一语句对应的第一向量集合X,第二编码器对上述第二语句进行编码之后,输出第二语句对应的第二向量集合Y。其中,第一向量集合X中的向量属于第一解码器的输入对应的向量空间,第二向量集合Y中的向量属于第二解码器的输入对应的向量空间。第一编码器和第二编码器的输入输出能够表示为如下公式:As shown in Figure 3, when training the above translation system, first the first sentence in the first language in the parallel corpus is input into the first encoder, and the second sentence in the second language in the parallel corpus is input into the second encoder. Wherein, the first sentence and the second sentence are expressions in two languages of sentences with the same semantics. The first encoder encodes the first statement and outputs the first vector set X corresponding to the first statement. The second encoder encodes the second statement and outputs the second vector set Y corresponding to the second statement. The vectors in the first vector set X belong to the vector space corresponding to the input of the first decoder, and the vectors in the second vector set Y belong to the vector space corresponding to the input of the second decoder. The input and output of the first encoder and the second encoder can be expressed as the following formula:
Figure PCTCN2022137877-appb-000002
Figure PCTCN2022137877-appb-000002
其中,Enc表示第一编码器和第二编码器执行的编码操作,{x 1,x 2,…,x n}表示第一语句对应的第一序列,{y 1,y 2,…,y m}表示第二语句对应的第二序列,即第一序列包括n个token,第二序列包括m个token;{X 1,X 2,…,X n}表示第一向量集合X,{Y 1,Y 2,…,Y m}表示第二向量集合Y;第一序列中的n个token与第一向量集合X中的n个向量一一对应,第二序列中的m个token与第二向量集合Y中的m个向量一一对应,即X 1是x 1对应的特征向量,X i是x i对应的特征向量,Y 1是y 1对应的特征向量,Y j是y j对应的特征向量,x i与Y j是维度相同的向量,i是小于或等于n的正整数,j是小于或等于m的正整数。 Among them, Enc represents the encoding operation performed by the first encoder and the second encoder, {x 1 ,x 2 ,…,x n } represents the first sequence corresponding to the first statement, {y 1 ,y 2 ,…,y m } represents the second sequence corresponding to the second statement, that is, the first sequence includes n tokens, and the second sequence includes m tokens; {X 1 ,X 2 ,…,X n } represents the first vector set X, {Y 1 ,Y 2 ,…,Y m } represents the second vector set Y; the n tokens in the first sequence correspond to the n vectors in the first vector set X, and the m tokens in the second sequence correspond to the n vectors in the first vector set X. The m vectors in the two-vector set Y have a one-to-one correspondence, that is, X 1 is the feature vector corresponding to x 1 , X i is the feature vector corresponding to x i , Y 1 is the feature vector corresponding to y 1 , and Y j is the feature vector corresponding to y j The eigenvectors of x i and Y j are vectors with the same dimensions, i is a positive integer less than or equal to n, and j is a positive integer less than or equal to m.
需要说明的是,由于对第一语句和第二语句是不同语言的语句,并通过不同的编码器进行编码得到各自对应的向量集合,因此第一向量集合X中的向量和第二向量集合Y中的向量是属于不同向量空间的向量。It should be noted that since the first statement and the second statement are statements in different languages and are encoded by different encoders to obtain corresponding vector sets, the vectors in the first vector set X and the second vector set Y The vectors in are vectors belonging to different vector spaces.
在第一编码器在输出第一向量集合X以及第二编码器在输出第二向量集合Y之后,对第一向量集合X和第二向量集合Y进行对齐操作,第一向量集合X进行对齐操作之后得到向量集合X L={X 1,X 2,…X n,X n+1,…X L},第二向量集合Y进行对齐操作之后得到向量集合Y L={Y 1,Y 2,…Y m,Y m+1,…Y L},使得向量集合X L和向量集合Y L具有相同数量的向量。例如对向量集合X和向量集合Y分别进行padding操作,将向量集合X和向量集合Y均转换为具有L个特征向量的集合。其中,padding操作可以是在第一向量集合X后补充L-n个全0向量,在第二向量集合Y后补充L-m个全0向量;L大于或等于n,L大于或等于m。L能够根据平行语料中的训练数据进行设置,例如L可以是平行语料中最长序列包括的token的数量,L也可以是上述第一单语言模型或第二单语言模型能够编码或解码的最长序列包括的token的数量,即单语言模型一次允许输入的最长序列包括的token的数量。 After the first encoder outputs the first vector set X and the second encoder outputs the second vector set Y, an alignment operation is performed on the first vector set X and the second vector set Y, and the first vector set X performs an alignment operation. Afterwards , the vector set X L = {X 1 , X 2 , ... …Y m ,Y m+1 ,…Y L }, so that the vector set X L and the vector set Y L have the same number of vectors. For example, perform a padding operation on the vector set X and the vector set Y respectively, and convert both the vector set X and the vector set Y into a set with L feature vectors. Among them, the padding operation can be to supplement Ln all-zero vectors after the first vector set X, and to supplement Lm all-zero vectors after the second vector set Y; L is greater than or equal to n, and L is greater than or equal to m. L can be set according to the training data in the parallel corpus. For example, L can be the number of tokens included in the longest sequence in the parallel corpus. L can also be the longest sequence that can be encoded or decoded by the first single language model or the second single language model. The number of tokens included in a long sequence is the number of tokens included in the longest sequence that a single language model allows to be input at one time.
将第一向量集合X和第二向量集合Y对齐之后,再将向量集合X L输入到第一对齐编码器,将向量集合Y L输入到第二对齐编码器,第一对齐编码器将向量集合X L中的向量转换到对齐向量空间,第二对齐编码器将向量集合Y L中的向量转换到上述对齐向量空间,将向量集合X L中的向量和向量集合Y L中的向量都映射到对齐向量空间。第一对齐编码器对向量集合X L中的各个向量进行重新编码,得到第三向量集合X a,第二对齐编码器对向量集合Y L中的各个向量进行重新编码,得到第四向量集合Y a。其中,第三向量结合X a中的向量和第四向量集合Y a中的向量都是对齐向量空间的向量。第一对齐编码器和第二对齐编码器执行的操作能够表示为如下(公式2): After aligning the first vector set X and the second vector set Y, the vector set X L is input to the first alignment encoder, and the vector set Y L is input to the second alignment encoder. The first alignment encoder The vectors in _ Align vector space. The first alignment encoder recodes each vector in the vector set X L to obtain the third vector set X a , and the second alignment encoder recodes each vector in the vector set Y L to obtain the fourth vector set Y a . Among them, the vectors in the third vector combination X a and the vectors in the fourth vector set Y a are all vectors aligned in the vector space. The operations performed by the first alignment encoder and the second alignment encoder can be expressed as follows (Equation 2):
Figure PCTCN2022137877-appb-000003
Figure PCTCN2022137877-appb-000003
其中,AliEnc表示第一对齐编码器和第二对齐编码器执行的编码操作,
Figure PCTCN2022137877-appb-000004
表示第三向量集合X a
Figure PCTCN2022137877-appb-000005
表示第四向量集合Y a;向量集合X L中的向量与第三向量集合X a中的向量一一对应,向量集合Y L中的向量与第四向量集合Y a中的向量一一对应。
Among them, AliEnc represents the encoding operation performed by the first alignment encoder and the second alignment encoder,
Figure PCTCN2022137877-appb-000004
represents the third vector set X a ,
Figure PCTCN2022137877-appb-000005
represents the fourth vector set Y a ; the vectors in the vector set X L correspond to the vectors in the third vector set X a one-to-one, and the vectors in the vector set Y L correspond to the vectors in the fourth vector set Y a one-to-one.
在通过上述第一对齐编码器得到第三向量集合X a,通过第二对齐编码器得到第四向量集合Y a之后,根据第三向量集合X a和第四向量集合Y a计算语义对齐损失L a,语义对齐损失用于更新第一对齐编码器和第二对齐编码器的参数,其中,语义对齐损失满足如下(公式3)。 After obtaining the third vector set X a through the above-mentioned first alignment encoder and obtaining the fourth vector set Y a through the second alignment encoder, the semantic alignment loss L is calculated according to the third vector set X a and the fourth vector set Y a a , the semantic alignment loss is used to update the parameters of the first alignment encoder and the second alignment encoder, where the semantic alignment loss satisfies the following (Formula 3).
Figure PCTCN2022137877-appb-000006
Figure PCTCN2022137877-appb-000006
其中,
Figure PCTCN2022137877-appb-000007
是第四向量集合Y a中的第i个向量,
Figure PCTCN2022137877-appb-000008
是第三向量集合X a中的第i个向量。
in,
Figure PCTCN2022137877-appb-000007
is the i-th vector in the fourth vector set Y a ,
Figure PCTCN2022137877-appb-000008
is the i-th vector in the third vector set X a .
在通过上述第一对齐编码器得到第三向量集合X a,通过第二对齐编码器得到第四向量集合Y a之后,再将上述第三向量集合X a输入第一对齐解码器,将第四向量集合Y a输入第二对齐解码器,第一对齐解码器将第一对齐编码器的输出转换到第一解码器的输入对应的向量空间,第二对齐解码器将第二对齐编码器的输出转换到第二解码器的输入对应的向量空间。第一对齐解码器对第三向量集合X a中的各个向量进行转换,得到第五向量集合X d,第二对齐解码器对第四向量集合Y a中的各个向量进行转换,得到第六向量集合Y d。第一对齐解码器和第二对齐解码器执行的操作能够表示为如下(公式4): After the third vector set X a is obtained through the above-mentioned first alignment encoder, and the fourth vector set Y a is obtained through the second alignment encoder, the above-mentioned third vector set X a is input into the first alignment decoder, and the fourth vector set Y a is The vector set Y a is input to the second alignment decoder. The first alignment decoder converts the output of the first alignment encoder into the vector space corresponding to the input of the first decoder. The second alignment decoder converts the output of the second alignment encoder. Convert to the vector space corresponding to the input of the second decoder. The first alignment decoder converts each vector in the third vector set X a to obtain the fifth vector set X d , and the second alignment decoder converts each vector in the fourth vector set Y a to obtain the sixth vector Set Y d . The operations performed by the first alignment decoder and the second alignment decoder can be expressed as follows (Equation 4):
Figure PCTCN2022137877-appb-000009
Figure PCTCN2022137877-appb-000009
其中,AliDec表示第一对齐解码器和第二对齐解码器执行的解码操作。
Figure PCTCN2022137877-appb-000010
表示第五向量集合X d
Figure PCTCN2022137877-appb-000011
表示第六向量集合Y d
Where, AliDec represents the decoding operation performed by the first alignment decoder and the second alignment decoder.
Figure PCTCN2022137877-appb-000010
represents the fifth vector set X d ,
Figure PCTCN2022137877-appb-000011
represents the sixth vector set Y d .
在得到第五向量集合X d和第六向量集合Y d之后,需要根据第一向量集合X和第五向量集合X d计算第一语言模型对应的第一自编码损失,以及根据第二向量集合Y和第六向量集合Y d计算第二语言模型对应的自编码损失。其中,计算第一语言模型对应的第一自编码损失的目的是更新第一语言模型中第一对齐编码器和第一对齐解码器中的参数,以使第一语言模型中的第一对齐解码器输出的向量集合与第一编码器输出的向量集合的差异尽可能小;计算第二语言模型对应的第二自编码损失的目的是更新第二语言模型中第二对齐编码器和第二对齐解码器中的参数,以使第二语言模型的第二对齐解码器输出的向量集合与第二编码器输出的 向量集合的差异尽可能小。 After obtaining the fifth vector set X d and the sixth vector set Y d , it is necessary to calculate the first autoencoding loss corresponding to the first language model based on the first vector set Y and the sixth vector set Y d calculate the autoencoding loss corresponding to the second language model. The purpose of calculating the first autoencoding loss corresponding to the first language model is to update the parameters in the first alignment encoder and the first alignment decoder in the first language model so that the first alignment decoding in the first language model The difference between the vector set output by the encoder and the vector set output by the first encoder is as small as possible; the purpose of calculating the second autoencoding loss corresponding to the second language model is to update the second alignment encoder and the second alignment in the second language model Parameters in the decoder such that the difference between the set of vectors output by the second aligned decoder of the second language model and the set of vectors output by the second encoder is as small as possible.
由于第一语言模型的第一编码器输出的第一向量集合X只包括n个向量,第二语言模型的第一编码器输出的第二向量集合Y只包括m个向量,因此需要对第一对齐解码器输出的第五向量集合X d和第二对齐编码器输出的第六向量集合Y d中的L个向量进行处理,删除第五向量集合X d的目标位置上的向量和第六向量集合Y d的目标位置上的向量。其中,目标位置是指上述对齐操作时增加的向量所在的位置。例如,上述对齐操作是在第一向量集合X后补充L-n个全0向量,则上述第五向量集合X d的目标位置为后面L-n个位置。 Since the first vector set X output by the first encoder of the first language model only includes n vectors, and the second vector set Y output by the first encoder of the second language model only includes m vectors, it is necessary to L vectors in the fifth vector set X d output by the alignment decoder and the sixth vector set Y d output by the second alignment encoder are processed, and the vectors and sixth vectors at the target position of the fifth vector set X d are deleted. The vector at the target position of the set Y d . Among them, the target position refers to the position of the vector added during the above alignment operation. For example, the above alignment operation is to supplement Ln all-zero vectors after the first vector set X, then the target position of the above fifth vector set Xd is the following Ln positions.
删除第五向量集合X d的目标位置上的向量和第六向量集合Y d的目标位置上的向量后,得到向量集合
Figure PCTCN2022137877-appb-000012
和向量结合
Figure PCTCN2022137877-appb-000013
根据第一向量集合X和向量集合X r计算第一语言模型对应的第一自编码损失,以及根据第二向量集合Y和向量集合Y r计算第二语言模型对应的第二自编码损失。其中,第一语言模型对应的第一自编码损失L sx和第二语言模型对应的第二自编码损失L sy的计算方法如下(公式5)所示。
After deleting the vectors at the target position of the fifth vector set X d and the vectors at the target position of the sixth vector set Y d , the vector set is obtained
Figure PCTCN2022137877-appb-000012
combined with vector
Figure PCTCN2022137877-appb-000013
The first autoencoding loss corresponding to the first language model is calculated based on the first vector set X and the vector set X r , and the second autoencoding loss corresponding to the second language model is calculated based on the second vector set Y and the vector set Y r . Among them, the calculation method of the first autoencoding loss L sx corresponding to the first language model and the second autoencoding loss L sy corresponding to the second language model is as follows (Formula 5).
Figure PCTCN2022137877-appb-000014
Figure PCTCN2022137877-appb-000014
其中,
Figure PCTCN2022137877-appb-000015
是向量集合X r中的第i个向量,X i是第一向量集合X中的第i个向量;
Figure PCTCN2022137877-appb-000016
是向量集合Y r中的第i个向量,Y i是第二向量集合Y中的第i个向量。
in,
Figure PCTCN2022137877-appb-000015
is the i-th vector in the vector set X r , and Xi is the i-th vector in the first vector set X;
Figure PCTCN2022137877-appb-000016
is the i-th vector in the vector set Y r , and Y i is the i-th vector in the second vector set Y.
根据一组平行语料(包括上述第一语句和第二语句),能够通过上述方法计算得到一句平行语料对应的语义对齐损失和自编码损失(包括第一自编码损失和第二自编码损失)。然后基于一组平行语料计算得到的语义对齐损失和第一自编码损失,更新第一语言模型中第一对齐编码器的参数,并根据第一自编码损失更新第一对齐解码器的参数;或者根据一个批次(batch)的平行语料对应的语义对齐损失和第一自编码损失,更新第一语言模型中第一对齐编码器的参数,并根据一个批次的平行语料对应的第一自编码损失更新第一对齐解码器的参数。基于一组平行语料计算得到的语义对齐损失和第二自编码损失,更新第二语言模型中第二对齐编码器的参数,并根据第二自编码损失更新第二对齐解码器的参数;或者根据一个批次(batch)的平行语料对应的语义对齐损失和第二自编码损失,更新第二语言模型中第二对齐编码器的参数;并根据一个批次的平行语料对应的第二自编码损失更新第二对齐解码器的参数。通过多次迭代训练对第一语言模型中的第一对齐编码器和第一对齐解码器进行训练,以及对第二语言模型中的第二对齐编码器和第二对齐解码器进行训练,直至第一语言模型和第二语言模型达到收敛条件,得到训练好的第一语言模型和训练好的第二语言模型。According to a set of parallel corpus (including the first sentence and the second sentence mentioned above), the semantic alignment loss and autoencoding loss (including the first autoencoding loss and the second autoencoding loss) corresponding to a sentence of parallel corpus can be calculated by the above method. Then update the parameters of the first alignment encoder in the first language model based on the semantic alignment loss and the first autoencoding loss calculated from a set of parallel corpus, and update the parameters of the first alignment decoder based on the first autoencoding loss; or According to the semantic alignment loss and the first autoencoding loss corresponding to a batch of parallel corpus, update the parameters of the first alignment encoder in the first language model, and according to the first autoencoding loss corresponding to a batch of parallel corpus The loss updates the parameters of the first aligned decoder. Update the parameters of the second alignment encoder in the second language model based on the semantic alignment loss and the second autoencoding loss calculated based on a set of parallel corpora, and update the parameters of the second alignment decoder based on the second autoencoding loss; or based on Update the parameters of the second alignment encoder in the second language model based on the semantic alignment loss and second autoencoding loss corresponding to a batch of parallel corpus; and based on the second autoencoding loss corresponding to a batch of parallel corpus Update the parameters of the second alignment decoder. The first aligned encoder and the first aligned decoder in the first language model are trained through multiple iterations of training, and the second aligned encoder and the second aligned decoder in the second language model are trained until the The first language model and the second language model reach the convergence condition, and the trained first language model and the trained second language model are obtained.
通过上述方法训练得到训练好的第一语言模型和训练好的第二语言模型,即得到训练好的能够实现两种语言之间相互翻译的翻译***。在通过该翻译***进行翻译的时,如果要将第一语言翻译为第二语言,则将上述第一编码器、第一对齐编码器、第二对齐解码器和第二解码器组成如图4所示的第一语言到第二语言的翻译模型。在进行翻译时,将以第一语言表述的第一源语句输入第一编码器,第一编码器对上述第一源语句进行编码后输出第一源语句对应的向量集合。其中,第一编码器输出的向量集合中的每个向量是第一解码器的输入对应 的向量空间的向量。然后将第一源语句对应的向量集合输入到第一对齐编码器,第一对齐编码器将第一源语句对应的向量集合中的每个向量映射到对齐向量空间,得到第一特征向量集合。再将第一特征向量集合输入到第二对齐解码器,第二对齐解码器再将第一特征向量集合中对齐向量空间的向量映射到第二解码器的输入对应的向量空间。最后将得到的第二解码器的输入对应的向量空间的向量输入到第二解码器进行解码,输出对第一源语句进行翻译后得到的第一源语句对应的第二语言的翻译语句。Through the above method, a trained first language model and a trained second language model are obtained, that is, a trained translation system that can realize mutual translation between two languages is obtained. When translating through this translation system, if the first language is to be translated into a second language, the above-mentioned first encoder, first alignment encoder, second alignment decoder and second decoder are composed as shown in Figure 4 First language to second language translation model shown. When translating, the first source sentence expressed in the first language is input to the first encoder. The first encoder encodes the first source sentence and outputs a vector set corresponding to the first source sentence. Wherein, each vector in the vector set output by the first encoder is a vector in the vector space corresponding to the input of the first decoder. Then, the vector set corresponding to the first source sentence is input to the first alignment encoder, and the first alignment encoder maps each vector in the vector set corresponding to the first source sentence to the alignment vector space to obtain the first feature vector set. The first feature vector set is then input to the second alignment decoder, and the second alignment decoder maps the vectors in the alignment vector space in the first feature vector set to the vector space corresponding to the input of the second decoder. Finally, the vector of the vector space corresponding to the input of the second decoder is input to the second decoder for decoding, and the translated sentence of the second language corresponding to the first source sentence obtained after translating the first source sentence is output.
如果要将第二语言翻译为第一语言,则将上述第二编码器、第二对齐编码器、第一对齐解码器和第一解码器组成如图5所示的第二语言到第一语言的翻译模型。在进行翻译时,将以第二语言表述的第二源语句输入第二编码器,第二编码器对上述第二源语句进行编码后输出第二源语句对应的向量集合。其中,第二编码器输出的向量集合中的每个向量是第二解码器的输入对应的向量空间的向量。然后将第二源语句对应的向量集合输入到第二对齐编码器,第二对齐编码器将第二源语句对应的向量集合中的每个向量映射到对齐向量空间,得到第二特征向量集合。再将第二特征向量集合输入到第一对齐解码器,第一对齐解码器再将第二特征向量集合中对齐向量空间的向量映射到第一解码器的输入对应的向量空间。最后将得到的第一解码器的输入对应的向量空间的向量输入到第一解码器进行解码,输出对第二源语句进行翻译后得到的第二源语句对应的第一语言的翻译语句。If you want to translate the second language into the first language, the above-mentioned second encoder, second alignment encoder, first alignment decoder and first decoder are composed of the second language to the first language as shown in Figure 5 translation model. When translating, the second source sentence expressed in the second language is input to the second encoder. The second encoder encodes the second source sentence and outputs a vector set corresponding to the second source sentence. Wherein, each vector in the vector set output by the second encoder is a vector in the vector space corresponding to the input of the second decoder. Then, the vector set corresponding to the second source sentence is input to the second alignment encoder, and the second alignment encoder maps each vector in the vector set corresponding to the second source sentence to the alignment vector space to obtain a second feature vector set. The second feature vector set is then input to the first alignment decoder, and the first alignment decoder maps the vectors in the alignment vector space in the second feature vector set to the vector space corresponding to the input of the first decoder. Finally, the vector in the vector space corresponding to the input of the first decoder is input to the first decoder for decoding, and the translated sentence in the first language corresponding to the second source sentence obtained after translating the second source sentence is output.
上面结合图1-图5详细介绍了如何训练能够实现两种语言相互翻译的翻译***和如何运用训练好的翻译***进行两种语言的翻译。应理解,对于更多种语言之间相互翻译的翻译***,同样能够通过上述方法进行训练及应用。下面以三种语言的翻译***为例,介绍能够实现三种语言相互翻译的翻译***的训练和应用。The above combined with Figures 1 to 5 introduces in detail how to train a translation system that can realize mutual translation between two languages and how to use the trained translation system to translate between two languages. It should be understood that translation systems for mutual translation between more languages can also be trained and applied through the above method. The following takes the translation system of three languages as an example to introduce the training and application of a translation system that can realize mutual translation of three languages.
如图6所示,当需要训练能够进行三种语言之间相互翻译的翻译***时,翻译***包括第一语言模型、第二语言模型和第三语言模型。第一语言模型包括第一编码器、第一对齐编码器、第一对齐解码器和第一解码器;第二语言模型包括第二编码器、第二对齐编码器、第二对齐解码器和第二解码器;第三语言模型包括第三编码器、第三对齐编码器、第三对齐解码器和第三解码器。As shown in Figure 6, when it is necessary to train a translation system that can perform mutual translation between three languages, the translation system includes a first language model, a second language model, and a third language model. The first language model includes a first encoder, a first aligned encoder, a first aligned decoder, and a first decoder; the second language model includes a second encoder, a second aligned encoder, a second aligned decoder, and a first decoder. Two decoders; the third language model includes a third encoder, a third aligned encoder, a third aligned decoder and a third decoder.
在对三种语言的翻译***进行训练时,首先需要对第一语言对应的第一单语言模型、第二语言对应的第二单语言模型以及第三语言对应的第三单语言模型进行预训练。其中,第一单语言模型和第二单语言模型的训练方法可以参照上述图2相关的描述,在此不再赘述。第三单语言模型包括上述第三编码器和第三解码器。对于第三单语言模型,根据上述第一单语言模型相同的训练方法,基于第三语言的训练数据,通过多次迭代训练对上述第三单语言模型进行训练,直至第三单语言模型达到收敛条件,得到训练好的第三单语言模型。When training a three-language translation system, it is first necessary to pre-train the first single-language model corresponding to the first language, the second single-language model corresponding to the second language, and the third single-language model corresponding to the third language. . For the training method of the first single-language model and the second single-language model, reference can be made to the description related to Figure 2 above, and will not be described again here. The third single language model includes the above-mentioned third encoder and third decoder. For the third single-language model, according to the same training method as the above-mentioned first single-language model, based on the training data of the third language, the above-mentioned third single-language model is trained through multiple iterative training until the third single-language model reaches convergence. Conditions to obtain the trained third single language model.
在对第一单语言模型、第二单语言模型和第三单语言模型进行预训练得到训练好的第一单语言模型、训练好的第二单语言模型和训练好的第三单语言模型之后,在第一单语言模型的第一编码器和第一解码器之间增加第一对齐编码器和第一对齐解码器,在第二单语言模型的第二编码器和第二解码器之间增加第二对齐编码器和第二对齐解码器,在第三单语言模型的第三编码器和第三解码器之间增加第三对齐编码器和第三对齐解码器,得到能够实现三种语言之间相互翻译的翻译***。再根据平行语料对翻译***中的第一对齐编码器和第一对齐解码器、第二对齐编码器和第二对齐解码器以及第三对齐编码器和第三对齐解码器进行训练,得到训练好的能够实现三种语言相互翻译的翻译***。其中,平行语料是包括第一语言、第 二语言和第三语言的同义语句的集合,即平行语料中包括第一语言的语句、第二语言的语句以及第三语料的语句,第一语言的每个语句均对应有具有相同语义的第二语言的翻译,同时对应有具有相同语义的第三语言的翻译。After pre-training the first single language model, the second single language model and the third single language model to obtain the trained first single language model, the trained second single language model and the trained third single language model , add a first aligned encoder and a first aligned decoder between the first encoder and the first decoder of the first single language model, and add a first aligned encoder and a first aligned decoder between the second encoder and the second decoder of the second single language model. Add a second aligned encoder and a second aligned decoder, and add a third aligned encoder and a third aligned decoder between the third encoder and the third decoder of the third single language model to achieve three languages. Translation system that translates between each other. Then, the first aligned encoder and the first aligned decoder, the second aligned encoder and the second aligned decoder, and the third aligned encoder and the third aligned decoder in the translation system are trained based on the parallel corpus, and the training is obtained A translation system that can realize mutual translation between three languages. Among them, the parallel corpus is a collection of synonymous sentences including the first language, the second language and the third language, that is, the parallel corpus includes sentences in the first language, sentences in the second language and sentences in the third language. The first language Each sentence of corresponds to a translation in the second language with the same semantics, and also corresponds to a translation in the third language with the same semantics.
如图7所示,在对该翻译***进行训练时,首先将平行语料中第一语言的第一语句输入第一编码器,将平行语料中第二语言的第二语句输入第二编码器,将平行语料中第三语言的第三语句输入第三编码器。其中,第一语句、第二语句以及第三语句是具有相同语义的语句的三种语言的表达。第一语句在输入第一编码器之后,第一语言模型对第一语句的处理过程与上述训练两种语言相互翻译的翻译***时的处理过程相同,第二语句在输入第二编码器之后,第二语言模型对第二语句的处理过程与上述训练两种语言相互翻译的翻译***时的处理过程相同。As shown in Figure 7, when training the translation system, first input the first sentence of the first language in the parallel corpus into the first encoder, and input the second sentence of the second language in the parallel corpus into the second encoder. Enter the third sentence of the third language in the parallel corpus into the third encoder. Among them, the first sentence, the second sentence and the third sentence are expressions of sentences with the same semantics in three languages. After the first sentence is input into the first encoder, the processing process of the first sentence by the first language model is the same as the above-mentioned processing process when training a translation system for mutual translation between two languages. After the second sentence is input into the second encoder, The processing process of the second sentence by the second language model is the same as the above-mentioned processing process when training the translation system for mutual translation between two languages.
对于第三语言模型,将第三语句输入第三编码器,第三编码器对第三语句进行编码,输出第三语句对应的第七向量集合Z={Z 1,Z 2,…,Z t},对第七向量集合Z进行对齐操作,将第七向量集合Z转换为对包括L个向量的集合,得到向量集合Z L,实现第一向量集合X、第二向量集合Y以及第七向量集合Z的对齐。然后将向量集合Z L输入至第三对齐编码器,第三对齐编码器将向量集合Z L中的向量映射到对齐向量空间,得到第八向量集合
Figure PCTCN2022137877-appb-000017
Figure PCTCN2022137877-appb-000018
For the third language model, the third sentence is input to the third encoder, the third encoder encodes the third sentence, and outputs the seventh vector set Z={Z 1 , Z 2 ,..., Z t corresponding to the third sentence }, perform an alignment operation on the seventh vector set Z, convert the seventh vector set Z into a set including L vectors, obtain the vector set Z L , and realize the first vector set X, the second vector set Y and the seventh vector Alignment of set Z. Then the vector set Z L is input to the third alignment encoder, and the third alignment encoder maps the vectors in the vector set Z L to the alignment vector space to obtain the eighth vector set
Figure PCTCN2022137877-appb-000017
Figure PCTCN2022137877-appb-000018
在通过第三对齐编码器得到第八向量集合Z a之后,根据第三向量集合X a、第四向量集合Y a和第八向量集合Z a计算语义对齐损失La,语义对齐损失用于更新第一对齐编码器、第二对齐编码器和第三对齐编码器的参数,其中,当翻译***是实现三种语言相互翻译的***时,语义对齐损失满足如下(公式6)。 After the eighth vector set Z a is obtained through the third alignment encoder, the semantic alignment loss La is calculated according to the third vector set X a , the fourth vector set Y a and the eighth vector set Z a , and the semantic alignment loss is used to update the Parameters of the first alignment encoder, the second alignment encoder and the third alignment encoder. When the translation system is a system that realizes mutual translation of three languages, the semantic alignment loss satisfies the following (Formula 6).
Figure PCTCN2022137877-appb-000019
Figure PCTCN2022137877-appb-000019
其中,
Figure PCTCN2022137877-appb-000020
是第四向量集合Z a中的第i个向量。
in,
Figure PCTCN2022137877-appb-000020
is the i-th vector in the fourth vector set Z a .
通过第三对齐编码器得到第八向量集合Z a之后,再将上述第八向量集合Z a输入第三对齐解码器,第三对齐解码器对第八向量集合Z a中的各个向量进行转换,得到第九向量集合
Figure PCTCN2022137877-appb-000021
从而将对齐向量空间的向量映射到第三解码器的输入对应的向量空间。
After the eighth vector set Z a is obtained through the third alignment encoder, the above-mentioned eighth vector set Z a is input to the third alignment decoder, and the third alignment decoder converts each vector in the eighth vector set Z a , Get the ninth vector set
Figure PCTCN2022137877-appb-000021
Thus, the vectors in the aligned vector space are mapped to the vector space corresponding to the input of the third decoder.
由于第三语言模型的第三编码器输出的第七向量集合Z只包括t个向量,因此需要对第三对齐解码器输出的第九向量集合Z d中的L个向量进行处理,删除向量集合Z d的目标位置上的向量。其中,目标位置是指对上述第七向量集合Z进行对齐操作时增加的向量所在的位置。例如,上述对齐操作是在第七向量集合Z后补充L-t个全0向量,则上述第九向量集合Z d的目标位置为后面L-t个位置。 Since the seventh vector set Z output by the third encoder of the third language model only includes t vectors, it is necessary to process L vectors in the ninth vector set Z d output by the third alignment decoder and delete the vector set Z d vector at the target position. The target position refers to the position of the vector added when performing the alignment operation on the seventh vector set Z. For example, the above-mentioned alignment operation is to supplement Lt all-zero vectors after the seventh vector set Z, then the target position of the above-mentioned ninth vector set Zd is the following Lt positions.
删除第九向量集合Z d的目标位置上的向量后,得到向量集合
Figure PCTCN2022137877-appb-000022
根据第七向量集合Z和向量集合Z r计算第三语言模型对应的第三自编码损失。其中,第一语言模型对应的第一自编码损失L sx和第二语言模型对应的第二自编码损失L sy的计算方法如上述(公式5)所示,第三语言模型的自编码损失L sz的计算方法如下(公式7)所示。
After deleting the vectors at the target position of the ninth vector set Z d , the vector set is obtained
Figure PCTCN2022137877-appb-000022
Calculate the third autoencoding loss corresponding to the third language model according to the seventh vector set Z and the vector set Z r . Among them, the calculation method of the first autoencoding loss L sx corresponding to the first language model and the second autoencoding loss L sy corresponding to the second language model is as shown in the above (Formula 5). The autoencoding loss L of the third language model The calculation method of sz is as follows (Formula 7).
Figure PCTCN2022137877-appb-000023
Figure PCTCN2022137877-appb-000023
其中,
Figure PCTCN2022137877-appb-000024
是向量集合Z r中的第i个向量,X i是第七向量集合Z中的第i个向量。
in,
Figure PCTCN2022137877-appb-000024
is the i-th vector in the vector set Z r , and Xi is the i-th vector in the seventh vector set Z.
根据一组平行语料(包括上述第一语句、第二语句和第三语句),能够通过上述方法计算得到一组平行语料对应的语义对齐损失和自编码损失(包括第一自编码损失、第二自编码损失和第三自编码损失)。然后基于一组平行语料计算得到的语义对齐损失和第一自编码损失,更新第一语言模型中第一对齐编码器的参数,并根据第一自编码损失更新第一对齐解码器的参数;或者根据一个批次(batch)的平行语料对应的语义对齐损失和第一自编码损失,更新第一语言模型中第一对齐编码器的参数,并根据一个批次的平行语料对应的第一自编码损失更新第一对齐解码器的参数。基于一组平行语料计算得到的语义对齐损失和第二自编码损失,更新第二语言模型中第二对齐编码器的参数,并根据第二自编码损失更新第二对齐解码器的参数;或者根据一个批次(batch)的平行语料对应的语义对齐损失和第二自编码损失,更新第二语言模型中第二对齐编码器的参数;并根据一个批次的平行语料对应的第二自编码损失更新第二对齐解码器的参数。基于一组平行语料计算得到的语义对齐损失和第三自编码损失,更新第三语言模型中第三对齐编码器的参数,并根据第三自编码损失更新第三对齐解码器的参数;或者根据一个批次(batch)的平行语料对应的语义对齐损失和第三自编码损失,更新第三语言模型中第三对齐编码器的参数;并根据一个批次的平行语料对应的第三自编码损失更新第三对齐解码器的参数。According to a set of parallel corpora (including the above-mentioned first sentence, second sentence and third sentence), the semantic alignment loss and autoencoding loss corresponding to a set of parallel corpora (including the first autoencoding loss, the second autoencoding loss) can be calculated by the above method. Autoencoding loss and tertiary autoencoding loss). Then update the parameters of the first alignment encoder in the first language model based on the semantic alignment loss and the first autoencoding loss calculated from a set of parallel corpus, and update the parameters of the first alignment decoder based on the first autoencoding loss; or According to the semantic alignment loss and the first autoencoding loss corresponding to a batch of parallel corpus, update the parameters of the first alignment encoder in the first language model, and according to the first autoencoding loss corresponding to a batch of parallel corpus The loss updates the parameters of the first aligned decoder. Update the parameters of the second alignment encoder in the second language model based on the semantic alignment loss and the second autoencoding loss calculated based on a set of parallel corpora, and update the parameters of the second alignment decoder based on the second autoencoding loss; or based on Update the parameters of the second alignment encoder in the second language model based on the semantic alignment loss and second autoencoding loss corresponding to a batch of parallel corpus; and based on the second autoencoding loss corresponding to a batch of parallel corpus Update the parameters of the second alignment decoder. Update the parameters of the third alignment encoder in the third language model based on the semantic alignment loss and the third autoencoding loss calculated based on a set of parallel corpus, and update the parameters of the third alignment decoder based on the third autoencoding loss; or based on Update the parameters of the third alignment encoder in the third language model based on the semantic alignment loss and third autoencoding loss corresponding to a batch of parallel corpus; and based on the third autoencoding loss corresponding to a batch of parallel corpus Update the parameters of the third alignment decoder.
通过多次迭代训练对第一语言模型中的第一对齐编码器和第一对齐解码器进行训练,对第二语言模型中的第二对齐编码器和第二对齐解码器进行训练,以及对第三语言模型中的第三对齐编码器和第三对齐解码器进行训练,直至第一语言模型、第二语言模型和第三语言模型达到收敛条件,得到训练好的第一语言模型、训练好的第二语言模型和训练好的第三语言模型。The first aligned encoder and the first aligned decoder in the first language model are trained through multiple iterations of training, the second aligned encoder and the second aligned decoder in the second language model are trained, and the first aligned encoder and the second aligned decoder in the second language model are trained. The third aligned encoder and the third aligned decoder in the three-language model are trained until the first language model, the second language model, and the third language model reach convergence conditions, and the trained first language model and the trained The second language model and the trained third language model.
需要说明的是,如果是在训练好的两种语言相互翻译的翻译***的基础上,再训练上述能够进行三种语言相互翻译的翻译***,可以不更新第一语言模型和第二语言模型的参数,仅对第三单语言模型进行预训练,然后再对第三对齐编码器和第三对齐解码器进行训练。即,不需要再对第一单语言模型和第二单语言模型进行训练,在对第三单语言模型进行训练之后,在通过平行语料训练第一语言模型、第二语言模型和第三语言模型,仅通过上述方法计算得到语义对齐损失和第三自编码损失,然后根据语义对齐损失和第三自编码损失更新第三对齐编码器的参数,根据第三自编码损失更新第三对齐解码器的参数。应理解,在训练好的两种语言相互翻译的翻译***的基础上,再训练上述能够进行三种语言相互翻译的翻译***,也可以再次根据语义对齐损失和第一自编码损失更新第一对齐编码器的参数,根据第一自编码损失更新第一对齐解码器的参数;并再次根据语义对齐损失和第二自编码损失更新第二对齐编码器的参数,根据第二自编码损失更新第二对齐解码器的参数;同时根据语义对齐损失和第三自编码损失更新第三对齐编码器的参数,根据第三自编码损失更新第三对齐解码器的参数。It should be noted that if the above-mentioned translation system capable of mutual translation between three languages is trained based on the trained translation system for mutual translation between two languages, the first language model and the second language model do not need to be updated. Parameters, only the third single language model is pre-trained, and then the third aligned encoder and third aligned decoder are trained. That is, there is no need to train the first single language model and the second single language model. After training the third single language model, the first language model, the second language model and the third language model are trained through parallel corpora. , only the semantic alignment loss and the third auto-encoding loss are calculated through the above method, then the parameters of the third alignment encoder are updated according to the semantic alignment loss and the third auto-encoding loss, and the parameters of the third alignment decoder are updated according to the third auto-encoding loss. parameter. It should be understood that on the basis of the trained translation system for mutual translation between two languages, the above-mentioned translation system capable of mutual translation between three languages is then trained, and the first alignment can also be updated again based on the semantic alignment loss and the first autoencoding loss. The parameters of the encoder, the parameters of the first aligned decoder are updated according to the first autoencoding loss; and the parameters of the second aligned encoder are updated again according to the semantic alignment loss and the second autoencoding loss, and the parameters of the second aligned decoder are updated according to the second autoencoding loss. Align the parameters of the decoder; at the same time, update the parameters of the third alignment encoder according to the semantic alignment loss and the third autoencoding loss, and update the parameters of the third alignment decoder according to the third autoencoding loss.
通过上述方法训练得到训练好的第一语言模型、训练好的第二语言模型和训练好的第三 语言模型,即得到训练好的能够实现三种语言之间相互翻译的翻译***。在通过该翻译***进行翻译的时,实现第一语言和第二语言之间相互翻译的方法可以参照上述图4和图5相关的描述。如果要将第一语言翻译为第二语言,可以参照上述图4相关的描述,实现第一语言到第二语言的翻译。如果要将第二语言翻译为第一语言,可以参照上述图5相关的描述,实现第二语言到第一语言的翻译。应理解,能够实现三种语言之间相互翻译的翻译***还能够实现第一语言和第三语言之间的相互翻译,以及第二语言与第三语言之间的相互翻译。Through the above method, the trained first language model, the trained second language model and the trained third language model are obtained, that is, a trained translation system that can realize mutual translation between the three languages is obtained. When translating through the translation system, the method of realizing mutual translation between the first language and the second language may refer to the descriptions related to Figures 4 and 5 above. If you want to translate the first language into the second language, you can refer to the description related to Figure 4 above to implement the translation from the first language to the second language. If you want to translate the second language into the first language, you can refer to the description related to Figure 5 above to implement the translation from the second language to the first language. It should be understood that a translation system that can realize mutual translation between three languages can also realize mutual translation between the first language and the third language, and the mutual translation between the second language and the third language.
本申请实施例中,实现第一语言和第三语言之间相互翻译以及第二语言和第三语言之间相互翻译的方法,与上述实现第一语言和第二语言之间相互翻译的方法相似。例如,要将第三语言翻译为第一语言,则将上述第三编码器、第三对齐编码器、第一对齐解码器和第一解码器组成如图8所示的第三语言到第一语言的翻译模型。在进行翻译时,将以第三语言表述的第三源语句输入第三编码器,第三编码器对上述第三源语句进行编码后输出第三源语句对应的向量集合。其中,第三编码器输出的向量集合中的每个向量是第三解码器的输入对应的向量空间的向量。然后将第三源语句对应的向量集合输入到第三对齐编码器,第三对齐编码器将第三源语句对应的向量集合中的每个向量映射到对齐向量空间,得到第三特征向量集合。再将第三特征向量集合输入到第一对齐解码器,第一对齐解码器将第三特征向量集合中对齐向量空间的向量映射到第一解码器的输入对应的向量空间。最后将得到的第一解码器的输入对应的向量空间的向量输入到第一解码器进行解码,输出对第三源语句进行翻译后得到的第三源语句对应的第一语言的翻译语句。In the embodiment of the present application, the method for realizing mutual translation between the first language and the third language and the mutual translation between the second language and the third language is similar to the above-mentioned method for realizing mutual translation between the first language and the second language. . For example, to translate the third language into the first language, the above-mentioned third encoder, the third alignment encoder, the first alignment decoder and the first decoder are composed as shown in Figure 8 from the third language to the first language. Language translation model. During translation, the third source sentence expressed in the third language is input to the third encoder. The third encoder encodes the above third source sentence and outputs a vector set corresponding to the third source sentence. Wherein, each vector in the vector set output by the third encoder is a vector in the vector space corresponding to the input of the third decoder. Then, the vector set corresponding to the third source sentence is input to the third alignment encoder, and the third alignment encoder maps each vector in the vector set corresponding to the third source sentence to the alignment vector space to obtain a third feature vector set. The third feature vector set is then input to the first alignment decoder, and the first alignment decoder maps the vectors in the alignment vector space in the third feature vector set to the vector space corresponding to the input of the first decoder. Finally, the vector in the vector space corresponding to the input of the first decoder is input to the first decoder for decoding, and the translated sentence in the first language corresponding to the third source sentence obtained after translating the third source sentence is output.
如果要将第一语言翻译为第三语言,则将上述第一编码器、第一对齐编码器、第三对齐解码器和第三解码器组成如图9所示的第一语言到第三语言的翻译模型。在进行翻译时,将以第一语言表述的第四源语句输入第一编码器,第一编码器对上述第四源语句进行编码后输出第四源语句对应的向量集合。其中,第一编码器输出的向量集合中的每个向量是第一解码器的输入对应的向量空间的向量。然后将第四源语句对应的向量集合输入到第一对齐编码器,第一对齐编码器将第四源语句对应的向量集合中的每个向量映射到对齐向量空间,得到第四特征向量集合。再将第四特征向量集合输入到第三对齐解码器,第三对齐解码器将第四特征向量集合中对齐向量空间的向量映射到第三解码器的输入对应的向量空间。最后将得到的第三解码器的输入对应的向量空间的向量输入到第三解码器进行解码,输出对第四源语句进行翻译后得到的第四源语句对应的第三语言的翻译语句。If the first language is to be translated into a third language, the above-mentioned first encoder, first alignment encoder, third alignment decoder and third decoder are composed of the first language to the third language as shown in Figure 9 translation model. During translation, the fourth source sentence expressed in the first language is input to the first encoder. The first encoder encodes the fourth source sentence and outputs a vector set corresponding to the fourth source sentence. Wherein, each vector in the vector set output by the first encoder is a vector in the vector space corresponding to the input of the first decoder. Then, the vector set corresponding to the fourth source sentence is input to the first alignment encoder, and the first alignment encoder maps each vector in the vector set corresponding to the fourth source sentence to the alignment vector space to obtain a fourth feature vector set. The fourth feature vector set is then input to the third alignment decoder, and the third alignment decoder maps the vectors in the alignment vector space in the fourth feature vector set to the vector space corresponding to the input of the third decoder. Finally, the vector in the vector space corresponding to the input of the third decoder is input to the third decoder for decoding, and the translated sentence in the third language corresponding to the fourth source sentence obtained after translating the fourth source sentence is output.
上面结合图6-图9详细介绍了如何训练能够实现三种语言相互翻译的翻译***和如何运用训练好的翻译***进行三种语言之间的相互翻译。应理解,对于更多种语言的相互翻译的翻译***,同样能够通过上述过程进行训练和应用,并且在训练好的翻译***的基础上增加一种语言的翻译时,训练过程可以参照上述在训练好两种语言相互翻译的翻译***的基础上,训练三种语言相互翻译的翻译***的过程,在此不再一一赘述。The above combined with Figures 6-9 describes in detail how to train a translation system that can realize mutual translation between three languages and how to use the trained translation system to perform mutual translation between three languages. It should be understood that the translation system for mutual translation of more languages can also be trained and applied through the above process, and when adding a language translation based on the trained translation system, the training process can refer to the above training process. On the basis of building a translation system for mutual translation between two languages, the process of training a translation system for mutual translation between three languages will not be repeated here.
通过构建本申请提供的翻译***,在需要实现多种语言之间的相互翻译时,只需要先分别训练各种语言对应的单语言模型;然后在各个单语言模型的编码器和解码器之间增加对齐编码器和对齐解码器,构建本申请提供的翻译***;最后通过平行语料训练构建的翻译***,就可以得到训练好的能够实现多种语言之间相互翻译的翻译***。而在进行翻译时,只需要将上述训练好的翻译***中源语言对应的编码器和对齐编码器以及目标语言对应的解码器和对齐解码器进行组合,即可得到将源语言翻译为目标语言的翻译模型。例如实现A语言和B 语言的相互翻译,只需要将训练好的翻译***中,A语言对应的编码器和对齐编码器以及B语言对应的解码器和对齐解码器进行组合,即可得到将A语言翻译为B语言的模型;将B语言对应的编码器和对齐编码器以及A语言对应的解码器和对齐解码器进行组合,即可得到将B语言翻译为A语言的模型。By building the translation system provided by this application, when it is necessary to realize mutual translation between multiple languages, it is only necessary to first train the single-language models corresponding to each language; and then between the encoder and decoder of each single-language model Add an alignment encoder and an alignment decoder to construct the translation system provided by this application; finally, through the translation system constructed through parallel corpus training, you can obtain a well-trained translation system that can realize mutual translation between multiple languages. When translating, you only need to combine the encoder and alignment encoder corresponding to the source language and the decoder and alignment decoder corresponding to the target language in the above-trained translation system to obtain the translation of the source language into the target language. translation model. For example, to realize the mutual translation of language A and language B, you only need to combine the encoder and alignment encoder corresponding to language A and the decoder and alignment decoder corresponding to language B in the trained translation system to obtain the translation of language A. A model that translates language into language B; by combining the encoder and alignment encoder corresponding to language B and the decoder and alignment decoder corresponding to language A, a model that translates language B into language A can be obtained.
通过使用本申请提供的翻译***,不需要分别训练一种语言到另一种语言的翻译模型,例如,要实现A语言与B语言之间的相互翻译,不需要分别训练将A语言翻译到B语言的模型,以及将B语言翻译到A语言的模型。能够提高翻译***的训练效率,节约计算资源。By using the translation system provided by this application, there is no need to separately train a translation model from one language to another. For example, to achieve mutual translation between language A and language B, there is no need to separately train a translation model from language A to language B. A model of language, and a model that translates language B into language A. It can improve the training efficiency of the translation system and save computing resources.
另外,通过上述训练两种语言相互翻译的***和训练三种语言相互翻译的***可知,在训练好能够实现第一语言和第二语言这两种语言相互翻译的***之后,如果要增加第三语言,实现三种语言之间的相互翻译,只需要单独训练第三语言对应的单语言模型,然后再将第一语言模型、第二语言模型和第三语言模型一起联合训练,即可得到能够实现三种语言之间相互翻译的模型,而不需要分别训练将第一语言翻译为第三语言的模型、将第三语言翻译到第一语言、将第二语言翻译到第三语言的模型以及将第三语言翻译到第二语言的模型。因此使用本申请提供的翻译模型,在进行扩展时,能够减少需要训练的模型数量,提高训练效率,提高***的扩展性。In addition, from the above-mentioned system for training two languages to translate each other and a system for training three languages to translate each other, it can be seen that after training a system that can realize mutual translation between the first language and the second language, if you want to add a third language language, to achieve mutual translation between three languages, you only need to train the single language model corresponding to the third language separately, and then jointly train the first language model, the second language model and the third language model together, you can get the ability Implement a model that translates between three languages without having to separately train a model that translates the first language to a third language, a model that translates a third language to a first language, a model that translates a second language to a third language, and Models for translating third to second language. Therefore, using the translation model provided by this application can reduce the number of models that need to be trained during expansion, improve training efficiency, and improve the scalability of the system.
下面结合附图10介绍本申请实施例提供的翻译***的训练方法,该翻译***包括第一语言模型和第二语言模型,其中,第一语言模型包括第一编码器、第一对齐编码器、第一对齐解码器和第一解码器,第二语言模型包括第二编码器、第二对齐编码器、第二对齐解码器和第二解码器。该方法包括:The training method of the translation system provided by the embodiment of the present application is introduced below with reference to Figure 10. The translation system includes a first language model and a second language model, where the first language model includes a first encoder, a first alignment encoder, The first aligned decoder and the first decoder, the second language model includes a second encoder, a second aligned encoder, a second aligned decoder and a second decoder. The method includes:
S101.获取基于第一语言的语料训练得到的第一语言的第一编码器和第一解码器,获取基于第二语言的语料训练得到的第一语言的第二编码器和第二解码器。S101. Obtain the first encoder and the first decoder of the first language trained based on the corpus of the first language, and obtain the second encoder and second decoder of the first language trained based on the corpus of the second language.
其中,第一编码器用于对第一语言表述的语句进行编码,输出对应的向量,第一解码器用于对输入的向量进行解码,输出第一语言表述的语句;第二编码器用于对第二语言表述的语句进行编码,输出对应的向量,第二解码器用于对输入的向量进行解码,输出第二语言表述的语句。Among them, the first encoder is used to encode the sentence expressed in the first language and output the corresponding vector; the first decoder is used to decode the input vector and output the sentence expressed in the first language; the second encoder is used to encode the sentence expressed in the second language. The sentence expressed in the language is encoded and the corresponding vector is output. The second decoder is used to decode the input vector and output the sentence expressed in the second language.
通过第一语言的训练数据对第一单语言模型进行训练,得到训练好的第一编码器和训练好的第一解码器。通过第二语言的训练数据对第二单语言模型进行训练,得到训练好的第二编码器和训练好的第二解码器。通过第一语言的训练数据对第一单语言模型进行训练,以及通过第二语言的训练数据对第二单语言模型进行训练的训练过程能够参照上述图2相关的描述,在此不在赘述。获取训练好的第一单语言模型和训练好的第二单语言模型之后,能够在第一单语言模型的基础上构建上述第一语言模型,在第二单语言模型的基础上构建上述第二语言模型。The first single language model is trained using the training data of the first language to obtain a trained first encoder and a trained first decoder. The second single-language model is trained using the training data of the second language to obtain a trained second encoder and a trained second decoder. The training process of training the first single-language model using the training data of the first language and training the second single-language model using the training data of the second language can refer to the description related to Figure 2 above, and will not be repeated here. After obtaining the trained first single-language model and the trained second single-language model, the above-mentioned first language model can be constructed based on the first single-language model, and the above-mentioned second single-language model can be constructed based on the second single-language model. Language model.
S102.基于平行语料对第一语言模型和第二语言模型进行联合训练,得到训练好的第一对齐编码器、训练好的第一对齐解码器、训练好的第二对齐编码器和训练好的第二对齐解码器。S102. Jointly train the first language model and the second language model based on parallel corpus to obtain the trained first aligned encoder, the trained first aligned decoder, the trained second aligned encoder and the trained Second alignment decoder.
在获取上述训练好的第一单语言模型和第二单语言模型之后,在第一编码器和第一解码器之间增加第一对齐编码器和第一对齐解码器,在第二编码器与第二解码器之间增加第二对齐编码器和第二对齐解码,得到上述第一语言模型和第二语言模型。然后通过平行语料对第一语言模型和第二语言模型进行训练,得到训练好的能够实现两种语言之间相互翻译的翻译***。其中,通过平行语料对所述第一语言模型和所述第二语言模型进行训练的训练过程可 以参照上述图3相关的描述,在此不再赘述。After obtaining the above-trained first single language model and the second single language model, a first alignment encoder and a first alignment decoder are added between the first encoder and the first decoder, and between the second encoder and the first decoder A second aligned encoder and a second aligned decoder are added between the second decoders to obtain the above-mentioned first language model and second language model. Then, the first language model and the second language model are trained through parallel corpora, and a well-trained translation system that can realize mutual translation between the two languages is obtained. The training process of training the first language model and the second language model through parallel corpora can refer to the description related to Figure 3 above, and will not be described again here.
在得到训练好的第一语言模型和第二语言模型之后,在利用第一语言模型和第二语言模型进行翻译时,可以如图4和图5所示对第一语言模型和第二语言模型包括的编码器和解码器进行组合,实现第一语言到第二语言的翻译或者第二语言到第一语言的翻译。After obtaining the trained first language model and second language model, when using the first language model and the second language model for translation, the first language model and the second language model can be compared as shown in Figure 4 and Figure 5. The included encoder and decoder are combined to realize the translation from the first language to the second language or the translation from the second language to the first language.
本申请提供的翻译***的训练方法还可以用于训练能够实现三种或者三种以上语言相互翻译的翻译***。对能够实现三种语言相互翻译的翻译***的训练方法可以参照上述图6和图7的相关描述,在此不再赘述。The translation system training method provided by this application can also be used to train a translation system that can realize mutual translation of three or more languages. For the training method of the translation system that can realize mutual translation between three languages, please refer to the relevant descriptions of the above-mentioned Figures 6 and 7, and will not be described again here.
对于上述方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明所必须的。For the above method embodiments, for the sake of simple description, they are all expressed as a series of action combinations. However, those skilled in the art should know that the present invention is not limited by the described action sequence. Secondly, those skilled in the art also It should be noted that the embodiments described in the specification are preferred embodiments, and the actions involved are not necessarily necessary for the present invention.
本领域的技术人员根据以上描述的内容,能够想到的其他合理的步骤组合,也属于本发明的保护范围内。其次,本领域技术人员也应该熟悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明所必须的。Based on the above description, those skilled in the art can think of other reasonable step combinations, which also fall within the protection scope of the present invention. Secondly, those skilled in the art should also be familiar with the fact that the embodiments described in the specification are preferred embodiments, and the actions involved are not necessarily necessary for the present invention.
参见图11,图11是本申请实施例提供的一种计算设备的示意图,该计算设备11包括:一个或者多个处理器110、通信接口111以及存储器112,所述处理器110、通信接口111以及存储器112通过总线113相互连接,其中,Referring to Figure 11, Figure 11 is a schematic diagram of a computing device provided by an embodiment of the present application. The computing device 11 includes: one or more processors 110, a communication interface 111, and a memory 112. The processor 110, the communication interface 111 and memory 112 are connected to each other via bus 113, where,
处理器110执行各种操作的具体实现可参照上述图10所示的方法实施例中的具体操作。或者执行图1-图5对应的训练两种语言相互翻译的翻译***的过程和运用训练好的翻译***进行翻译的过程,或者执行图6-图9对应的训练三种语言相互翻译的翻译***的过程和运用训练好的翻译***进行翻译的过程,在此不再赘述。For specific implementation of various operations performed by the processor 110, reference may be made to the specific operations in the method embodiment shown in FIG. 10 above. Or perform the process of training a translation system for mutual translation between two languages and the process of using the trained translation system to perform translation corresponding to Figures 1 to 5, or perform the translation system for training three languages to translate to each other corresponding to Figures 6 to 9 The process of translation and the process of translation using the trained translation system will not be described in detail here.
处理器110可以有多种具体实现形式,例如处理器110可以为中央处理器(central processing unit,CPU)或图像处理器(graphics processing unit,GPU),处理器110还可以是单核处理器或多核处理器。处理器110可以由CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器110也可以单独采用内置处理逻辑的逻辑器件来实现,例如FPGA或数字信号处理器(digital signal processor,DSP)等。The processor 110 can have a variety of specific implementation forms. For example, the processor 110 can be a central processing unit (CPU) or an image processor (graphics processing unit, GPU). The processor 110 can also be a single-core processor or Multi-core processor. The processor 110 may be a combination of a CPU and a hardware chip. The above-mentioned hardware chip can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL) or any combination thereof. The processor 110 can also be implemented solely using a logic device with built-in processing logic, such as an FPGA or a digital signal processor (DSP).
通信接口111可以为有线接口或无线接口,用于与其他模块或设备进行通信,有线接口可以是以太接口、局域互联网络(local interconnect network,LIN)等,无线接口可以是蜂窝网络接口或使用无线局域网接口等。本申请实施例中通信接口111具体可用于接收用户设备上传的训练数据、平行语料等。The communication interface 111 can be a wired interface or a wireless interface, used to communicate with other modules or devices. The wired interface can be an Ethernet interface, a local interconnect network (LIN), etc., and the wireless interface can be a cellular network interface or use Wireless LAN interface, etc. In the embodiment of this application, the communication interface 111 can be specifically used to receive training data, parallel corpus, etc. uploaded by user equipment.
存储器112可以是非易失性存储器,例如,只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。存储器112也可以是易失性存储器,易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步 动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。The memory 112 may be a non-volatile memory, such as read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), Electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The memory 112 may also be a volatile memory, and the volatile memory may be a random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (direct rambus RAM, DR RAM).
存储器112也可用于存储程序代码和数据,以便于处理器110调用存储器112中存储的程序代码执行上述方法实施例中训练翻译***的方法或者应用翻译***进行翻译的方法。此外,计算设备11可能包含相比于图10展示的更多或者更少的组件,或者有不同的组件配置方式。The memory 112 can also be used to store program codes and data, so that the processor 110 calls the program codes stored in the memory 112 to execute the method of training the translation system or the method of applying the translation system for translation in the above method embodiments. Additionally, computing device 11 may contain more or fewer components than shown in FIG. 10 , or have components configured differently.
总线113可以是快捷***部件互连标准(peripheral component interconnect express,PCIe)总线,或扩展工业标准结构(extendedindustrystandardarchitecture,EISA)总线、统一总线(unifiedbus,Ubus或UB)、计算机快速链接(computeexpresslink,CXL)、缓存一致互联协议(cachecoherentinterconnectforaccelerators,CCIX)等。总线113可以分为地址总线、数据总线、控制总线等。总线113除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 113 may be a peripheral component interconnect express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unifiedbus, Ubus or UB), or a computer express link (computeexpresslink, CXL). , Cache Coherent Interconnect for Accelerators (CCIX), etc. The bus 113 can be divided into an address bus, a data bus, a control bus, etc. In addition to the data bus, the bus 113 may also include a power bus, a control bus, a status signal bus, etc. However, for the sake of clear explanation, only one thick line is used in Figure 11, but it does not mean that there is only one bus or one type of bus.
可选地,该计算设备11还可以包括输入/输出接口114,输入/输出接口114连接有输入/输出设备,用于接收输入的信息,输出操作结果。Optionally, the computing device 11 may also include an input/output interface 114 connected with an input/output device for receiving input information and outputting operation results.
由于本申请提供的翻译***中的各个模块可以分布式地部署在同一环境或者不同环境中的多个计算设备11上,因此,本申请还提供一种如图12所示的计算设备集群,该计算设备集群包括多个计算设备11。Since each module in the translation system provided by this application can be deployed in a distributed manner on multiple computing devices 11 in the same environment or in different environments, this application also provides a computing device cluster as shown in Figure 12, which A computing device cluster includes a plurality of computing devices 11.
上述每个计算设备11间通过通信网络建立通信通路。每个计算设备11上运行第一编码器、第一对齐编码器、第一对齐解码器、第一解码器、第二编码器、第二对齐编码器、第二对齐解码器和第二解码器中的任意一个或多个。例如,第一个计算设备11中运行有第一编码器、第一对齐编码器、第一对齐解码器、第一解码器;第二个计算设备11中运行有第二编码器、第二对齐编码器、第二对齐解码器、第二解码器。其中,任一计算设备11可以为云环境中的计算机(例如:服务器),或边缘数据中心中的计算机,或终端计算设备。A communication path is established between each of the above computing devices 11 through a communication network. Each computing device 11 runs a first encoder, a first aligned encoder, a first aligned decoder, a first decoder, a second encoder, a second aligned encoder, a second aligned decoder, and a second decoder. any one or more of them. For example, the first computing device 11 runs a first encoder, a first alignment encoder, a first alignment decoder, and a first decoder; the second computing device 11 runs a second encoder, a second alignment Encoder, second alignment decoder, second decoder. Any computing device 11 may be a computer (for example, a server) in a cloud environment, a computer in an edge data center, or a terminal computing device.
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在处理器上运行时,可以实现上述方法实施例中的方法步骤,所述计算机可读存储介质的处理器在执行上述方法步骤的具体实现可参照上述方法实施例图3所示的具体操作,在此不再赘述。Embodiments of the present application also provide a computer-readable storage medium. Instructions are stored in the computer-readable storage medium. When run on a processor, the method steps in the above method embodiments can be implemented. The computer can For the specific implementation of the steps of the above method performed by the processor that reads the storage medium, reference may be made to the specific operations shown in Figure 3 of the above method embodiment, which will not be described again here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中 心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disks, hard disks, tapes), optical media, or semiconductor media. The semiconductor medium may be a solid state drive (SSD).
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并或删减;本申请实施例***中的模块可以根据实际需要进行划分、合并或删减。The steps in the methods of the embodiments of this application can be sequentially adjusted, merged or deleted according to actual needs; the modules in the system of the embodiments of this application can be divided, merged or deleted according to actual needs.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the present application have been introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method and the core idea of the present application; at the same time, for Those of ordinary skill in the art will have changes in the specific implementation and application scope based on the ideas of the present application. In summary, the content of this description should not be understood as a limitation of the present application.

Claims (17)

  1. 一种翻译***的训练方法,其特征在于,所述翻译***包括第一语言模型和第二语言模型,所述第一语言模型包括第一编码器、第一对齐编码器、第一对齐解码器和第一解码器,所述第二语言模型包括第二编码器、第二对齐编码器、第二对齐解码器和第二解码器,所述方法包括:A training method for a translation system, characterized in that the translation system includes a first language model and a second language model, and the first language model includes a first encoder, a first alignment encoder, and a first alignment decoder. and a first decoder, the second language model includes a second encoder, a second aligned encoder, a second aligned decoder and a second decoder, and the method includes:
    获取基于第一语言的语料训练得到的所述第一语言的第一编码器和所述第一解码器,获取基于第二语言的语料训练得到的所述第一语言的所述第二编码器和所述第二解码器;Obtain the first encoder and the first decoder of the first language trained based on the corpus of the first language, and obtain the second encoder of the first language trained based on the corpus of the second language. and said second decoder;
    基于平行语料对所述第一语言模型和所述第二语言模型进行联合训练,得到训练好的第一对齐编码器、训练好的第一对齐解码器、训练好的第二对齐编码器和训练好的第二对齐解码器,所述平行语料包括所述第一语言和所述第二语言的同义语句的集合;The first language model and the second language model are jointly trained based on the parallel corpus to obtain a trained first aligned encoder, a trained first aligned decoder, a trained second aligned encoder and a trained A good second alignment decoder, the parallel corpus includes a set of synonymous sentences in the first language and the second language;
    其中,所述第一对齐编码器用于将所述第一编码器输出的向量转换到对齐向量空间,所述第二对齐编码器用于将所述第二编码器输出的向量转换到所述对齐向量空间;Wherein, the first alignment encoder is used to convert the vector output by the first encoder into the alignment vector space, and the second alignment encoder is used to convert the vector output by the second encoder into the alignment vector. space;
    所述第一对齐解码器用于将所述第一对齐编码器的输出转换到所述第一解码器的输入对应的向量空间中,以使所述第一解码器对所述第一对齐解码器的输出进行解码,输出第一语言的语句;所述第二对齐解码器用于将所述第二对齐编码器的输出转换到所述第二解码器的输入的向量空间中,以使所述第二解码器对所述第二对齐解码器的输出进行解码,输出第二语言的语句。The first alignment decoder is used to convert the output of the first alignment encoder into a vector space corresponding to the input of the first decoder, so that the first decoder can The output of the second alignment encoder is decoded to output a sentence of the first language; the second alignment decoder is used to convert the output of the second alignment encoder into the vector space of the input of the second decoder, so that the first alignment decoder The second decoder decodes the output of the second alignment decoder and outputs a sentence in the second language.
  2. 根据权利要求1所述的方法,其特征在于,所述基于平行语料对所述第一语言模型和所述第二语言模型进行联合训练,包括:The method of claim 1, wherein the joint training of the first language model and the second language model based on parallel corpus includes:
    通过第一对齐编码器对第一编码器输出的第一语句对应的向量进行编码,以及通过第二对齐编码器对第二编码器输出的第二语句对应的向量进行编码,所述第一语句和所述第二语句是使用两种语言表述的具有相同语义的语句;The vector corresponding to the first sentence output by the first encoder is encoded by the first alignment encoder, and the vector corresponding to the second sentence output by the second encoder is encoded by the second alignment encoder. The first sentence and the second statement is a statement with the same semantics expressed in two languages;
    基于所述第一对齐编码器的输出和所述第二对齐编码器的输出,更新所述第一对齐编码器的参数和所述第二对齐编码器的参数;updating parameters of the first alignment encoder and parameters of the second alignment encoder based on the output of the first alignment encoder and the output of the second alignment encoder;
    通过所述第一对齐解码器对所述第一对齐编码器的输出进行解码,基于所述第一对齐解码器的输出和所述第一编码器的输出,更新所述第一对齐解码器的参数和所述第一对齐编码器的参数;The output of the first alignment encoder is decoded by the first alignment decoder, and the output of the first alignment decoder is updated based on the output of the first alignment decoder and the output of the first encoder. parameters and parameters of the first alignment encoder;
    通过所述第二对齐解码器对所述第二对齐编码器的输出进行解码,基于所述第二对齐解码器的输出和所述第二编码器的输出,更新所述第二对齐解码器的参数和所述第二对齐编码器的参数。The output of the second alignment encoder is decoded by the second alignment decoder, and based on the output of the second alignment decoder and the output of the second encoder, the output of the second alignment decoder is updated. parameters and the parameters of the second alignment encoder.
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述第一对齐编码器的输出和所述第二对齐编码器的输出,更新所述第一对齐编码器的参数和所述第二对齐编码器的参数,包括:The method of claim 2, wherein the parameters of the first alignment encoder and the second alignment encoder are updated based on the output of the first alignment encoder and the output of the second alignment encoder. Parameters of the two-aligned encoder, including:
    基于所述对齐编码器的输出和所述第二对齐编码器的输出,计算语义对齐损失,根据所述语义对齐损失更新所述第一对齐编码器的参数和所述第二对齐编码器的参数。A semantic alignment loss is calculated based on the output of the alignment encoder and the output of the second alignment encoder, and parameters of the first alignment encoder and parameters of the second alignment encoder are updated according to the semantic alignment loss. .
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述翻译***还包括第三语言模型,所述第三语言模型包括第三编码器、第三对齐编码器、第三对齐解码器和第三解码器;所述 方法还包括:The method according to any one of claims 1-3, characterized in that the translation system further includes a third language model, the third language model includes a third encoder, a third alignment encoder, a third alignment decoder and a third decoder; the method further includes:
    获取基于第三语言的语料训练得到的所述第三语言的第三编码器和所述第三解码器;Obtain the third encoder and the third decoder of the third language that are trained based on the corpus of the third language;
    所述通过平行语料对所述第一语言模型和所述第二语言模型进行训练,得到训练好的所述第一语言模型和所述第二语言模型,包括:The training of the first language model and the second language model through parallel corpora to obtain the trained first language model and the second language model includes:
    通过平行语料对所述第一语言模型、所述第二语言模型和所述第三语言模型进行联合训练,得到训练好的所述第一语言模型、训练好的所述第二语言模型和训练好的所述第三语言模型,所述平行语料包括所述第一语言、所述第二语言和所述第三语言的同义语句的集合;The first language model, the second language model and the third language model are jointly trained through parallel corpora to obtain the trained first language model, the trained second language model and the trained For the good third language model, the parallel corpus includes a set of synonymous sentences in the first language, the second language and the third language;
    其中,所述第三对齐编码器用于将所述第三编码器输出的向量转换到所述对齐向量空间;所述第三对齐解码器用于将所述第三对齐编码器的输出转换到所述第三解码器的输入对应的向量空间,以使所述第三解码器对所述第三对齐解码器的输出进行解码,输出第三语言的语句。Wherein, the third alignment encoder is used to convert the vector output by the third encoder into the alignment vector space; the third alignment decoder is used to convert the output of the third alignment encoder into the alignment vector space. The input of the third decoder corresponds to the vector space, so that the third decoder decodes the output of the third alignment decoder and outputs a sentence in the third language.
  5. 根据权利要求4所述的方法,其特征在于,通过平行语料对所述第一语言模型、所述第二语言模型和所述第三语言模型进行联合训练,包括:The method according to claim 4, characterized in that jointly training the first language model, the second language model and the third language model through parallel corpora includes:
    通过第一对齐编码器对第一编码器输出的第一语句对应的向量进行编码、通过第二对齐编码器对第二编码器输出的第二语句对应的向量进行编码以及通过第三对齐编码器对第三编码器输出的第三语句进行编码,所述第一语句、所述第二语句和所述第三语句是使用三种语言表述的具有相同语义的语句;The vector corresponding to the first sentence output by the first encoder is encoded by the first alignment encoder, the vector corresponding to the second sentence output by the second encoder is encoded by the second alignment encoder, and the vector corresponding to the second sentence output by the second encoder is encoded by the third alignment encoder. Encode the third sentence output by the third encoder, where the first sentence, the second sentence and the third sentence are sentences with the same semantics expressed in three languages;
    基于所述第一对齐编码器的输出、所述第二对齐编码器的输出和所述第三对齐编码器的输出,更新所述第三对齐编码器的参数;updating parameters of the third alignment encoder based on the output of the first alignment encoder, the output of the second alignment encoder, and the output of the third alignment encoder;
    通过所述第三对齐解码器对所述第三对齐编码器的输出进行解码,基于所述第三对齐解码器的输出和所述第三编码器的输出,更新所述第三对齐解码器的参数和所述第三对齐编码器的参数。The output of the third alignment encoder is decoded by the third alignment decoder, and the output of the third alignment decoder is updated based on the output of the third alignment decoder and the output of the third encoder. parameters and the parameters of the third alignment encoder.
  6. 一种翻译方法,其特征在于,包括:A translation method, characterized by including:
    通过第一编码器对输入的第一语言的第一源语句进行编码,输出所述第一源语句对应的向量集合,所述第一源语句对应的向量集合中的向量属于第一解码器的输入对应的向量空间;The first encoder encodes the input first source sentence of the first language, and outputs a vector set corresponding to the first source sentence. The vectors in the vector set corresponding to the first source sentence belong to the first decoder. Enter the corresponding vector space;
    通过第一对齐编码器将所述第一源语句对应的向量集合中的向量转换为对齐向量空间中的第一特征向量集合;Convert the vectors in the vector set corresponding to the first source sentence into a first feature vector set in the aligned vector space through a first alignment encoder;
    通过第二对齐解码器将所述第一特征向量集合中的向量转换为第二解码器的输入对应的向量空间的向量;Convert the vectors in the first feature vector set into vectors in the vector space corresponding to the input of the second decoder through the second alignment decoder;
    通过第二解码器对所述第二对齐解码器输出的向量进行解码,输出所述第一源语句对应的所述第二语言的翻译语句。The vector output by the second alignment decoder is decoded by the second decoder, and the translated sentence of the second language corresponding to the first source sentence is output.
  7. 根据权利要求6所述的方法,其特征在于,还包括:The method according to claim 6, further comprising:
    通过第二编码器对输入的所述第二语言的第二源语句进行编码,输出所述第二源语句对应的向量集合,所述第二源语句对应的向量集合中的向量属于所述第二解码器的输入对应的向量空间;The input second source sentence of the second language is encoded by the second encoder, and a vector set corresponding to the second source sentence is output, and the vectors in the vector set corresponding to the second source sentence belong to the first The vector space corresponding to the input of the second decoder;
    通过第二对齐编码器将所述第二解码器的输入对应的向量集合中的向量转换为所述对齐向量空间中的第二特征向量集合;Convert vectors in the vector set corresponding to the input of the second decoder into a second feature vector set in the aligned vector space through a second alignment encoder;
    通过第一对齐解码器将所述第二特征向量集合中的向量转换为所述第一解码器的输入对应的向量空间的向量;Convert vectors in the second feature vector set into vectors in the vector space corresponding to the input of the first decoder through a first alignment decoder;
    通过第一解码器对所述第一解码器的输入对应的向量空间的向量进行解码,输出所述第二源语句对应的所述第一语言的翻译语句。The first decoder decodes the vector in the vector space corresponding to the input of the first decoder, and outputs the translated sentence of the first language corresponding to the second source sentence.
  8. 根据权利要求6或7所述的方法,其特征在于,还包括:The method according to claim 6 or 7, further comprising:
    通过第三编码器对输入的第三语言的第三源语句进行编码,输出所述第三源语句对应的向量集合,所述第三源语句对应的向量集合中的向量属于第三解码器的输入对应的向量空间;The third encoder encodes the input third source sentence in the third language, and outputs a vector set corresponding to the third source sentence. The vectors in the vector set corresponding to the third source sentence belong to the third decoder. Enter the corresponding vector space;
    通过第三对齐编码器将所述第三解码器的输入对应的向量集合中的向量转换为所述对齐向量空间的第三特征向量集合;Convert vectors in the vector set corresponding to the input of the third decoder into a third feature vector set of the aligned vector space through a third alignment encoder;
    通过第一对齐解码器将所述第三特征向量集合中的向量转换为所述第一解码器的输入对应的向量空间的向量;Convert vectors in the third feature vector set into vectors in the vector space corresponding to the input of the first decoder through a first alignment decoder;
    通过第一解码器对所述第一解码器的输入对应的向量空间的向量进行解码,输出所述第三源语句对应的所述第一语言的翻译语句。The first decoder decodes the vector in the vector space corresponding to the input of the first decoder, and outputs the translated sentence of the first language corresponding to the third source sentence.
  9. 根据权利要求6-8任一项所述的方法,其特征在于,还包括:The method according to any one of claims 6-8, further comprising:
    通过所述第一编码器对输入的所述第一语言的第四源语句进行编码,输出所述第四源语句对应的向量集合,所述第四源语句对应的向量集合中的向量属于所述第一解码器的输入对应的向量空间;The first encoder encodes the input fourth source sentence of the first language and outputs a vector set corresponding to the fourth source sentence. The vectors in the vector set corresponding to the fourth source sentence belong to the The vector space corresponding to the input of the first decoder;
    通过所述第一对齐编码器将所述第四源语句对应的向量集合中的向量转换为所述对齐向量空间的第四特征向量集合;Convert vectors in the vector set corresponding to the fourth source sentence into a fourth feature vector set of the aligned vector space through the first alignment encoder;
    通过第三对齐解码器将所述第四特征向量集合中的向量转换为所述第三解码器的输入对应的向量空间的向量;Convert the vectors in the fourth feature vector set into vectors in the vector space corresponding to the input of the third decoder through a third alignment decoder;
    通过第三解码器对所述第三解码器的输入对应的向量空间的向量进行解码,输出所述第四源语句对应的所述第三语言的翻译语句。The third decoder decodes the vector in the vector space corresponding to the input of the third decoder, and outputs the translated sentence in the third language corresponding to the fourth source sentence.
  10. 一种翻译***,其特征在于,所述翻译***包括第一编码器、第一对齐编码器、第二对齐解码器和第二解码器,其中,A translation system, characterized in that the translation system includes a first encoder, a first alignment encoder, a second alignment decoder and a second decoder, wherein,
    所述第一编码器,用于对输入的第一语言的第一源语句进行编码,输出所述第一源语句对应的向量集合,所述第一源语句对应的向量集合中的向量属于第一解码器的输入对应的向量空间;The first encoder is used to encode the input first source sentence of the first language and output a vector set corresponding to the first source sentence. The vectors in the vector set corresponding to the first source sentence belong to the first source sentence. A vector space corresponding to the input of the decoder;
    所述第一对齐编码器,用于将所述第一源语句对应的向量集合中的向量转换为对齐向量空间的第一特征向量集合;The first alignment encoder is used to convert the vectors in the vector set corresponding to the first source sentence into a first feature vector set of the aligned vector space;
    所述第二对齐解码器,用于将所述第一特征向量集合中的向量转换为第二解码器的输入对应的向量空间的向量;The second alignment decoder is used to convert the vectors in the first feature vector set into vectors in the vector space corresponding to the input of the second decoder;
    所述第二解码器,用于对所述第二解码器的输入对应的向量空间的向量进行解码,输出所述第一源语句对应的所述第二语言的翻译语句。The second decoder is used to decode the vector in the vector space corresponding to the input of the second decoder, and output the translation sentence of the second language corresponding to the first source sentence.
  11. 根据权利要求10所述的***,其特征在于,所述***还包括第二编码器、第二对齐解码器、第一对齐解码器和第一解码器,其中,The system of claim 10, further comprising a second encoder, a second alignment decoder, a first alignment decoder and a first decoder, wherein,
    所述第二编码器,用于对输入的所述第二语言的第二源语句进行编码,输出所述第二源语句对应的向量集合,所述第二源语句对应的向量集合中的向量属于所述第二解码器的输入对应的向量空间;The second encoder is used to encode the input second source sentence of the second language, and output a vector set corresponding to the second source sentence, and the vector in the vector set corresponding to the second source sentence Belonging to the vector space corresponding to the input of the second decoder;
    所述第二对齐编码器,用于将所述第二解码器的输入对应的向量集合中的向量转换为所述对齐向量空间的第二特征向量集合;The second alignment encoder is used to convert vectors in the vector set corresponding to the input of the second decoder into a second feature vector set of the alignment vector space;
    所述第一对齐解码器,用于将所述第二特征向量集合中的向量转换为所述第一解码器的输入对应的向量空间的向量;The first alignment decoder is used to convert vectors in the second feature vector set into vectors in the vector space corresponding to the input of the first decoder;
    所述第一解码器,用于对所述第一语言对应的向量空间的向量进行解码,输出所述第二源语句对应的所述第一语言的翻译语句。The first decoder is used to decode the vector in the vector space corresponding to the first language, and output the translation sentence of the first language corresponding to the second source sentence.
  12. 根据权利要求10或11所述的***,其特征在于,所述***还包括第一对齐解码器和第一解码器、第三编码器和第三对齐编码器,其中,The system according to claim 10 or 11, characterized in that the system further includes a first alignment decoder and a first decoder, a third encoder and a third alignment encoder, wherein,
    所述第三编码器,用于对输入的第三语言的第三源语句进行编码,输出所述第三源语句对应的向量集合,所述第三源语句对应的向量集合中的向量属于第三解码器的输入对应的向量空间;The third encoder is used to encode the input third source sentence in the third language, and output a vector set corresponding to the third source sentence. The vectors in the vector set corresponding to the third source sentence belong to the third source sentence. The vector space corresponding to the input of the three decoders;
    所述第三对齐编码器,用于将所述第三解码器的输入对应的向量集合中的向量转换为所述对齐向量空间的第三特征向量集合;The third alignment encoder is used to convert vectors in the vector set corresponding to the input of the third decoder into a third feature vector set of the alignment vector space;
    所述第一对齐解码器,用于将所述第三特征向量集合中的向量转换为所述第一解码器的输入对应的向量空间的向量;The first alignment decoder is used to convert vectors in the third feature vector set into vectors in the vector space corresponding to the input of the first decoder;
    所述第一解码器,用于对所述第一解码器的输入对应的向量空间的向量进行解码,输出所述第三源语句对应的所述第一语言的翻译语句。The first decoder is configured to decode the vector in the vector space corresponding to the input of the first decoder, and output the translation sentence of the first language corresponding to the third source sentence.
  13. 根据权利要求10-12任一项所述的***,其特征在于,所述***还包括第三对齐解码器和第三解码器,其中,The system according to any one of claims 10-12, characterized in that the system further includes a third alignment decoder and a third decoder, wherein,
    所述第一编码器,还用于对输入的所述第一语言的第四源语句进行编码,输出所述第四源语句对应的向量集合,所述第四源语句对应的向量集合中的向量属于所述第一解码器的输入对应的向量空间;The first encoder is also used to encode the input fourth source sentence of the first language and output a vector set corresponding to the fourth source sentence. The vector belongs to the vector space corresponding to the input of the first decoder;
    所述第一对齐编码器,还用于将所述第四源语句对应的向量集合中的向量转换为所述对齐向量空间的第四特征向量集合;The first alignment encoder is also used to convert vectors in the vector set corresponding to the fourth source sentence into a fourth feature vector set of the alignment vector space;
    所述第三对齐解码器,用于将所述第四特征向量集合中的向量转换为所述第三解码器的输入对应的向量空间的向量;The third alignment decoder is used to convert vectors in the fourth feature vector set into vectors in the vector space corresponding to the input of the third decoder;
    所述第三解码器,用于对所述第三解码器的输入对应的向量空间的向量进行解码,输出所述第四源语句对应的所述第三语言的翻译语句。The third decoder is configured to decode the vector in the vector space corresponding to the input of the third decoder, and output the translation sentence in the third language corresponding to the fourth source sentence.
  14. 一种计算设备,其特征在于,包括处理器和存储器;所述存储器用于存储指令,所述处理器用于执行所述指令,当所述处理器执行所述指令时,执行如权利要求1至5任一项所述的方法。A computing device, characterized in that it includes a processor and a memory; the memory is used to store instructions, and the processor is used to execute the instructions. When the processor executes the instructions, the instructions of claims 1 to 1 are executed. The method described in any one of 5.
  15. 一种计算设备,其特征在于,包括处理器和存储器;所述存储器用于存储指令,所述处理器用于执行所述指令,当所述处理器执行所述指令时,执行如权利要求6至9任一项所 述的方法。A computing device, characterized in that it includes a processor and a memory; the memory is used to store instructions, the processor is used to execute the instructions, and when the processor executes the instructions, the execution of claims 6 to The method described in any one of 9.
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时,所述处理器执行如权利要求1至5任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the processor executes the method described in any one of claims 1 to 5. Methods.
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时,所述处理器执行如权利要求6至9任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes as described in any one of claims 6 to 9 Methods.
PCT/CN2022/137877 2022-03-11 2022-12-09 Translation system and training and application methods therefor, and related device WO2023169024A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210244066.XA CN116796760A (en) 2022-03-11 2022-03-11 Translation system, training method and application method thereof and related equipment
CN202210244066.X 2022-03-11

Publications (1)

Publication Number Publication Date
WO2023169024A1 true WO2023169024A1 (en) 2023-09-14

Family

ID=87937158

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137877 WO2023169024A1 (en) 2022-03-11 2022-12-09 Translation system and training and application methods therefor, and related device

Country Status (2)

Country Link
CN (1) CN116796760A (en)
WO (1) WO2023169024A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956045A (en) * 2018-09-26 2020-04-03 北京三星通信技术研究有限公司 Machine translation method, training method, corresponding device and electronic equipment
KR20200075615A (en) * 2018-12-18 2020-06-26 삼성전자주식회사 Method and apparatus for machine translation
CN113297841A (en) * 2021-05-24 2021-08-24 哈尔滨工业大学 Neural machine translation method based on pre-training double-word vectors
CN113688637A (en) * 2020-05-19 2021-11-23 阿里巴巴集团控股有限公司 Model training method and device, computing equipment and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956045A (en) * 2018-09-26 2020-04-03 北京三星通信技术研究有限公司 Machine translation method, training method, corresponding device and electronic equipment
KR20200075615A (en) * 2018-12-18 2020-06-26 삼성전자주식회사 Method and apparatus for machine translation
CN113688637A (en) * 2020-05-19 2021-11-23 阿里巴巴集团控股有限公司 Model training method and device, computing equipment and readable storage medium
CN113297841A (en) * 2021-05-24 2021-08-24 哈尔滨工业大学 Neural machine translation method based on pre-training double-word vectors

Also Published As

Publication number Publication date
CN116796760A (en) 2023-09-22

Similar Documents

Publication Publication Date Title
KR102382499B1 (en) Translation method, target information determination method, related apparatus and storage medium
CN110287461B (en) Text conversion method, device and storage medium
CN108170686B (en) Text translation method and device
CN109902312B (en) Translation method and device, and training method and device of translation model
WO2022127613A1 (en) Translation model training method, translation method, and device
CN116030792B (en) Method, apparatus, electronic device and readable medium for converting voice tone
CN114020950B (en) Training method, device, equipment and storage medium for image retrieval model
CN111738020A (en) Translation model training method and device
WO2022135028A1 (en) Method for connecting tvm and related device
WO2023169024A1 (en) Translation system and training and application methods therefor, and related device
CN114492426A (en) Sub-word segmentation method, model training method, device and electronic equipment
WO2017045142A1 (en) Decoding method and decoding device for ldpc truncated code
WO2023185896A1 (en) Text generation method and apparatus, and computer device and storage medium
CN111178097B (en) Method and device for generating Zhongtai bilingual corpus based on multistage translation model
US20230153550A1 (en) Machine Translation Method and Apparatus, Device and Storage Medium
US20200089774A1 (en) Machine Translation Method and Apparatus, and Storage Medium
CN111475635A (en) Semantic completion method and device and electronic equipment
CN113689866B (en) Training method and device of voice conversion model, electronic equipment and medium
CN114282551B (en) Translation method, translation device, electronic equipment and storage medium
CN113408303B (en) Training and translation method and device for translation model
US20220083745A1 (en) Method, apparatus and electronic device for determining word representation vector
CN113591493B (en) Translation model training method and translation model device
CN108874786A (en) Machine translation method and device
CN111597829B (en) Translation method and device, storage medium and electronic equipment
CN110263352B (en) Method and device for training deep neural machine translation model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22930646

Country of ref document: EP

Kind code of ref document: A1