CN114692652A - Translation model training method and device, and translation method and device - Google Patents

Translation model training method and device, and translation method and device Download PDF

Info

Publication number
CN114692652A
CN114692652A CN202011642541.6A CN202011642541A CN114692652A CN 114692652 A CN114692652 A CN 114692652A CN 202011642541 A CN202011642541 A CN 202011642541A CN 114692652 A CN114692652 A CN 114692652A
Authority
CN
China
Prior art keywords
vector
coding
training
decoding
translation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011642541.6A
Other languages
Chinese (zh)
Inventor
李长亮
郭馨泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Digital Entertainment Co Ltd
Original Assignee
Beijing Kingsoft Digital Entertainment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Digital Entertainment Co Ltd filed Critical Beijing Kingsoft Digital Entertainment Co Ltd
Priority to CN202011642541.6A priority Critical patent/CN114692652A/en
Publication of CN114692652A publication Critical patent/CN114692652A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a translation model training method and device and a translation method and device, wherein the translation model comprises a first encoder, a second encoder and a decoder; the translation model training method comprises the following steps: receiving a training sample, wherein the training sample comprises a training sentence, a target sentence corresponding to the training sentence and a target object corresponding to the training sentence, and the target object comprises a target prefix and/or a target suffix; inputting the target object into the first encoder to perform encoding processing to obtain a first encoding vector; inputting the training sentence and the first coding vector into the second coder for coding to obtain a second coding vector; inputting the target statement and the second coding vector into the decoder for decoding processing to obtain a decoding vector, and calculating a loss value according to the decoding vector; and adjusting parameters of the translation model according to the loss value, and continuing to train the translation model until a training stopping condition is reached.

Description

Translation model training method and device, and translation method and device
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a translation model training method and apparatus, a translation method and apparatus, a computing device, and a computer-readable storage medium.
Background
With the improvement of artificial intelligence technology, neural networks are more and more widely applied, for example, a neural machine translation model is constructed to realize the conversion from a sentence to be translated to a target sentence.
Neural machine translation models typically employ an encoder-decoder architecture to model variable-length input sentences. The encoder realizes the 'understanding' of source language sentences, forms a floating point number vector with a specific dimension, and then the decoder generates translation results of a target language word by word according to the vector. In the prior art, for a translation result of a specified prefix or suffix, a traditional neural machine translation model usually adopts engineering means to force to fix the first vectors or the last vectors of a model decoding stage. Although fixed prefix or suffix translation results may be generated, the translation results of the model may be rendered less straightforward.
Therefore, how to enable the training effect of the translation model with the appointed prefix or suffix to be better and the translation result to be more natural becomes a problem to be solved urgently by technical staff.
Disclosure of Invention
In view of this, embodiments of the present application provide a translation model training method and apparatus, a translation method and apparatus, a computing device, and a computer-readable storage medium, so as to solve technical defects in the prior art.
According to a first aspect of embodiments of the present application, there is provided a translation model training method, where the translation model includes a first encoder, a second encoder, and a decoder;
the translation model training method comprises the following steps:
receiving a training sample, wherein the training sample comprises a training sentence, a target sentence corresponding to the training sentence and a target object corresponding to the training sentence, and the target object comprises a target prefix and/or a target suffix;
inputting the target object into the first encoder to perform encoding processing to obtain a first encoding vector;
inputting the training sentence and the first coding vector into the second coder for coding to obtain a second coding vector;
inputting the target statement and the second coding vector into the decoder for decoding processing to obtain a decoding vector, and calculating a loss value according to the decoding vector;
and adjusting parameters of the translation model according to the loss value, and continuing to train the translation model until a training stopping condition is reached.
Optionally, the first encoder includes a first embedded layer and x object encoding layers, where x is a positive integer greater than or equal to 1;
the inputting the target object into the first encoder for encoding processing to obtain a first encoding vector includes:
inputting the target object into the first embedding layer for embedding processing to obtain a target object vector, and inputting the target object vector into a first object coding layer for coding processing to obtain a first object coding vector;
inputting the first object coding vector into a second object coding layer for coding to obtain a second object coding vector;
until the x-1 object coding vector is input into the x object coding layer to be coded to obtain a first coding vector.
Optionally, the second encoder includes a second embedded layer and y syntax encoding layers, where y is a positive integer greater than or equal to 1;
the inputting the training sentence and the first coding vector into the second encoder for encoding to obtain a second coding vector includes:
inputting the training sentences to the second embedding layer for embedding processing to obtain training sentence vectors, and inputting the training sentence vectors and the first coding vectors to a first sentence coding layer for coding processing to obtain first sentence coding vectors;
inputting the first statement vector into a second statement coding layer for coding to obtain a second statement coding vector;
until the y-1 coding vector is input into the y statement coding layer for coding processing to obtain a second coding vector.
Optionally, the decoder includes a third embedded layer and z decoded layers, where z is a positive integer greater than or equal to 1;
the inputting the target sentence and the second encoding vector into the decoder for decoding processing to obtain a decoding vector comprises:
inputting the target statement into the third embedding layer for embedding processing to obtain a target statement vector, and inputting the target statement vector and the second coding vector into a first decoding layer for decoding processing to obtain a first decoding vector;
inputting the first decoding vector into a second decoding layer for decoding processing to obtain a second decoding vector;
until the z-1 decoding vector is input into the z decoding layer to be decoded to obtain a decoding vector.
Optionally, wherein calculating a loss value according to the decoding vector includes:
and comparing the decoding vector with a preset standard vector to obtain a loss value of the decoding vector.
Optionally, adjusting parameters of the translation model according to the loss value includes:
and reversely propagating the loss value to sequentially update the decoding parameters of the decoder, the coding parameters of the second coder and the coding parameters of the first coder.
Optionally, the continuing to train the translation model until reaching a training stop condition includes:
and when the loss value is smaller than the target value, stopping training the translation model.
According to a second aspect of embodiments of the present application, there is provided a translation method including:
obtaining a statement to be translated and a specified object, wherein the object comprises a prefix and/or a suffix;
inputting the object into a first encoder of a translation model for encoding to obtain a first encoding vector, wherein the translation model is obtained by training through the translation model training method;
inputting the first coding vector and the statement to be translated into a second coder of a translation model for coding to obtain a second coding vector;
and inputting the second coding vector into a decoder of the translation model for decoding to obtain a target statement.
According to a third aspect of embodiments of the present application, there is provided a training apparatus for a translation model, where the translation model includes a first encoder, a second encoder, and a decoder;
the training device of the translation model comprises:
the training device comprises a receiving module and a processing module, wherein the receiving module is configured to receive a training sample, the training sample comprises a training sentence, a target sentence corresponding to the training sentence and a target object corresponding to the training sentence, and the target object comprises a target prefix and/or a target suffix.
The first encoding module is configured to input the target object into the first encoder to perform encoding processing to obtain a first encoding vector;
a second coding module configured to input the training sentence and the first coding vector into the second encoder for coding processing to obtain a second coding vector;
the decoding module is configured to input the target statement and the second encoding vector into the decoder for decoding processing to obtain a decoding vector, and calculate a loss value according to the decoding vector;
and the training module is configured to adjust parameters of the translation model according to the loss value and continue to train the translation model until a training stopping condition is reached.
According to a fourth aspect of embodiments of the present application, there is provided a translation apparatus including:
the translation module is configured to translate the sentence to be translated into the object to be translated;
a first encoding module, configured to input the object into a first encoder of a translation model for encoding, to obtain a first encoding vector, wherein the translation model is trained by the training method according to any one of claims 1 to 7;
the second coding module is configured to input the first coding vector and the statement to be translated into a second coder of a translation model for coding to obtain a second coding vector;
and the decoding module is configured to input the second coding vector into a decoder of the translation model for decoding to obtain a target statement.
According to a fifth aspect of embodiments of the present application, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the translation model training method or the steps of the translation method when executing the instructions.
According to a sixth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer instructions which, when executed by a processor, implement the translation model training method or the steps of the translation method.
According to a seventh aspect of embodiments of the present application, there is provided a chip storing computer instructions that, when executed by the chip, implement the translation model training method or the steps of the translation method.
In the embodiment of the application, a training sample is received, wherein the training sample comprises a training sentence, a target sentence corresponding to the training sentence and a target object corresponding to the training sentence; inputting the target object into the first encoder to perform encoding processing to obtain a first encoding vector; inputting the training sentence and the first coding vector into the second coder for coding to obtain a second coding vector; inputting the target statement and the second coding vector into the decoder for decoding processing to obtain a decoding vector, and calculating a loss value according to the decoding vector; and adjusting parameters of the translation model according to the loss value, and continuing to train the translation model until a training stopping condition is reached. By improving the translation model and adding the first encoder specially used for encoding the object in the translation model, the auxiliary translation model can better translate the result with the specified object, and the fluency of the translation result is ensured under the condition that the specified object is not lost by the translation model, so that the translation result is more natural.
Drawings
FIG. 1 is a schematic structural diagram of a translation model provided by an embodiment of the present application;
FIG. 2 is a block diagram of a computing device provided by an embodiment of the present application;
FIG. 3 is a flowchart of a translation model training method provided by an embodiment of the present application;
fig. 4A is a schematic structural diagram of a first encoder in a translation model provided in an embodiment of the present application;
FIG. 4B is a schematic structural diagram of a second encoder in the translation model according to the embodiment of the present application;
fig. 4C is a schematic structural diagram of a decoder in the translation model provided in the embodiment of the present application;
FIG. 5 is a flowchart of a translation method provided by an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a training apparatus for translation models according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a translation apparatus according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if," as used herein, may be interpreted as "responsive to a determination," depending on the context.
First, for ease of understanding and without limitation, the terms used in connection with one or more embodiments of the present invention are to be interpreted relative to machine translation tasks in the natural language processing arts:
encoding (encoder): and converting the sentence to be translated into a coding vector from words.
Decoding (decoder): and converting the coding vector into language words of the translation statement.
Training a sentence: and the sentence to be translated is used for the translation model training.
And (3) target sentence: and translating the sentence to be translated into the translated sentence.
Prefix: refers to the first few words or words in a sentence, such as "I'm at home", "I'm at the job of writing", "I'm at rest", such as "I's" in "I's music", "I's swing", "I's my family".
Suffix: refers to the last few words or words in the sentence, such as "beautiful like flower", "really beautiful" and "beautiful" in which the piece of clothing is really beautiful ", such as" she is "in" What a lovely baby she is "and" How lovely shes is ".
In the present application, a translation model training method and apparatus, a translation method and apparatus, a computing device, and a computer-readable storage medium are provided, which are described in detail in the following embodiments one by one.
Fig. 1 shows a schematic structural diagram of a translation model provided in an embodiment of the present application, where the translation model includes a first encoder 102, a second encoder 104, and a decoder 106.
The first encoder 102 includes an input port and an output port. The input port is used for inputting a target object when the translation model is trained, and the input port is used for inputting the target object or a specified object when the translation model is translated into a sentence; the output port is used for outputting a first encoding vector obtained by encoding the target object or the specified object.
The second encoder 104 includes two input ports and one output port. When the translation model is translated, the first input port of the second encoder is used for inputting a sentence to be translated; a second input port in the second encoder is used for inputting the first coding vector; the output port is used for outputting the second coding vector.
The decoder 106 also includes two input ports and one output port. The first input port of the decoder is used for inputting the target statement, the second input port of the decoder is used for inputting the second coding vector, and the output port of the decoder is used for outputting the coding vector.
FIG. 2 shows a block diagram of a computing device 200 according to an embodiment of the present application. The components of the computing device 200 include, but are not limited to, a memory 210 and a processor 220. The processor 220 is coupled to the memory 210 via a bus 230 and the database 250 is used to store data.
Computing device 200 also includes access device 240, access device 240 enabling computing device 200 to communicate via one or more networks 260. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 240 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present application, the above-described components of computing device 200 and other components not shown in FIG. 2 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device structure shown in FIG. 2 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.
Computing device 200 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 200 may also be a mobile or stationary server.
Wherein, the processor 220 may execute the steps in the translation model training method shown in fig. 3. FIG. 3 is a flowchart of a method for training a translation model, such as the translation model shown in FIG. 1, including steps 302-310, according to an embodiment of the present application.
Step 302: receiving a training sample, wherein the training sample comprises a training sentence, a target sentence corresponding to the training sentence and a target object corresponding to the training sentence, and the target object comprises a target prefix and/or a target suffix.
Specifically, the training samples are samples used for training a translation model, each training sample includes a training sentence, a target sentence corresponding to the training sentence, and a target object corresponding to the training sentence, the training sentence is a sentence to be translated, the target sentence is a sentence obtained by standard translation of the training sentence, the target object is a prefix and/or a suffix fixed to a translation sentence obtained by translating the training sentence, and the target object may be a target prefix, a target postfix, or a combination of the target prefix and the target postfix.
In practical applications, the languages of the training sentence and the target sentence may be any natural language languages such as chinese, english, french, italian, japanese, german, and the like, and in the present application, the languages of the training sentence and the target sentence are not limited. In addition, the target object and the target sentence are in the same language.
In the embodiment provided by the application, the obtained training sentences are ' I like painting and singing ', the target sentences corresponding to the training sentences are ' I like drawing and singing ', and the target prefixes corresponding to the training sentences are ' I like drawing; or the obtained training sentence is 'I like painting and sing', the target sentence corresponding to the training sentence is 'I like drawing and I like singing', and the target suffix corresponding to the training sentence is 'I like singing'; or the obtained training sentence is that "I like painting and sing", the target sentence corresponding to the training sentence is "I like drawing, I like singing, too", the target object corresponding to the training sentence comprises a target prefix "I like drawing" and a target suffix "too", that is, the target object is "I like drawing … … too".
Step 304: and inputting the target object into the first encoder to perform encoding processing to obtain a first encoding vector.
Specifically, the first encoder is specially used for encoding the target object, and the first encoding vector is used for information of the target object, so that the translation model can be assisted to obtain a more natural and smooth translation result under the condition of specifying a prefix and/or a suffix.
In the embodiment provided by the application, when a target object corresponding to a training statement is a target prefix "I like drawing", the target prefix "I like drawing" is input into a first encoder for encoding processing; when the target object corresponding to the training sentence is the target suffix 'I like singing', inputting the target suffix 'I like singing' into a first encoder for encoding processing; when the target object corresponding to the training sentence is the target prefix "I like drawing" and the target suffix "too", the "I like drawing … … too" is input to the first encoder for encoding processing.
Further, the first encoder includes a first embedded layer and x object coding layers, where x is a positive integer greater than or equal to 1, and specifically, the first embedded layer, the first object coding layer, and the xth object coding layer are connected in sequence. Therefore, the specific process of inputting the target object into the first encoder for encoding to obtain the first encoded vector may be:
inputting the target object into the first embedding layer for embedding processing to obtain a target object vector, and inputting the target object vector into a first object coding layer for coding processing to obtain a first object coding vector;
inputting the first object coding vector into a second object coding layer for coding to obtain a second object coding vector;
until the x-1 coding vector is input into the x object coding layer to be coded to obtain a first coding vector.
Specifically, the embedding processing refers to a process of representing an object by using a low-dimensional vector, for example, representing a word or an object by using a vector, and the embedded vector has the property that objects corresponding to vectors with close distances can have close meanings, and the characteristic that the embedding can encode the object by using the low-dimensional vector and can also retain the meaning is very suitable for deep learning.
In practical application, referring to fig. 4A, the target object is input into the first embedding layer 402 of the first encoder, embedding processing is performed to generate a target object vector, the target object vector is input into the first object encoding layer 404 for encoding processing, so as to obtain a first object encoding vector, the first object encoding vector is input into the second object encoding layer 406 for encoding processing, so as to obtain a second object encoding vector, and on the basis of obtaining the x-1 object encoding vector, the first object encoding vector is input into the x-1 object encoding layer 408 for encoding processing, and the first encoding vector is output.
Step 306: and inputting the training sentence and the first coding vector into the second coder for coding to obtain a second coding vector.
Specifically, the second encoder is configured to perform encoding processing on the training sentence.
In the embodiment provided by the application, the training sentence 'i like painting and singing',
when a target object corresponding to a training statement is a target prefix 'I like drawing', inputting a first coding vector corresponding to the target prefix 'I like drawing' and a training statement 'I like singing' into a second coder for coding; when a target object corresponding to a training sentence is a target suffix 'I like singing', inputting a first coding vector corresponding to the target suffix 'I like singing' and the training sentence 'I like painting and singing' into a second coder for coding; when the target object corresponding to the training sentence is the target prefix "I like drawing" and the target suffix "too", the target object is "I like drawing … … too", and the first encoding vector corresponding to "I like drawing … … too" and the training sentence "I like singing" are input into the second encoder for encoding processing.
Further, the second encoder includes a second embedded layer and y statement coding layers, y is a positive integer greater than or equal to 1, and specifically, the second embedded layer, the first statement coding layer, and the y statement coding layer are connected in sequence. Therefore, the specific process of inputting the training sentence and the first encoding vector into the second encoder to perform encoding processing to obtain a second encoding vector may be:
inputting the training sentences to the second embedding layer for embedding processing to obtain training sentence vectors, and inputting the training sentence vectors and the first coding vectors to a first sentence coding layer for coding processing to obtain first sentence coding vectors;
inputting the first statement vector into a second statement coding layer for coding to obtain a second statement coding vector;
until the y-1 coding vector is input into the y statement coding layer for coding processing to obtain a second coding vector.
In practical application, referring to fig. 4B, the training sentence is input into the second embedding layer 410 in the second encoder, embedding processing is performed to generate a training sentence vector, the training sentence vector and the first encoding vector are input into the first sentence encoding layer 412 for encoding processing, so as to obtain a first sentence encoding vector, the first sentence encoding vector is input into the second sentence encoding layer 414 for encoding processing, so as to obtain a second sentence encoding vector, and on the basis of obtaining the y-1 st sentence encoding vector, the first sentence encoding vector is input into the y-th sentence encoding layer 416 for encoding processing, so as to output the second encoding vector.
Step 308: and inputting the target statement and the second coding vector into the decoder for decoding processing to obtain a decoding vector, and calculating a loss value according to the decoding vector.
There are many loss functions for calculating the loss value, such as cross entropy loss function, L1 norm loss function, maximum loss function, mean square error loss function, logarithmic loss function, etc., and in the present application, the selection of the loss function for calculating the loss value is not limited.
In the embodiment provided by the application, the obtained training sentence is "i like painting and singing": when the target statement corresponding to the training statement is 'I like drawing and singing' and the target prefix corresponding to the training statement is 'I like drawing', inputting the target statement 'I like drawing and singing' and a second coding vector corresponding to the training statement and the target prefix 'I like drawing' into a decoder for decoding; when the target sentence corresponding to the training sentence is 'I like drawing and I like singing' and the target suffix corresponding to the training sentence is 'I like singing', inputting the target sentence 'I like drawing and I like singing' and the second coding vector corresponding to the training sentence and the target suffix 'I like singing' into a decoder for decoding; when the target sentence corresponding to the training sentence is "I like drawing, I like singing, too", and the target object "I like drawing … … too" corresponding to the training sentence (including the target prefix "I like drawing" and the target suffix "too"), the second encoding vector corresponding to the target sentence "I like drawing, I like singing, too" and the target object "I like drawing … … too" is input to the decoder for decoding processing.
Further, the decoder comprises a third embedded layer and z decoding layers, wherein z is a positive integer greater than or equal to 1, and specifically, the third embedded layer, the first decoding layer and the z decoding layer are sequentially connected. Therefore, the specific process of inputting the target statement and the second encoding vector into the decoder for decoding to obtain a decoding vector may be:
inputting the target statement into the third embedding layer for embedding processing to obtain a target statement vector, and inputting the target statement vector and the second coding vector into a first decoding layer for decoding processing to obtain a first decoding vector;
inputting the first decoding vector into a second decoding layer for decoding processing to obtain a second decoding vector;
until the z-1 decoding vector is input into the z decoding layer to be decoded to obtain a decoding vector.
In practical application, referring to fig. 4C, the target sentence is input into the third embedding layer 418 in the decoder, embedding processing is performed to generate a target sentence vector, the target sentence vector and the second encoding vector are input into the first decoding layer 420 for decoding processing, so as to obtain a first decoding vector, the first decoding vector is input into the second decoding layer 422 for encoding processing, so as to obtain a second decoding vector, and in addition, the first decoding vector and the second decoding vector are input into the z-th decoding layer 424 for decoding processing on the basis of obtaining the z-1 decoding vector, so as to output a decoding vector.
Optionally, calculating a loss value according to the decoding vector includes: and comparing the decoding vector with a preset standard vector to obtain a loss value of the decoding vector.
In this embodiment, when calculating the loss value of the decoding vector, the loss value is calculated by comparing the obtained decoding vector with the standard vector, but the loss value is not calculated by directly comparing the decoding vector with the target term vector. The overfitting of the decoding vector when the loss value is directly compared with the target sentence vector and calculated is avoided, the expression of the translation model in other sentence translations is optimized, and the translation effect is more natural and smooth.
Step 310: and adjusting parameters of the translation model according to the loss value, and continuing to train the translation model until a training stopping condition is reached.
Specifically, when adjusting the parameters of the translation model, the parameters of the translation model need to be adjusted according to the loss value, and the specific implementation process is as follows: and reversely propagating the loss value to sequentially update the decoding parameters of the decoder, the coding parameters of the second coder and the coding parameters of the first coder.
In practical applications, first, the decoding parameters of the decoder are updated: for a decoder comprising z decoding layers, sequentially updating a z decoding layer, a z-1 decoding layer, … … a 2 nd decoding layer and a 1 st decoding layer; secondly, updating the encoding parameters of the second encoder: for an encoder comprising y statement coding layers, sequentially updating a y statement coding layer, a y-1 statement coding layer, … … a 2 nd statement coding layer and a 1 st statement coding layer; finally, the encoding parameters of the first encoder are updated: for an encoder including x object encoding layers, an x-th object encoding layer, an x-1-th object encoding layer, … … a 2 nd object encoding layer, and a 1 st object encoding layer are sequentially updated.
Optionally, the continuing to train the translation model until a training stop condition is reached includes: and when the loss value is smaller than the target value, stopping training the translation model.
In this embodiment, a target value is preset, and when the loss value of the translation model is lower than the preset target value, it indicates that the translation model reaches a translation standard or a certain translation level, that is, a training stop condition is reached, and therefore, training of the translation model is stopped.
In the translation model training method provided by the present specification, a training sample is received, where the training sample includes a training sentence, a target sentence corresponding to the training sentence, and a target object corresponding to the training sentence, and the target object includes a target prefix and/or a target suffix; inputting the target object into the first encoder to perform encoding processing to obtain a first encoding vector; inputting the training sentence and the first coding vector into the second coder for coding to obtain a second coding vector; inputting the target statement and the second encoding vector into the decoder for decoding processing to obtain a decoding vector, and calculating a loss value according to the decoding vector; and adjusting parameters of the translation model according to the loss value, and continuing to train the translation model until a training stopping condition is reached. By improving the translation model and adding the first encoder specially used for encoding the object in the translation model, the auxiliary translation model can better translate the result with the specified object, and the fluency of the translation result is ensured under the condition that the specified object is not lost by the translation model, so that the translation result is more natural.
Fig. 5 shows a flowchart of a translation method according to an embodiment of the present application, including steps 502 to 508.
Step 502: and acquiring a statement to be translated and a specified object, wherein the object comprises a prefix and/or a suffix.
In the embodiment provided by the application, the obtained sentence to be translated is 'I love China and love the world', and the specified prefix is 'I love China'. Or the obtained sentence to be translated is 'love China and love the world', and the specified suffix is 'too'.
Step 504: and inputting the object into a first encoder of a translation model for encoding to obtain a first encoding vector, wherein the translation model is obtained by training through the translation model training method.
In the embodiment provided by the application, the specified prefix "I love China" is input into a first encoder of a translation model for encoding, and a first encoding vector corresponding to the specified prefix "I love China" is obtained. Or inputting the specified suffix "too" into a first encoder of a translation model for encoding, and obtaining a first encoding vector corresponding to the specified suffix "too".
Step 506: and inputting the first coding vector and the statement to be translated into a second coder of a translation model for coding to obtain a second coding vector.
In the embodiment provided by the application, the first coding vector corresponding to the "I love China" and the sentence to be translated "I love China and I love the world" are input into a second coder of a translation model for coding, so as to obtain a second coding vector. Or inputting the first coding vector corresponding to the too and the sentence to be translated, i.e. i love China and i love the world, into a second coder of the translation model for coding to obtain a second coding vector.
Step 508: and inputting the second coding vector into a decoder of the translation model for decoding to obtain a target statement.
In the embodiment provided by the application, when the specified prefix is 'I love China', the second coding vector is input into a decoder of the translation model for decoding, and a target sentence 'I love China and the world' corresponding to the sentence to be translated is obtained. And when the specified object is the specified suffix 'too', inputting the second coding vector into a decoder of the translation model for decoding to obtain a target sentence 'I love China, I love the world' corresponding to the sentence to be translated.
The translation method provided by the specification acquires a statement to be translated and a specified object, wherein the object comprises a prefix and/or a suffix; inputting the object into a first encoder of a translation model for encoding to obtain a first encoding vector, wherein the translation model is obtained by training through the translation model training method; inputting the first coding vector and the statement to be translated into a second coder of a translation model for coding to obtain a second coding vector; and inputting the second coding vector into a decoder of the translation model for decoding to obtain a target statement. And inputting the sentence to be translated and the specified object into a pre-trained translation model to obtain a more smooth and natural target sentence with the specified object.
Corresponding to the above embodiment of the translation model training method, the present application further provides an embodiment of a translation model training apparatus, and fig. 6 shows a schematic structural diagram of the translation model training apparatus according to an embodiment of the present application, where the translation model includes a first encoder, a second encoder, and a decoder. As shown in fig. 6, the apparatus includes:
a receiving module 602 configured to receive a training sample, where the training sample includes a training sentence, a target sentence corresponding to the training sentence, and a target object corresponding to the training sentence, and the target object includes a target prefix and/or a target suffix;
a first encoding module 604, configured to input the target object into the first encoder for encoding processing to obtain a first encoding vector;
a second encoding module 606, configured to input the training sentence and the first encoding vector into the second encoder for encoding processing to obtain a second encoding vector;
a decoding module 608 configured to input the target sentence and the second encoding vector into the decoder for decoding processing to obtain a decoding vector, and calculate a loss value according to the decoding vector;
a training module 610 configured to adjust parameters of the translation model according to the loss value, and continue training the translation model until a training stop condition is reached.
Optionally, the first encoder includes a first embedded layer and x object encoding layers, where x is a positive integer greater than or equal to 1;
the first encoding module 604 is further configured to input the target object to the first embedding layer for embedding processing to obtain a target object vector, input the target object vector to the first object encoding layer for encoding processing to obtain a first object encoding vector, input the first object encoding vector to the second object encoding layer for encoding processing to obtain a second object encoding vector, until the x-1 th object encoding vector is input to the x-th object encoding layer for encoding processing to obtain the first encoding vector.
Optionally, the second encoder includes a second embedded layer and y statement encoding layers, where y is a positive integer greater than or equal to 1;
the second encoding module 606 is further configured to input the training sentence into the second embedding layer for embedding processing to obtain a training sentence vector, input the training sentence vector and the first encoding vector into the first sentence encoding layer for encoding processing to obtain a first sentence encoding vector, input the first sentence vector into the second sentence encoding layer for encoding processing to obtain a second sentence encoding vector, and perform encoding processing until the y-1 th sentence encoding vector is input into the y-th sentence encoding layer for encoding processing to obtain the second encoding vector.
Optionally, the decoder includes a third embedded layer and z decoded layers, where z is a positive integer greater than or equal to 1;
the decoding module 608 is further configured to input the target statement into the third embedding layer for embedding processing to obtain a target statement vector, input the target statement vector and the second coding vector into the first decoding layer for decoding processing to obtain a first decoding vector, input the first decoding vector into the second decoding layer for decoding processing to obtain a second decoding vector, and perform decoding processing until the z-1 decoding vector is input into the z-th decoding layer for decoding processing to obtain a decoding vector.
The decoding module 608 is further configured to compare the decoded vector with a preset standard vector to obtain a loss value of the decoded vector.
The training module 610 is further configured to back-propagate the loss value to sequentially update the decoding parameters of the decoder, the encoding parameters of the second encoder, and the encoding parameters of the first encoder.
The training module 610 is further configured to stop training the translation model when the loss value is less than a target value.
The receiving module is configured to receive a training sample, where the training sample includes a training sentence, a target sentence corresponding to the training sentence, and a target object corresponding to the training sentence, and the target object includes a target prefix and/or a target suffix; the first encoding module is configured to input the target object into the first encoder to perform encoding processing to obtain a first encoding vector; a second encoding module 606, configured to input the training sentence and the first encoding vector into the second encoder for encoding processing to obtain a second encoding vector; a decoding module 608 configured to input the target sentence and the second encoding vector into the decoder for decoding processing to obtain a decoding vector, and calculate a loss value according to the decoding vector; a training module 610 configured to adjust parameters of the translation model according to the loss value, and continue training the translation model until a training stop condition is reached. By improving the translation model and adding the first encoder specially used for encoding the target object in the translation model, the auxiliary translation model can better translate the result with the specified target object, and the fluency of the translation result is ensured under the condition that the specified target object is not lost by the translation model, so that the translation result is more natural.
The above is a schematic scheme of a training apparatus for translation models according to this embodiment. It should be noted that the technical solution of the translation model training device and the technical solution of the translation model training method belong to the same concept, and details of the technical solution of the translation model training device, which are not described in detail, can be referred to the description of the technical solution of the translation model training method.
It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.
Corresponding to the above embodiment of the translation method, the present application further provides an embodiment of a translation apparatus, and fig. 7 shows a schematic structural diagram of the translation apparatus according to an embodiment of the present application. As shown in fig. 7, the apparatus includes:
an obtaining module 702 configured to obtain a sentence to be translated and a specified object, wherein the object includes a prefix and/or a suffix;
a first encoding module 704, configured to input the object into a first encoder of a translation model for encoding, so as to obtain a first encoding vector, where the translation model is obtained by the above-mentioned translation model training method;
a second encoding module 706, configured to input the first encoding vector and the sentence to be translated into a second encoder of the translation model for encoding, so as to obtain a second encoding vector;
a decoding module 708 configured to input the second encoding vector into a decoder of the translation model for decoding to obtain a target sentence.
The translation device comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is configured to acquire a sentence to be translated and a specified object, and the object comprises a prefix and/or a suffix; the first coding module is configured to input the object into a first coder of a translation model for coding to obtain a first coding vector, wherein the translation model is obtained by training through the translation model training method; the second coding module is configured to input the first coding vector and the statement to be translated into a second coder of a translation model for coding to obtain a second coding vector; and the decoding module is configured to input the second coding vector into a decoder of the translation model for decoding to obtain a target statement. And inputting the sentence to be translated and the specified object into a pre-trained translation model to obtain a more smooth and natural target sentence with the specified object.
The above is a schematic scheme of a translation apparatus of the present embodiment. It should be noted that the technical solution of the translation apparatus and the technical solution of the translation method described above belong to the same concept, and details that are not described in detail in the technical solution of the translation apparatus can be referred to the description of the technical solution of the translation method described above.
It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.
There is also provided in an embodiment of the present application a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the translation model training method or the steps of the translation method when executing the instructions.
An embodiment of the present application further provides a computer-readable storage medium, which stores computer instructions, when executed by a processor, for implementing the translation model training method or the steps of the translation method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the translation model training method or the translation method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the translation model training method or the translation method.
The embodiment of the application discloses a chip, which stores computer instructions, and the instructions are executed by a processor to realize the translation model training method or the steps of the translation method.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims (12)

1. A translation model training method, wherein the translation model comprises a first encoder, a second encoder and a decoder;
the translation model training method comprises the following steps:
receiving a training sample, wherein the training sample comprises a training sentence, a target sentence corresponding to the training sentence and a target object corresponding to the training sentence, and the target object comprises a target prefix and/or a target suffix;
inputting the target object into the first encoder to perform encoding processing to obtain a first encoding vector;
inputting the training sentence and the first coding vector into the second coder for coding to obtain a second coding vector;
inputting the target statement and the second coding vector into the decoder for decoding processing to obtain a decoding vector, and calculating a loss value according to the decoding vector;
and adjusting parameters of the translation model according to the loss value, and continuing to train the translation model until a training stopping condition is reached.
2. The translation model training method according to claim 1, wherein the first encoder includes a first embedded layer and x object encoding layers, x being a positive integer greater than or equal to 1;
the inputting the target object into the first encoder for encoding processing to obtain a first encoding vector includes:
inputting the target object into the first embedding layer for embedding processing to obtain a target object vector, and inputting the target object vector into a first object coding layer for coding processing to obtain a first object coding vector;
inputting the first object coding vector into a second object coding layer for coding to obtain a second object coding vector;
until the x-1 object coding vector is input into the x object coding layer to be coded to obtain a first coding vector.
3. The translation model training method according to claim 1, wherein the second encoder includes a second embedding layer and y sentence encoding layers, y being a positive integer greater than or equal to 1;
the inputting the training sentence and the first coding vector into the second encoder for encoding to obtain a second coding vector includes:
inputting the training sentences to the second embedding layer for embedding processing to obtain training sentence vectors, and inputting the training sentence vectors and the first coding vectors to a first sentence coding layer for coding processing to obtain first sentence coding vectors;
inputting the first statement vector into a second statement coding layer for coding to obtain a second statement coding vector;
until the y-1 statement coding vector is input into the y statement coding layer for coding processing to obtain a second coding vector.
4. The translation model training method of claim 1, wherein said decoder comprises a third embedded layer and z decoding layers, z being a positive integer greater than or equal to 1;
the inputting the target sentence and the second encoding vector into the decoder for decoding processing to obtain a decoding vector comprises:
inputting the target statement into the third embedding layer for embedding processing to obtain a target statement vector, and inputting the target statement vector and the second coding vector into a first decoding layer for decoding processing to obtain a first decoding vector;
inputting the first decoding vector into a second decoding layer for decoding processing to obtain a second decoding vector;
until the z-1 decoding vector is input into the z decoding layer to be decoded to obtain a decoding vector.
5. The translation model training method of claim 1, wherein calculating a loss value from the decoded vector comprises:
and comparing the decoding vector with a preset standard vector to obtain a loss value of the decoding vector.
6. The translation model training method of claim 1, wherein adjusting parameters of the translation model based on the loss values comprises:
and reversely propagating the loss value to sequentially update the decoding parameters of the decoder, the coding parameters of the second coder and the coding parameters of the first coder.
7. The translation model training method of claim 1, wherein said continuing to train the translation model until a training stop condition is reached comprises:
and when the loss value is smaller than the target value, stopping training the translation model.
8. A method of translation, characterized in that,
acquiring a statement to be translated and a specified object, wherein the object comprises a prefix and/or a suffix;
inputting the object into a first encoder of a translation model for encoding to obtain a first encoding vector, wherein the translation model is obtained by training according to the training method of any one of claims 1 to 7;
inputting the first coding vector and the statement to be translated into a second coder of a translation model for coding to obtain a second coding vector;
and inputting the second coding vector into a decoder of the translation model for decoding to obtain a target statement.
9. A training device of a translation model, wherein the translation model comprises a first encoder, a second encoder and a decoder;
the training device of the translation model comprises:
the training device comprises a receiving module and a processing module, wherein the receiving module is configured to receive a training sample, the training sample comprises a training sentence, a target sentence corresponding to the training sentence and a target object corresponding to the training sentence, and the target object comprises a target prefix and/or a target suffix.
The first encoding module is configured to input the target object into the first encoder to perform encoding processing to obtain a first encoding vector;
a second coding module configured to input the training sentence and the first coding vector into the second coder for coding to obtain a second coding vector;
a decoding module configured to input the target sentence and the second encoding vector into the decoder for decoding processing to obtain a decoding vector, and calculate a loss value according to the decoding vector;
and the training module is configured to adjust parameters of the translation model according to the loss value and continue to train the translation model until a training stopping condition is reached.
10. A translation device is characterized in that a translation table is provided,
the translation module is configured to translate the sentence to be translated into the object to be translated;
a first encoding module, configured to input the object into a first encoder of a translation model for encoding, to obtain a first encoding vector, wherein the translation model is trained by the training method according to any one of claims 1 to 7;
the second coding module is configured to input the first coding vector and the statement to be translated into a second coder of a translation model for coding to obtain a second coding vector;
and the decoding module is configured to input the second coding vector into a decoder of the translation model for decoding to obtain a target statement.
11. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-7 or 8 when executing the instructions.
12. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 7 or 8.
CN202011642541.6A 2020-12-31 2020-12-31 Translation model training method and device, and translation method and device Pending CN114692652A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011642541.6A CN114692652A (en) 2020-12-31 2020-12-31 Translation model training method and device, and translation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011642541.6A CN114692652A (en) 2020-12-31 2020-12-31 Translation model training method and device, and translation method and device

Publications (1)

Publication Number Publication Date
CN114692652A true CN114692652A (en) 2022-07-01

Family

ID=82136086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011642541.6A Pending CN114692652A (en) 2020-12-31 2020-12-31 Translation model training method and device, and translation method and device

Country Status (1)

Country Link
CN (1) CN114692652A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933809A (en) * 2019-03-15 2019-06-25 北京金山数字娱乐科技有限公司 A kind of interpretation method and device, the training method of translation model and device
CN110110337A (en) * 2019-05-08 2019-08-09 网易有道信息技术(北京)有限公司 Translation model training method, medium, device and calculating equipment
US20200082271A1 (en) * 2017-11-30 2020-03-12 Tencent Technology (Shenzhen) Company Limited Summary generation method, summary generation model training method, and computer device
CN111326157A (en) * 2020-01-20 2020-06-23 北京字节跳动网络技术有限公司 Text generation method and device, electronic equipment and computer readable medium
CN111339789A (en) * 2020-02-20 2020-06-26 北京字节跳动网络技术有限公司 Translation model training method and device, electronic equipment and storage medium
CN111738020A (en) * 2020-08-24 2020-10-02 北京金山数字娱乐科技有限公司 Translation model training method and device
CN111931518A (en) * 2020-10-15 2020-11-13 北京金山数字娱乐科技有限公司 Translation model training method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200082271A1 (en) * 2017-11-30 2020-03-12 Tencent Technology (Shenzhen) Company Limited Summary generation method, summary generation model training method, and computer device
CN109933809A (en) * 2019-03-15 2019-06-25 北京金山数字娱乐科技有限公司 A kind of interpretation method and device, the training method of translation model and device
CN110110337A (en) * 2019-05-08 2019-08-09 网易有道信息技术(北京)有限公司 Translation model training method, medium, device and calculating equipment
CN111326157A (en) * 2020-01-20 2020-06-23 北京字节跳动网络技术有限公司 Text generation method and device, electronic equipment and computer readable medium
CN111339789A (en) * 2020-02-20 2020-06-26 北京字节跳动网络技术有限公司 Translation model training method and device, electronic equipment and storage medium
CN111738020A (en) * 2020-08-24 2020-10-02 北京金山数字娱乐科技有限公司 Translation model training method and device
CN111931518A (en) * 2020-10-15 2020-11-13 北京金山数字娱乐科技有限公司 Translation model training method and device

Similar Documents

Publication Publication Date Title
CN111222347B (en) Sentence translation model training method and device and sentence translation method and device
CN111738020B (en) Translation model training method and device
CN110503945B (en) Training method and device of voice processing model
CN111931518A (en) Translation model training method and device
CN108170686B (en) Text translation method and device
CN109933809B (en) Translation method and device, and training method and device of translation model
CN109977428A (en) A kind of method and device that answer obtains
CN111223498A (en) Intelligent emotion recognition method and device and computer readable storage medium
CN109710953B (en) Translation method and device, computing equipment, storage medium and chip
CN113590761B (en) Training method of text processing model, text processing method and related equipment
CN111144140B (en) Zhongtai bilingual corpus generation method and device based on zero-order learning
CN111241853B (en) Session translation method, device, storage medium and terminal equipment
CN111539228B (en) Vector model training method and device and similarity determining method and device
CN114282555A (en) Translation model training method and device, and translation method and device
CN113239710A (en) Multi-language machine translation method and device, electronic equipment and storage medium
CN113268989A (en) Polyphone processing method and device
CN111178097B (en) Method and device for generating Zhongtai bilingual corpus based on multistage translation model
CN113449529A (en) Translation model training method and device, and translation method and device
CN114692652A (en) Translation model training method and device, and translation method and device
CN114638238A (en) Training method and device of neural network model
CN112328777B (en) Answer detection method and device
CN114841175A (en) Machine translation method, device, equipment and storage medium
CN114997395A (en) Training method of text generation model, method for generating text and respective devices
CN113553837A (en) Reading understanding model training method and device and text analysis method and device
CN113486647A (en) Semantic parsing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination