CN113887253A

CN113887253A - Method, apparatus, and medium for machine translation

Info

Publication number: CN113887253A
Application number: CN202111325941.9A
Authority: CN
Inventors: 王明轩; 蒋庆男; 孙泽维; 曹军
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2022-01-04
Also published as: WO2023082900A1

Abstract

The present disclosure relates to methods, devices, and media for machine translation. A method for machine translation according to some embodiments of the present disclosure, comprising: acquiring original training data comprising source language sentences for training and target language sentences for training; concatenating bilingual phrase cues associated with the source language sentence used for training to generate new training data having bilingual phrase cues, wherein the bilingual phrase cues comprise one or more bilingual phrases, each bilingual phrase comprising a source language phrase and a corresponding target language phrase; the machine translation model is pre-trained with at least new training data with bilingual phrase cues.

Description

Method, apparatus, and medium for machine translation

Technical Field

The present disclosure relates to methods, devices, and media for machine translation.

Background

Pre-trained models (PTM) significantly advance natural language processing. In recent years, hint-based learning has become an attractive approach for adapting PTMs to specific tasks. With either manually created cues or automatically created cues, PTM can achieve good performance in many downstream tasks without the need for fine-tuning. Unlike hinting and feature-based adaptation, prompt-based learning does not require additional training for downstream tasks. It formulates the downstream task as a language model fill-in task with prompts. In general, in prompt-based learning, predicting a specific task using a pre-trained language model includes 3 phases: (i) constructing a hint with some unfilled gaps based on the input; (ii) filling the unfilled voids with a pre-trained model; and (iii) deriving a final prediction from the filled gaps.

The prompt format depends on the pre-trained model and the downstream task. There are two main types of prompts: a filled-in-space cue, wherein an unfilled space is a predefined space; and prefixed hints, where filling the gap is a generation process that continues to utilize the prefix. The full fill-in hints are typically used for natural language understanding tasks, while the prefix hints are primarily used for natural language generation tasks.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

According to some embodiments of the present disclosure, there is provided a method for machine translation, comprising: acquiring original training data comprising source language sentences for training and target language sentences for training; concatenating bilingual phrase cues associated with the source language sentence used for training to generate new training data having bilingual phrase cues, wherein the bilingual phrase cues comprise one or more bilingual phrases, each bilingual phrase comprising a source language phrase and a corresponding target language phrase; the machine translation model is pre-trained with at least new training data with bilingual phrase cues.

According to some embodiments of the present disclosure, there is provided an apparatus for machine translation, comprising: an original training data acquisition unit configured to acquire original training data including a source language sentence for training and a target language sentence for training; a new training data generating unit configured to concatenate bilingual phrase cues associated with the source language sentence for training to generate new training data having bilingual phrase cues, wherein the bilingual phrase cues comprise one or more bilingual phrases, each bilingual phrase comprising a source language phrase and a corresponding target language phrase; and a pre-training unit configured to pre-train the machine translation model with at least new training data with bilingual phrase cues.

According to some embodiments of the present disclosure, there is provided an electronic device including: a memory; and a processor coupled to the memory, the memory having instructions stored therein that, when executed by the processor, cause the processor to perform a method according to an embodiment of the disclosure.

According to some embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to embodiments of the present disclosure.

Other features, aspects, and advantages of the present disclosure will become apparent from the following detailed description of exemplary embodiments thereof, which is to be read in connection with the accompanying drawings.

Drawings

Preferred embodiments of the present disclosure are described below with reference to the accompanying drawings. The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure. It is to be understood that the drawings in the following description are directed to only some embodiments of the disclosure and are not limiting of the disclosure. In the drawings:

FIG. 1 is a schematic diagram illustrating a method for machine translation in comparison to an existing Vanilla method, according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram illustrating a method for machine translation, according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram illustrating a method of retrieving bilingual phrases from a bilingual phrase database according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram illustrating a process for translating source language phrases into target language phrases based on a bilingual phrase database;

FIG. 5 illustrates an example of an Ender translation in the medical field according to an embodiment of the present disclosure;

FIG. 6 is a block diagram illustrating an apparatus for machine translation, according to an embodiment of the present disclosure;

FIG. 7 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure; and

fig. 8 is a block diagram illustrating an example structure of a computer system employable in embodiments of the present disclosure.

It should be understood that the dimensions of the various features shown in the drawings are not necessarily drawn to scale for ease of illustration. The same or similar reference numbers are used throughout the drawings to refer to the same or like parts. Thus, once an item is defined in one drawing, it may not be further discussed in subsequent drawings.

Detailed Description

Technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, but it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The following description of the embodiments is merely exemplary in nature and is in no way intended to limit the disclosure, its application, or uses. It is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect. Unless specifically stated otherwise, the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments should be construed as merely illustrative, and not limiting the scope of the present disclosure.

The term "comprising" and variations thereof as used in this disclosure is intended to be open-ended terms that include at least the following elements/features, but do not exclude other elements/features, i.e., "including but not limited to". Furthermore, the term "comprising" and variations thereof as used in this disclosure is intended to be an open term that includes at least the following elements/features, but does not exclude other elements/features, i.e., "including but not limited to". Thus, including is synonymous with including. The term "based on" means "based at least in part on".

Reference throughout this specification to "one embodiment," "some embodiments," or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. For example, the term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Moreover, the appearances of the phrases "in one embodiment," "in some embodiments," or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units. Unless otherwise specified, the notions "first", "second", etc. are not intended to imply that the objects so described must be in a given order, either temporally, spatially, in ranking, or in any other manner.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings, but the present disclosure is not limited to these specific embodiments. These particular embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner as would be apparent to one of ordinary skill in the art from this disclosure in one or more embodiments.

There are two major difficulties in applying hint-based learning to machine translation. First, it is difficult to create effective hints for machine translation. Brown et al, in "Language models are new-shot characters in New Information in Neural Information Processing Systems, volume 33, pages 1877-. Liu et al, in "What masks good in-context expressions for gpt-3arXiv preprint arXiv: 2101.06804", disclose that the performance of downstream tasks is heavily dependent on the choice of examples within the context. However, sentence-level translation examples are sparse. It is difficult to find sentence-level translation examples related to the input sentence to build an effective hint. Second, the pre-training task of the language model is not designed in conjunction with hint-based machine translation prediction. The potential for prompt-based learning is limited by the inconsistency between pre-training and prediction.

Embodiments of the present disclosure can effectively address difficulties encountered when applying hint-based learning to machine translation. Embodiments of the present disclosure utilize bilingual phrase-hinting and use hint-aware pre-training (PAPT) for hint-based machine translation. Experiments show that the BLEU score of machine translation in a specific field can be improved by 6.2 and the precision of lexical-constrained machine translation is improved by 11.5% under the condition of no need of additional training.

First, embodiments of the present disclosure build bilingual phrase cues for machine translation. Bilingual phrase cues are concatenated from phrase-level translation examples (e.g., bilingual phrases) to mitigate sparseness of sentence-level translation examples. By retrieving relevant bilingual phrases from a pre-constructed bilingual phrase database, embodiments of the present disclosure are able to construct input-related hints that can provide useful knowledge for translation generation. Then, to mitigate inconsistencies between pre-training and hint-based prediction, hints may be made known in the pre-training. Thus, a hint-aware pre-training task may be designed, which may be a sequence-to-sequence generation task. Embodiments of the present disclosure pre-train a hint-aware model for machine translation, thereby enabling mitigation of inconsistencies between pre-training and hint-based prediction.

FIG. 1 is a schematic diagram illustrating a method for machine translation in comparison to an existing Vanilla method, according to an embodiment of the present disclosure. In FIG. 1, the input is a source language sentence in a training sample, the output is a target language sentence in the training sample, and the prompt is a bilingual phrase prompt. In the existing Vanilla method, only each training sample including source and target language sentences is trained, without the use of bilingual phrase cues. The PAPT is a hint-aware pre-training method according to embodiments of the present disclosure. In the PAPT, not only each training sample including a source language sentence and a target language sentence is used for training, but also a new training sample constituted with the target language sentence after a bilingual phrase cue is concatenated to the source language sentence is trained for each training sample. To exemplarily illustrate embodiments of the present disclosure, concatenating a bilingual phrase prompt to a source language sentence refers to prefixing the bilingual phrase prompt to the source language sentence. However, embodiments of the present disclosure are not so limited, and for example, bilingual phrase cues may be used as suffixes instead of prefixes for source language sentences.

Fig. 2 is a flow diagram illustrating a method 200 for machine translation, according to an embodiment of the present disclosure. In step S210, original training data including a source language sentence for training and a target language sentence for training is acquired. The raw training data may be general domain translation data, such as the WMT14 EN-DE data set.

In step S220, bilingual phrase cues associated with the source language sentence used for training are concatenated to the source language sentence used for training to generate new training data having bilingual phrase cues.

The bilingual phrase prompt includes one or more bilingual phrases, each including a source language phrase and a corresponding target language phrase. Bilingual phrase cues can provide useful knowledge for machine translation.

Multiple bilingual phrases may be separated by a first token (e.g., < r >). The source language phrase and the corresponding target language phrase in the bilingual phrase may be separated by a second token (e.g., < q >). Bilingual phrase cues associated with the source language sentence used for training and the source language sentence used for training may be separated by a third label (e.g., < p >).

Bilingual phrase prompts may be constructed by retrieving bilingual phrases from a pre-constructed bilingual phrase database. For a source language sentence used for training, bilingual phrase cues associated with the source language sentence used for training may be constructed by retrieving bilingual phrases associated with the source language sentence used for training from a pre-constructed bilingual phrase database. For a source language sentence to be translated, bilingual phrase cues associated with the source language sentence to be translated may be constructed by retrieving bilingual phrases associated with the source language sentence to be translated from a pre-constructed bilingual phrase database.

The bilingual phrase database may be pre-constructed and may be offline. Bilingual phrase databases may be constructed using multilingual BERTs, such as, for example, Devrin et al, set forth in "BERT-translation of language interpretation. in Proc. of NAACL-HLT, pages 4171-4186" to extract bilingual phrases from parallel translation data and calculate a contextual representation of the source language phrases. Contextual representations of source language phrases and corresponding bilingual phrases are stored as key-value pairs in a bilingual phrase database. The context of the source language phrase is represented as a key in a key-value pair, with the corresponding bilingual phrase as the value in the key-value pair. The bilingual phrase database is a collection of key-value pairs created from parallel translation data.

The method for extracting bilingual phrases can comprise the following steps: first, Word alignments can be extracted using the awesome-alignment method described in "Word alignment by fine-tuning elements on parallel ligands. in Proceedings of the 16th Conference of the European channel of the Association for the Computational Linear alignments: Main Volume, pages 2112-2128", Dou et al; the bilingual phrases can then be extracted from the word alignments using the algorithm described by Koehn et al in "Statistical Machine Translation". The contextual representation of the phrase may be computed by pool averaging the hidden states of the words in the phrase. The encoding method can be sub-word segmented by 32k merge operations using the joint bytes described in "Neural machine translation of words with sub-word units. in Proc. of ACL, pages 1715-.

FIG. 3 is a flow diagram illustrating a method 300 of retrieving bilingual phrases from a bilingual phrase database according to an embodiment of the present disclosure. In step S310, the source language phrases in the bilingual phrase database are loaded into the dictionary tree. In step S320, source language phrases existing in the dictionary tree in the source language sentence are extracted. In step S330, a contextual representation of the source language phrase in the source language sentence is calculated. In step S340, bilingual phrases are retrieved from a bilingual phrase database based on the contextual representation of the source language phrase in the source language sentence.

In general, L can be based on a contextual representation of a source language phrase²The distance retrieves the most similar bilingual phrase from the bilingual phrase database. However, in the case where the bilingual phrase database is constructed based on the original training data, a second similar bilingual phrase is retrieved from the bilingual phrase database based on the context representation of the source language phrase to avoid retrieving a bilingual phrase extracted from the current translation sample and overfitting the retrieved bilingual phrase.

Once the bilingual phrases are retrieved, they may be used to construct bilingual phrase cues and concatenate the constructed bilingual phrase cues to the source language sentences used for training to generate new training data with the bilingual phrase cues. The new training data includes a source language sentence and a target language sentence concatenated with bilingual phrase cues.

Returning to FIG. 2, at step S230, the machine translation model is pre-trained with at least new training data with bilingual phrase cues. In some embodiments, the machine translation model may be pre-trained with both the original training data and the new training data with bilingual phrase cues to obtain a model with better prediction accuracy.

One or more rounds (e.g., 10 rounds) of pre-training of the machine translation model may be performed to obtain a model with better prediction accuracy. Cross entropy loss may be employed in the pre-training process. The machine translation model may be an encoder-decoder model. The encoder-decoder model may be pre-trained for hint-based learning in machine translation. The encoder-decoder model may utilize a Transformer (Transformer) architecture, as described by Vaswani et al in the Attention all you needed. in Proc. of NeurIPS, pages 5998-. The machine translation model may be implemented based on the Fairseq method described by Ott et al in "A fast, extensible toolkit for sequence modeling. in Proc. of NAACL-Demonstrations, pages 48-53". For efficient bilingual phrase retrieval, the IVFPQ index with the FAISS can be constructed with reference to what Johnson et al describe in "Billion-scale similarity search with gpus.

In some embodiments of the present disclosure, the method 200 for machine translation may further include step S240. In step S240, a source language sentence to be translated is received and the source language sentence to be translated is translated into a target language sentence to be output using a pre-trained machine translation model. In some embodiments of the present disclosure, a pre-trained machine translation model may translate a source language sentence with bilingual phrase cues. In this case, a bilingual phrase prompt associated with the source language sentence to be translated is concatenated to the source language sentence to be translated to generate the source language sentence to be translated having the bilingual phrase prompt. The source language sentence to be translated with the bilingual phrase prompt is then input into the pre-trained machine translation model. In some embodiments of the present disclosure, a pre-trained machine translation model may translate a source language sentence without bilingual phrase cues. In which case the source language sentence to be translated is input into the pre-trained machine translation model.

In some embodiments of the present disclosure, the translation intervention is performed by manually creating bilingual phrase prompts. For example, lexical constraints may be expressed as bilingual phrase cues to intervene in the vocabulary selection during translation. The lexical constraints are assumed to be that the word x in the input sentence is translated into the target word y in the output sentence. The lexical constraint may be expressed as a bilingual phrase prompt "x < q > y," and the input sentence with the prompt is translated. Lexical constraints may be specified as soft lexical constraints. The benefit of soft lexical constraints over hard lexical constraints is that the morphology of the phrase need not be specified.

In some embodiments of the present disclosure, bilingual phrase prompts associated with a source language sentence to be translated are constructed by retrieving bilingual phrases from a bilingual phrase database. FIG. 4 is a flow diagram illustrating a process 400 for translating source language phrases into target language phrases based on a bilingual phrase database. First, a source language phrase 402 in a source language sentence 401 to be translated is extracted, and a bilingual phrase 405 is retrieved from a bilingual phrase database 404 to construct a bilingual phrase prompt 406. The most similar bilingual phrases 405 may be retrieved from the bilingual phrase database 404 by computing a contextual representation 403 of the source language phrase 402 and based on the contextual representation 403. The constructed bilingual phrase prompt 406 is then concatenated to the source language sentence 401 to be translated to generate a source language sentence 407 having bilingual phrase prompts. Finally, the source language sentence 407 with bilingual phrase cues is input into the pre-trained machine translation model 408 to be translated into a target language sentence 409 to be output.

Table 1 compares the BLEU scores of the PAPT scheme of the examples of the present disclosure with the existing Vanilla scheme. In Table 1, the database size represents the number of bilingual phrases in the database. The PAPT (no prompt) indicates that no bilingual phrase prompt is concatenated to the source language sentence to be translated during the translation stage. The PAPT (with prompt) indicates that a bilingual phrase prompt is concatenated to the source language sentence to be translated during the translation stage. Table 2 shows the size of the data sets used for training, development and testing, respectively.

As shown in table 1, the PAPT (no hint) scheme has comparable performance to the Vanilla scheme when translating english to german and vice versa in a particular domain. The PAPT (hint) scheme requires no additional training to be 6.7 and 5.6 points higher than the average BLEU score of the Vanilla scheme, respectively. The results indicate that bilingual phrase cues are helpful for machine translation in a particular domain.

TABLE 1

TABLE 2

Table 3 compares the performance of the PAPT schema of the embodiments of the present disclosure with the existing Vanilla schema for english translation with lexical constraints. The test set used in Table 3 was extracted from the database of the knowledge and IATE (interactive technology for Europe) Terminology by Susanto et al in "Lexially constrained neural network transformation with a less transmitted device. in Proceedings of the 58th Annual Meeting for the regulatory agencies, pages 3536 3543". As shown in table 3, the PAPT scheme of the embodiments of the present disclosure achieved 10.7% and 12.3% precision improvements in Wiktionary and IATE, respectively, without additional training, compared to Vanilla. The precision represents the rate at which the target phrase appears in the translation output. The overall translation performance of the BLEU score measurement is slightly improved. This indicates that the PAPT can effectively incorporate lexical constraints into the translation process. Moreover, by concatenating multiple bilingual phrases as hints, the PAPT can conveniently incorporate multiple lexical constraints.

TABLE 3

Fig. 5 illustrates an example of english translation in the medical field according to an embodiment of the present disclosure. As can be seen from fig. 5, in the case where the bilingual phrase cues are different, the PAPT may output different target language sentences. It can be seen that the translation process can be intervened by modifying the bilingual phrase cues.

Fig. 6 is a block diagram illustrating an apparatus 600 for machine translation, according to an embodiment of the present disclosure. As shown in fig. 6, the apparatus 600 includes an original training data acquisition unit 601, a new training data generation unit 602, and a pre-training unit 603. The original training data acquisition unit 601 is configured to acquire original training data including a source language sentence for training and a target language sentence for training. The new training data generating unit 602 is configured to concatenate bilingual phrase cues associated with the source language sentence for training to generate new training data having the bilingual phrase cues. The bilingual phrase prompt includes one or more bilingual phrases, each including a source language phrase and a corresponding target language phrase. The pre-training unit 603 is configured to pre-train the machine translation model with at least new training data with bilingual phrase cues.

In some embodiments of the present disclosure, the apparatus 600 may further include a translation unit 604. Is configured to receive a source language sentence to be translated and translate the source language sentence to be translated into a target language sentence to be output using a pre-trained machine translation model.

Since the specific implementation of the operation performed by each unit in fig. 6 has been described in detail in the foregoing, the detailed description is omitted here.

As described above, the present disclosure finds difficulty in applying hint-based learning to machine translation. The present disclosure presents an effective approach to address this difficulty. Data show that the technical scheme of the disclosure can effectively promote machine translation in a specific field and machine translation based on lexical constraints without additional training.

It should be noted that the above units are only logic modules divided according to the specific functions implemented by the units, and are not used for limiting the specific implementation manner, and may be implemented in software, hardware or a combination of software and hardware, for example. In actual implementation, the above units may be implemented as separate physical entities, or may also be implemented by a single entity (e.g., a processor (CPU or DSP, etc.), an integrated circuit, etc.). Furthermore, the various elements described above are shown in dashed lines in the figures to indicate that these elements may not actually be present, but that the operations/functions that they implement may be implemented by the processing circuitry itself.

Further, although not shown, the apparatus may also include a memory that can store various information generated in operation by the apparatus, the respective units included in the apparatus, programs and data for operation, data to be transmitted by the communication unit, and the like. The memory may be volatile memory and/or non-volatile memory. For example, memory may include, but is not limited to, Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), flash memory. Of course, the memory may also be located outside the device. Optionally, although not shown, the apparatus may also comprise a communication unit, which may be used for communicating with other devices. In one example, the communication unit may be implemented in a suitable manner as known in the art, e.g., including communication components such as antenna arrays and/or radio frequency links, various types of interfaces, communication units, and so forth. And will not be described in detail herein. Further, the device may also include other components not shown, such as radio frequency links, baseband processing units, network interfaces, processors, controllers, and so forth. And will not be described in detail herein.

Some embodiments of the present disclosure also provide an electronic device. Fig. 7 is a block diagram illustrating an electronic device in accordance with some embodiments of the present disclosure. For example, in some embodiments, the electronic device 700 may be various types of devices, such as, but not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet), a PMP (portable multimedia player), a vehicle mounted terminal (e.g., a car navigation terminal), and so on, and a fixed terminal such as a digital TV, a desktop computer, and so on. For example, the electronic device 700 may include a display panel for displaying data and/or execution results utilized in accordance with aspects of the present disclosure. For example, the display panel may be various shapes such as a rectangular panel, an elliptical panel, or a polygonal panel, etc. In addition, the display panel can be not only a plane panel, but also a curved panel, even a spherical panel.

As shown in fig. 7, the electronic apparatus 700 of this embodiment includes: a memory 701 and a processor 702 coupled to the memory 701. It should be noted that the components of the electronic device 700 shown in fig. 7 are exemplary only, and not limiting, and the electronic device 700 may have other components according to the actual application. The processor 702 may control other components in the electronic device 700 to perform desired functions.

In some embodiments, memory 701 is used to store one or more computer-readable instructions. The processor 702 is configured to execute computer readable instructions, which when executed by the processor 702 implement a method according to any of the embodiments described above. For specific implementation and related explanation of each step of the method, reference may be made to the above-mentioned embodiments, and repeated details are not described herein.

For example, the processor 702 and the memory 701 may be in communication with each other, directly or indirectly. For example, the processor 702 and the memory 701 may communicate over a network. The network may include a wireless network, a wired network, and/or any combination of wireless and wired networks. The processor 702 and the memory 701 may also communicate with each other via a system bus, which is not limited by the present disclosure.

For example, the processor 702 may be embodied as various suitable processors, Processing devices, and the like, such as a Central Processing Unit (CPU), Graphics Processing Unit (GPU), Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The Central Processing Unit (CPU) may be an X86 or ARM architecture, etc. For example, memory 701 may include any combination of various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory. The memory 701 may include, for example, a system memory storing, for example, an operating system, application programs, a Boot Loader (Boot Loader), databases, and other programs. Various application programs and various data and the like can also be stored in the storage medium.

In addition, according to some embodiments of the present disclosure, in the case of being implemented by software and/or firmware, various operations/processes according to the present disclosure may install a program constituting the software from a storage medium or a network to a computer system having a dedicated hardware structure, for example, the computer system 800 shown in fig. 8, which is capable of performing various functions including functions such as those described above, etc., when the various programs are installed. Fig. 8 is a block diagram illustrating an example structure of a computer system employable in embodiments of the present disclosure.

In fig. 8, a Central Processing Unit (CPU)801 executes various processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 to a Random Access Memory (RAM) 803. In the RAM 803, data necessary when the CPU 801 executes various processes and the like is also stored as necessary. The central processing unit is merely exemplary and may be other types of processors such as the various processors described above. The ROM 802, RAM 803, and storage 808 can be various forms of computer-readable storage media, as described below. It is noted that although ROM 802, RAM 803, and storage 808 are shown separately in fig. 8, one or more of them may be combined or located in the same or different memory or storage modules.

The CPU 801, the ROM 802, and the RAM 803 are connected to each other via a bus 804. An input/output interface 805 is also connected to the bus 804.

The following components are connected to the input/output interface 805: an input portion 806, such as a touch screen, a touch pad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, or the like; an output section 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; a storage portion 808 including a hard disk, a magnetic tape, and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, and the like. The communication section 809 allows communication processing to be performed via a network such as the internet. It will be readily appreciated that while the various devices or modules in computer system 800 are shown in fig. 8 as communicating via bus 804, they may also communicate via a network or otherwise, wherein a network may include a wireless network, a wired network, and/or any combination of wireless and wired networks.

A drive 810 is also connected to the input/output interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is installed in the storage portion 808 as necessary.

In the case where the above-described series of processes is realized by software, a program constituting the software may be installed from a network such as the internet or a storage medium such as the removable medium 811.

According to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the CPU 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that in the context of this disclosure, a computer-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer-readable storage medium may be, for example, but is not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

In some embodiments, there is also provided a computer program comprising: instructions which, when executed by a processor, cause the processor to perform the method of any of the embodiments described above. For example, the instructions may be embodied as computer program code.

In embodiments of the present disclosure, computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules, components or units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware. Wherein the designation of a module, component or unit does not in some way constitute a limitation on the module, component or unit itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

According to some embodiments of the present disclosure, the one or more bilingual phrases are separated by first markers, the source language phrase and the corresponding target language phrase are separated by second markers, and the bilingual phrase prompt and the source language sentence used for training are separated by third markers.

According to some embodiments of the present disclosure, the bilingual phrase prompt is constructed by retrieving bilingual phrases from a bilingual phrase database.

According to some embodiments of the present disclosure, a contextual representation of a source language phrase of each bilingual phrase and the bilingual phrase are stored as a key-value pair in the bilingual phrase database.

According to some embodiments of the present disclosure, retrieving bilingual phrases from a bilingual phrase database includes: loading source language phrases in the bilingual phrase database into a dictionary tree; extracting source language phrases existing in a dictionary tree in the source language sentences; calculating a context representation of a source language phrase in a source language sentence; and retrieving the bilingual phrase from the bilingual phrase database based on the contextual representation of the source language phrase in the source language sentence.

According to some embodiments of the present disclosure, in a case where the bilingual phrase database is not constructed based on the original training data, retrieving a most similar bilingual phrase from the bilingual phrase database based on the contextual representation of the source language phrase, and in a case where the bilingual phrase database is constructed based on the original training data, retrieving a second similar bilingual phrase from the bilingual phrase database based on the contextual representation of the source language phrase.

According to some embodiments of the disclosure, the machine translation model is an encoder-decoder model.

According to some embodiments of the present disclosure, pre-training a machine translation model with at least new training data with bilingual phrase cues comprises: the machine translation model is pre-trained using both the original training data and the new training data with bilingual phrase cues.

According to some embodiments of the present disclosure, a source language sentence to be translated is received and translated into a target language sentence to be output using a pre-trained machine translation model.

According to some embodiments of the present disclosure, concatenating a bilingual phrase prompt associated with the source language sentence to be translated to generate the source language sentence to be translated having the bilingual phrase prompt; and inputting the source language sentence to be translated with bilingual phrase cues into the pre-trained machine translation model.

According to some embodiments of the present disclosure, the bilingual phrase prompt associated with the source language sentence to be translated is created manually.

According to some embodiments of the present disclosure, a source language phrase in the source language sentence to be translated is extracted; and retrieving bilingual phrases from the bilingual phrase database to construct bilingual phrase prompts associated with the source language sentence to be translated.

According to some embodiments of the present disclosure, computing a contextual representation of a source language phrase in the source language sentence to be translated, wherein retrieving a bilingual phrase from a bilingual phrase database comprises retrieving a most similar bilingual phrase from the bilingual phrase database based on the contextual representation of the source language phrase in the source language sentence to be translated.

According to some embodiments of the present disclosure, there is provided a computer program comprising: instructions that when executed by a processor cause the processor to perform a method according to an embodiment of the present disclosure.

According to some embodiments of the present disclosure, there is provided a computer program product comprising instructions which, when executed by a processor, implement a method according to embodiments of the present disclosure.

The foregoing description is only exemplary of some embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method for machine translation, comprising:

acquiring original training data comprising source language sentences for training and target language sentences for training;

concatenating bilingual phrase cues associated with the source language sentence used for training to generate new training data having bilingual phrase cues, wherein the bilingual phrase cues comprise one or more bilingual phrases, each bilingual phrase comprising a source language phrase and a corresponding target language phrase;

the machine translation model is pre-trained with at least new training data with bilingual phrase cues.

2. The method of claim 1, wherein the one or more bilingual phrases are separated by a first marker, the source language phrase and the corresponding target language phrase are separated by a second marker, and the bilingual phrase prompt and the source language sentence used for training are separated by a third marker.

3. The method of claim 1, further comprising:

the bilingual phrase prompt is constructed by retrieving bilingual phrases from a bilingual phrase database.

4. The method of claim 3, further comprising:

the contextual representation of the source language phrase of each bilingual phrase and the bilingual phrase are stored as key-value pairs in the bilingual phrase database.

5. The method of claim 3, wherein retrieving bilingual phrases from a bilingual phrase database comprises:

loading source language phrases in the bilingual phrase database into a dictionary tree;

extracting source language phrases existing in a dictionary tree in the source language sentences;

calculating a context representation of a source language phrase in a source language sentence; and

bilingual phrases are retrieved from a bilingual phrase database based on contextual representations of the source language phrases in the source language sentences.

6. The method of claim 3, wherein,

in the case where the bilingual phrase database is not constructed based on the original training data, retrieving the most similar bilingual phrases from the bilingual phrase database based on the contextual representation of the source language phrases, an

Retrieving a second similar bilingual phrase from the bilingual phrase database based on the contextual representation of the source language phrase, in the case that the bilingual phrase database is constructed based on the original training data.

7. The method of claim 1, wherein the machine translation model is an encoder-decoder model.

8. The method of claim 1, wherein pre-training the machine translation model with at least new training data with bilingual phrase cues comprises:

the machine translation model is pre-trained using both the original training data and the new training data with bilingual phrase cues.

9. The method of claim 1, further comprising:

a source language sentence to be translated is received and translated into a target language sentence to be output using a pre-trained machine translation model.

10. The method of claim 9, further comprising:

concatenating bilingual phrase cues associated with the source language sentence to be translated to generate the source language sentence to be translated having the bilingual phrase cues; and

a source language sentence to be translated having bilingual phrase cues is input into a pre-trained machine translation model.

11. The method of claim 10, wherein the bilingual phrase prompt associated with the source language sentence to be translated is created manually.

12. The method of claim 10, further comprising:

extracting source language phrases in the source language sentences to be translated; and

bilingual phrases are retrieved from a bilingual phrase database to construct bilingual phrase prompts associated with the source language sentence to be translated.

13. The method of claim 12, further comprising:

calculating a contextual representation of a source language phrase in the source language sentence to be translated,

wherein retrieving bilingual phrases from a bilingual phrase database comprises retrieving most similar bilingual phrases from the bilingual phrase database based on the contextual representation of the source language phrases in the source language sentence to be translated.

14. An electronic device, comprising:

a memory; and

a processor coupled to the memory, the memory having stored therein instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-13.

15. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any one of claims 1-13.

16. An apparatus for machine translation, comprising:

an original training data acquisition unit configured to acquire original training data including a source language sentence for training and a target language sentence for training;

a new training data generating unit configured to concatenate bilingual phrase cues associated with the source language sentence for training to generate new training data having bilingual phrase cues, wherein the bilingual phrase cues comprise one or more bilingual phrases, each bilingual phrase comprising a source language phrase and a corresponding target language phrase; and

a pre-training unit configured to pre-train the machine translation model with at least new training data with bilingual phrase cues.