CN112966530A - Self-adaptive method, system, medium and computer equipment in machine translation field - Google Patents

Self-adaptive method, system, medium and computer equipment in machine translation field Download PDF

Info

Publication number
CN112966530A
CN112966530A CN202110375078.1A CN202110375078A CN112966530A CN 112966530 A CN112966530 A CN 112966530A CN 202110375078 A CN202110375078 A CN 202110375078A CN 112966530 A CN112966530 A CN 112966530A
Authority
CN
China
Prior art keywords
domain
field
machine translation
model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110375078.1A
Other languages
Chinese (zh)
Other versions
CN112966530B (en
Inventor
贝超
程国艮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Tone Communication Technology Co ltd
Original Assignee
Global Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Tone Communication Technology Co ltd filed Critical Global Tone Communication Technology Co ltd
Priority to CN202110375078.1A priority Critical patent/CN112966530B/en
Publication of CN112966530A publication Critical patent/CN112966530A/en
Application granted granted Critical
Publication of CN112966530B publication Critical patent/CN112966530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of machine translation, and discloses a self-adaptive method, a self-adaptive system, a self-adaptive medium and a self-adaptive computer device in the field of machine translation, which comprise the following steps: and performing adaptive training of the domain model based on the quantity and quality of different domain corpora through semi-supervised augmentation corpora, and performing machine translation by using the trained domain model. The invention provides a set of complete solution for the main problems in the practical application of the field machine translation, can effectively utilize the field linguistic data provided by the user and provides a better field machine translation model. Aiming at the condition of less field linguistic data, the invention uses a semi-supervised method to construct the field linguistic data; selecting different training modes according to the requirements of users, and obtaining a better field neural network machine translation model in an up-sampling mode; the method avoids the phenomenon that the field model is over-fitted in an incremental training mode, cannot cover the use scenes of most users, and can quickly construct the field model.

Description

Self-adaptive method, system, medium and computer equipment in machine translation field
Technical Field
The invention belongs to the technical field of machine translation, and particularly relates to a self-adaptive method, a self-adaptive system, a self-adaptive medium and computer equipment in the field of machine translation.
Background
Currently, machine translation is the process of automatically translating a source language sentence into another target language sentence using computer algorithms. Machine translation is a research direction of artificial intelligence, and has very important scientific research value and practical value. Along with the continuous deepening of the globalization process and the rapid development of the internet, the machine translation technology plays an increasingly important role in political, economic, social, cultural communication and the like at home and abroad.
As the availability of neural network machine translation has increased substantially, the user's demand for machine translation has increased. The general user has no professional requirement and does not need high accuracy, and the requirement can be met by using machine translation in the general field. However, general machine translation in the general field cannot meet the requirements of users in the professional field, but the requirements of the users in the professional field for machine translation are large, and the requirements on the accuracy and the specialty of the translation are high.
The neural network machine translation system in the current field has been discussed more in academia, but for the application of the neural network machine translation in the field of industry level, a plurality of problems to be solved still exist. The academic papers can be optimized correspondingly for the test set, and most importantly, the domain test set is only thousands of sentences, and cannot represent the sentences to be translated in all scenes of the user. Therefore, in practical applications, the domain machine translation model often causes users to feel poor.
Training a neural network model from scratch takes a lot of time on corpus processing and model training. However, in practical application, users often generate new domain corpora continuously, but cannot train a model from scratch every time, which puts requirements on rapid domain adaptation.
In addition, the user has less corpus, which is difficult to cover all the use scenes and the quality is difficult to ensure. The difficulty is how to use the language material provided by the user and perform personalized customization.
Through the above analysis, the problems and defects of the prior art are as follows: the existing machine translation system or method can not be applied to the professional field and can not carry out field self-adaptation; and the existing machine translation method or system has inaccurate translation and poor user experience. The difficulty in solving the above problems and defects is: because the available domain linguistic data are few or even none, the neural network machine translation model needs a large amount of data to drive, and a small amount of data cannot be trained to obtain an available domain machine translation model or even train.
The significance of solving the problems and the defects is as follows: according to the invention, an available domain machine translation model can be trained under a reasonable condition according to the requirements of users and different conditions, and the problem that the domain neural network machine translation model cannot be trained under the condition of domain corpus deficiency is solved.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a self-adaptive method, a self-adaptive system, a self-adaptive medium and computer equipment in the field of machine translation.
The invention is realized in such a way that a machine translation field self-adaptive method comprises the following steps: and performing adaptive training of the domain model based on the quantity and quality of different domain corpora through semi-supervised augmentation corpora, and performing machine translation by using the trained domain model.
Further, the machine translation domain adaptive method comprises the following steps:
step one, producing a pseudo-parallel field corpus by using a semi-supervised method, and augmenting the corpus;
step two, constructing a domain model, judging whether the time is sufficient, and if the time is sufficient, performing full training on the constructed domain model; if the time is not sufficient, performing incremental training on the constructed domain model;
and step three, performing the machine translation of the domain self-adaption by using the trained domain model.
Further, the producing the pseudo parallel domain corpus by using the semi-supervised method comprises the following steps:
and (3) collecting monolingues in the field, translating by using a reverse machine translation model, and forming a pseudo-parallel corpus in the field by using a translated text obtained by translation and an original text.
Further, in the second step, the performing full-scale training on the constructed domain model includes:
(1) carrying out training set pretreatment; training the constructed domain model by taking the universal test set as a development set;
(2) and (4) taking the field test set as a development set, and performing secondary training on the constructed field model by using the same training set.
Further, in step (1), the performing training set preprocessing includes: and upsampling the domain linguistic data to enable the ratio of the number of the general linguistic data to the number of the domain linguistic data to be 5:1 to 10: 1.
Further, in the second step, the incremental training of the constructed domain model includes: and judging the condition of the domain linguistic data, and training the constructed domain model by utilizing the linguistic data in the domain based on a judgment result.
Further, the domain model constructed by using corpus training in the domain based on the determination result includes:
if the field corpus is more and the quality is better: based on the original general model, the field linguistic data is used for carrying out incremental training of the field model;
if the domain corpus is less or the quality is lower: mixing the domain linguistic data and the general linguistic data, and performing upsampling to obtain a ratio of the general linguistic data to the domain linguistic data of 5: 1; incremental training is performed based on the generic model.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of: and performing adaptive training of the domain model based on the quantity and quality of different domain corpora through semi-supervised augmentation corpora, and performing machine translation by using the trained domain model.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of: and performing adaptive training of the domain model based on the quantity and quality of different domain corpora through semi-supervised augmentation corpora, and performing machine translation by using the trained domain model.
Another object of the present invention is to provide a machine translation domain adaptive system for implementing the machine translation domain adaptive method, the machine translation domain adaptive system comprising:
the corpus augmentation module is used for producing the corpus in the pseudo-parallel field by using a semi-supervised method and augmenting the corpus;
the model building module is used for building a domain model;
the training module is used for carrying out incremental or full training of the domain model based on the quantity and quality of different domain linguistic data;
and the translation module is used for performing the machine translation of the domain self-adaption by utilizing the trained domain model.
By combining all the technical schemes, the invention has the advantages and positive effects that: the invention can select a proper training mode under the condition of the quantity and the quality of the linguistic data in different fields. Under the condition of less field language material quantity, the field language material can be rapidly expanded through a semi-supervision mode. During training, the field model can be provided with better quality through full training. And in a short time, the domain model training needs to be performed rapidly: under the condition that the quality and the quantity of the linguistic data are both good, the incremental training can be directly carried out by using the domain linguistic data; under the condition of poor corpus quality or small corpus quantity, the field and the general corpus can be mixed, the specific gravity of the field corpus is improved through upsampling, and then incremental training is carried out.
The invention provides a set of complete solution for the main problems in the practical application of the field machine translation, can effectively utilize the field linguistic data provided by the user and provides a better field machine translation model.
When the field linguistic data are less, the semi-supervised method is selected to construct the field linguistic data. And according to the requirements of users, different training modes are selected, and a better domain neural network machine translation model is finally obtained through an up-sampling mode. The method avoids the phenomenon that the field model is over-fitted in an incremental training mode, cannot cover the use scenes of most users, and can quickly construct the field model.
The invention is applied to a machine translation engine in the financial field, and has the effect shown in the table 1, and the effect is obvious compared with the effect of a model in the general field.
TABLE 1 financial field BLEU values
Figure DEST_PATH_IMAGE001
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
FIG. 1 is a schematic diagram of a method for adaptive domain machine translation according to an embodiment of the present invention.
Fig. 2 is a flowchart of a method for adapting to the field of machine translation according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating a structure of a machine translation domain adaptive system according to an embodiment of the present invention;
in the figure: 1. a corpus augmentation module; 2. a model building module; 3. a training module; 4. and a translation module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a method, a system, a medium, and a computer device for adaptive field of machine translation, and the present invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a method for adapting to a machine translation domain provided by an embodiment of the present invention includes: and performing adaptive training of the domain model based on the quantity and quality of different domain corpora through semi-supervised augmentation corpora, and performing machine translation by using the trained domain model.
As shown in fig. 2, the method for adapting to the field of machine translation provided by the embodiment of the present invention includes the following steps:
s101, producing a pseudo-parallel field corpus by using a semi-supervised method, and augmenting the corpus;
s102, constructing a domain model, judging whether the time is sufficient, and if the time is sufficient, performing full training on the constructed domain model; if the time is not sufficient, performing incremental training on the constructed domain model;
and S103, performing the machine translation of the domain self-adaption by using the trained domain model.
The method for producing the language material in the pseudo parallel field by using the semi-supervised method provided by the embodiment of the invention comprises the following steps:
and (3) collecting monolingues in the field, translating by using a reverse machine translation model, and forming a pseudo-parallel corpus in the field by using a translated text obtained by translation and an original text.
In step S102, the performing full-scale training on the constructed domain model provided by the embodiment of the present invention includes:
(1) carrying out training set pretreatment; training the constructed domain model by taking the universal test set as a development set;
(2) and (4) taking the field test set as a development set, and performing secondary training on the constructed field model by using the same training set.
In step (1), the training set preprocessing provided by the embodiment of the present invention includes: and upsampling the domain linguistic data to enable the ratio of the number of the general linguistic data to the number of the domain linguistic data to be 5:1 to 10: 1.
In step S102, the incremental training of the constructed domain model provided in the embodiment of the present invention includes:
and judging the condition of the domain linguistic data, and training the constructed domain model by utilizing the linguistic data in the domain based on a judgment result.
The domain model constructed by utilizing the corpus training in the domain based on the judgment result provided by the embodiment of the invention comprises the following steps:
if the field corpus is more and the quality is better: based on the original general model, the field linguistic data is used for carrying out incremental training of the field model;
if the domain corpus is less or the quality is lower: mixing the domain linguistic data and the general linguistic data, and performing upsampling to obtain a ratio of the general linguistic data to the domain linguistic data of 5: 1; incremental training is performed based on the generic model.
As shown in fig. 3, the machine translation domain adaptive system provided by the embodiment of the present invention includes:
the corpus augmentation module 1 is used for producing the corpus in the pseudo-parallel field by using a semi-supervised method and augmenting the corpus;
the model building module 2 is used for building a domain model;
the training module 3 is used for carrying out incremental or full training of the domain model based on the quantity and quality of different domain corpora;
and the translation module 4 is used for performing the machine translation of the domain self-adaption by utilizing the trained domain model.
The technical effects of the present invention will be further described with reference to specific embodiments.
Example 1:
the invention provides a neural network-based field machine translation self-adaption method and system. The whole process is shown in FIG. 1.
1. Aiming at the problem of the amount of the linguistic data, the invention uses a semi-supervised method to produce the linguistic data in the pseudo-parallel field:
because the linguistic data in the bilingual field are few, especially in certain small languages, and the quality is difficult to guarantee, the quantity and the quality of the linguistic data can be better guaranteed by collecting the monolingual in the field. And then, translating by using a machine translation model in the opposite direction, wherein the obtained translated text and the original text form a field pseudo-parallel corpus.
2. Regarding how to quickly build a domain model:
a) gross training
According to the quantity and quality of the domain corpora, the domain corpora are subjected to upsampling, so that the ratio of the general corpora to the quantity of the domain corpora is about 5:1 to 10: 1. The higher the field corpus quality is, the larger the occupied proportion is. The basic steps are as follows:
i. and training until stopping by taking the universal test set as a development set under the condition of not changing the model structure.
And ii, training the model by using the same training set by taking the field test set as a development set until stopping.
b) Incremental training
Under limited time, it is desirable to obtain a domain model quickly, and incremental training can be performed. Incremental training is based on the original model (generally, the universal model), and the model is continuously trained by using the linguistic data in the field.
i. For more domain corpora and better quality: based on the original general model, the domain corpora are used for incremental training.
Less or lower quality for domain corpora: and mixing the domain linguistic data and the general linguistic data, and enabling the ratio of the general linguistic data to the domain linguistic data to be about 5:1 through upsampling. Incremental training is then performed based on the generic model.
The training method of the field machine turnover can select a proper training mode under the condition of the quantity and the quality of the linguistic data in different fields. Under the condition of less field language material quantity, the field language material can be rapidly expanded through a semi-supervision mode. During training, the field model can be provided with better quality through full training. And in a short time, the domain model training needs to be performed rapidly: under the condition that the quality and the quantity of the linguistic data are both good, the incremental training can be directly carried out by using the domain linguistic data; under the condition of poor corpus quality or small corpus quantity, the field and the general corpus can be mixed, the specific gravity of the field corpus is improved through upsampling, and then incremental training is carried out.
Example 2:
a field model in the english to mid direction is trained.
1. Language material in semi-supervised augmentation field:
and (3) collecting monolingues in the Chinese field, translating the Chinese to English by using a Chinese-to-English universal machine translation model after cleaning, and cleaning the obtained translation to form pseudo bilingual corpus in the English-to-Chinese field.
2. Training a field model:
a) if time allows, then full training is performed
According to the quantity and quality of the domain corpora, the domain corpora are subjected to upsampling, so that the ratio of the general corpora to the quantity of the domain corpora is about 5:1 to 10: 1. The higher the field corpus quality is, the larger the occupied proportion is. The basic steps are as follows:
i. and training until stopping by taking the universal test set as a development set under the condition of not changing the model structure.
And ii, training the model by using the same training set by taking the field test set as a development set until stopping.
b) If the time is short, incremental training is performed
Under limited time, it is desirable to obtain a domain model quickly, and incremental training can be performed. Incremental training is based on the original model (generally, the universal model), and the model is continuously trained by using the linguistic data in the field.
i. For more domain corpora and better quality: based on the original general model, the domain corpora are used for incremental training.
Less or lower quality for domain corpora: and mixing the domain linguistic data and the general linguistic data, and enabling the ratio of the general linguistic data to the domain linguistic data to be about 5:1 through upsampling. Incremental training is then performed based on the generic model.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A self-adaptive method in the machine translation field is characterized in that the self-adaptive method in the machine translation field is used for carrying out adaptive training of a field model based on the quantity and quality of different field linguistic data through semi-supervised augmentation linguistic data, and carrying out machine translation by using the trained field model;
the self-adaptive method of the machine translation field comprises the following steps:
step one, producing a pseudo-parallel field corpus by using a semi-supervised method, and augmenting the corpus;
constructing a domain model, and carrying out full-scale training on the constructed domain model; if the time is not sufficient, performing incremental training on the constructed domain model;
and step three, performing the machine translation of the domain self-adaption by using the trained domain model.
2. The machine translation domain adaptive method of claim 1, wherein said producing pseudo parallel domain corpus using semi-supervised method comprises: and (3) collecting monolingues in the field, translating by using a reverse machine translation model, and forming a pseudo-parallel corpus in the field by using a translated text obtained by translation and an original text.
3. The machine translation domain adaptive method of claim 1, wherein in step two, the training the constructed domain model comprises:
(1) carrying out training set pretreatment; training the constructed domain model by taking the universal test set as a development set;
(2) and (4) taking the field test set as a development set, and performing secondary training on the constructed field model by using the same training set.
4. The machine translation domain adaptive method of claim 3, wherein in step (1), the performing training set preprocessing comprises: and upsampling the domain linguistic data to enable the ratio of the number of the general linguistic data to the number of the domain linguistic data to be 5:1 to 10: 1.
5. The machine translation domain adaptive method of claim 1, wherein in step two, the incrementally training the constructed domain model comprises: and judging the condition of the domain linguistic data, and training the constructed domain model by utilizing the linguistic data in the domain based on a judgment result.
6. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of: performing adaptive training of a domain model based on the quantity and quality of different domain corpora through semi-supervised augmentation corpora, and performing machine translation by using the trained domain model; the method specifically comprises the following steps:
step one, producing a pseudo-parallel field corpus by using a semi-supervised method, and augmenting the corpus;
constructing a domain model, and carrying out full-scale training on the constructed domain model; if the time is not sufficient, performing incremental training on the constructed domain model;
and step three, performing the machine translation of the domain self-adaption by using the trained domain model.
7. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of: performing adaptive training of a domain model based on the quantity and quality of different domain corpora through semi-supervised augmentation corpora, and performing machine translation by using the trained domain model; the method specifically comprises the following steps:
step one, producing a pseudo-parallel field corpus by using a semi-supervised method, and augmenting the corpus;
constructing a domain model, and carrying out full-scale training on the constructed domain model; if the time is not sufficient, performing incremental training on the constructed domain model;
and step three, performing the machine translation of the domain self-adaption by using the trained domain model.
8. A machine translation domain adaptive system for implementing the machine translation domain adaptive method according to any one of claims 1 to 5, wherein the machine translation domain adaptive system comprises:
the corpus augmentation module is used for producing the corpus in the pseudo-parallel field by using a semi-supervised method and augmenting the corpus;
the model building module is used for building a domain model;
the training module is used for carrying out incremental or full training of the domain model based on the quantity and quality of different domain linguistic data;
and the translation module is used for performing the machine translation of the domain self-adaption by utilizing the trained domain model.
CN202110375078.1A 2021-04-08 2021-04-08 Self-adaptive method, system, medium and computer equipment in machine translation field Active CN112966530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110375078.1A CN112966530B (en) 2021-04-08 2021-04-08 Self-adaptive method, system, medium and computer equipment in machine translation field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110375078.1A CN112966530B (en) 2021-04-08 2021-04-08 Self-adaptive method, system, medium and computer equipment in machine translation field

Publications (2)

Publication Number Publication Date
CN112966530A true CN112966530A (en) 2021-06-15
CN112966530B CN112966530B (en) 2022-07-22

Family

ID=76281494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110375078.1A Active CN112966530B (en) 2021-04-08 2021-04-08 Self-adaptive method, system, medium and computer equipment in machine translation field

Country Status (1)

Country Link
CN (1) CN112966530B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038159A (en) * 2017-03-09 2017-08-11 清华大学 A kind of neural network machine interpretation method based on unsupervised domain-adaptive
CN109190768A (en) * 2018-08-09 2019-01-11 北京中关村科金技术有限公司 A kind of data enhancing corpus training method in neural network
CN110263349A (en) * 2019-03-08 2019-09-20 腾讯科技(深圳)有限公司 Corpus assessment models training method, device, storage medium and computer equipment
CN110728154A (en) * 2019-08-28 2020-01-24 云知声智能科技股份有限公司 Construction method of semi-supervised general neural machine translation model
CN110889295A (en) * 2019-09-12 2020-03-17 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora
CN111414770A (en) * 2020-02-24 2020-07-14 内蒙古工业大学 Semi-supervised Mongolian neural machine translation method based on collaborative training
CN111859995A (en) * 2020-06-16 2020-10-30 北京百度网讯科技有限公司 Training method and device of machine translation model, electronic equipment and storage medium
US10878201B1 (en) * 2017-07-27 2020-12-29 Lilt, Inc. Apparatus and method for an adaptive neural machine translation system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038159A (en) * 2017-03-09 2017-08-11 清华大学 A kind of neural network machine interpretation method based on unsupervised domain-adaptive
US10878201B1 (en) * 2017-07-27 2020-12-29 Lilt, Inc. Apparatus and method for an adaptive neural machine translation system
CN109190768A (en) * 2018-08-09 2019-01-11 北京中关村科金技术有限公司 A kind of data enhancing corpus training method in neural network
CN110263349A (en) * 2019-03-08 2019-09-20 腾讯科技(深圳)有限公司 Corpus assessment models training method, device, storage medium and computer equipment
CN110728154A (en) * 2019-08-28 2020-01-24 云知声智能科技股份有限公司 Construction method of semi-supervised general neural machine translation model
CN110889295A (en) * 2019-09-12 2020-03-17 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora
CN111414770A (en) * 2020-02-24 2020-07-14 内蒙古工业大学 Semi-supervised Mongolian neural machine translation method based on collaborative training
CN111859995A (en) * 2020-06-16 2020-10-30 北京百度网讯科技有限公司 Training method and device of machine translation model, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112966530B (en) 2022-07-22

Similar Documents

Publication Publication Date Title
US10679148B2 (en) Implicit bridging of machine learning tasks
CN111079406B (en) Natural language processing model training method, task execution method, equipment and system
US20130185049A1 (en) Predicting Pronouns for Pro-Drop Style Languages for Natural Language Translation
Kenny Human and machine translation
WO2020242567A1 (en) Cross-lingual task training
CN116595999B (en) Machine translation model training method and device
CN116187282B (en) Training method of text review model, text review method and device
Rigau et al. Meaning: A roadmap to knowledge technologies
KR20220033652A (en) Method of building training data of machine translation
CN110287498B (en) Hierarchical translation method, device and storage medium
Steuer et al. On the linguistic and pedagogical quality of automatic question generation via neural machine translation
Mo Design and Implementation of an Interactive English Translation System Based on the Information‐Assisted Processing Function of the Internet of Things
Jiang et al. Chat with illustration
CN112966530B (en) Self-adaptive method, system, medium and computer equipment in machine translation field
Wang The development of translation technology in the era of big data
CN117111952A (en) Code complement method and device based on generation type artificial intelligence and medium
Zhu et al. Improving low-resource named entity recognition via label-aware data augmentation and curriculum denoising
CN115809658A (en) Parallel corpus generation method and device and unsupervised synonymy transcription method and device
Jooste et al. Philipp Koehn: Neural Machine Translation: Cambridge University Press, 30 Jun 2020, www. cambridge. org/9781108497329, DOI: 10.1017/9781108608480
CN116151347A (en) Training method and device for pre-training language model and electronic equipment
US12019990B2 (en) Representation learning method and device based on natural language and knowledge graph
Miao et al. Improved Quality Estimation of Machine Translation with Pre-trained Language Representation
US20210192364A1 (en) Representation learning method and device based on natural language and knowledge graph
Yang et al. Analysis of AI MT based on fuzzy algorithm
Cao et al. Design and Application of Corpus in Computational Linguistics based on Multimedia Virtual Technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant