CN109670190A - Translation model construction method and device - Google Patents

Translation model construction method and device Download PDF

Info

Publication number
CN109670190A
CN109670190A CN201811590009.7A CN201811590009A CN109670190A CN 109670190 A CN109670190 A CN 109670190A CN 201811590009 A CN201811590009 A CN 201811590009A CN 109670190 A CN109670190 A CN 109670190A
Authority
CN
China
Prior art keywords
translation
language
model
translation model
example corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811590009.7A
Other languages
Chinese (zh)
Other versions
CN109670190B (en
Inventor
朱晓宁
张睿卿
何中军
吴华
王海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811590009.7A priority Critical patent/CN109670190B/en
Publication of CN109670190A publication Critical patent/CN109670190A/en
Application granted granted Critical
Publication of CN109670190B publication Critical patent/CN109670190B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application proposes a kind of translation model construction method and device, wherein, when method includes: that the quantity of the translation word pair in the first positive example corpus is less than threshold value, according to each translation word pair in the first positive example corpus of acquisition, negative example corpus is generated at random, wherein, the translation word that first positive example corpus and negative illustrative phrase material are concentrated is to respectively including original language and corresponding object language, machine learning is carried out to the first positive example corpus and negative example corpus, to generate disaggregated model, utilize disaggregated model, preset translation model is subjected to beta pruning processing, to generate translation model corresponding with original language and object language.When less the method achieve the bilingual corpora in original language and object language, using the translation word of original language and object language to obtaining disaggregated model, it is filtered by disaggregated model to by referring to the obtained original language of language and the translation model of object language, the noise for greatly reducing translation model improves the translation quality of translation model.

Description

Translation model construction method and device
Technical field
This application involves machine translation mothod field more particularly to a kind of translation model construction methods and device.
Background technique
When constructing translation model, translation model usually is trained using large-scale bilingual corpora, to improve translation mould The translation quality of type.But for the language pair with rare foreign languages, it is difficult to obtain extensive bilingual corpora, if that utilizing Small-scale bilingual corpora trains translation model, and the quality of obtained translation model can be relatively low.
Summary of the invention
The application proposes a kind of translation model construction method and device, utilizes the training of small-scale bilingual corpora for solving Translation model, the lower problem of the translation quality of obtained translation model.
The application one side embodiment proposes a kind of translation model construction method, comprising:
When the quantity of translation word pair in the first positive example corpus is less than threshold value, according to the first positive example corpus of acquisition In each translation word pair, generate negative example corpus at random, wherein the translation that the first positive example corpus and negative illustrative phrase material are concentrated Word is to respectively including original language and corresponding object language;
Machine learning is carried out to the first positive example corpus and the negative example corpus, to generate disaggregated model;
Using the disaggregated model, preset translation model is subjected to beta pruning processing, to generate and the original language and mesh Poster says corresponding translation model;
Wherein, the preset translation model, to be turned over what is obtained after the first translation model and the fusion of the second translation model Translate model, the first translation model be obtained using the second positive example corpus training for including original language and reference language, second Translation model is to be obtained using the third positive example corpus training for including reference language and object language.
The translation model construction method of the embodiment of the present application passes through the quantity of the translation word pair in the first positive example corpus When less than threshold value, according to each translation word pair in the first positive example corpus of acquisition, negative example corpus is generated at random, wherein the The translation word that one positive example corpus and negative illustrative phrase material are concentrated is to original language and corresponding object language is respectively included, to the first positive example Corpus and negative example corpus carry out machine learning, to generate disaggregated model, using disaggregated model, by preset translation model into Row beta pruning processing, to generate corresponding with original language and object language translation model, wherein preset translation model, for by the The translation model obtained after one translation model and the fusion of the second translation model, the first translation model are using including original language and ginseng The second positive example corpus training of written comments on the work, etc of public of officials speech obtains, the second translation model be using including the of reference language and object language The training of three positive example corpus obtains.As a result, when the bilingual corpora of original language and object language is less, original language and mesh are utilized The translation word of poster speech is to disaggregated model is obtained, by disaggregated model to the obtained original language and target by referring to language The translation model of language is filtered, and greatly reduces the noise of translation model, improves the translation quality of translation model.
The application another aspect embodiment proposes a kind of translation model construction device, comprising:
First generation module, when the quantity for the translation word pair in the first positive example corpus is less than threshold value, according to obtaining Each translation word pair in the first positive example corpus taken, generates negative example corpus at random, wherein the first positive example corpus and The translation word that negative illustrative phrase material is concentrated is to respectively including original language and corresponding object language;
Second generation module, for carrying out machine learning to the first positive example corpus and the negative example corpus, with Generate disaggregated model;
Preset translation model is carried out beta pruning processing for utilizing the disaggregated model by third generation module, to generate Translation model corresponding with the original language and object language;
Wherein, the preset translation model, to be turned over what is obtained after the first translation model and the fusion of the second translation model Translate model, first translation model is the obtained using the second positive example corpus training for including original language and reference language One translation model, second translation model be using include reference language and object language third positive example corpus it is trained It arrives.
The translation model construction device of the embodiment of the present application passes through the quantity of the translation word pair in the first positive example corpus When less than threshold value, according to each translation word pair in the first positive example corpus of acquisition, negative example corpus is generated at random, wherein the The translation word that one positive example corpus and negative illustrative phrase material are concentrated is to original language and corresponding object language is respectively included, to the first positive example Corpus and negative example corpus carry out machine learning, to generate disaggregated model, using disaggregated model, by preset translation model into Row beta pruning processing, to generate corresponding with original language and object language translation model, wherein preset translation model, for by the The translation model obtained after one translation model and the fusion of the second translation model, the first translation model are using including original language and ginseng The second positive example corpus training of written comments on the work, etc of public of officials speech obtains, the second translation model be using including the of reference language and object language The training of three positive example corpus obtains.As a result, when the bilingual corpora of original language and object language is less, original language and mesh are utilized The translation word of poster speech is to disaggregated model is obtained, by disaggregated model to the obtained original language and target by referring to language The translation model of language is filtered, and greatly reduces the noise of translation model, improves the translation quality of translation model.
The application another aspect embodiment proposes a kind of computer equipment, including processor and memory;
Wherein, the processor run by reading the executable program code stored in the memory with it is described can The corresponding program of program code is executed, for realizing the translation model construction method as described in above-mentioned one side embodiment.
The application another aspect embodiment proposes a kind of non-transitorycomputer readable storage medium, is stored thereon with meter Calculation machine program realizes the translation model construction method as described in above-mentioned one side embodiment when the program is executed by processor.
The additional aspect of the application and advantage will be set forth in part in the description, and will partially become from the following description It obtains obviously, or recognized by the practice of the application.
Detailed description of the invention
The application is above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is a kind of flow diagram of translation model construction method provided by the embodiments of the present application;
Fig. 2 is the flow diagram of another translation model construction method provided by the embodiments of the present application;
Fig. 3 is the flow diagram of another translation model construction method provided by the embodiments of the present application;
Fig. 4 is the flow diagram of another translation model construction method provided by the embodiments of the present application;
Fig. 5 is a kind of structural schematic diagram of translation model construction device provided by the embodiments of the present application;
Fig. 6 shows the block diagram for being suitable for the exemplary computer device for being used to realize the application embodiment.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the application, and should not be understood as the limitation to the application.
Below with reference to the accompanying drawings the translation model construction method and device of the embodiment of the present application are described.
The embodiment of the present application directly utilizes bilingual language for rare foreign languages language pair less for corpus in the related technology The lower problem of the mass ratio for the translation model that material training obtains, proposes a kind of translation model construction method.
The translation model construction method that the embodiment of the present application proposes, it is less in the bilingual corpora of original language and object language When, using the translation word of original language and object language to disaggregated model is obtained, by disaggregated model to by referring to language The translation model of obtained original language and object language is filtered, and greatly reduces the noise of translation model, improves translation The translation quality of model.
Fig. 1 is a kind of flow diagram of translation model construction method provided by the embodiments of the present application.
The translation model construction method of the embodiment of the present application, can be by translation model construction device provided by the embodiments of the present application It executes, to realize using the translation word of original language and object language to obtained disaggregated model, is obtained to by referring to language To original language and the translation model of object language be filtered, to improve the translation quality of translation model.
As shown in Figure 1, the translation model construction method includes:
Step 101, when the quantity of the translation word pair in the first positive example corpus is less than threshold value, just according to the first of acquisition Each translation word pair that illustrative phrase material is concentrated, generates negative example corpus at random.
When constructing translation model, translation model is usually obtained by training using large-scale bilingual corpora, still For there are the translation model of rare foreign languages, such as Chinese and Japanese, Sino-Japan corpus is less, and it is trained to lead to too small amount of Sino-Japan corpus To Sino-Japan translation model translation quality to relatively low.
In the application, when the corpus of original language and object language is less, can be obtained by referring to language original language with The preset translation model of object language, and translation model is filtered using the corpus of original language and object language, it obtains The translation model of original language and object language.
Specifically, when the quantity of the translation word pair in the first positive example corpus is less than threshold value, it can be according to the first positive example Each translation word pair in corpus, generates negative example corpus at random.Wherein, what the first positive example corpus and negative illustrative phrase material were concentrated turns over Translation word is to including original language and corresponding object language.That is, the translation that first case corpus and negative illustrative phrase material are concentrated Word is to being original language and target language words pair.
In the present embodiment, translation word in positive example corpus is to being correct mutual translation word pair, and what negative illustrative phrase material was concentrated turns over Translation word is not to being mutual translation word pair.
According to each translation word pair in the first positive example corpus, when generating negative example corpus at random, can be exchanged at random The original language of translation word centering in one positive example corpus, or object language is exchanged at random, generate negative example corpus.
For example, original language is Chinese, object language is Japanese, and the translation word in the first positive example corpus is to including: { (river Bank: river is former), (bank: bank), (deposit: pre- け Ru }, then exchanging position of the translation word to Chinese and japanese " river is former " with " bank " It sets, translation word pair: (riverbank: bank) and (bank: river former) can be obtained, the two translation words are to can be used as negative illustrative phrase material concentration Translate word pair.
Step 102, machine learning is carried out to the first positive example corpus and negative example corpus, to generate disaggregated model.
In the present embodiment, the translation word concentrated using the first positive example corpus and negative illustrative phrase material to carrying out machine learning, Generate disaggregated model.Disaggregated model be used to judge the translation word of original language and object language to whether being legal word pair, i.e., whether For the word pair of intertranslation relationship.
It is close using support vector cassification algorithm, decision Tree algorithms, bayesian algorithm, K- when carrying out machine learning The sorting algorithms such as adjacent algorithm carry out classification based training, obtain disaggregated model.
Step 103, using disaggregated model, preset translation model is subjected to beta pruning processing, to generate and original language and mesh Poster says corresponding translation model.
In the present embodiment, preset translation model is the translation of the original language and object language that obtain by referring to language Model.Wherein, reference language can be regarded as the bridge between original language and object language.
Specifically, preset translation model is the first translation model by original language and reference language, with reference language with What the second translation model of object language merged.Wherein, the first translation model is using including original language and reference language The training of the second positive example corpus obtain, the second translation model is the third positive example corpus using reference language and object language What training was got.
It is understood that the translation word for including in the second positive example corpus is to the translation word for being original language and reference language Right, the translation word for including in third positive example corpus is to the translation word pair for being reference language and object language.
For constructing Sino-Japan translation model, if the quantity of the Sino-Japan translation word obtained to being less than threshold value, and can obtain Large-scale Chinese-English translation word to and Britain and Japan translate word pair, then training can will be obtained China and Britain and be turned over using Chinese-English translation word Model, and the Britain and Japan's translation model obtained using Britain and Japan's translation word to training are translated, is merged to obtain preset Sino-Japan translation mould Type.
Since reference language may have ambiguity, for example, Chinese " deposit " and " bank " can all be translated as English " bank ", and " bank " can be translated as Japanese " river is former ", " bank ", " pre- け Ru ".Therefore, Chinese-English translation model is turned over Sino-Japan When translating model and being merged, there can be noise.For example, fusion obtains " bank ": " bank ": " river is former ", i.e., Sino-Japan translation word pair For " bank ": " river is former ", and be that intertranslation relationship is not present between Chinese " bank " and Japanese " river is former ".
Although the quantity of the translation word pair of original language and object language is less than threshold value, intertranslation matter with higher Amount.In order to improve the translation quality of translation model, in the present embodiment, using the positive example corpus according to original language and object language The disaggregated model that collection and the training of negative illustrative phrase material are got, carries out beta pruning processing to preset translation model, to filter preset turn over The noise in mould is translated, original language and the corresponding translation model of object language are generated.For example, by disaggregated model, it can be to preset Translation word in bilingual database in translation model there is no intertranslation relationship is to filtering out.
Due to having filtered the noise in preset translation model using disaggregated model, so as to improve original language and target The translation quality of the translation model of language.
Machine learning is carried out for the negative example corpus generated above by the first positive example corpus and at random, can be used In determining that the translation word of original language and object language is the probability of legal word pair, so according to probability to preset translation model into Row beta pruning processing.It is illustrated below with reference to Fig. 2, Fig. 2 is another translation model construction method provided by the embodiments of the present application Flow diagram.
As shown in Fig. 2, above-mentioned utilize disaggregated model, preset translation model is subjected to beta pruning processing, comprising:
Step 201, by each translation word in the bilingual database in default translation model to inputting disaggregated model respectively In, with each translation word of determination to the probability for legal word pair.
It include the translation word of original language and object language in the present embodiment, in the bilingual database in preset translation model It is right.
It, can be by each pair of translation word in bilingual database to being input to when carrying out beta pruning processing to preset translation model In disaggregated model, each pair of translation word is obtained to the probability for legal word pair.Wherein, legal word is to referring to that there are intertranslation relationships Translate word pair.
Step 202, beta pruning processing is carried out to preset translation model according to the legal word of acquisition.
In the present embodiment, each translation word is compared the probability for legal word pair with preset threshold probability, it will Greater than threshold probability translation word as legal word to retaining, by other translation words in bilingual database to deletion, thus It completes the beta pruning to preset translation model to handle, generates translation model corresponding with original language and object language.
In the embodiment of the present application, each translation word in bilingual database is determined in preset translation model by disaggregated model To the probability for legal word pair, cut according to the legal word pair of determine the probability, and according to legal word to preset translation model Branch processing improves the translation quality of translation model so that the translation model obtained to fusion is filtered.
In one embodiment of the application, machine learning can be carried out by the feature of the translation word pair to extraction and be divided Class model, and disaggregated model is utilized, the feature of the translation word pair in the bilingual database in preset translation model is carried out Identification carries out beta pruning processing to preset translation model.It is illustrated below with reference to Fig. 3, Fig. 3 provides for the embodiment of the present application Another translation model construction method flow diagram.
As shown in figure 3, the translation model construction method includes:
Step 301, when the quantity of the translation word pair in the first positive example corpus is less than threshold value, just according to the first of acquisition Each translation word pair that illustrative phrase material is concentrated, generates negative example corpus at random.
In the present embodiment, step 301 is similar with above-mentioned steps 101, therefore details are not described herein.
Step 302, each translation word concentrated to the first positive example corpus and negative illustrative phrase material is to dissection process is carried out, with true The feature set of fixed each translation word pair.
In the present embodiment, turning in feature differentiation positive example corpus can be passed through by extracting each feature for translating word pair Translation word concentrates translation word pair to negative illustrative phrase material.
Machine learning is carried out in the feature set of each translation word pair to the first positive example corpus and negative illustrative phrase material concentration, with Before generating disaggregated model, first each translation word can be concentrated to analyzing for the first positive example corpus and negative illustrative phrase material, obtained To the feature of each translation word pair, the collection of feature is combined into the feature set of each translation word pair.Wherein, feature set includes: original language Phrase length, object language phrase length, translation word when translate word to length to one kind of the features such as intertranslation probability value or more Kind.
Step 303, the feature set for each translation word pair concentrated to the first positive example corpus and negative illustrative phrase material carries out machine Study, to generate disaggregated model.
In the present embodiment, the spy of each translation word pair is concentrated to the first positive example corpus and negative illustrative phrase material using sorting algorithm Collection is trained, and generates disaggregated model.
Step 304, to each translation word in the bilingual database in preset translation model to carry out dissection process, with Determine the feature set of each translation word pair.
In the present embodiment, each translation word is to carrying out at parsing in the bilingual database in preset translation model Reason determines that the feature of each translation word pair, the set of feature are characterized collection.
Step 305, by utilizing disaggregated model to each translation word in the bilingual database in preset translation model Pair feature set identify, by preset translation model carry out beta pruning processing, with generate it is corresponding with original language and object language Translation model.
It, can be defeated by the feature set of each translation word pair in the bilingual database in preset translation model in the present embodiment Enter disaggregated model, disaggregated model exports each translation word to the probability for legal word pair, can determine bilingual data according to probability In library it is the translation word pair of legal word pair, and retains legal word pair, then can deletes in bilingual database except legal word is external Other translation words pair generate corresponding with original language and object language to complete to carry out beta pruning processing to preset translation model Translation model.
In one embodiment of the application, each translation word pair in the first positive example corpus according to acquisition, at random Before generating negative example corpus, the first positive example corpus of original language and object language can be first obtained from corpus, then again Negative example corpus is generated according to the first positive example corpus.It wherein, may include a variety of bilingual phrases pair in corpus, for example including in English phrase is to, Britain and Japan's phrase to, Sino-Japan phrase to, a variety of corpus of Sino-Russian phrase equity.
Specifically, translation model construction device can obtain translation model when user initiates translation model building request Building request, wherein include original language type and target language type in building request.For example, including source language in building request Type Chinese and object language Japanese are sayed, then the translation model to be constructed is Sino-Japan translation model.
Then, according to original language type and target language type, original language type and target can be obtained from corpus The phrase pair of language form extracts original language word and target language words from source language phrase and object language phrase later, The translation word pair comprising original language and object language is obtained, multiple translation words are to the first positive example corpus of composition.Obtaining first It, can be according to the translation word in the first positive example corpus to generating negative example corpus at random after positive example corpus.
In one embodiment of the application, disaggregated model is being utilized, beta pruning is carried out to preset translation model and handles it Before, it can first determine reference language, obtain preset translation model by reference language.It is illustrated below with reference to Fig. 4, Fig. 4 is The flow diagram of another translation model construction method provided by the embodiments of the present application.
As shown in figure 4, utilizing disaggregated model, before preset translation model is carried out beta pruning processing, the translation model Construction method further include:
Step 401, according to every kind of first kind translates the quantity of word pair in corpus and corresponding second class translates word pair Quantity determines object reference language.
Since the quantity of the translation word pair of original language and object language is less than threshold value, directly with including original language and target language The translation word of speech obtains the translation model of original language and object language to training, and the translation quality of the translation model is to relatively low.
In the embodiment of the present application, the translation model of original language and object language can be constructed by means of intermediate language type.? When being determined as the object reference language of intermediate language type, first kind translation word pair and corresponding the are obtained from corpus Two classes translate word pair, and the quantity of word pair and the number of corresponding second class translation word pair are translated according to every kind of first kind in corpus Amount, determines object reference language.
It is understood that the method that first kind translation word pair and the second class translation word pair are obtained from corpus, with It is similar from the method obtained in corpus in the first positive example corpus, therefore details are not described herein.
Wherein, the reference language that first kind translation word pair and the translation word centering of corresponding second class include is identical, the first kind Word is translated to comprising original language and corresponding reference language, the second class translates word centering packet reference language and corresponding target language Speech.For example, the first kind translates word to being Chinese-English translation word pair, corresponding second class translation word is to for Britain and Japan's translation word pair, first Class translates word to being Sino-Russian translation word pair, and corresponding second class translation word is reciprocity to word is translated for Russia day.
It, can be by the quantity and corresponding second class of every kind of first kind translation word pair in order to improve the translation quality of translation model The quantity of translation word pair is compared with preset quantity, and the quantity of first kind translation word pair and corresponding second class are translated word pair Quantity be more than preset quantity the first kind translation word pair and corresponding second class translation word centering include reference language make For object reference language.
It should be noted that if turning over there are many original language and reference language and corresponding reference language and object language Translation word is to quantitative requirement is all met, then the quantity of first kind translation word pair and the number of corresponding second class translation word pair can be chosen The reference language that the maximum first kind translation word pair of the sum of amount and the translation word centering of corresponding second class include is as object reference Language.
For example, the quantity of Chinese-English translation word pair and the quantity of corresponding Britain and Japan translation word pair are more than present count in corpus Amount, and the quantity of Sino-Russian translation word pair and the quantity of corresponding Russia day translation word pair have also exceeded preset quantity, if Chinese-English translation The sum of the quantity of the quantity of word pair and corresponding Britain and Japan translation word pair, is turned over greater than the quantity of Sino-Russian translation word pair and corresponding Russia day The sum of quantity of translation word pair, then using Chinese-English translation word pair and Britain and Japan translation word to comprising reference language English as object reference Language.
Step 402, the second positive example corpus training of composition is obtained using the first kind translation word comprising reference language First translation model, and the is obtained to the training of the third positive example corpus of composition using the second class translation word comprising reference language Two translation models.
In the present embodiment, the first kind translation word comprising object reference language carries out the second positive example corpus of composition Training obtains the first translation model, and to the second class translation word comprising object reference language to the third positive example corpus of composition Collection is trained to obtain the second translation model.Wherein, the first translation model is the translation model of original language and object reference language, Second translation model is the translation model of object reference language and object language.
In the present embodiment, translate word pair using large-scale original language and reference language, and large-scale reference language with The translation word of object language obtains the first translation model and the second translation model, translation quality with higher to training.
After obtaining the first translation model and the second translation model, the first translation model and the second translation model are melted Conjunction obtains the preset translation model of original language and object language.It, can will be bilingual in the first translation model when being merged Translation word pair in database is obtained with the translation word in the double data library said in the second translation model to corresponding combination is carried out The bilingual database of preset translation model.
For example, table 1 is the translation word pair in the bilingual database in Chinese-English translation model, table 2 is in Britain and Japan's translation model Bilingual database in translation word pair, table 3, which is table 1 and the translation word in table 2, translates word pair to merging to obtain.
Table 1
Chinese English
Riverbank riverside
Riverbank bank
Bank bank
Deposit bank
Deposit deposit
Table 2
English Japanese
riverside River is former
bank River is former
bank Bank
bank Pre- け Ru
deposit Pre- け Ru
Table 3
Chinese (English) Japanese
Riverbank (riverside) River is former
Riverbank (bank) River is former
Riverbank (bank) Bank
Riverbank (bank) Pre- け Ru
Bank (bank) River is former
Bank (bank) Bank
Bank (bank) Pre- け Ru
Deposit (bank) River is former
Deposit (bank) Bank
Deposit (bank) Pre- け Ru
Deposit (deposit) Pre- け Ru
As can be seen from Table 3, since English " bank " there are ambiguity, lead to the Sino-Japan translation word centering obtained after fusion Comprising there is no the translation words of intertranslation relationship to such as (riverbank: bank), (riverbank: pre- け Ru), (bank: river is former), (bank: pre- け Ru), (deposit: river former), (deposit: bank).
It therefore, can be to pre- after merging to obtain preset translation model according to the first translation model and the second translation model If translation model carry out beta pruning processing, filter out illegal translation word pair, that is, filter out that there is no the translations of intertranslation relationship Word pair, to improve the translation quality of the translation model constructed by referring to language.
In order to realize above-described embodiment, the embodiment of the present application also proposes a kind of translation model construction device.Fig. 5 is the application A kind of structural schematic diagram for translation model construction device that embodiment provides.
As shown in figure 5, the translation model construction device includes: the first generation module 510, the second generation module 520, third Generation module 530.
First generation module 510, when the quantity for the translation word pair in the first positive example corpus is less than threshold value, according to Each translation word pair in the first positive example corpus obtained, generates negative example corpus at random, wherein the first positive example corpus and negative The translation word that illustrative phrase material is concentrated is to respectively including original language and corresponding object language;
Second generation module 520 is divided for carrying out machine learning to the first positive example corpus and negative example corpus with generating Class model;
Preset translation model is carried out beta pruning processing for utilizing disaggregated model by third generation module 530, to generate Translation model corresponding with original language and object language;
Wherein, preset translation model, for the translation mould that will be obtained after the first translation model and the fusion of the second translation model Type, the first translation model are that obtain, the second translation is trained using the second positive example corpus for including original language and reference language Model is to be obtained using the third positive example corpus training for including reference language and object language.
In a kind of possible implementation of the embodiment of the present application, above-mentioned third generation module 530 is specifically used for:
By each translation word in the bilingual database in default translation model to inputting in disaggregated model respectively, with determination Each translation word is to the probability for legal word pair;
Beta pruning processing is carried out to preset translation model according to the legal word of acquisition.
In a kind of possible implementation of the embodiment of the present application, which may also include that
First determining module, for each translation word to the first positive example corpus and negative illustrative phrase material concentration to parsing Processing, with the feature set of each translation word pair of determination;
First determining module is also used to each translation word in the bilingual database in preset translation model to progress Dissection process, with the feature set of each translation word pair of determination.
It include following characteristics in the feature set of each translation word pair in a kind of possible implementation of the embodiment of the present application At least one of: source language phrase length, object language phrase length, translation word are to length ratio, translation word to intertranslation probability The translation probability of value and the translation each word of word centering.
In a kind of possible implementation of the embodiment of the present application, above-mentioned first generation module 510 is specifically used for: by first The object language of each translation word centering of positive example corpus is exchanged at random, generates negative example corpus.
In a kind of possible implementation of the embodiment of the present application, the device further include:
First obtains module, includes original language type and target in building request for obtaining translation model building request Language form;
Second obtains module, for obtaining the first positive example from corpus according to original language type and target language type Corpus.
In a kind of possible implementation of the embodiment of the present application, which may also include that
Second determining module, for according to every kind of first kind translates the quantity of word pair in corpus and corresponding second class is turned over The quantity of translation word pair determines object reference language;
Wherein, the reference language that first kind translation word pair and the translation word centering of corresponding second class include is identical, the first kind Word is translated to comprising original language and corresponding reference language, the second class translates word centering packet reference language and corresponding target language Speech;
Training module, for the second positive example corpus training using the first kind translation word comprising reference language to composition The first translation model is obtained, and trained to the third positive example corpus of composition using the second class translation word comprising reference language To the second translation model.
It should be noted that the aforementioned explanation to translation model construction method embodiment, is also applied for the embodiment Translation model construction device, therefore details are not described herein.
The translation model construction device of the embodiment of the present application passes through the quantity of the translation word pair in the first positive example corpus When less than threshold value, according to each translation word pair in the first positive example corpus of acquisition, negative example corpus is generated at random, wherein the The translation word that one positive example corpus and negative illustrative phrase material are concentrated is to original language and corresponding object language is respectively included, to the first positive example Corpus and negative example corpus carry out machine learning, to generate disaggregated model, using disaggregated model, by preset translation model into Row beta pruning processing, to generate corresponding with original language and object language translation model, wherein preset translation model, for by the The translation model obtained after one translation model and the fusion of the second translation model, the first translation model are using including original language and ginseng The second positive example corpus training of written comments on the work, etc of public of officials speech obtains, the second translation model be using including the of reference language and object language The training of three positive example corpus obtains.As a result, when the bilingual corpora of original language and object language is less, original language and mesh are utilized The translation word of poster speech is to disaggregated model is obtained, by disaggregated model to the obtained original language and target by referring to language The translation model of language is filtered, and greatly reduces the noise of translation model, improves the translation quality of translation model.
In order to realize above-described embodiment, the embodiment of the present application also proposes a kind of computer equipment, including processor and storage Device;
Wherein, processor is run and executable program code by reading in memory the executable program code that stores Corresponding program, for realizing the translation model construction method as described in above-described embodiment.
Fig. 6 shows the block diagram for being suitable for the exemplary computer device for being used to realize the application embodiment.What Fig. 6 was shown Computer equipment 12 is only an example, should not function to the embodiment of the present application and use scope bring any restrictions.
As shown in fig. 6, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with Including but not limited to: one or more processor or processing unit 16, system storage 28 connect different system components The bus 18 of (including system storage 28 and processing unit 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture;Hereinafter referred to as: ISA) bus, microchannel architecture (Micro Channel Architecture;Below Referred to as: MAC) bus, enhanced isa bus, Video Electronics Standards Association (Video Electronics Standards Association;Hereinafter referred to as: VESA) local bus and peripheral component interconnection (Peripheral Component Interconnection;Hereinafter referred to as: PCI) bus.
Computer equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by The usable medium that computer equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (Random Access Memory;Hereinafter referred to as: RAM) 30 and/or cache memory 32.Computer equipment 12 can be with It further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, Storage system 34 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 6 do not show, commonly referred to as " hard drive Device ").Although being not shown in Fig. 6, the disk for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided and driven Dynamic device, and to removable anonvolatile optical disk (such as: compact disc read-only memory (Compact Disc Read Only Memory;Hereinafter referred to as: CD-ROM), digital multi CD-ROM (Digital Video Disc Read Only Memory;Hereinafter referred to as: DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 18.Memory 28 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28 In, such program module 42 include but is not limited to operating system, one or more application program, other program modules and It may include the realization of network environment in program data, each of these examples or certain combination.Program module 42 is usual Execute the function and/or method in embodiments described herein.
Computer equipment 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 Deng) communication, can also be enabled a user to one or more equipment interact with the computer equipment 12 communicate, and/or with make The computer equipment 12 any equipment (such as network interface card, the modulatedemodulate that can be communicated with one or more of the other calculating equipment Adjust device etc.) communication.This communication can be carried out by input/output (I/O) interface 22.Also, computer equipment 12 may be used also To pass through network adapter 20 and one or more network (such as local area network (Local Area Network;Hereinafter referred to as: LAN), wide area network (Wide Area Network;Hereinafter referred to as: WAN) and/or public network, for example, internet) communication.Such as figure Shown, network adapter 20 is communicated by bus 18 with other modules of computer equipment 12.It should be understood that although not showing in figure Out, other hardware and/or software module can be used in conjunction with computer equipment 12, including but not limited to: microcode, device drives Device, redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and Data processing, such as realize the method referred in previous embodiment.
In order to realize above-described embodiment, the embodiment of the present application also proposes a kind of non-transitorycomputer readable storage medium, It is stored thereon with computer program, the translation model building side as described in above-described embodiment is realized when which is executed by processor Method.
In the description of this specification, term " first ", " second " are used for description purposes only, and should not be understood as instruction or It implies relative importance or implicitly indicates the quantity of indicated technical characteristic.The spy of " first ", " second " is defined as a result, Sign can explicitly or implicitly include at least one of the features.In the description of the present application, the meaning of " plurality " is at least two It is a, such as two, three etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, can integrate in a processing module in each functional unit in each embodiment of the application It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above Embodiments herein is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the application System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of application Type.

Claims (10)

1. a kind of translation model construction method characterized by comprising
When the quantity of translation word pair in the first positive example corpus is less than threshold value, according in the first positive example corpus of acquisition Each translation word pair, generates negative example corpus at random, wherein the translation word pair of the first positive example corpus and negative illustrative phrase material concentration Respectively include original language and corresponding object language;
Machine learning is carried out to the first positive example corpus and the negative example corpus, to generate disaggregated model;
Using the disaggregated model, preset translation model is subjected to beta pruning processing, to generate and the original language and target language Say corresponding translation model;
Wherein, the preset translation model, for the translation mould that will be obtained after the first translation model and the fusion of the second translation model Type, first translation model be obtained using the second positive example corpus training for including the original language and reference language, Second translation model is to be obtained using the third positive example corpus training for including reference language and the object language.
2. the method as described in claim 1, which is characterized in that it is described to utilize the disaggregated model, by preset translation model Carry out beta pruning processing, comprising:
By each translation word in the bilingual database in the default translation model to being inputted in the disaggregated model respectively, with Determine each translation word to the probability for legal word pair;
Beta pruning processing is carried out to the preset translation model according to the legal word of acquisition.
3. the method as described in claim 1, which is characterized in that described to the first positive example corpus and the negative illustrative phrase material Collection carries out before machine learning, further includes:
The each translation word concentrated to the first positive example corpus and the negative illustrative phrase material is every to determine to carrying out dissection process The feature set of a translation word pair;
It is described to utilize the disaggregated model, before the progress beta pruning processing of preset translation model, further includes:
To each translation word in the bilingual database in the preset translation model to dissection process is carried out, to determine each Translate the feature set of word pair.
4. method as claimed in claim 3, which is characterized in that include following characteristics in the feature set of each translation word pair At least one of: source language phrase length, object language phrase length, translation word when translate word to length to intertranslation probability Value.
5. the method as described in claim 1-4 is any, which is characterized in that in the first positive example corpus according to acquisition Each translation word pair, generates negative example corpus at random, comprising:
The object language of each translation word centering of the first positive example corpus is exchanged at random, generates the negative illustrative phrase material Collection.
6. the method as described in claim 1-4 is any, which is characterized in that in the first positive example corpus according to acquisition Each translation word pair, before generating negative example corpus at random, further includes:
Translation model building request is obtained, includes original language type and target language type in the building request;
According to the original language type and target language type, the first positive example corpus is obtained from corpus.
7. method as claimed in claim 6, which is characterized in that it is described to utilize the disaggregated model, by preset translation model Before progress beta pruning processing, further includes:
The quantity of word pair and the quantity of corresponding second class translation word pair are translated according to every kind of first kind in the corpus, are determined Object reference language;
Wherein, the reference language that the first kind translation word pair and the translation word centering of corresponding second class include is identical, the first kind Word is translated to comprising the original language and corresponding reference language, the second class translation word centering packet reference language and corresponding Object language;
Described first is obtained to the second positive example corpus training of composition using the first kind translation word comprising the reference language Translation model, and institute is obtained to the third positive example corpus training of composition using the second class translation word comprising the reference language State the second translation model.
8. a kind of translation model construction device characterized by comprising
First generation module, when the quantity for the translation word pair in the first positive example corpus is less than threshold value, according to acquisition Each translation word pair in first positive example corpus, generates negative example corpus at random, wherein the first positive example corpus and negative example Translation word in corpus is to respectively including original language and corresponding object language;
Second generation module, for carrying out machine learning to the first positive example corpus and the negative example corpus, to generate Disaggregated model;
Preset translation model is carried out beta pruning processing, with generation and institute for utilizing the disaggregated model by third generation module State original language and the corresponding translation model of object language;
Wherein, the preset translation model, for the translation mould that will be obtained after the first translation model and the fusion of the second translation model Type, first translation model be obtained using the second positive example corpus training for including the original language and reference language, Second translation model is to be obtained using the third positive example corpus training for including reference language and the object language.
9. a kind of computer equipment, which is characterized in that including processor and memory;
Wherein, the processor is run by reading the executable program code stored in the memory can be performed with described The corresponding program of program code, for realizing the translation model construction method as described in any in claim 1-7.
10. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program The translation model construction method as described in any in claim 1-7 is realized when being executed by processor.
CN201811590009.7A 2018-12-25 2018-12-25 Translation model construction method and device Active CN109670190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811590009.7A CN109670190B (en) 2018-12-25 2018-12-25 Translation model construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811590009.7A CN109670190B (en) 2018-12-25 2018-12-25 Translation model construction method and device

Publications (2)

Publication Number Publication Date
CN109670190A true CN109670190A (en) 2019-04-23
CN109670190B CN109670190B (en) 2023-05-16

Family

ID=66146043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811590009.7A Active CN109670190B (en) 2018-12-25 2018-12-25 Translation model construction method and device

Country Status (1)

Country Link
CN (1) CN109670190B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046677A (en) * 2019-12-09 2020-04-21 北京字节跳动网络技术有限公司 Method, device, equipment and storage medium for obtaining translation model
CN111259676A (en) * 2020-01-10 2020-06-09 苏州交驰人工智能研究院有限公司 Translation model training method and device, electronic equipment and storage medium
CN111898389A (en) * 2020-08-17 2020-11-06 腾讯科技(深圳)有限公司 Information determination method and device, computer equipment and storage medium
CN113139391A (en) * 2021-04-26 2021-07-20 北京有竹居网络技术有限公司 Translation model training method, device, equipment and storage medium
CN113591492A (en) * 2021-06-30 2021-11-02 北京百度网讯科技有限公司 Corpus generation method and device, electronic equipment and storage medium
CN113988089A (en) * 2021-10-18 2022-01-28 浙江香侬慧语科技有限责任公司 Machine translation method, device and medium based on K neighbor

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197826A1 (en) * 2011-01-28 2012-08-02 Fujitsu Limited Information matching apparatus, method of matching information, and computer readable storage medium having stored information matching program
CN102789451A (en) * 2011-05-16 2012-11-21 北京百度网讯科技有限公司 Individualized machine translation system, method and translation model training method
CN103123634A (en) * 2011-11-21 2013-05-29 北京百度网讯科技有限公司 Copyright resource identification method and copyright resource identification device
CN103544147A (en) * 2013-11-06 2014-01-29 北京百度网讯科技有限公司 Translation model training method and device
CN104505090A (en) * 2014-12-15 2015-04-08 北京国双科技有限公司 Method and device for voice recognizing sensitive words
CN104915337A (en) * 2015-06-18 2015-09-16 中国科学院自动化研究所 Translation text integrity evaluation method based on bilingual text structure information
CN105068997A (en) * 2015-07-15 2015-11-18 清华大学 Parallel corpus construction method and device
CN106202059A (en) * 2015-05-25 2016-12-07 松下电器(美国)知识产权公司 Machine translation method and machine translation apparatus
CN106294684A (en) * 2016-08-06 2017-01-04 上海高欣计算机***有限公司 The file classification method of term vector and terminal unit
CN107590237A (en) * 2017-09-11 2018-01-16 桂林电子科技大学 A kind of knowledge mapping based on dynamic translation principle represents learning method
CN107885760A (en) * 2016-12-21 2018-04-06 桂林电子科技大学 It is a kind of to represent learning method based on a variety of semantic knowledge mappings
CN108845994A (en) * 2018-06-07 2018-11-20 南京大学 Utilize the neural machine translation system of external information and the training method of translation system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197826A1 (en) * 2011-01-28 2012-08-02 Fujitsu Limited Information matching apparatus, method of matching information, and computer readable storage medium having stored information matching program
CN102789451A (en) * 2011-05-16 2012-11-21 北京百度网讯科技有限公司 Individualized machine translation system, method and translation model training method
CN103123634A (en) * 2011-11-21 2013-05-29 北京百度网讯科技有限公司 Copyright resource identification method and copyright resource identification device
CN103544147A (en) * 2013-11-06 2014-01-29 北京百度网讯科技有限公司 Translation model training method and device
CN104505090A (en) * 2014-12-15 2015-04-08 北京国双科技有限公司 Method and device for voice recognizing sensitive words
CN106202059A (en) * 2015-05-25 2016-12-07 松下电器(美国)知识产权公司 Machine translation method and machine translation apparatus
CN104915337A (en) * 2015-06-18 2015-09-16 中国科学院自动化研究所 Translation text integrity evaluation method based on bilingual text structure information
CN105068997A (en) * 2015-07-15 2015-11-18 清华大学 Parallel corpus construction method and device
CN106294684A (en) * 2016-08-06 2017-01-04 上海高欣计算机***有限公司 The file classification method of term vector and terminal unit
CN107885760A (en) * 2016-12-21 2018-04-06 桂林电子科技大学 It is a kind of to represent learning method based on a variety of semantic knowledge mappings
CN107590237A (en) * 2017-09-11 2018-01-16 桂林电子科技大学 A kind of knowledge mapping based on dynamic translation principle represents learning method
CN108845994A (en) * 2018-06-07 2018-11-20 南京大学 Utilize the neural machine translation system of external information and the training method of translation system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONG ZHANG.ETC: ""Negative expression translation in Japanese and Chinese machine translation"", 《2008 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING》 *
何中军等: ""统计机器翻译中短语切分的新方法"", 《第三届学生计算语言学研讨会论文集 》 *
过冰: ""智能语音对话***中基于规则和统计的语义识别"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046677A (en) * 2019-12-09 2020-04-21 北京字节跳动网络技术有限公司 Method, device, equipment and storage medium for obtaining translation model
CN111259676A (en) * 2020-01-10 2020-06-09 苏州交驰人工智能研究院有限公司 Translation model training method and device, electronic equipment and storage medium
CN111898389A (en) * 2020-08-17 2020-11-06 腾讯科技(深圳)有限公司 Information determination method and device, computer equipment and storage medium
CN111898389B (en) * 2020-08-17 2023-09-19 腾讯科技(深圳)有限公司 Information determination method, information determination device, computer equipment and storage medium
CN113139391A (en) * 2021-04-26 2021-07-20 北京有竹居网络技术有限公司 Translation model training method, device, equipment and storage medium
CN113139391B (en) * 2021-04-26 2023-06-06 北京有竹居网络技术有限公司 Translation model training method, device, equipment and storage medium
CN113591492A (en) * 2021-06-30 2021-11-02 北京百度网讯科技有限公司 Corpus generation method and device, electronic equipment and storage medium
CN113988089A (en) * 2021-10-18 2022-01-28 浙江香侬慧语科技有限责任公司 Machine translation method, device and medium based on K neighbor

Also Published As

Publication number Publication date
CN109670190B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN109670190A (en) Translation model construction method and device
CN109684648B (en) Multi-feature fusion automatic translation method for ancient and modern Chinese
CN110196894A (en) The training method and prediction technique of language model
CN107463553B (en) Text semantic extraction, representation and modeling method and system for elementary mathematic problems
CN106021227B (en) A kind of Chinese Chunk analysis method based on state transfer and neural network
CN109344413A (en) Translation processing method and device
US8818790B2 (en) Syntactic analysis and hierarchical phrase model based machine translation system and method
WO2020233269A1 (en) Method and apparatus for reconstructing 3d model from 2d image, device and storage medium
CN108804423B (en) Medical text feature extraction and automatic matching method and system
CN112069826B (en) Vertical domain entity disambiguation method fusing topic model and convolutional neural network
CN105988990A (en) Device and method for resolving zero anaphora in Chinese language, as well as training method
CN103116578A (en) Translation method integrating syntactic tree and statistical machine translation technology and translation device
CN110032734B (en) Training method and device for similar meaning word expansion and generation of confrontation network model
CN110515823A (en) Program code complexity evaluation methodology and device
CN110210041A (en) The neat method, device and equipment of intertranslation sentence pair
Dandapat et al. Improved named entity recognition using machine translation-based cross-lingual information
de Sousa Neto et al. Htr-flor++ a handwritten text recognition system based on a pipeline of optical and language models
Shterionov et al. A roadmap to neural automatic post-editing: an empirical approach
CN106569994B (en) The analysis method and device of address
CN110020163A (en) Searching method, device, computer equipment and storage medium based on human-computer interaction
Cho et al. Kosp2e: Korean speech to english translation corpus
Dandapat et al. Using example-based MT to support statistical MT when translating homogeneous data in a resource-poor setting
CN110245361A (en) Phrase is to extracting method, device, electronic equipment and readable storage medium storing program for executing
CN101866336A (en) Methods, devices and systems for obtaining evaluation unit and establishing syntactic path dictionary
Hassanat et al. Rule-and dictionary-based solution for variations in written Arabic names in social networks, big data, accounting systems and large databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant