CN109446537A - A kind of translation evaluation method and device for machine translation - Google Patents
A kind of translation evaluation method and device for machine translation Download PDFInfo
- Publication number
- CN109446537A CN109446537A CN201811306229.2A CN201811306229A CN109446537A CN 109446537 A CN109446537 A CN 109446537A CN 201811306229 A CN201811306229 A CN 201811306229A CN 109446537 A CN109446537 A CN 109446537A
- Authority
- CN
- China
- Prior art keywords
- corpus
- translation
- model
- target word
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of translation evaluation method and devices for machine translation, which comprises obtains several corpus in corpus, and by the splicing result for the context term vector for including in each corpus;And the term vector of the word for the different parts of speech for including in several corpus is initialized;CBOW model using the splicing result and the term vector as the input of CBOW model, after obtaining training;The target word of each corpus is obtained, and is translated using the CBOW model after training;The translation that model to be assessed is directed to the target word is obtained, and according to the similarity between the corresponding translation of the model to be assessed translation corresponding with the CBOW model after training, assesses the accuracy of model translation to be assessed.Using the embodiment of the present invention, accuracy evaluation can be carried out to translation result automatically.
Description
Technical field
The present invention relates to a kind of translation evaluation method and devices, are more particularly to a kind of translation evaluation for machine translation
Method and device.
Background technique
With the development of modern society, the mankind are increasing to the conversion requirements between language.In practical applications, traditional
Machine translation is rule-based, and feature is that the grammer Matching Relation based on syntax and semantics theory, by analyzing context obtains
To translation result.But since rule can not cover all sentences, conventional machines translation is literal translation or the sentence of syntax mostly
The conversion of type.
With the continuous development of artificial intelligence technology, expression learning art neural network based starts fine in every field
It appears.Especially in the multiple tasks based on image recognition and speech recognition, the method based on expression study is in performance
It has been more than traditional method based on statistical learning.Modern machines interpretation method is based on " bilingual library ", and feature is
Include the bilingualism corpora of many sentence patterns using one, is extracted and inputted sentence when translation according to the sentence pattern in corpus
Original language, is converted into object language referring next to bilingual sentence pattern by the similar example sentence of son.
Natural language is the abstract expression of the wisdom of humanity, be difficult to represent by existing data structure come.In natural language
It says in treatment process, the basic unit of data is word or word.Similar to " apple ", a kind of fruit can be both indicated, it can also be with table
Show " Apple Inc. ".What " microphone " and " microphone " indicated is a kind of article, but can not set up correct connection from literal.
Therefore, most of translation systems can correctly translate the substantially meaning of sentence at present.But the word, sentence between different language are used
Method has marked difference, and the result of translation has word order mistake mostly, word is used with, the problems such as misusing.Particularly with long sentence, machine
Device translation cannot reach better accuracy, and the prior art is caused to there is technical issues that the result of translation still needs to.
Summary of the invention
Technical problem to be solved by the present invention lies in provide a kind of translation evaluation method and dress for machine translation
It sets, to solve the technical issues of result of translation existing in the prior art still needs to manual evaluation.
The present invention is to solve above-mentioned technical problem by the following technical programs:
The embodiment of the invention provides a kind of translation evaluation methods for machine translation, which comprises
Several corpus in corpus are obtained, and by the splicing knot for the context term vector for including in each corpus
Fruit;And the term vector of the word for the different parts of speech for including in several corpus is initialized;
CBOW model using the splicing result and the term vector as the input of CBOW model, after obtaining training;
The target word of each corpus is obtained, and is translated using the CBOW model after training;
The translation that model to be assessed is directed to the target word is obtained, and according to the corresponding translation of the model to be assessed and instruction
The similarity between the corresponding translation of CBOW model after white silk, assesses the accuracy of model translation to be assessed.
Optionally, the term vector of the word to the different parts of speech for including in several corpus initializes,
Include:
Respectively using the value range not being overlapped mutually, to the word of the word for the different parts of speech for including in several corpus
Vector is initialized.
Optionally, training is obtained using the splicing result and the term vector as the input of CBOW model described
Before CBOW model afterwards, the method also includes:
By the punctuation mark removal in each corpus in addition to the punctuation mark of setting, wherein the punctuation mark of setting
It include: one of punctuation mark that punctuation mark, corpus for expressing the tone of corpus terminate or combination.
Optionally, the target word for obtaining each corpus, comprising:
Using formula,Obtain the target word of each corpus, wherein
P (w | c) is the probability of target word;W is target word;C is the context of target word;Exp () is to be with the natural truth of a matter
The exponential function at bottom;;X is the input layer of CBOW model;∑ is summing function;V is corpus;()TFor transposed matrix.
Optionally, the corpus is individual sentence.
The embodiment of the invention provides a kind of translation evaluation device for machine translation, described device includes:
Module is obtained, for obtaining several corpus in corpus, and the cliction up and down that will include in each corpus
The splicing result of vector;And the term vector of the word for the different parts of speech for including in several corpus is initialized;
CBOW model using the splicing result and the term vector as the input of CBOW model, after obtaining training;
The target word of each corpus is obtained, and is translated using the CBOW model after training;
The translation that model to be assessed is directed to the target word is obtained, and according to the corresponding translation of the model to be assessed and instruction
The similarity between the corresponding translation of CBOW model after white silk, assesses the accuracy of model translation to be assessed.
Optionally, the acquisition module, is used for:
Respectively using the value range not being overlapped mutually, to the word of the word for the different parts of speech for including in several corpus
Vector is initialized.
Optionally, described device further include: removal module, for by each corpus in addition to the punctuation mark of setting
Punctuation mark removal, wherein the punctuation mark of setting includes: that punctuation mark, the corpus for expressing the tone of corpus terminate
One of punctuation mark or combination.
Optionally, the acquisition module, is used for:
Using formula,Obtain the target word of each corpus, wherein
P (w | c) is the probability of target word;W is target word;C is the context of target word;Exp () is to be with the natural truth of a matter
The exponential function at bottom;;X is the input layer of CBOW model;∑ is summing function;V is corpus;()TFor transposed matrix.
Optionally, the corpus is individual sentence.
The present invention has the advantage that compared with prior art
Using the embodiment of the present invention, since context word order plays an important role for translation, by each
The splicing result for the context term vector for including in corpus, available more accurate translation model, and then this can be used
Inventive embodiments training model the translation result of model in the prior art is proofreaded, compared with the existing technology in need
Manual evaluation, the embodiment of the present invention can carry out accuracy evaluation to translation result automatically.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the translation evaluation method for machine translation provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram of CBOW model provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of the translation evaluation device for machine translation provided in an embodiment of the present invention.
Specific embodiment
It elaborates below to the embodiment of the present invention, the present embodiment carries out under the premise of the technical scheme of the present invention
Implement, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to following implementation
Example.
The embodiment of the invention provides a kind of translation evaluation method and devices for machine translation, first below with regard to this hair
A kind of translation evaluation method for machine translation that bright embodiment provides is introduced.
Fig. 1 is a kind of flow diagram of the translation evaluation method for machine translation provided in an embodiment of the present invention, such as
Shown in Fig. 1, which comprises
S101: several corpus in corpus are obtained, and by the spelling for the context term vector for including in each corpus
Binding fruit;And the term vector of the word for the different parts of speech for including in several corpus is initialized;
Specifically, can be respectively using the value range not being overlapped mutually, to the different words for including in several corpus
The term vector of the word of property is initialized.The corpus is individual sentence.
Illustratively, can learn to establish language model from Large Scale Corpus.Since the quality of language model is direct
The judgement to sentence correctness is influenced, so it is more important to choose suitable corpus.Chinese corpus can choose wikipedia
Chinese vocabulary entry is modeled.
S102: the CBOW using the splicing result and the term vector as the input of CBOW model, after obtaining training
Model;
Fig. 2 is a kind of structural schematic diagram of CBOW model provided in an embodiment of the present invention, as shown in Fig. 2, CBOW model
(Continuous Bag of Words, continuous bag of words) include: input layer x and output layer y.Input layer receives different
Phrase is exported after being translated by output layer.
S103: the target word of each corpus is obtained, and is translated using the CBOW model after training.
Specifically, can use formula,The target word of each corpus is obtained,
In,
P (w | c) is the probability of target word;W is target word;C is the context of target word;Exp () is to be with the natural truth of a matter
The exponential function at bottom;;X is the input layer of CBOW model;∑ is summing function;V is corpus;()TFor transposed matrix.
(w, c) is the n member phrase w selected from corpusi-(n-1)/2,...,wi+(n-1)/2, general n selects odd number, can be with
Guarantee that the word quantity of context is consistent.
The optimization aim of model can be with:
Wherein,
D is corpus.
S104: the translation that model to be assessed is directed to the target word is obtained, and is translated according to the model to be assessed is corresponding
Similarity between text translation corresponding with the CBOW model after training, assesses the accuracy of model translation to be assessed.
In practical applications, for a translation, repeatedly judged using sliding window.Such as: window size 5,
Respectively with the 1,2nd of translation the ... a word is that medium term is judged.Judgement obtains a similarity value every time, then calculates similar
The average value of degree, the similarity finally obtained are to the marking value of this translation, and the higher correctness for illustrating translation of marking value is more
It is high.
Using embodiment illustrated in fig. 1 of the present invention, since context word order plays an important role for translation,
By the splicing result for the context term vector for including in each corpus, available more accurate translation model, Jin Erke
To use the model of training of the embodiment of the present invention to proofread the translation result of model in the prior art, relative to existing skill
Manual evaluation is needed in art, the embodiment of the present invention can carry out accuracy evaluation to translation result automatically.
Specifically in a kind of specific embodiment of the embodiment of the present invention, before S102 step, the method is also wrapped
It includes:
By the punctuation mark removal in each corpus in addition to the punctuation mark of setting, wherein the punctuation mark of setting
It include: one of punctuation mark that punctuation mark, corpus for expressing the tone of corpus terminate or combination.
Before training pattern, when handling corpus, additional character is removed, retains the punctuate symbol useful to model
Number.Such as: fullstop, exclamation mark, question mark etc..
The present invention increases the sentences information such as word order, part of speech, punctuation mark, improves language by improving language model
The expression ability of model, can indicate more complicated sentence.It can be sentenced by the improvement of language model in conjunction with machine translation
The correctness of disconnected machine translation translation, improves the accuracy rate of machine translation.
Corresponding for embodiment illustrated in fig. 1 of the present invention, the embodiment of the invention also provides a kind of for machine translation
Translation evaluation device.
Fig. 3 is a kind of structural schematic diagram of the translation evaluation device for machine translation provided in an embodiment of the present invention, such as
Shown in Fig. 3, described device includes:
Module 301 is obtained, for obtaining several corpus in corpus, and the context that will include in each corpus
The splicing result of term vector;And the term vector of the word for the different parts of speech for including in several corpus is initialized;
CBOW model using the splicing result and the term vector as the input of CBOW model, after obtaining training;
The target word of each corpus is obtained, and is translated using the CBOW model after training;
The translation that model to be assessed is directed to the target word is obtained, and according to the corresponding translation of the model to be assessed and instruction
The similarity between the corresponding translation of CBOW model after white silk, assesses the accuracy of model translation to be assessed.
Using embodiment illustrated in fig. 1 of the present invention, since context word order plays an important role for translation,
By the splicing result for the context term vector for including in each corpus, available more accurate translation model, Jin Erke
To use the model of training of the embodiment of the present invention to proofread the translation result of model in the prior art, relative to existing skill
Manual evaluation is needed in art, the embodiment of the present invention can carry out accuracy evaluation to translation result automatically.
In a kind of specific embodiment of the embodiment of the present invention, the acquisition module 301 is used for:
Respectively using the value range not being overlapped mutually, to the word of the word for the different parts of speech for including in several corpus
Vector is initialized.
In a kind of specific embodiment of the embodiment of the present invention, the acquisition module 301, be used for: described device is also wrapped
It includes: removal module, for the punctuation mark in each corpus in addition to the punctuation mark of setting to be removed, wherein setting
Punctuation mark includes: one of punctuation mark that punctuation mark, the corpus for expressing the tone of corpus terminate or combination.
In a kind of specific embodiment of the embodiment of the present invention, the acquisition module 301 is used for: formula is utilized,Obtain the target word of each corpus, wherein
P (w | c) is the probability of target word;W is target word;C is the context of target word;Exp () is to be with the natural truth of a matter
The exponential function at bottom;;X is the input layer of CBOW model;∑ is summing function;V is corpus;()TFor transposed matrix.
In a kind of specific embodiment of the embodiment of the present invention, the corpus is individual sentence.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (10)
1. a kind of translation evaluation method for machine translation, which is characterized in that the described method includes:
Several corpus in corpus are obtained, and by the splicing result for the context term vector for including in each corpus;And
The term vector of the word for the different parts of speech for including in several corpus is initialized;
CBOW model using the splicing result and the term vector as the input of CBOW model, after obtaining training;
The target word of each corpus is obtained, and is translated using the CBOW model after training;
Obtain the translation that model to be assessed is directed to the target word, and according to the corresponding translation of the model to be assessed and training after
The corresponding translation of CBOW model between similarity, assess the accuracy of model translation to be assessed.
2. a kind of translation evaluation method for machine translation according to claim 1, which is characterized in that described to described
The term vector of the word for the different parts of speech for including in several corpus is initialized, comprising:
Respectively using the value range not being overlapped mutually, to the term vector of the word for the different parts of speech for including in several corpus
It is initialized.
3. a kind of translation evaluation method for machine translation according to claim 1, which is characterized in that described by institute
The input of splicing result and the term vector as CBOW model is stated, before the CBOW model after obtaining training, the method
Further include:
By the punctuation mark removal in each corpus in addition to the punctuation mark of setting, wherein the punctuation mark of setting includes:
For expressing one of the punctuation mark of the tone of corpus, punctuation mark that corpus terminates or combination.
4. a kind of translation evaluation method for machine translation according to claim 1, which is characterized in that described to obtain often
The target word of one corpus, comprising:
Using formula,Obtain the target word of each corpus, wherein
P (w | c) is the probability of target word;W is target word;C is the context of target word;Exp () is using the natural truth of a matter bottom of as
Exponential function;;X is the input layer of CBOW model;∑ is summing function;V is corpus;()TFor transposed matrix.
5. a kind of translation evaluation method for machine translation according to claim 1, which is characterized in that the corpus is
Individual sentence.
6. a kind of translation evaluation device for machine translation, which is characterized in that described device includes:
Module is obtained, for obtaining several corpus in corpus, and the context term vector that will include in each corpus
Splicing result;And the term vector of the word for the different parts of speech for including in several corpus is initialized;
CBOW model using the splicing result and the term vector as the input of CBOW model, after obtaining training;
The target word of each corpus is obtained, and is translated using the CBOW model after training;
Obtain the translation that model to be assessed is directed to the target word, and according to the corresponding translation of the model to be assessed and training after
The corresponding translation of CBOW model between similarity, assess the accuracy of model translation to be assessed.
7. a kind of translation evaluation device for machine translation according to claim 6, which is characterized in that the acquisition mould
Block is used for:
Respectively using the value range not being overlapped mutually, to the term vector of the word for the different parts of speech for including in several corpus
It is initialized.
8. a kind of translation evaluation device for machine translation according to claim 6, which is characterized in that described device is also
It include: removal module, for removing the punctuation mark in each corpus in addition to the punctuation mark of setting, wherein setting
Punctuation mark include: one of punctuation mark that punctuation mark, corpus for expressing the tone of corpus terminate or combination.
9. a kind of translation evaluation device for machine translation according to claim 6, which is characterized in that the acquisition mould
Block is used for:
Using formula,Obtain the target word of each corpus, wherein
P (w | c) is the probability of target word;W is target word;C is the context of target word;Exp () is using the natural truth of a matter bottom of as
Exponential function;;X is the input layer of CBOW model;∑ is summing function;V is corpus;()TFor transposed matrix.
10. a kind of translation evaluation device for machine translation according to claim 6, which is characterized in that the corpus
For individual sentence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811306229.2A CN109446537B (en) | 2018-11-05 | 2018-11-05 | Translation evaluation method and device for machine translation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811306229.2A CN109446537B (en) | 2018-11-05 | 2018-11-05 | Translation evaluation method and device for machine translation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109446537A true CN109446537A (en) | 2019-03-08 |
CN109446537B CN109446537B (en) | 2022-11-25 |
Family
ID=65550840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811306229.2A Active CN109446537B (en) | 2018-11-05 | 2018-11-05 | Translation evaluation method and device for machine translation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446537B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111274827A (en) * | 2020-01-20 | 2020-06-12 | 南京新一代人工智能研究院有限公司 | Suffix translation method based on multi-target learning of word bag |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110246177A1 (en) * | 2010-04-06 | 2011-10-06 | Samsung Electronics Co. Ltd. | Syntactic analysis and hierarchical phrase model based machine translation system and method |
CN105808530A (en) * | 2016-03-23 | 2016-07-27 | 苏州大学 | Translation method and device in statistical machine translation |
US20160350288A1 (en) * | 2015-05-29 | 2016-12-01 | Oracle International Corporation | Multilingual embeddings for natural language processing |
-
2018
- 2018-11-05 CN CN201811306229.2A patent/CN109446537B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110246177A1 (en) * | 2010-04-06 | 2011-10-06 | Samsung Electronics Co. Ltd. | Syntactic analysis and hierarchical phrase model based machine translation system and method |
US20160350288A1 (en) * | 2015-05-29 | 2016-12-01 | Oracle International Corporation | Multilingual embeddings for natural language processing |
CN105808530A (en) * | 2016-03-23 | 2016-07-27 | 苏州大学 | Translation method and device in statistical machine translation |
Non-Patent Citations (2)
Title |
---|
姚亮等: "基于语义分布相似度的翻译模型领域自适应研究", 《山东大学学报(理学版)》 * |
樊文婷等: "融合先验信息的蒙汉神经网络机器翻译模型", 《中文信息学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111274827A (en) * | 2020-01-20 | 2020-06-12 | 南京新一代人工智能研究院有限公司 | Suffix translation method based on multi-target learning of word bag |
Also Published As
Publication number | Publication date |
---|---|
CN109446537B (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110750959B (en) | Text information processing method, model training method and related device | |
CN110598203A (en) | Military imagination document entity information extraction method and device combined with dictionary | |
CN108549637A (en) | Method for recognizing semantics, device based on phonetic and interactive system | |
CN112784696B (en) | Lip language identification method, device, equipment and storage medium based on image identification | |
CN105138507A (en) | Pattern self-learning based Chinese open relationship extraction method | |
CN110767213A (en) | Rhythm prediction method and device | |
CN111599340A (en) | Polyphone pronunciation prediction method and device and computer readable storage medium | |
CN109949799B (en) | Semantic parsing method and system | |
CN105404621A (en) | Method and system for blind people to read Chinese character | |
CN110276069A (en) | A kind of Chinese braille mistake automatic testing method, system and storage medium | |
CN110334187A (en) | Burmese sentiment analysis method and device based on transfer learning | |
CN113255331B (en) | Text error correction method, device and storage medium | |
CN113268576B (en) | Deep learning-based department semantic information extraction method and device | |
CN110377882A (en) | For determining the method, apparatus, system and storage medium of the phonetic of text | |
CN115064154A (en) | Method and device for generating mixed language voice recognition model | |
CN113779992A (en) | Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training | |
CN109446537A (en) | A kind of translation evaluation method and device for machine translation | |
CN116822530A (en) | Knowledge graph-based question-answer pair generation method | |
CN114357975A (en) | Multilingual term recognition and bilingual term alignment method | |
CN109960782A (en) | A kind of Tibetan language segmenting method and device based on deep neural network | |
CN113886521A (en) | Text relation automatic labeling method based on similar vocabulary | |
CN114330375A (en) | Term translation method and system based on fixed paradigm | |
CN109657207B (en) | Formatting processing method and processing device for clauses | |
CN107423293A (en) | The method and apparatus of data translation | |
CN112199927A (en) | Ancient book mark point filling method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |