CN107273363B

CN107273363B - A kind of language text interpretation method and system

Info

Publication number: CN107273363B
Application number: CN201710335652.4A
Authority: CN
Inventors: 刘洋; 张嘉成; 孙茂松; 栾焕博; 许静芳
Original assignee: Tsinghua University; Beijing Sogou Technology Development Co Ltd
Current assignee: Tsinghua University; Beijing Sogou Technology Development Co Ltd
Priority date: 2017-05-12
Filing date: 2017-05-12
Publication date: 2019-11-22
Anticipated expiration: 2037-05-12
Also published as: CN107273363A

Abstract

The present invention provides a kind of language text interpretation method and system.This method comprises: determining rule according to preset translation candidate collection, determine that the corresponding translation candidate collection of source language text, the translation candidate collection include multiple cypher texts of source language text；The source language text is language text to be translated；Based on the translation candidate collection, preset translation model and preset priori knowledge model, the first probability distribution and the second probability distribution are determined；First probability distribution is used to indicate the probability that the cypher text meets priori knowledge model, and second probability distribution is used to indicate the probability that the cypher text meets translation model；Based on first probability distribution and second probability distribution, the cypher text of the source language text is determined from the translation candidate collection.The present invention can incorporate any priori knowledge in translation model, to improve the accuracy and reliability of machine translation.

Description

A kind of language text interpretation method and system

Technical field

The present invention relates to machine translation mothod field, in particular to a kind of language text interpretation method and system.

Background technique

With international progress, the exchange between different language crowd is growing day by day, translate into order in exchanging to closing Important tool.Machine translation because it is convenient simple and free the advantages that, greatly meet the translation demand of people, improve The efficiency of international exchange, so that more stringent requirements are proposed for correctness of the people to machine translation.

Machine translation can substantially be divided into: rule-based machine translation method and the machine translation based on corpus.Base In the machine translation of corpus, its critical issue, which is that, establishes a complete corpus, alternatively referred to as high quality Training sample.The training sample of high quality directly affects the accuracy of translation.However, establishing the training sample of high quality not It is an easy thing, reason is that sample data is limited, and cannot portray the distribution of initial data well；In addition, Even if sample data is enough, it can not avoid wherein the presence of error sample, i.e. noise data.The mind obtained based on the training sample It is difficult to prepare to embody master mould through network, or even will appear the case where violating priori knowledge.In this case, priori knowledge Introducing just becomes particularly significant.For translation rule, for example, " should not repeat translation, should not also leak and turn over ", such rule is just It can be described as priori knowledge.Many studies have shown that incorporating priori knowledge in neural network model to constrain it, mind can be improved Performance through network.

Machine translation method (the Attention-based Neural Machine of neural network based on attention mechanism Translation；Abbreviation Attention-based NMT) be the machine translation based on corpus a branch, and at present A kind of machine translation method used in mainstream translation system.Its basic thought is using a non-linear neural net end to end Source language text is directly mapped to target language text by network, that is, constructs the new frame of one " coding-decoding ": giving a source Language sentence is mapped as a continuous, dense vector using an encoder first, then reuses a decoder A target language sentence is converted by the vector.But this method is difficult for priori knowledge to be dissolved among neural network.

There are also the technologies being dissolved into priori knowledge in neural network at present.For example, some technologies are by priori knowledge It is indicated with additional neural network module；Some technologies are by adding limit entry in training objective to incorporate priori knowledge.Though These right technologies can promote translation effect significantly, but the former require the correlation between different priori knowledges be also required to by Modeling, the latter are merely able to add a small amount of simple limit entry.These problems cause these technologies that cannot be applied to will be any, multiple Miscellaneous priori knowledge incorporates neural network machine translation model.

Therefore, how a kind of interpretation method that any priori knowledge can be incorporated to neural network machine translation model is provided It is a urgent problem needed to be solved.

Summary of the invention

For solve the problems, such as it is of the existing technology can not by any priori knowledge incorporate neural network translation model, this hair It is bright that a kind of language text interpretation method and system are provided.

On the one hand, the present invention provides a kind of language text interpretation method, this method comprises:

Rule is determined according to preset translation candidate collection, determines the corresponding translation candidate collection of source language text, it is described Translation candidate collection includes multiple cypher texts of source language text；The source language text is language text to be translated；

Based on the translation candidate collection, preset translation model and preset priori knowledge model, the first probability is determined Distribution and the second probability distribution；First probability distribution is used to indicate the cypher text and meets the general of priori knowledge model Rate, second probability distribution are used to indicate the probability that the cypher text meets translation model；

Based on first probability distribution and second probability distribution, the source is determined from the translation candidate collection The cypher text of language text.

On the other hand, the present invention provides a kind of language text translation system, which includes:

Candidate collection module is translated, for determining rule according to preset translation candidate collection, determines source language text pair The translation candidate collection answered, the translation candidate collection include multiple cypher texts of source language text；The source language text For language text to be translated；

Training module, for being based on the translation candidate collection, preset translation model and preset priori knowledge model, Determine the first probability distribution and the second probability distribution；First probability distribution, which is used to indicate the cypher text and meets priori, to be known Know the probability of model, second probability distribution is used to indicate the probability that the cypher text meets translation model；

Translation module, for being based on first probability distribution and second probability distribution, from the translation Candidate Set The cypher text of the source language text is determined in conjunction.

Language text interpretation method provided by the invention and system, by calculating separately priori knowledge model and translation model Probability distribution in translation candidate collection, and using the difference of two probability distribution as a part of speech training target, from And Machine Translation Model is made to may learn arbitrary priori knowledge, improve the accuracy of machine translation result and reliable Property.

Detailed description of the invention

Fig. 1 is the flow diagram of language text interpretation method provided in an embodiment of the present invention；

Fig. 2 is the structural schematic diagram of language text translation system provided in an embodiment of the present invention；

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical solution in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention A part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.

Fig. 1 is the flow diagram of language text interpretation method provided in an embodiment of the present invention.As shown in Figure 1, this method The following steps are included:

Step 101 determines rule according to preset translation candidate collection, determines the corresponding translation Candidate Set of source language text It closes, the translation candidate collection includes multiple cypher texts of source language text；The source language text is language to be translated Text；

Step 102 is based on the translation candidate collection, preset translation model and preset priori knowledge model, determines First probability distribution and the second probability distribution；First probability distribution is used to indicate the cypher text and meets priori knowledge mould The probability of type, second probability distribution are used to indicate the probability that the cypher text meets translation model；

Step 103 is based on first probability distribution and second probability distribution, from the translation candidate collection really The cypher text of the fixed source language text.

Specifically, firstly, preset translation candidate collection determines that rule refers to that translation is the task that a sequence generates, source There are multiple words or word in language text x, when generating translation candidate collection, the word or word of previous generation can be used as latter The input of a word or word.According to the source language text x of different length, it is exponential for really translating the size of candidate collection , it can not effectively calculate.In practical applications, by stochastical sampling or beam search, to obtain the source language text Multiple cypher texts, i.e., translation candidate collection S (x), can be realized using the prior art, details are not described herein again；

Then, according to the translation candidate collection S (x) and preset priori knowledge model Q (y | x；γ), the first probability is determined DistributionAccording to the translation candidate collection S (x) and preset translation model P (y | x；θ), the second probability distribution is determinedFinally, being based on the first probability distribution and the second probability distribution, source language text is determined from translation candidate collection Cypher text y.

For sake of clarity, if source language text x, as input, cypher text y thus constitutes sentence pair as output (x, y).In practical applications, under different contexts the same word or word there are different semantemes, and source language text x be by Multiple words or word are according to the different compositions that puts in order, and the uncertainty of the ambiguity and sequence of word or word leads to one A source language text may correspond to multiple cypher texts (y1, y2, y3 etc.), and probability is highest in this multiple cypher text, be Best cypher text, in order to be distinguished with other cypher texts, referred to as target language text.

For example, preset priori knowledge model Q (y | x；γ), it can be obtained not according to different characteristic function φ (x, y) Same model, the first probability distribution can determine according to the following formula:

Wherein, x indicates source language text, and y is target language text, and y ' is cypher text, and γ is priori knowledge model Parameter preset.

Characteristic function φ (x, y) indicates the corresponding relationship of source language text and cypher text in priori knowledge data base, It based on specific characteristic function, is given a mark, that is, calculated each to each cypher text y1, y2 and y3 using priori knowledge model Cypher text meets the probability of priori knowledge model.Wherein, more meet the cypher text of priori knowledge model, probability is higher.

Translation model P (y | x；It θ) is then the commonly used scoring model of machine translation, which can be parallel by training Corpus obtains, and the corresponding relationship of source language text x and cypher text y in Parallel Corpus is indicated, for calculating each translation Text meets the probability of translation model, belongs to the prior art, and details are not described herein again.

According to translation candidate collection S (x) and translation model P (y | x；θ), the second probability distribution can be determined by following formula:

Wherein, x indicates source language text, and y is target language text, and y ' is cypher text, and θ is the parameter of translation model；α It is the default hyper parameter for controlling the second probability distribution steep.

Language text interpretation method provided in an embodiment of the present invention passes through comprehensive utilization priori knowledge model and translation mould Type gives a mark to multiple cypher texts in terms of two, so that the cypher text for more meeting priori knowledge model be encouraged to turn over The probability translated under model is also higher, to finally determine target language text from translation candidate collection, improves translation model Performance and translation result accuracy.

On the basis of the above embodiments, first probability distribution and described second in the language text interpretation method Probability distribution determines the cypher text of the source language text from the translation candidate collection, comprising:

Based on first probability distribution and second probability distribution, probability difference parameter value is determined；The probability difference Different parameter is used to indicate the difference of first probability distribution and second probability distribution；

Based on the probability difference parameter value, the translation text of the source language text is determined from the translation candidate collection This.

Specifically, firstly, determining rule according to preset translation candidate collection, the corresponding translation of source language text x is determined Candidate collection S (x)；Then, it is based on the translation candidate collection, translation model and priori knowledge model, determines the first probability distributionWith the second probability distributionLater, it determines general between the first probability distribution and the second probability distribution Rate difference parameter value；Finally, being based on the probability difference parameter value, determine source language text x's from translation candidate collection S (x) Cypher text y.

For example, user log in translation system after, in-English translation window input in Chinese column in input source language text x For " many airports are all forced to close ", determining translation candidate collection S (x) according to x, there are two cypher texts: y1 is " Many Airports were closed to close " and y2 is " Many airports were forced to close down"；

According to priori knowledge model, the first probability distribution is determined

Wherein, the probability that Q (y1 | x)=0.2, i.e. sentence pair (x, y1) meet priori knowledge model is 0.2；Q (y2 | x)= 0.8, i.e., it is 0.8 that sentence pair (x, y2), which meets the probability of priori knowledge model,；

According to translation model, the second probability distribution is determined:

Wherein, the probability that P (y1 | x)=0.6, i.e. sentence pair (x, y1) meet translation model is 0.6；P (y2 | x)=0.4, i.e., The probability that sentence pair (x, y2) meets translation model is 0.4；

By the first probability distribution and the second probability distribution, difference parameter value between the two can be determined；Based on the difference Different parameter value is adjusted translation model and gives a mark again to above-mentioned two cypher text, obtain P (y1 | x)=0.3, P (y2 | X)=0.7；

Accordingly, it is determined that source language text x: the cypher text y: " Many airports of " many airports are all forced to close " were forced to close down”。

By above-described embodiment, it can be seen that, language text interpretation method provided in an embodiment of the present invention is based on the first probability The difference parameter value of distribution and the second probability distribution, and given a mark again according to translation model to multiple cypher texts, to improve Meet probability of the cypher text of priori knowledge in translation model probability distribution, and then obtains more accurate source language text Cypher text.

On the basis of the above embodiments, the difference parameter value of first probability distribution and second probability distribution is KL (Kullback-Leibler) distance can be determined by following formula:

On the basis of the various embodiments described above, in the language text interpretation method based on the probability difference parameter value, The cypher text of the source language text is determined from the translation candidate collection, comprising:

Based on the difference parameter value, training objective is determined；The training objective is used to indicate the translation model to institute State priori knowledge Model approximation；

Based on the training objective and the preset model that reorders, the original language is determined from the translation candidate collection The cypher text of text.

Specifically, firstly, determining rule according to preset translation candidate collection, the corresponding translation of source language text x is determined Candidate collection S (x)；Then, it is based on the translation candidate collection, translation model and priori knowledge model, determines the first probability distributionWith the second probability distributionLater, it determines general between the first probability distribution and the second probability distribution Rate difference parameter value；Finally, being based on the probability difference parameter value, training objective J (θ, γ) is determined, so that translation model is to priori Model approximation；Finally, being based on training objective J (θ, γ) and the preset model that reorders, determined from translation candidate collection S (x) The cypher text y of source language text x.

In general, when giving a mark to cypher text, generally use translation model P (y | x；Log-likelihood θ) is estimated Be counted as standard exercise criterion, i.e., traditional training objective be log-likelihood function L (θ)=logP (y | x；θ).

By determining the difference parameter value of the first probability distribution and the second probability distribution, which is added tradition In training objective, determine that new training objective is J (θ, γ), which thinks that optimal parameter θ and γ can encourage most to accord with Probability highest of the cypher text of priori knowledge in the second probability distribution of translation model is closed, so that translation model more inclines The cypher text that priori knowledge is determined for compliance in Xiang Yucong translation candidate collection S (x) is the target language text of source language text x y。

Optionally, if the difference parameter value is KL distance, training objective can determine according to the following formula:

Wherein, λ₁And λ₂It is the default hyper parameter of balance training target, N is the sentence pair number of training data.

Optimal parameter θ and γ is obtained by new training objective, using the following model that reorders, from translation candidate Determine the cypher text of source language text.

Y=argmax_y∈S(x){logP(y|x；θ)+γ·φ(x,y)}

For example, it is assumed that source language text x is " Bush and salon have held talks ", translation candidate collection S is determined according to x (x) there are three cypher texts: y1 is " Bush held a talk with Sharon ", and y2 is " Bush held a talk With Bush ", y3 are " Bush had lunch with Sharon ".

Assuming that characteristic function φ (x, y) indicates the word pair occurred in source language text x and target language text y in sentence pair Quantity, word is combined into { (Bush, Bush), (holding, held), (talks, talk), (salon, Sharon) } to collection, then the In one cypher text y1,4 words are to occurring, therefore φ (x, y1)=4；Similarly, φ (x, y2)=3, φ (x, y3)= 2。

The first probability distribution can be determined according to priori knowledge model

Wherein, the probability of cypher text y1 are as follows:

It can similarly obtain: Q (y2 | x)=e³/(e²+e³+e⁴)；Q (y3 | x)=e²/(e²+e³+e⁴).Final Q (y1 | x)= 0.67, Q (y2 | x)=0.24, Q (y3 | x)=0.09.

It by above-mentioned probability it is found that cypher text y1 is best suitable for priori knowledge model, and is in fact also correctly to turn over Translation sheet；Cypher text y2 has then obviously violated the priori knowledge of " should not repeat translation, should not leak and turn over ", therefore probability is lower； Cypher text y3 then deviates from the semanteme of source language text, therefore probability is lower.

Assuming that obtaining the second probability distribution by adjusting preceding translation model

Wherein, P (y1 | x)=0.4, P (y2 | x)=0.5, P (y3 | x)=0.1, translation model can translate " Bush held a talk with Bush”。

At this point, if default hyper parameter λ₁、λ₂Numerical value be 1, KL (P between above-mentioned two probability distribution is calculated by formula | | Q), new training objective J (θ, γ) is determined based on KL distance；

It based on the training objective and reorders model, translation model is adjusted, the P (y1 | x)=0.6 after training, P (y2 | x)=0.31, P (y3 | x)=0.09, it is seen then that new training objective improves the probability of cypher text y1, and reduces The probability of cypher text y2 and y3, so that more meeting the cypher text of priori knowledge probability in the probability distribution in translation model It is higher, even if translation model is to priori knowledge Model approximation.

Therefore, the target language text y of final output is " Bush held a talk with Sharon ".

By above-described embodiment it can be seen that, language text interpretation method provided in an embodiment of the present invention, by the way that elder generation will be met Test the probability distribution of knowledge model and meet translation model probability distribution between KL distance traditional training objective, drum is added It encourages more meeting the translation that is also higher, and then more being optimized of probability of the cypher text of priori knowledge model under translation model Model parameter improves the performance and translation of translation model to finally determine target language text from translation candidate collection As a result accuracy.

Fig. 2 is the structural schematic diagram of language text translation system provided in an embodiment of the present invention.As shown in Fig. 2, the system It include: translation candidate collection module 21, training module 22 and translation module 23.Wherein, translation candidate collection module 21 is used for root Rule is determined according to preset translation candidate collection, determines the corresponding translation candidate collection of source language text, the translation Candidate Set Close multiple cypher texts including source language text；The source language text is language text to be translated；Training module 22 is used In be based on the translation candidate collection, preset translation model and preset priori knowledge model, determine the first probability distribution and Second probability distribution；First probability distribution is used to indicate the probability that the cypher text meets priori knowledge model, described Second probability distribution is used to indicate the probability that the cypher text meets translation model；Translation module 23 is used to be based on described first Probability distribution and second probability distribution determine the cypher text of the source language text from the translation candidate collection.

It should be noted that the language text translation system is to realize that above method embodiment, function are specific It can refer to above method embodiment, details are not described herein again.

On the basis of the above embodiments, the translation module 23 in the system is specifically used for being based on first probability distribution And second probability distribution, determine probability difference parameter value；The probability difference parameter is used to indicate first probability point The difference of cloth and second probability distribution；Based on the probability difference parameter value, institute is determined from the translation candidate collection State the cypher text of source language text.Optionally, the probability difference parameter is KL distance.

On the basis of the various embodiments described above, the translation module 23 in the system is specifically used for being based on the difference parameter Value, determines training objective；The training objective is used to indicate the translation model to the priori knowledge Model approximation；Based on institute Training objective and the preset model that reorders are stated, the translation text of the source language text is determined from the translation candidate collection This.

Priori knowledge is dissolved into translation in the training stage by the language text interpretation method and system provided through the invention In model, the performance of translation model is improved, and then priori knowledge is applied in translation process, without increasing additionally Network module, which achieves that, applies to any priori knowledge in machine translation, the final accuracy for improving translation result and reliable Property.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of language text interpretation method characterized by comprising

Rule is determined according to preset translation candidate collection, determines the corresponding translation candidate collection of source language text, the translation Candidate collection includes multiple cypher texts of source language text；The source language text is language text to be translated；

Based on the translation candidate collection, preset translation model and preset priori knowledge model, the first probability distribution is determined And second probability distribution；First probability distribution is used to indicate the probability that the cypher text meets priori knowledge model, institute It states the second probability distribution and is used to indicate the probability that the cypher text meets translation model；

Based on first probability distribution and second probability distribution, the original language is determined from the translation candidate collection The cypher text of text；

It is described to be based on first probability distribution and second probability distribution, the source is determined from the translation candidate collection The cypher text of language text, comprising:

Based on first probability distribution and second probability distribution, probability difference parameter value is determined；The probability difference ginseng Number is used to indicate the difference of first probability distribution and second probability distribution；

Based on the probability difference parameter value, the cypher text of the source language text is determined from the translation candidate collection；

It is described to be based on the probability difference parameter value, the translation text of the source language text is determined from the translation candidate collection This, comprising:

Based on the difference parameter value, training objective is determined；The training objective is used to indicate the translation model to the elder generation Knowledge model is tested to approach；

Based on the training objective and the preset model that reorders, the source language text is determined from the translation candidate collection Cypher text.

2. the method according to claim 1, wherein the probability difference parameter is KL distance.

3. a kind of language text translation system characterized by comprising

Candidate collection module is translated, for determining rule according to preset translation candidate collection, determines that source language text is corresponding Candidate collection is translated, the translation candidate collection includes multiple cypher texts of source language text；The source language text be to The language text of translation；

Training module is determined for being based on the translation candidate collection, preset translation model and preset priori knowledge model First probability distribution and the second probability distribution；First probability distribution is used to indicate the cypher text and meets priori knowledge mould The probability of type, second probability distribution are used to indicate the probability that the cypher text meets translation model；

Translation module, for being based on first probability distribution and second probability distribution, from the translation candidate collection Determine the cypher text of the source language text；

The translation module is specifically used for:

4. system according to claim 3, which is characterized in that the probability difference parameter is KL distance.