CN103810159A - Machine translation data processing method, system and terminal - Google Patents

Machine translation data processing method, system and terminal Download PDF

Info

Publication number
CN103810159A
CN103810159A CN201210459144.4A CN201210459144A CN103810159A CN 103810159 A CN103810159 A CN 103810159A CN 201210459144 A CN201210459144 A CN 201210459144A CN 103810159 A CN103810159 A CN 103810159A
Authority
CN
China
Prior art keywords
translation
module
constant
score value
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210459144.4A
Other languages
Chinese (zh)
Other versions
CN103810159B (en
Inventor
廖剑
卢小康
吴克文
张永刚
郑文彬
林锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Alibaba Zhirong Digital Technology Co.,Ltd.
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201210459144.4A priority Critical patent/CN103810159B/en
Publication of CN103810159A publication Critical patent/CN103810159A/en
Application granted granted Critical
Publication of CN103810159B publication Critical patent/CN103810159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a machine translation data processing method, which comprises the following steps of obtaining translation results output by at least two translation modules; determining weighting weighted values of the translation modules; calculating basic scores of the translation results; calculating final scores of the translation results on the basis of the basic scores of the translation results and the weighting weighted values of the corresponding translation modules; selecting a part of or all translation results to output according to the final scores of the translation results. The invention also provides a machine translation data processing system capable of realizing the method and a terminal provided with the system. By using the machine translation data processing method, the system and the terminal, the translation query efficiency and the system response speed are improved.

Description

Machine translation data processing method, system and terminal
Technical field
The application relates to computer-aided translation technical field, particularly relates to a kind of machine translation data processing method, system and terminal.
Background technology
Along with the fast development of science and technology and internet, the own every aspect of working, living through going deep into us of computer and network technologies.In translation field, also there is computer-aided translation technology, for example common *** translation, Baidu's translation Huo You road translation etc.
Memory translation realized in the common with good grounds corpus of method of computer-aided translation, be decomposed into several words by sentence to be translated, then by means of the example translation of having stored, the word decomposing is out translated, finally again the result after translation is combined.For example, needing the sentence of translation is " he has bought a book ", can be decomposed into " he, bought, a book ", then in system, search corresponding translation instance, for example, find " she is reading a book: she is reading a book " and " he has bought a computer: he bought a computer ", so just can translate and obtain " he, bought, a book " decomposing word out or phrase, finally combine and obtain translation result " he bought a book ".
This kind of mode can be sub-divided into sentence very little particle, thereby can improve translation quality.But because need accurate matching operation, in order to guarantee matching rate, need in system or in database, safeguard a large amount of example phrase data, this will take a large amount of data spaces.Meanwhile, in a large amount of example phrase data, the identical word of match query also needs to spend more query time, thereby causes the response speed of system slower.In the time that the concurrency of sentence to be translated is larger, also may cause system crash.In addition, because comparatively single interpretative system also may affect the accuracy of translation result,, now often need artificial modification sentence to be translated repeatedly to inquire about, until obtain the result that user expects, this can increase the load of system undoubtedly.
Summary of the invention
The application provides a kind of machine translation data processing method, system and terminal, can solve translation and inquiry efficiency low, the problem that system responses is slow.
In order to address the above problem, the application discloses a kind of machine translation data processing method, comprises the following steps:
Obtain the translation result of at least two translation module outputs;
Determine the weighting weighted value of described translation module;
Calculate the basic score value of described translation result;
The weighting weighted value of the basic score value based on described translation result and corresponding translation module calculates the final score value of described translation result;
According to the final score value selected part of described translation result or whole translation result output.
Further, described at least two translation modules have different translation rules, described translation result be described in the translation result that obtains after same source statement to be translated is translated of at least two translation modules translation rule separately.
Further, the described weighting weighted value of determining described translation module comprises:
Training sample is inputted to each translation module and obtain Output rusults;
Output rusults is marked, obtain the score value of Output rusults;
Score value based on Output rusults corresponding to each translation module is determined the error rate of described translation module;
Determine the weighting weighted value of each translation module according to error rate.
Further, the score value that the error rate of described translation module is Output rusults is less than the initial weight sum of the training sample of standard value;
Describedly determine that according to error rate the weighting weighted value of described translation module adopts following formula to calculate:
Log(1/(error rate/(1-error rate))).
Further, the basic score value of the described translation result of described calculating comprises:
Determine the final weight of each training sample in each translation module according to error rate;
Determine the to be translated source statement corresponding with translation result and the similarity of each training sample;
Final weight according to each training sample in this translation module and calculate the basic score value of translating the translation result obtaining through this translation module with the similarity of this source statement to be translated.
Further, describedly determine that according to error rate the final weight of each training sample in each translation module comprises:
If the score value of the Output rusults after certain training sample is translated by this translation module is less than standard value, the final weight of this training sample in this translation module is: initial weight × error rate/(1-error rate); If the score value of Output rusults is greater than standard value, the final weight of this training sample is initial weight.
Further, further comprising the steps of before the translation result that obtains at least two translation module outputs:
Receive source statement to be translated, constant and the non-constant in described source statement obtained in scanning;
Replace the non-constant of described source statement with predetermined variable, obtain standard statement, and record the corresponding relation of described predetermined variable and non-constant;
In template base, search the To Template mating with described standard statement, if can find, according to To Template, standard statement is translated and obtained initial translation result;
According to the corresponding relation of predetermined variable and non-constant, replace in initial translation result corresponding with it predetermined variable with non-constant;
Non-constant in initial translation result after replacing is translated, obtained translation result.
Further, described scanning is obtained in described source statement constant and non-constant comprise:
The character string comprising in scan source statement;
By the constant comprising in constant database and described string matching, if can match, using described character string as constant, if can not mate, using described character string as non-constant.
Further, described non-constant in initial translation result after replacing translated and comprised:
Adopt predetermined translation module to translate described non-constant; Or
Inquire about special translation database, judge whether described non-constant is special named entity, if, based on described special translation database, the described non-constant for special named entity is translated, if not, adopt predetermined translation module to translate the non-constant of described non-special named entity.
Disclosed herein as well is a kind of machine translation data handling system, comprising:
Translation result acquisition module, for obtaining the translation result of at least two translation module outputs;
Translation module weighting score value determination module, for determining the weighting weighted value of each translation module;
Translation result basis score value determination module, for calculating the basic score value of described translation result;
Final score value computing module, calculates the final score value of described each translation result for the weighting weighted value of the basic score value based on described translation result and corresponding translation module;
Result output module, for exporting according to the final score value selected part of each translation result or whole translation results.
Further, described translation module weighting score value determination module comprises:
Training unit, obtains Output rusults for training sample being inputted to each translation module;
Score value evaluation unit, for Output rusults is marked, obtains the score value of Output rusults;
Error rate computing module, determines the error rate of each translation module for the score value based on Output rusults corresponding to each translation module;
Weighted value computing unit, for determining the weighting weighted value of each translation module according to error rate.
Further, described translation result basis score value determination module comprises:
Final weight calculation unit, for determining the final weight of each training sample at each translation module according to error rate;
Similarity calculated, for determining the to be translated source statement corresponding with translation result and the similarity of each training sample;
Basis score value computing unit, for calculating in the weight of this translation module with the similarity of this source statement to be translated the basic score value of translating the translation result obtaining through this translation module according to each training sample.
Further, described system also comprises:
Data reception module, for receiving source statement to be translated, constant and the non-constant in described source statement obtained in scanning;
Standard statement determination module, for replace the non-constant of described source statement with predetermined variable, obtains standard statement, and records the corresponding relation of described predetermined variable and non-constant;
Template translation module, for search the To Template mating with described standard statement in template base, if can find, translates and obtains initial translation result standard statement according to To Template;
Replacement module, for according to the corresponding relation of predetermined variable and non-constant, replaces in initial translation result corresponding with it predetermined variable with non-constant;
Translation module, translates for the non-constant of the initial translation result to after replacing, and obtains translation result.
Further, described data reception module comprises:
Character string scanning element, the character string comprising for scan source statement;
String matching unit, for constant and described string matching that constant database is comprised, if can match, using described character string as constant, if can not mate, using described character string as non-constant.
Further, described system also comprises:
Special translation database, be used for judging whether described non-constant is special named entity, if so, based on described special translation database, the described non-constant for special named entity translated, if not, triggering translation module translates the non-constant of described non-special named entity.
Disclosed herein as well is a kind of terminal, comprise foregoing machine translation data handling system.
Further, described terminal is distributed frame, comprises front-end server and at least one background server, and described machine translation data handling system is placed in described at least one background server; Described front-end server receives and includes the translation request of source statement to be translated, and is distributed to the machine translation data handling system in described at least one background server.
Further, described background server quantity is greater than 1, and described terminal also comprises:
Load balancing module, distributes management for the translation request that front-end server is received, and is distributed to corresponding background server by front-end server, makes each background server load balancing.
Compared with prior art, the application comprises following advantage:
The application's machine translation data processing method, system and terminal adopts the mode of multiple (at least two) translation module combination to translate source statement, can improve the efficiency of translation and inquiry and the response speed of system, avoid taking a large amount of system spaces, can guarantee the accuracy of translation result simultaneously.Especially, multiple translation modules are carried out training in advance and determined the mode of its weighting weighted value, can, to the sequence of marking of multiple translation results, obtain translation result comparatively accurately.
Secondly, by adopt the form that source statement is transformed to standard statement come with template base in the mode that matches of template, according to the form in template, the constant in standard statement is translated, and definite final translation sentence formula, carry out static translation, really need the part of dynamic translation to only have non-constant, therefore the work for the treatment of amount of actual translations is less, can improve translation speed and system response time, reduce taking system resource.Meanwhile, because template can guarantee that the sentence formula of translation result is accurate, thereby can improve translation quality, avoid because translation result is inaccurate, user repeats the system burden increase that query translation causes.
In addition, for special dimension, set special translation database, can carry out correspondence translation to non-constant special or non-standard statement, can reduce the workload of subsequent dynamic translation, improve system response time, also can guarantee the accurate of translation quality simultaneously.
Further, for machine translation data handling system and terminal when the specific implementation, can adopt distributed frame, be placed in multiple background servers by machine translation data handling system, front-end server is distributed in each background server after reception translation request, guarantees the load balancing of each background server, simultaneously because the increase of background server quantity, thereby can improve the load-bearing capacity of system, and accelerate system response time, improve treatment effeciency.
Certainly, arbitrary product of enforcement the application not necessarily needs to reach above-described all advantages simultaneously.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the application's machine translation data processing method embodiment mono-;
Fig. 2 is the process flow diagram of the application's machine translation data processing method embodiment bis-;
Fig. 3 is the process flow diagram of the application's machine translation data processing method embodiment tri-;
Fig. 4 is the structural representation of the application's machine translation data handling system embodiment mono-;
Fig. 5 is the structural representation of the application's machine translation data handling system embodiment bis-;
Fig. 6 is the structural representation of the application's machine translation data handling system embodiment tri-;
Fig. 7 is the structural representation of the application's machine translation data handling system embodiment tetra-;
Fig. 8 is the system architecture instance graph of the application's the terminal with machine translation data handling system.
Embodiment
For the above-mentioned purpose, the feature and advantage that make the application can become apparent more, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 1, the application's machine translation data processing method embodiment mono-is shown, the application adopts at least two translation modules to translate, and obtains translation result respectively, and translation result is processed accordingly, comprises the following steps:
Step 101, obtains the translation result of at least two translation module outputs.
Be appreciated that wherein at least two can be that two, three or other are more than or equal to arbitrarily 2 quantity.
Translation module in the application refers to device or the functional module that can translate source statement, each translation module has translation rule and the mode of oneself, the translation rule that is the each translation module in aforesaid at least two translation modules is not identical, and therefore these translation modules may obtain different translation results for the translation of same source statement to be translated.Aforesaid translation result is the translation result obtaining after these at least two translation modules translation rule is separately translated same source statement to be translated.For example, for source statement to be translated, adopt 5 translation modules, can obtain so 5 translation results.
In addition, if single translation module to source statement to be translated be by after partition again the mode of translation translate, so may be because of partition mode difference, and occur different translation results.For example, for source statement to be translated, adopt 5 translation modules, each translation module has adopted three kinds of modes this source statement that breaks, and so each translation module can obtain 3 translation results, and 5 translation modules can obtain 15 translation results.
Step 102, determines the weighting weighted value of translation module.
Step 103, the basic score value of calculating translation result.
Be appreciated that and can adopt training in advance for the basic score value of the translation result in weighting weighted value and step 103 in step 102 good translation module combines to realize.Translation module combination is to be combined by multiple translation modules of the aforementioned description of the application.The computation rule of basic score value and the weighting weighted value of each translation module of the translation result that each translation module obtains in the translation module combination training, are set, when obtaining after translation result, calculate the basic score value of each translation result according to computation rule, and calculate the final score value of each translation result according to corresponding weighting weighted value.
The training of translation module combination can be in the following way:
Suppose that training data is that scale is the parallel corpora pair of N, every pair of parallel corpora is a training sample, and training process is as follows:
Training sample is inputted to each translation module and obtain Output rusults;
Output rusults is marked, obtain the score value of Output rusults;
Score value based on Output rusults corresponding to each translation module is determined the error rate of each translation module;
Determine the weighting weighted value of each translation module according to error rate.
Concrete computing formula is:
Each training sample has initial weight, and the error rate Et of each translation module is: the score value of Output rusults is less than the initial weight sum of the training sample of standard value;
The weighting weighted value of each translation module is: log(1/(Et/(1-Et))).
Preferably, each training sample has identical initial weight, for example, suppose a total N training sample, and the initial weight of each training sample is 1/N.
Basic score value for translation result can judge the translation effect of each translation module for the statement of different structure according to Output rusults corresponding to each training sample, and a fixing score value set in the statement of the different structure of then translating for each translation module.For example, the statement that is A for structure, the basic score value that a translation module is translated the translation result obtaining is 40 points, the basic score value that b translation module is translated the translation result obtaining is 80 points.Like this, analyze according to the structure of the corresponding source statement of each translation result, judge the structure type that it is affiliated, can obtain the basic score value of each translation result.
Preferably, in order to make the calculating of basic score value more objective, improve its accuracy, the training sample adopting can also be in conjunction with the training of aforesaid translation module combination time and the error rate of each translation module are calculated basic score value.Detailed process is as follows:
Determine the final weight of each training sample in each translation module according to error rate;
Determine the to be translated source statement corresponding with translation result and the similarity of each training sample;
Final weight according to each training sample in this translation module and calculate the basic score value of translating the translation result obtaining through this translation module with the similarity of this source statement to be translated.
Wherein, determine that according to error rate the specific formula for calculation of the final weight of each training sample in each translation module is:
For single translation module, if the score value of the Output rusults after certain training sample is translated by this translation module is less than standard value, by this training sample, the final weight in this translation module changes to: (1/N) × Et/(1-Et); If the score value of Output rusults is greater than standard value, be still initial weight 1/N by the final weight of this training sample.Finally, the final weight to each training sample in this translation module is adjusted, and making it summation is 1, and concrete adjustment process can obtain by model training by artificially determining also, and the application does not limit this.
Source statement to be translated and the similarity of each training sample, if each translation module adopts various ways partition to source statement to be translated, it is different from the similarity of same training sample because partition mode is different in same translation module so just may to occur same source statement, thereby changes the basic score value of translation result.Exactly because so, same translation module also can obtain multiple result, thereby increase the accuracy of translation result.
Because the sentence structure difference of translation that each translation module is good at, therefore, determine the sentence structure (sentence structure that each training sample is corresponding different of translation that each translation module is good at according to the mode of training in advance, the sentence structure of the training sample that final weighted value is larger can think that this translation module is good at the sentence structure of translation), and source statement to be translated and each training sample are compared, the mode of being combined with weight from the angle of structural similarity is calculated, thereby can determine comparatively accurately the basic score value of translating the translation result obtaining through this translation module.
Step 104, the weighting weighted value of the basic score value based on described translation result and corresponding translation module calculates the final score value of described each translation result.
Final score value can directly adopt the mode that basic score value and weighting weighted value multiply each other to calculate.For example, for source statement X to be translated, one total a, b, tri-translation modules of c are translated and are obtained respectively Xa, Xb and tri-translation results of Xc, the weighting weighted value that basis score value is respectively 70,80,90, three translation modules is respectively 0.3,0.5,0.9, and the final score value of three translation results is respectively 21,40 and 81 so.Certainly, also can adopt other modes to calculate, the application does not limit this.
Step 105, according to the final score value selected part of each translation result or whole translation result output.
Calculating after the final score value of each translation result, output or whole translation results after can sorting to it according to the final score value of described each translation result, for example choose final score value and come the translation result of three above, or directly choose the translation result output of score value maximum.
In the present embodiment, adopt the mode of multiple translation module combinations to translate source statement, and mark according to predefined weight for the translation result of each translation module, both can guarantee translation speed, the efficiency of raising system processing, make again translation result accurate as much as possible, avoid because translation result when inaccurate user repeatedly input and increase the burden of system.
With reference to Fig. 2, a kind of machine translation data processing method embodiment bis-of the application is shown, before carrying out the step 101 of embodiment mono-, can also comprise the process that source statement to be translated is translated, comprise the following steps:
Step 201, receives source statement to be translated, and constant and the non-constant in described source statement obtained in scanning.
Be appreciated that can carry out cutting to source statement obtains word, then word determined to constant and non-constant with the mode that the constant in pre-stored constant database mates.Certainly, also can adopt other modes to carry out.
Preferably, scanning is obtained in source statement constant and non-constant can be realized in the following way:
The character string comprising in scan source statement;
By the constant comprising in constant database and described string matching, if can match, using described character string as constant, if can not mate, using described character string as non-constant.
For example; suppose that source statement is for " a kind of equipment of vision protection "; when the constant comprising in constant database is mated with each character string of this source statement, can obtain " one " and " " for constant, and " vision protection " and " equipment " is non-constant.
Step 202, replaces the non-constant of described source statement with predetermined variable, obtain standard statement, and record the corresponding relation of described predetermined variable and non-constant.
Predetermined variable adopts standard format, and for example, the order occurring according to non-constant in source statement, uses respectively X1, X2, and X3 ..., Xn represents, by that analogy.Take aforementioned source statement as example, can replace with X1 " vision protection ", with X2 replacement " equipment ", so just obtain standard speech sentence " X2 of a kind of X1 ".Meanwhile, the corresponding relation of addition record predetermined variable and non-constant " X1=vision protection, X2=equipment ".
Step 203 is searched the To Template mating with described standard statement in template base, if can find, according to To Template, standard statement is translated and is obtained initial translation result.
Now, if standard statement " X2 of a kind of X1 " can match To Template in template base, can obtain initial translation result for " a kind of X2for X1 " according to translation corresponding to To Template.
Step 204, according to the corresponding relation of predetermined variable and non-constant, replaces the predetermined variable of correspondence in initial translation result with non-constant.
For example, the corresponding relation of predetermined variable and non-constant is: X1=vision protection, and X2=equipment, the initial translation result after replacing is so " a kind of equipment for vision protection ".
Step 205, translates the non-constant in the initial translation result after replacing, and obtains translation result.
Now, can adopt translation module to translate non-constant, and join in correspondence position, thereby obtain translation result.For example; " equipment " translation is obtained to " equipment "; " vision protection " translation is obtained to " protecting eyesight ", join the position that initial translation result is corresponding and obtain translation result, be " a kind of equipment for protecting eyesight ".
Be appreciated that, in the aforementioned embodiment, the mode that constant in source statement and non-constant mate by the constant in constant database is determined, the character string that so just may occur same source statement splits position difference and matches from the different constants in constant database, thereby final non-constant also can be changed thereupon.Likely occur, same source statement obtains a kind of standard statement after replacing, and also may occur that same source statement obtains multiple standards statement after replacing.
For the source statement that obtains multiple standards statement, can obtain multiple translation results according to preceding method, multiple translation results can be exported to selection for reference simultaneously, also can, to the sequence etc. of marking of multiple translation results, choose a translation result output comparatively accurately.
Be appreciated that the translation for non-constant, except utilizing predetermined translation module translates, in system, special translation database can also be set, for storing the peculiar translation data of some special dimension.For example, for e-commerce field, can, by being stored in special translation database as the named entity of product word, brand word, product type, exabyte and so on, in addition, the named entity of the regularization such as time can also be stored in special translation database.
With reference to Fig. 3, the application's machine translation data processing method embodiment tri-is shown, the non-constant in source statement in the step 205 of embodiment bis-is translated and can also be comprised the following steps:
Step 301, inquire about special translation database, judge whether described non-constant is special named entity, if, based on described special translation database, the described non-constant for special named entity is translated, if not, adopt predetermined translation module to translate the non-constant of described non-special named entity.
When non-constant is product word, brand word, model etc. named entity, can directly translate according to the data in special translation database; If not the named entity that constant is regularization, the such as time etc., the part except numeral in described non-constant is translated according to translation rule corresponding to the time in particular database, finally fill into again the numeral in non-constant.Now, can translate for numeral, also can not deal with.
Be appreciated that for the method described in previous embodiment two and three, can apply separately, also can combine with at present common interpretation method, for example, remember interpretation method.Can be specifically, receive after source statement to be translated, adopt the translation of memory interpretation method, if when memory interpretation method cannot be realized, can continue to process by the method in the application's previous embodiment.Because there is the method for the application's previous embodiment, when translation memory is without a large amount of instance data of storage, can reduce the taking of system memory space, and can guarantee fast translation request to be responded simultaneously, raising system processing speed, and guarantee the accuracy of translation result.
The application's machine translation data processing method adopts the mode of multiple (at least two) translation module combination to translate source statement, can improve the response speed of system, can guarantee the accuracy of translation result simultaneously.Especially, multiple translation modules are carried out training in advance and determined the mode of its weighting weighted value, can, to the sequence of marking of multiple translation results, obtain translation result comparatively accurately.
Secondly, by adopt the form that source statement is transformed to standard statement come with template base in the mode that matches of template, according to the form in template, the constant in standard statement is translated, and definite final translation sentence formula, carry out static translation, really need the part of dynamic translation to only have non-constant, therefore the work for the treatment of amount of actual translations is less, can improve translation speed and system response time, reduce taking system resource.Meanwhile, because template can guarantee that the sentence formula of translation result is accurate, thereby can improve translation quality, avoid because translation result is inaccurate, user repeats the system burden increase that query translation causes.
In addition, for special dimension, set special translation database, can carry out correspondence translation to non-constant special or non-standard statement, can reduce the workload of subsequent dynamic translation, improve system response time, also can guarantee the accurate of translation quality simultaneously.
With reference to Fig. 4, the application's machine translation data handling system embodiment mono-is shown, comprises translation result acquisition module 60, translation module weighting score value determination module 62, translation result basis score value determination module 61, final score value computing module 63 and result output module 65.
Translation result acquisition module 60, for obtaining the translation result of at least two translation module outputs.
Translation module weighting score value determination module 62, for determining the weighting weighted value of translation module.
Translation result basis score value determination module 61, for calculating the basic score value of described translation result.
Final score value computing module 63, calculates the final score value of described each translation result for the weighting weighted value of the basic score value based on described translation result and corresponding translation module.
Result output module 65, for exporting according to the final score value selected part of each translation result or whole translation results.
Preferably, this translation module weighting score value determination module 62 comprises: training unit, score value evaluation unit, error rate computing module and weighted value computing unit.Training unit, obtains Output rusults for training sample being inputted to each translation module; Score value evaluation unit, for Output rusults is marked, obtains the score value of Output rusults; Error rate computing module, determines the error rate of each translation module for the score value based on Output rusults corresponding to each translation module; Weighted value computing unit, for determining the weighting weighted value of each translation module according to error rate.
Preferably, this translation result basis score value determination module 61 comprises final weight calculation unit, similarity calculated and basic score value computing unit.Final weight calculation unit, for determining the final weight of each training sample at each translation module according to error rate; Similarity calculated, for determining the to be translated source statement corresponding with translation result and the similarity of each training sample; Basis score value computing unit, for calculating in the final weight of this translation module with the similarity of this source statement to be translated the basic score value of translating the translation result obtaining through this translation module according to each training sample.
Be appreciated that these modules in aforesaid embodiment mono-can be integrated in predetermined model, obtain each module by the mode of training in advance and calculate needed weight or formula, in the time that reality is used, can directly use.
With reference to Fig. 5, the application's machine translation data handling system embodiment bis-is shown, on the basis of embodiment mono-, this system also comprises data reception module 10, standard statement determination module 20, template translation module 30, replacement module 40 and translation module 50.
Data reception module 10, for receiving source statement to be translated, constant and the non-constant in described source statement obtained in scanning.Preferably, data reception module comprises character string scanning element and string matching unit.Character string scanning element, the character string comprising for scan source statement.String matching unit, for constant and described string matching that constant database is comprised, if can match, using described character string as constant, if can not mate, using described character string as non-constant.
Standard statement determination module 20, for replace the non-constant of described source statement with predetermined variable, obtains standard statement, and records the corresponding relation of described predetermined variable and non-constant.
Template translation module 30, for search the To Template mating with described standard statement in template base, if can find, translates and obtains initial translation result standard statement according to To Template.
Replacement module 40, for according to the corresponding relation of predetermined variable and non-constant, replaces in initial translation result corresponding with it predetermined variable with non-constant.
Translation module 50, translates for the non-constant of the initial translation result to after replacing, and obtains translation result.Wherein, translation module 50 is at least two, can be two, three or multiple.Wherein, 60 of translation result acquisition modules are to obtain corresponding translation result from translation module 50.
With reference to Fig. 6, the application's machine translation data handling system embodiment tri-is shown, this system also comprises special translation database 67, be used for judging whether described non-constant is special named entity, if, based on described special translation database 67, the described non-constant for special named entity is translated, if not, triggered translation module 50 non-constant of described non-special named entity is translated.
Preferably, with reference to Fig. 7, the application's machine translation data handling system embodiment tetra-is shown, this system can also increase memory translation module 70 on the basis of aforementioned each module, memory translation module 70 can be remembered translation to common statement, cannot translate for memory translation module, then process translation by subsequent module.This kind of combination, can reduce the memory space of the instance data in memory module, but can realize equally the accurate translation to source statement, therefore can reduce the taking of system memory space, and guarantee the accuracy of translation result.
Preferably, the application also relates to a kind of terminal, comprises as the machine translation data handling system in previous embodiment one, two or three.Terminal can be client, server end.If terminal is server end, can adopt independently structure, i.e. machine translation data handling system is placed in server, and this server also receives translation request simultaneously.Be appreciated that now and machine translation data handling system directly can be placed in to server, modules is done as a whole from physics deployment.Preferably, also the modules of machine translation data handling system can be arranged in server, modules is disposed as distributed frame from physics.Concrete, machine translation data handling system can be divided into three layers of physical arrangement in server: ground floor is data receiver layer, and the translation request sending for receiving client realizes the function such as fractionation, merging and distribution of data; The second layer is translation logic layer, and the 3rd layer is database (each module of aforesaid machine translation data handling system respectively correspondence be placed in the second layer and the 3rd layer), and the two common realization is translated data of data receiving layer distribution.Wherein, translation logic layer can arrange multiple separate translation nodes, and Data dissemination layer can directly arrive concrete translation node by Data dissemination.In the time of negligible amounts, the second layer and the 3rd layer can merge into a single whole on physics is disposed, and in the time that data volume is more, the two is separate on physics is disposed.These three layers of physical arrangements can be deployed as row-column configuration in server, each other by sending request to realize data interaction.Realize distributed frame in aforementioned same server time, can also be in server built-in load balancing module, be distributed to each translation node of the second layer for realizing the equilibrium of data.By the mode of arranging respectively in this kind of physical arrangement, translation duties is transferred to multiple translation nodes, thereby raise the efficiency, and can dynamically adjust as required computational resource, meet big data quantity, real-time translation duties requirement, there is enhanced scalability and retractility; Distributed computing technology can be avoided making the danger of whole system collapse because individual node loses efficacy simultaneously, has good fault-tolerance.
In addition, be appreciated that except realizing distributed frame in a station server, can also use the distributed frame of multiple servers, be that terminal can comprise front-end server and at least one background server, machine translation data handling system is placed in described at least one background server; Described front-end server receives and includes the translation request of source statement to be translated, and is distributed to the machine translation data handling system in described at least one background server.Wherein, the quantity of background server can arrange according to actual needs.
With reference to Fig. 8, terminal embodiment mono-is shown, comprise front-end server, many background servers and load balancing module.
Load balancing module can be placed in front-end server, also can independently arrange.The translation request of what load balancing module received front-end server include source statement to be translated is distributed according to the real-time status of each background server, and sends corresponding background server to by front-end server.
Preferably, can also classify to background server, for example, be responsible for respectively the translation between different language, load balancing module can be distributed according to languages corresponding to each background server.For example, background server 1 is responsible for the translation of English-Chinese, and background server 2 is responsible for the translation of German-Chinese, by that analogy.Load balancing module can be distributed according to corresponding translation brief in translation request.
By distributed frame, can guarantee the load balancing of each background server, simultaneously because can increase the quantity of background server, thereby can improve the load-bearing capacity of system, and accelerate system response time, improve treatment effeciency.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment, between each embodiment identical similar part mutually referring to.For system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part is referring to the part explanation of embodiment of the method.
The application is with reference to describing according to process flow diagram and/or the block scheme of the method for the embodiment of the present application, equipment (system) and computer program.Should understand can be by the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or the combination of square frame.Can provide these computer program instructions to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, the instruction that makes to carry out by the processor of computing machine or other programmable data processing device produces the device for realizing the function of specifying at flow process of process flow diagram or multiple flow process and/or square frame of block scheme or multiple square frame.
These computer program instructions also can be stored in energy vectoring computer or the computer-readable memory of other programmable data processing device with ad hoc fashion work, the instruction that makes to be stored in this computer-readable memory produces the manufacture that comprises command device, and this command device is realized the function of specifying in flow process of process flow diagram or multiple flow process and/or square frame of block scheme or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make to carry out sequence of operations step to produce computer implemented processing on computing machine or other programmable devices, thereby the instruction of carrying out is provided for realizing the step of the function of specifying in flow process of process flow diagram or multiple flow process and/or square frame of block scheme or multiple square frame on computing machine or other programmable devices.
The machine translation data processing method, system and the terminal that above the application are provided are described in detail, applied principle and the embodiment of specific case to the application herein and set forth, the explanation of above embodiment is just for helping to understand the application's method and core concept thereof; , for one of ordinary skill in the art, according to the application's thought, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application meanwhile.

Claims (18)

1. a machine translation data processing method, is characterized in that, comprises the following steps:
Obtain the translation result of at least two translation module outputs;
Determine the weighting weighted value of described translation module;
Calculate the basic score value of described translation result;
The weighting weighted value of the basic score value based on described translation result and corresponding translation module calculates the final score value of described translation result;
According to the final score value selected part of described translation result or whole translation result output.
2. machine translation data processing method as claimed in claim 1, it is characterized in that, described at least two translation modules have different translation rules, described translation result be described in the translation result that obtains after same source statement to be translated is translated of at least two translation modules translation rule separately.
3. machine translation data processing method as claimed in claim 1, is characterized in that, the described weighting weighted value of determining described translation module comprises:
Training sample is inputted to each translation module and obtain Output rusults;
Output rusults is marked, obtain the score value of Output rusults;
Score value based on Output rusults corresponding to each translation module is determined the error rate of described translation module;
Determine the weighting weighted value of each translation module according to error rate.
4. machine translation data processing method as claimed in claim 3, is characterized in that, the score value that the error rate of described translation module is Output rusults is less than the initial weight sum of the training sample of standard value;
Describedly determine that according to error rate the weighting weighted value of described translation module adopts following formula to calculate:
Log(1/(error rate/(1-error rate))).
5. machine translation data processing method as claimed in claim 3, is characterized in that, the basic score value of the described translation result of described calculating comprises:
Determine the final weight of each training sample in each translation module according to error rate;
Determine the to be translated source statement corresponding with translation result and the similarity of each training sample;
Final weight according to each training sample in this translation module and calculate the basic score value of translating the translation result obtaining through this translation module with the similarity of this source statement to be translated.
6. machine translation data processing method as claimed in claim 5, is characterized in that, describedly determines that according to error rate the final weight of each training sample in each translation module comprises:
If the score value of the Output rusults after certain training sample is translated by this translation module is less than standard value, the final weight of this training sample in this translation module is: initial weight × error rate/(1-error rate); If the score value of Output rusults is greater than standard value, the final weight of this training sample is initial weight.
7. the machine translation data processing method as described in claim 1 to 6 any one, is characterized in that, further comprising the steps of before the translation result that obtains at least two translation module outputs:
Receive source statement to be translated, constant and the non-constant in described source statement obtained in scanning;
Replace the non-constant of described source statement with predetermined variable, obtain standard statement, and record the corresponding relation of described predetermined variable and non-constant;
In template base, search the To Template mating with described standard statement, if can find, according to To Template, standard statement is translated and obtained initial translation result;
According to the corresponding relation of predetermined variable and non-constant, replace in initial translation result corresponding with it predetermined variable with non-constant;
Non-constant in initial translation result after replacing is translated, obtained translation result.
8. machine translation data processing method as claimed in claim 7, is characterized in that, constant and non-constant that described scanning is obtained in described source statement comprise:
The character string comprising in scan source statement;
By the constant comprising in constant database and described string matching, if can match, using described character string as constant, if can not mate, using described character string as non-constant.
9. machine translation data processing method as claimed in claim 7, is characterized in that, described non-constant in initial translation result after replacing is translated and comprised:
Adopt predetermined translation module to translate described non-constant; Or
Inquire about special translation database, judge whether described non-constant is special named entity, if, based on described special translation database, the described non-constant for special named entity is translated, if not, adopt predetermined translation module to translate the non-constant of described non-special named entity.
10. a machine translation data handling system, is characterized in that, comprising:
Translation result acquisition module, for obtaining the translation result of at least two translation module outputs;
Translation module weighting score value determination module, for determining the weighting weighted value of translation module;
Translation result basis score value determination module, for calculating the basic score value of described translation result;
Final score value computing module, calculates the final score value of described each translation result for the weighting weighted value of the basic score value based on described translation result and corresponding translation module;
Result output module, for exporting according to the final score value selected part of each translation result or whole translation results.
11. machine translation data handling systems as claimed in claim 10, is characterized in that, described translation module weighting score value determination module comprises:
Training unit, obtains Output rusults for training sample being inputted to each translation module;
Score value evaluation unit, for Output rusults is marked, obtains the score value of Output rusults;
Error rate computing module, determines the error rate of each translation module for the score value based on Output rusults corresponding to each translation module;
Weighted value computing unit, for determining the weighting weighted value of each translation module according to error rate.
12. machine translation data handling systems as claimed in claim 11, is characterized in that, described translation result basis score value determination module comprises:
Final weight calculation unit, for determining the final weight of each training sample at each translation module according to error rate;
Similarity calculated, for determining the to be translated source statement corresponding with translation result and the similarity of each training sample;
Basis score value computing unit, for calculating in the weight of this translation module with the similarity of this source statement to be translated the basic score value of translating the translation result obtaining through this translation module according to each training sample.
13. machine translation data handling systems as described in claim 10 to 12 any one, is characterized in that, described system also comprises:
Data reception module, for receiving source statement to be translated, constant and the non-constant in described source statement obtained in scanning;
Standard statement determination module, for replace the non-constant of described source statement with predetermined variable, obtains standard statement, and records the corresponding relation of described predetermined variable and non-constant;
Template translation module, for search the To Template mating with described standard statement in template base, if can find, translates and obtains initial translation result standard statement according to To Template;
Replacement module, for according to the corresponding relation of predetermined variable and non-constant, replaces in initial translation result corresponding with it predetermined variable with non-constant;
Translation module, translates for the non-constant of the initial translation result to after replacing, and obtains translation result.
14. machine translation data handling systems as claimed in claim 13, is characterized in that, described data reception module comprises:
Character string scanning element, the character string comprising for scan source statement;
String matching unit, for constant and described string matching that constant database is comprised, if can match, using described character string as constant, if can not mate, using described character string as non-constant.
15. machine translation data handling systems as claimed in claim 13, is characterized in that, described system also comprises:
Special translation database, be used for judging whether described non-constant is special named entity, if so, based on described special translation database, the described non-constant for special named entity translated, if not, triggering translation module translates the non-constant of described non-special named entity.
16. 1 kinds of terminals, is characterized in that, comprise the machine translation data handling system as claim 10 to 15 any one.
17. terminals as claimed in claim 16, is characterized in that, described terminal is distributed frame, comprise front-end server and at least one background server, and described machine translation data handling system is placed in described at least one background server; Described front-end server receives and includes the translation request of source statement to be translated, and is distributed to the machine translation data handling system in described at least one background server.
18. terminals as claimed in claim 17, is characterized in that, described background server quantity is greater than 1, and described terminal also comprises:
Load balancing module, distributes management for the translation request that front-end server is received, and is distributed to corresponding background server by front-end server, makes each background server load balancing.
CN201210459144.4A 2012-11-14 2012-11-14 Machine translation data processing method, system and terminal Active CN103810159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210459144.4A CN103810159B (en) 2012-11-14 2012-11-14 Machine translation data processing method, system and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210459144.4A CN103810159B (en) 2012-11-14 2012-11-14 Machine translation data processing method, system and terminal

Publications (2)

Publication Number Publication Date
CN103810159A true CN103810159A (en) 2014-05-21
CN103810159B CN103810159B (en) 2017-03-01

Family

ID=50706946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210459144.4A Active CN103810159B (en) 2012-11-14 2012-11-14 Machine translation data processing method, system and terminal

Country Status (1)

Country Link
CN (1) CN103810159B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503960A (en) * 2015-01-07 2015-04-08 渤海大学 Text data processing method for English translation
CN109213902A (en) * 2017-07-03 2019-01-15 上海全土豆文化传播有限公司 Information processing and methods of exhibiting and device
CN109617974A (en) * 2018-12-21 2019-04-12 珠海金山办公软件有限公司 A kind of request processing method, device and server
CN109766561A (en) * 2019-01-17 2019-05-17 陕西译喵网络科技有限公司 A kind of interpretation method based on translation quality, device, terminal and storage medium
CN111144111A (en) * 2019-12-30 2020-05-12 北京世纪好未来教育科技有限公司 Translation method, device, equipment and storage medium
CN111553174A (en) * 2020-04-02 2020-08-18 腾讯科技(深圳)有限公司 Sentence translation method and device based on artificial intelligence
CN114048759A (en) * 2021-11-16 2022-02-15 北京百度网讯科技有限公司 Model training method, data processing method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271461A (en) * 2007-03-19 2008-09-24 株式会社东芝 Cross-language retrieval request conversion and cross-language information retrieval method and system
US20100023315A1 (en) * 2008-07-25 2010-01-28 Microsoft Corporation Random walk restarts in minimum error rate training
CN101714136A (en) * 2008-10-06 2010-05-26 株式会社东芝 Method and device for adapting a machine translation system based on language database to new field
CN101788978A (en) * 2009-12-30 2010-07-28 中国科学院自动化研究所 Chinese and foreign spoken language automatic translation method combining Chinese pinyin and character

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271461A (en) * 2007-03-19 2008-09-24 株式会社东芝 Cross-language retrieval request conversion and cross-language information retrieval method and system
US20100023315A1 (en) * 2008-07-25 2010-01-28 Microsoft Corporation Random walk restarts in minimum error rate training
CN101714136A (en) * 2008-10-06 2010-05-26 株式会社东芝 Method and device for adapting a machine translation system based on language database to new field
CN101788978A (en) * 2009-12-30 2010-07-28 中国科学院自动化研究所 Chinese and foreign spoken language automatic translation method combining Chinese pinyin and character

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503960A (en) * 2015-01-07 2015-04-08 渤海大学 Text data processing method for English translation
CN109213902A (en) * 2017-07-03 2019-01-15 上海全土豆文化传播有限公司 Information processing and methods of exhibiting and device
CN109617974A (en) * 2018-12-21 2019-04-12 珠海金山办公软件有限公司 A kind of request processing method, device and server
CN109766561A (en) * 2019-01-17 2019-05-17 陕西译喵网络科技有限公司 A kind of interpretation method based on translation quality, device, terminal and storage medium
CN111144111A (en) * 2019-12-30 2020-05-12 北京世纪好未来教育科技有限公司 Translation method, device, equipment and storage medium
CN111553174A (en) * 2020-04-02 2020-08-18 腾讯科技(深圳)有限公司 Sentence translation method and device based on artificial intelligence
CN114048759A (en) * 2021-11-16 2022-02-15 北京百度网讯科技有限公司 Model training method, data processing method, device, equipment and medium

Also Published As

Publication number Publication date
CN103810159B (en) 2017-03-01

Similar Documents

Publication Publication Date Title
CN103810159A (en) Machine translation data processing method, system and terminal
CN105335519B (en) Model generation method and device and recommendation method and device
Wang et al. Improved multi-order distributed HOSVD with its incremental computing for smart city services
CN105247507B (en) Method, system and storage medium for the influence power score for determining brand
US10191946B2 (en) Answering natural language table queries through semantic table representation
KR101991320B1 (en) Method for extending ontology using resources represented by the ontology
WO2019190689A1 (en) Syntax Based Source Code Search
EP3065068A1 (en) Method and apparatus for determining semantic matching degree
WO2012085518A1 (en) Method and apparatus for processing electronic data
CN105243149B (en) A kind of semantic-based web query recommended method and system
CN103544623A (en) Web service recommendation method based on user preference feature modeling
JP6363682B2 (en) Method for selecting an image that matches content based on the metadata of the image and content
CN104067273A (en) Grouping search results into a profile page
WO2023045187A1 (en) Semantic search method and apparatus based on neural network, device, and storage medium
US20140095490A1 (en) Ranking supervised hashing
Langarica Córdoba et al. An observer–based scheme for decentralized stabilization of large‐scale systems with application to power systems
CN102156733A (en) Search engine and method based on service oriented architecture
WO2022166689A1 (en) Information retrieval method and related system, and storage medium
Jiang et al. Arnoldi-based model reduction for fractional order linear systems
CN103744970B (en) A kind of method and device of the descriptor determining picture
CN104021117A (en) Language processing method and electronic device
Zhang et al. Cellular neural network modelling of soft tissue dynamics for surgical simulation
CN110083674B (en) Intellectual property information processing method and device
US10789510B2 (en) Dynamic minibatch sizes
Liu et al. Ace-Sniper: Cloud-Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous Devices

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1195148

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1195148

Country of ref document: HK

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221109

Address after: Room 211, Floor 2, Building 5, No. 699, Wangshang Road, Changhe Street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Alibaba Zhirong Digital Technology Co.,Ltd.

Address before: Box 847, four, Grand Cayman capital, Cayman Islands, UK

Patentee before: ALIBABA GROUP HOLDING Ltd.