CN109344409B

CN109344409B - Translation robot selection method

Info

Publication number: CN109344409B
Application number: CN201811091026.6A
Authority: CN
Inventors: 何征宇; 何恩培; 郑丽华; 王莲
Original assignee: Transn Iol Technology Co ltd
Current assignee: Transn Iol Technology Co ltd
Priority date: 2018-09-19
Filing date: 2018-09-19
Publication date: 2023-10-27
Anticipated expiration: 2038-09-19
Also published as: CN109344409A

Abstract

The invention provides a selection method of translation robots, which can select a proper number of robots from a translation robot group for translation according to the attribute of data to be translated; and the attribute of the translation robot can be updated based on the translation result in the translation process, so that a more accurate reference basis is provided for the next selection of the translation robot. By using the method of the invention, the more translation robots are, the more reference histories are used for selection, and the selection effect is better; in addition, the related translation personnel can control the number of the selected translation robots by setting the height of the matching conditions according to the actual translation precision requirement and based on the translation market requirement.

Description

Translation robot selection method

Technical Field

The invention belongs to the technical field of translation, and particularly relates to a selection method of a translation robot.

Background

In the translation field, there are a number of different translation tools, with different emphasis on each. For the same piece of data to be translated, in order to ensure the accuracy of the translation results, a translation person usually adopts a plurality of translation tools to obtain a plurality of candidate translation results at the same time, then calculates the respective scores of the plurality of candidate translation results by using a corresponding language model or a scoring algorithm, and selects the candidate translation result with the highest score as the final translation result. Such prior art includes the machine translation result selection method disclosed in application No. 2012103205447, and the like.

However, the above related art is blind in selecting the translation tools, that is, all the translation tools are tried no matter what the data to be translated is, and all the translation results of all the translation tools are scored, and although the relatively optimal translation result or the translation tool can be obtained through the scoring, the process is complex to implement, especially when the number of the data to be translated is large and the available translation tools are large, the whole process is time-consuming and laborious, and the translation efficiency is reduced. The purpose of using translation tools originally is to assist manual translation to improve efficiency, and the more convenience, accuracy and efficiency of the translation tools should be higher, but under the scheme of the prior art, the more translation tools are, the more the execution cost is, the higher the execution cost is.

In view of this, a method for adaptively selecting a translation tool is highly desired by a translator, and the translation efficiency and accuracy are improved after the selected translation tool is adopted.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method for selecting translation robots, which can select a proper number of robots from a translation robot group for translation according to the attribute of data to be translated; and the attribute of the translation robot can be updated based on the translation result in the translation process, so that a more accurate reference basis is provided for the next selection of the translation robot.

In a first aspect of the present invention, there is provided a method of selecting a translation robot, the method comprising:

(1) Analyzing the attribute of the data to be translated, and determining a domain keyword data set corresponding to the data to be translated;

(2) Selecting a translation robot matched with the domain keyword data set from a candidate translation robot group;

it is characterized in that

The analyzing the attribute of the data to be translated comprises the following steps:

randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus; determining at least one domain keyword data based on the result of the word segmentation process;

the step (2) comprises:

analyzing the history translation result record of the candidate robot, and randomly extracting a corpus with a second preset proportion from the history translation result record;

and analyzing the matching degree score of the at least one field keyword data in the corpus of the second preset proportion, and selecting a translation robot meeting the matching condition based on the matching degree score.

As an important improvement point of the invention, when analyzing the attribute of the data to be translated, the invention does not directly perform word segmentation processing on all the data to be translated, but adopts a random extraction method, thereby greatly reducing the workload;

of course, the premise of reducing the workload is to ensure the representativeness and the accuracy of the word segmentation processing result, and if the simple random extraction is not aimed at, the accuracy cannot be ensured.

Therefore, the randomly extracting the corpus of the first predetermined proportion from the data to be translated includes:

(21) Randomly extracting a corpus of a third predetermined proportion from the beginning of the data to be translated,

and/or;

(22) Randomly extracting a fourth predetermined proportion of corpus from the tail part of the data to be translated,

as a further improvement of the invention, the random extraction of the invention is very targeted, and the random extraction must be chosen to be performed from the beginning of the data to be translated and/or from the end of the data to be translated onwards, which is one of the innovative aspects of the invention.

At least one domain keyword data can be obtained by randomly extracting a first predetermined proportion of corpus.

Next, the present invention selects a translation robot most suitable for translating the current data to be translated from among a plurality of candidate translation robots.

Unlike available technology, which submits the data to be translated to several translation robots and scoring and selecting, the present invention selects the most proper translation robot before translation.

Specifically, the invention fully utilizes the existing translation result history record of the candidate translation robot.

Of course, the number of translation result histories of each candidate translation robot is different, some robots may be larger in number, and some robots may not have histories.

Aiming at candidate translation robots with a large number of history records, the invention randomly extracts a second predetermined proportion of corpus from the history translation result records;

as another improvement point of the present invention, when extracting the corpus of the second predetermined proportion, the present invention preferably selects the history record nearest to the current time node;

as another improvement point of the invention, the invention randomly selects a plurality of histories of different time periods when extracting the corpus of a second preset proportion;

on the basis, the invention starts to perform matching selection, and the matching selection is mainly based on the matching degree score of the at least one domain keyword data in the corpus of the second preset proportion. The matching degree score can be comprehensively considered from the dimensions of the domain keyword data, such as the frequency of occurrence, the distribution position and the like of the corpus of the second predetermined proportion, and a specific scoring mode is described in a specific embodiment section.

For the candidate translation robots without history records, the invention directly selects the candidate translation robots as translation robots meeting the matching conditions.

In another aspect of the present invention, a system for selecting and updating a translation robot is provided, which is used for implementing the foregoing method for selecting a translation robot, and updating a history of the translation robot based on a translation result of the selected robot.

Through the selection method, the invention can fully utilize the functional characteristics and translation directions of the existing translation robots to rapidly select the most suitable robot. More importantly, the invention can complete matching selection before translation work, instead of needing to translate before selecting as in the prior art, thereby greatly improving the efficiency; by using the method of the invention, the more translation robots are, the more reference histories are used for selection, and the selection effect is better; in addition, the related translation personnel can control the number of the selected translation robots by setting the height of the matching conditions according to the actual translation precision requirement and based on the translation market requirement.

Drawings

FIG. 1 is a flow chart of a selection method of the translation robot of the present invention

FIG. 2 is a specific implementation of determining a domain keyword dataset

Detailed Description

Referring to fig. 1, the method includes two major parts:

(1) Analyzing the attribute of the data to be translated, and determining the domain keyword dataset corresponding to the data to be translated

(2) And selecting a translation robot matched with the domain keyword data set from the candidate translation robot group.

Referring to fig. 2, a specific embodiment of the implementation of step (1) in fig. 1 includes:

(11) Randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus;

(12) And determining at least one domain keyword data based on the word segmentation processing result.

In specific implementation, the domain keyword data are translation words of corresponding languages of the vocabulary in the word segmentation processing result set;

in a specific implementation, the step (2) is to analyze the history translation result record of the candidate translation robot and randomly extract a corpus with a second predetermined proportion from the history translation result record;

On the basis, matching selection is started, wherein the matching selection is mainly based on the matching degree score of the at least one domain keyword data in the corpus of the second preset proportion. The matching degree score can be comprehensively considered from the dimensions of the domain keyword data, such as the frequency, the distribution position, the case and the like of the occurrence of the corpus of the second preset proportion.

As an example, if a domain keyword appears more frequently and in capitalized state in a corpus of a second predetermined proportion, the score of the word is higher, where the score can be measured by the following formula:

M1=exp(x)+lg(Y);

M2=exp（X）+lg（Z）；

wherein X is a case state description, X is 1 in case of upper case, and 0 in case of lower case; y is frequency, and the frequency of occurrence of a keyword in a certain field in the corpus of a second preset proportion is expressed as percentage;

z is a distribution position, represents the position of the domain keyword in the corpus of a second preset proportion and the sum of representative values of corresponding times, and the representative values are taken according to the following standard:

the first 1/10 or the last 1/10 part appearing in the corpus of the second predetermined proportion appears z1 times, and then the representative value is exp (z1≡10);

the part of {1/10,1/5}, Z2 times, the representative value is exp (z2≡5);

the number of occurrences is z3 in {1/5,9/10} portions, and the representative value is exp (z3≡9).

The matching condition may be: the M2 score is greater than e+3;

the matching condition may also be: m1 is greater than e+9.

Where e is a natural logarithmic base, exp () represents an exponent based on e, z≡num, and z is to the power of num.

The above formula fully considers the priority of case, rather than only taking frequency as a unique index, and the accuracy is higher compared with the prior art which only considers high frequency.

Of course, if a candidate translation robot does not have a history, this means that the candidate translation robot may be the latest translation technology that comes into existence, and should preferably be considered.

In practical application, the translator can determine the matching condition according to the number of candidate robots, the requirements of translation clients of the materials to be translated, and the like, for example, the matching is improved, so that fewer translation robots are obtained, but the accuracy of the translation result is ensured.

Claims

1. A method of selecting a translation robot, the method comprising the steps of:

s1: randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus; determining at least one domain keyword data based on the result of the word segmentation process;

s2: analyzing a history translation result record of the candidate robot, and randomly extracting a corpus with a second preset proportion from the history translation result record;

analyzing matching degree scores of the at least one field keyword data in the corpus of the second preset proportion, and selecting a translation robot meeting matching conditions from the candidate robots based on the matching degree scores;

step S1 randomly extracts a first predetermined proportion of corpus from the data to be translated, including:

randomly extracting corpus with a third preset proportion from the beginning part of the data to be translated;

randomly extracting a fourth corpus with a preset proportion from the tail part of the data to be translated;

step S2, when randomly extracting the corpus of a second preset proportion, preferentially selecting a history record nearest to the current time node;

the matching degree score is determined based on the frequency, distribution position and case of occurrence of the corpus of the domain keyword data in a second preset proportion;

the score was measured using the following formula:

M1＝exp(x)+lg(Y)；M2＝exp(X)+lg(Z)；

wherein X is a case state description, X is 1 in case of upper case, and 0 in case of lower case;

y is frequency, and the frequency of occurrence of a keyword in a certain field in the corpus of a second preset proportion is expressed as percentage;

appearing in {1/10,1/5} part, and z2 times, the representative value is exp (z2ζ5);

the number of occurrences is z3 in the {1/5,9/10} portion, and the representative value is exp (z3≡9);

e is a natural logarithmic base, exp () represents an exponent based on e, and z≡num represents the num power of z.

2. The method for selecting a translation robot according to claim 1, wherein the matching condition is: the M2 score is greater than e+3.

3. The method for selecting a translation robot according to claim 1, wherein the matching condition is: m1 is greater than e+9.

4. A selection system of translation robots comprising functional modules for performing the steps of the method of claim 1 or 2.

5. An updating system of a translation robot for use with the selection system of a translation robot of claim 4 for updating attributes of the translation robot, the translation robot attributes including a translation result record thereof.