CN109344409B - Translation robot selection method - Google Patents
Translation robot selection method Download PDFInfo
- Publication number
- CN109344409B CN109344409B CN201811091026.6A CN201811091026A CN109344409B CN 109344409 B CN109344409 B CN 109344409B CN 201811091026 A CN201811091026 A CN 201811091026A CN 109344409 B CN109344409 B CN 109344409B
- Authority
- CN
- China
- Prior art keywords
- translation
- corpus
- robot
- data
- translated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010187 selection method Methods 0.000 title abstract description 7
- 238000000034 method Methods 0.000 claims abstract description 19
- 230000011218 segmentation Effects 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a selection method of translation robots, which can select a proper number of robots from a translation robot group for translation according to the attribute of data to be translated; and the attribute of the translation robot can be updated based on the translation result in the translation process, so that a more accurate reference basis is provided for the next selection of the translation robot. By using the method of the invention, the more translation robots are, the more reference histories are used for selection, and the selection effect is better; in addition, the related translation personnel can control the number of the selected translation robots by setting the height of the matching conditions according to the actual translation precision requirement and based on the translation market requirement.
Description
Technical Field
The invention belongs to the technical field of translation, and particularly relates to a selection method of a translation robot.
Background
In the translation field, there are a number of different translation tools, with different emphasis on each. For the same piece of data to be translated, in order to ensure the accuracy of the translation results, a translation person usually adopts a plurality of translation tools to obtain a plurality of candidate translation results at the same time, then calculates the respective scores of the plurality of candidate translation results by using a corresponding language model or a scoring algorithm, and selects the candidate translation result with the highest score as the final translation result. Such prior art includes the machine translation result selection method disclosed in application No. 2012103205447, and the like.
However, the above related art is blind in selecting the translation tools, that is, all the translation tools are tried no matter what the data to be translated is, and all the translation results of all the translation tools are scored, and although the relatively optimal translation result or the translation tool can be obtained through the scoring, the process is complex to implement, especially when the number of the data to be translated is large and the available translation tools are large, the whole process is time-consuming and laborious, and the translation efficiency is reduced. The purpose of using translation tools originally is to assist manual translation to improve efficiency, and the more convenience, accuracy and efficiency of the translation tools should be higher, but under the scheme of the prior art, the more translation tools are, the more the execution cost is, the higher the execution cost is.
In view of this, a method for adaptively selecting a translation tool is highly desired by a translator, and the translation efficiency and accuracy are improved after the selected translation tool is adopted.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for selecting translation robots, which can select a proper number of robots from a translation robot group for translation according to the attribute of data to be translated; and the attribute of the translation robot can be updated based on the translation result in the translation process, so that a more accurate reference basis is provided for the next selection of the translation robot.
In a first aspect of the present invention, there is provided a method of selecting a translation robot, the method comprising:
(1) Analyzing the attribute of the data to be translated, and determining a domain keyword data set corresponding to the data to be translated;
(2) Selecting a translation robot matched with the domain keyword data set from a candidate translation robot group;
it is characterized in that
The analyzing the attribute of the data to be translated comprises the following steps:
randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus; determining at least one domain keyword data based on the result of the word segmentation process;
the step (2) comprises:
analyzing the history translation result record of the candidate robot, and randomly extracting a corpus with a second preset proportion from the history translation result record;
and analyzing the matching degree score of the at least one field keyword data in the corpus of the second preset proportion, and selecting a translation robot meeting the matching condition based on the matching degree score.
As an important improvement point of the invention, when analyzing the attribute of the data to be translated, the invention does not directly perform word segmentation processing on all the data to be translated, but adopts a random extraction method, thereby greatly reducing the workload;
of course, the premise of reducing the workload is to ensure the representativeness and the accuracy of the word segmentation processing result, and if the simple random extraction is not aimed at, the accuracy cannot be ensured.
Therefore, the randomly extracting the corpus of the first predetermined proportion from the data to be translated includes:
(21) Randomly extracting a corpus of a third predetermined proportion from the beginning of the data to be translated,
and/or;
(22) Randomly extracting a fourth predetermined proportion of corpus from the tail part of the data to be translated,
as a further improvement of the invention, the random extraction of the invention is very targeted, and the random extraction must be chosen to be performed from the beginning of the data to be translated and/or from the end of the data to be translated onwards, which is one of the innovative aspects of the invention.
At least one domain keyword data can be obtained by randomly extracting a first predetermined proportion of corpus.
Next, the present invention selects a translation robot most suitable for translating the current data to be translated from among a plurality of candidate translation robots.
Unlike available technology, which submits the data to be translated to several translation robots and scoring and selecting, the present invention selects the most proper translation robot before translation.
Specifically, the invention fully utilizes the existing translation result history record of the candidate translation robot.
Of course, the number of translation result histories of each candidate translation robot is different, some robots may be larger in number, and some robots may not have histories.
Aiming at candidate translation robots with a large number of history records, the invention randomly extracts a second predetermined proportion of corpus from the history translation result records;
as another improvement point of the present invention, when extracting the corpus of the second predetermined proportion, the present invention preferably selects the history record nearest to the current time node;
as another improvement point of the invention, the invention randomly selects a plurality of histories of different time periods when extracting the corpus of a second preset proportion;
on the basis, the invention starts to perform matching selection, and the matching selection is mainly based on the matching degree score of the at least one domain keyword data in the corpus of the second preset proportion. The matching degree score can be comprehensively considered from the dimensions of the domain keyword data, such as the frequency of occurrence, the distribution position and the like of the corpus of the second predetermined proportion, and a specific scoring mode is described in a specific embodiment section.
For the candidate translation robots without history records, the invention directly selects the candidate translation robots as translation robots meeting the matching conditions.
In another aspect of the present invention, a system for selecting and updating a translation robot is provided, which is used for implementing the foregoing method for selecting a translation robot, and updating a history of the translation robot based on a translation result of the selected robot.
Through the selection method, the invention can fully utilize the functional characteristics and translation directions of the existing translation robots to rapidly select the most suitable robot. More importantly, the invention can complete matching selection before translation work, instead of needing to translate before selecting as in the prior art, thereby greatly improving the efficiency; by using the method of the invention, the more translation robots are, the more reference histories are used for selection, and the selection effect is better; in addition, the related translation personnel can control the number of the selected translation robots by setting the height of the matching conditions according to the actual translation precision requirement and based on the translation market requirement.
Drawings
FIG. 1 is a flow chart of a selection method of the translation robot of the present invention
FIG. 2 is a specific implementation of determining a domain keyword dataset
Detailed Description
Referring to fig. 1, the method includes two major parts:
(1) Analyzing the attribute of the data to be translated, and determining the domain keyword dataset corresponding to the data to be translated
(2) And selecting a translation robot matched with the domain keyword data set from the candidate translation robot group.
Referring to fig. 2, a specific embodiment of the implementation of step (1) in fig. 1 includes:
(11) Randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus;
(12) And determining at least one domain keyword data based on the word segmentation processing result.
In specific implementation, the domain keyword data are translation words of corresponding languages of the vocabulary in the word segmentation processing result set;
in a specific implementation, the step (2) is to analyze the history translation result record of the candidate translation robot and randomly extract a corpus with a second predetermined proportion from the history translation result record;
and analyzing the matching degree score of the at least one field keyword data in the corpus of the second preset proportion, and selecting a translation robot meeting the matching condition based on the matching degree score.
On the basis, matching selection is started, wherein the matching selection is mainly based on the matching degree score of the at least one domain keyword data in the corpus of the second preset proportion. The matching degree score can be comprehensively considered from the dimensions of the domain keyword data, such as the frequency, the distribution position, the case and the like of the occurrence of the corpus of the second preset proportion.
As an example, if a domain keyword appears more frequently and in capitalized state in a corpus of a second predetermined proportion, the score of the word is higher, where the score can be measured by the following formula:
M1=exp(x)+lg(Y);
M2=exp(X)+lg(Z);
wherein X is a case state description, X is 1 in case of upper case, and 0 in case of lower case; y is frequency, and the frequency of occurrence of a keyword in a certain field in the corpus of a second preset proportion is expressed as percentage;
z is a distribution position, represents the position of the domain keyword in the corpus of a second preset proportion and the sum of representative values of corresponding times, and the representative values are taken according to the following standard:
the first 1/10 or the last 1/10 part appearing in the corpus of the second predetermined proportion appears z1 times, and then the representative value is exp (z1≡10);
the part of {1/10,1/5}, Z2 times, the representative value is exp (z2≡5);
the number of occurrences is z3 in {1/5,9/10} portions, and the representative value is exp (z3≡9).
The matching condition may be: the M2 score is greater than e+3;
the matching condition may also be: m1 is greater than e+9.
Where e is a natural logarithmic base, exp () represents an exponent based on e, z≡num, and z is to the power of num.
The above formula fully considers the priority of case, rather than only taking frequency as a unique index, and the accuracy is higher compared with the prior art which only considers high frequency.
Of course, if a candidate translation robot does not have a history, this means that the candidate translation robot may be the latest translation technology that comes into existence, and should preferably be considered.
In practical application, the translator can determine the matching condition according to the number of candidate robots, the requirements of translation clients of the materials to be translated, and the like, for example, the matching is improved, so that fewer translation robots are obtained, but the accuracy of the translation result is ensured.
Claims (5)
1. A method of selecting a translation robot, the method comprising the steps of:
s1: randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus; determining at least one domain keyword data based on the result of the word segmentation process;
s2: analyzing a history translation result record of the candidate robot, and randomly extracting a corpus with a second preset proportion from the history translation result record;
analyzing matching degree scores of the at least one field keyword data in the corpus of the second preset proportion, and selecting a translation robot meeting matching conditions from the candidate robots based on the matching degree scores;
step S1 randomly extracts a first predetermined proportion of corpus from the data to be translated, including:
randomly extracting corpus with a third preset proportion from the beginning part of the data to be translated;
randomly extracting a fourth corpus with a preset proportion from the tail part of the data to be translated;
step S2, when randomly extracting the corpus of a second preset proportion, preferentially selecting a history record nearest to the current time node;
the matching degree score is determined based on the frequency, distribution position and case of occurrence of the corpus of the domain keyword data in a second preset proportion;
the score was measured using the following formula:
M1=exp(x)+lg(Y);M2=exp(X)+lg(Z);
wherein X is a case state description, X is 1 in case of upper case, and 0 in case of lower case;
y is frequency, and the frequency of occurrence of a keyword in a certain field in the corpus of a second preset proportion is expressed as percentage;
z is a distribution position, represents the position of the domain keyword in the corpus of a second preset proportion and the sum of representative values of corresponding times, and the representative values are taken according to the following standard:
the first 1/10 or the last 1/10 part appearing in the corpus of the second predetermined proportion appears z1 times, and then the representative value is exp (z1≡10);
appearing in {1/10,1/5} part, and z2 times, the representative value is exp (z2ζ5);
the number of occurrences is z3 in the {1/5,9/10} portion, and the representative value is exp (z3≡9);
e is a natural logarithmic base, exp () represents an exponent based on e, and z≡num represents the num power of z.
2. The method for selecting a translation robot according to claim 1, wherein the matching condition is: the M2 score is greater than e+3.
3. The method for selecting a translation robot according to claim 1, wherein the matching condition is: m1 is greater than e+9.
4. A selection system of translation robots comprising functional modules for performing the steps of the method of claim 1 or 2.
5. An updating system of a translation robot for use with the selection system of a translation robot of claim 4 for updating attributes of the translation robot, the translation robot attributes including a translation result record thereof.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811091026.6A CN109344409B (en) | 2018-09-19 | 2018-09-19 | Translation robot selection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811091026.6A CN109344409B (en) | 2018-09-19 | 2018-09-19 | Translation robot selection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109344409A CN109344409A (en) | 2019-02-15 |
CN109344409B true CN109344409B (en) | 2023-10-27 |
Family
ID=65306035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811091026.6A Active CN109344409B (en) | 2018-09-19 | 2018-09-19 | Translation robot selection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109344409B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080092B (en) * | 2019-11-29 | 2023-04-18 | 北京云聚智慧科技有限公司 | Data annotation management method and device, electronic equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763403A (en) * | 2009-12-31 | 2010-06-30 | 哈尔滨工业大学 | Query translation method facing multi-lingual information retrieval system |
CN103064970A (en) * | 2012-12-31 | 2013-04-24 | 武汉传神信息技术有限公司 | Search method for optimizing translators |
CN103678285A (en) * | 2012-08-31 | 2014-03-26 | 富士通株式会社 | Machine translation method and machine translation system |
CN105138521A (en) * | 2015-08-27 | 2015-12-09 | 武汉传神信息技术有限公司 | General translator recommendation method for risk project in translation industry |
-
2018
- 2018-09-19 CN CN201811091026.6A patent/CN109344409B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763403A (en) * | 2009-12-31 | 2010-06-30 | 哈尔滨工业大学 | Query translation method facing multi-lingual information retrieval system |
CN103678285A (en) * | 2012-08-31 | 2014-03-26 | 富士通株式会社 | Machine translation method and machine translation system |
CN103064970A (en) * | 2012-12-31 | 2013-04-24 | 武汉传神信息技术有限公司 | Search method for optimizing translators |
CN105138521A (en) * | 2015-08-27 | 2015-12-09 | 武汉传神信息技术有限公司 | General translator recommendation method for risk project in translation industry |
Also Published As
Publication number | Publication date |
---|---|
CN109344409A (en) | 2019-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299480B (en) | Context-based term translation method and device | |
US20190179855A1 (en) | Error correction method and device for search term | |
WO2020186627A1 (en) | Public opinion polarity prediction method and apparatus, computer device, and storage medium | |
WO2015081754A1 (en) | Genome compression and decompression | |
CN102567409A (en) | Method and device for providing retrieval associated word | |
CN107016018B (en) | Database index creation method and device | |
CN111026884B (en) | Dialog corpus generation method for improving quality and diversity of man-machine interaction dialog corpus | |
CN105701089A (en) | Post-editing processing method for correction of wrong words in machine translation | |
CN104375988A (en) | Word and expression alignment method and device | |
CN109241543B (en) | Preprocessing technique for consistent translation terms | |
WO2017215242A1 (en) | Method and device for searching resumes | |
CN107491425A (en) | Determine method, determining device, computer installation and computer-readable recording medium | |
CN111160014A (en) | Intelligent word segmentation method | |
CN115793571B (en) | Processing equipment control method and system based on multi-mode data and related equipment | |
CN112395867A (en) | Synonym mining method, synonym mining device, synonym mining storage medium and computer equipment | |
CN109344409B (en) | Translation robot selection method | |
CN109284503A (en) | Translate Statement Completion judgment method and system | |
CN104239292B (en) | A kind of method for obtaining specialized vocabulary translation | |
CN107229613B (en) | English-Chinese corpus extraction method based on vector space model | |
CN109325241B (en) | Translation robot optimization method based on consistency calculation and computer system thereof | |
CN106611012A (en) | Heterogeneous data real-time search method in big data environment | |
CN109325237A (en) | Complete sentence recognition methods and system for machine translation | |
WO2013143362A1 (en) | Method, device, and computer storage media for adding hyperlink to text | |
JP4883719B2 (en) | Similar image retrieval method and apparatus | |
JP2024003752A (en) | Search result sorting model training method, search result sorting method, search result sorting model training device, search result sorting device, electronic device, computer readable medium, and computer program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |