CN109344409B - Translation robot selection method - Google Patents

Translation robot selection method Download PDF

Info

Publication number
CN109344409B
CN109344409B CN201811091026.6A CN201811091026A CN109344409B CN 109344409 B CN109344409 B CN 109344409B CN 201811091026 A CN201811091026 A CN 201811091026A CN 109344409 B CN109344409 B CN 109344409B
Authority
CN
China
Prior art keywords
translation
corpus
robot
data
translated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811091026.6A
Other languages
Chinese (zh)
Other versions
CN109344409A (en
Inventor
何征宇
何恩培
郑丽华
王莲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transn Iol Technology Co ltd
Original Assignee
Transn Iol Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transn Iol Technology Co ltd filed Critical Transn Iol Technology Co ltd
Priority to CN201811091026.6A priority Critical patent/CN109344409B/en
Publication of CN109344409A publication Critical patent/CN109344409A/en
Application granted granted Critical
Publication of CN109344409B publication Critical patent/CN109344409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a selection method of translation robots, which can select a proper number of robots from a translation robot group for translation according to the attribute of data to be translated; and the attribute of the translation robot can be updated based on the translation result in the translation process, so that a more accurate reference basis is provided for the next selection of the translation robot. By using the method of the invention, the more translation robots are, the more reference histories are used for selection, and the selection effect is better; in addition, the related translation personnel can control the number of the selected translation robots by setting the height of the matching conditions according to the actual translation precision requirement and based on the translation market requirement.

Description

Translation robot selection method
Technical Field
The invention belongs to the technical field of translation, and particularly relates to a selection method of a translation robot.
Background
In the translation field, there are a number of different translation tools, with different emphasis on each. For the same piece of data to be translated, in order to ensure the accuracy of the translation results, a translation person usually adopts a plurality of translation tools to obtain a plurality of candidate translation results at the same time, then calculates the respective scores of the plurality of candidate translation results by using a corresponding language model or a scoring algorithm, and selects the candidate translation result with the highest score as the final translation result. Such prior art includes the machine translation result selection method disclosed in application No. 2012103205447, and the like.
However, the above related art is blind in selecting the translation tools, that is, all the translation tools are tried no matter what the data to be translated is, and all the translation results of all the translation tools are scored, and although the relatively optimal translation result or the translation tool can be obtained through the scoring, the process is complex to implement, especially when the number of the data to be translated is large and the available translation tools are large, the whole process is time-consuming and laborious, and the translation efficiency is reduced. The purpose of using translation tools originally is to assist manual translation to improve efficiency, and the more convenience, accuracy and efficiency of the translation tools should be higher, but under the scheme of the prior art, the more translation tools are, the more the execution cost is, the higher the execution cost is.
In view of this, a method for adaptively selecting a translation tool is highly desired by a translator, and the translation efficiency and accuracy are improved after the selected translation tool is adopted.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for selecting translation robots, which can select a proper number of robots from a translation robot group for translation according to the attribute of data to be translated; and the attribute of the translation robot can be updated based on the translation result in the translation process, so that a more accurate reference basis is provided for the next selection of the translation robot.
In a first aspect of the present invention, there is provided a method of selecting a translation robot, the method comprising:
(1) Analyzing the attribute of the data to be translated, and determining a domain keyword data set corresponding to the data to be translated;
(2) Selecting a translation robot matched with the domain keyword data set from a candidate translation robot group;
it is characterized in that
The analyzing the attribute of the data to be translated comprises the following steps:
randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus; determining at least one domain keyword data based on the result of the word segmentation process;
the step (2) comprises:
analyzing the history translation result record of the candidate robot, and randomly extracting a corpus with a second preset proportion from the history translation result record;
and analyzing the matching degree score of the at least one field keyword data in the corpus of the second preset proportion, and selecting a translation robot meeting the matching condition based on the matching degree score.
As an important improvement point of the invention, when analyzing the attribute of the data to be translated, the invention does not directly perform word segmentation processing on all the data to be translated, but adopts a random extraction method, thereby greatly reducing the workload;
of course, the premise of reducing the workload is to ensure the representativeness and the accuracy of the word segmentation processing result, and if the simple random extraction is not aimed at, the accuracy cannot be ensured.
Therefore, the randomly extracting the corpus of the first predetermined proportion from the data to be translated includes:
(21) Randomly extracting a corpus of a third predetermined proportion from the beginning of the data to be translated,
and/or;
(22) Randomly extracting a fourth predetermined proportion of corpus from the tail part of the data to be translated,
as a further improvement of the invention, the random extraction of the invention is very targeted, and the random extraction must be chosen to be performed from the beginning of the data to be translated and/or from the end of the data to be translated onwards, which is one of the innovative aspects of the invention.
At least one domain keyword data can be obtained by randomly extracting a first predetermined proportion of corpus.
Next, the present invention selects a translation robot most suitable for translating the current data to be translated from among a plurality of candidate translation robots.
Unlike available technology, which submits the data to be translated to several translation robots and scoring and selecting, the present invention selects the most proper translation robot before translation.
Specifically, the invention fully utilizes the existing translation result history record of the candidate translation robot.
Of course, the number of translation result histories of each candidate translation robot is different, some robots may be larger in number, and some robots may not have histories.
Aiming at candidate translation robots with a large number of history records, the invention randomly extracts a second predetermined proportion of corpus from the history translation result records;
as another improvement point of the present invention, when extracting the corpus of the second predetermined proportion, the present invention preferably selects the history record nearest to the current time node;
as another improvement point of the invention, the invention randomly selects a plurality of histories of different time periods when extracting the corpus of a second preset proportion;
on the basis, the invention starts to perform matching selection, and the matching selection is mainly based on the matching degree score of the at least one domain keyword data in the corpus of the second preset proportion. The matching degree score can be comprehensively considered from the dimensions of the domain keyword data, such as the frequency of occurrence, the distribution position and the like of the corpus of the second predetermined proportion, and a specific scoring mode is described in a specific embodiment section.
For the candidate translation robots without history records, the invention directly selects the candidate translation robots as translation robots meeting the matching conditions.
In another aspect of the present invention, a system for selecting and updating a translation robot is provided, which is used for implementing the foregoing method for selecting a translation robot, and updating a history of the translation robot based on a translation result of the selected robot.
Through the selection method, the invention can fully utilize the functional characteristics and translation directions of the existing translation robots to rapidly select the most suitable robot. More importantly, the invention can complete matching selection before translation work, instead of needing to translate before selecting as in the prior art, thereby greatly improving the efficiency; by using the method of the invention, the more translation robots are, the more reference histories are used for selection, and the selection effect is better; in addition, the related translation personnel can control the number of the selected translation robots by setting the height of the matching conditions according to the actual translation precision requirement and based on the translation market requirement.
Drawings
FIG. 1 is a flow chart of a selection method of the translation robot of the present invention
FIG. 2 is a specific implementation of determining a domain keyword dataset
Detailed Description
Referring to fig. 1, the method includes two major parts:
(1) Analyzing the attribute of the data to be translated, and determining the domain keyword dataset corresponding to the data to be translated
(2) And selecting a translation robot matched with the domain keyword data set from the candidate translation robot group.
Referring to fig. 2, a specific embodiment of the implementation of step (1) in fig. 1 includes:
(11) Randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus;
(12) And determining at least one domain keyword data based on the word segmentation processing result.
In specific implementation, the domain keyword data are translation words of corresponding languages of the vocabulary in the word segmentation processing result set;
in a specific implementation, the step (2) is to analyze the history translation result record of the candidate translation robot and randomly extract a corpus with a second predetermined proportion from the history translation result record;
and analyzing the matching degree score of the at least one field keyword data in the corpus of the second preset proportion, and selecting a translation robot meeting the matching condition based on the matching degree score.
On the basis, matching selection is started, wherein the matching selection is mainly based on the matching degree score of the at least one domain keyword data in the corpus of the second preset proportion. The matching degree score can be comprehensively considered from the dimensions of the domain keyword data, such as the frequency, the distribution position, the case and the like of the occurrence of the corpus of the second preset proportion.
As an example, if a domain keyword appears more frequently and in capitalized state in a corpus of a second predetermined proportion, the score of the word is higher, where the score can be measured by the following formula:
M1=exp(x)+lg(Y);
M2=exp(X)+lg(Z);
wherein X is a case state description, X is 1 in case of upper case, and 0 in case of lower case; y is frequency, and the frequency of occurrence of a keyword in a certain field in the corpus of a second preset proportion is expressed as percentage;
z is a distribution position, represents the position of the domain keyword in the corpus of a second preset proportion and the sum of representative values of corresponding times, and the representative values are taken according to the following standard:
the first 1/10 or the last 1/10 part appearing in the corpus of the second predetermined proportion appears z1 times, and then the representative value is exp (z1≡10);
the part of {1/10,1/5}, Z2 times, the representative value is exp (z2≡5);
the number of occurrences is z3 in {1/5,9/10} portions, and the representative value is exp (z3≡9).
The matching condition may be: the M2 score is greater than e+3;
the matching condition may also be: m1 is greater than e+9.
Where e is a natural logarithmic base, exp () represents an exponent based on e, z≡num, and z is to the power of num.
The above formula fully considers the priority of case, rather than only taking frequency as a unique index, and the accuracy is higher compared with the prior art which only considers high frequency.
Of course, if a candidate translation robot does not have a history, this means that the candidate translation robot may be the latest translation technology that comes into existence, and should preferably be considered.
In practical application, the translator can determine the matching condition according to the number of candidate robots, the requirements of translation clients of the materials to be translated, and the like, for example, the matching is improved, so that fewer translation robots are obtained, but the accuracy of the translation result is ensured.

Claims (5)

1. A method of selecting a translation robot, the method comprising the steps of:
s1: randomly extracting a first predetermined proportion of corpus from the data to be translated, and performing word segmentation based on natural language processing on the corpus; determining at least one domain keyword data based on the result of the word segmentation process;
s2: analyzing a history translation result record of the candidate robot, and randomly extracting a corpus with a second preset proportion from the history translation result record;
analyzing matching degree scores of the at least one field keyword data in the corpus of the second preset proportion, and selecting a translation robot meeting matching conditions from the candidate robots based on the matching degree scores;
step S1 randomly extracts a first predetermined proportion of corpus from the data to be translated, including:
randomly extracting corpus with a third preset proportion from the beginning part of the data to be translated;
randomly extracting a fourth corpus with a preset proportion from the tail part of the data to be translated;
step S2, when randomly extracting the corpus of a second preset proportion, preferentially selecting a history record nearest to the current time node;
the matching degree score is determined based on the frequency, distribution position and case of occurrence of the corpus of the domain keyword data in a second preset proportion;
the score was measured using the following formula:
M1=exp(x)+lg(Y);M2=exp(X)+lg(Z);
wherein X is a case state description, X is 1 in case of upper case, and 0 in case of lower case;
y is frequency, and the frequency of occurrence of a keyword in a certain field in the corpus of a second preset proportion is expressed as percentage;
z is a distribution position, represents the position of the domain keyword in the corpus of a second preset proportion and the sum of representative values of corresponding times, and the representative values are taken according to the following standard:
the first 1/10 or the last 1/10 part appearing in the corpus of the second predetermined proportion appears z1 times, and then the representative value is exp (z1≡10);
appearing in {1/10,1/5} part, and z2 times, the representative value is exp (z2ζ5);
the number of occurrences is z3 in the {1/5,9/10} portion, and the representative value is exp (z3≡9);
e is a natural logarithmic base, exp () represents an exponent based on e, and z≡num represents the num power of z.
2. The method for selecting a translation robot according to claim 1, wherein the matching condition is: the M2 score is greater than e+3.
3. The method for selecting a translation robot according to claim 1, wherein the matching condition is: m1 is greater than e+9.
4. A selection system of translation robots comprising functional modules for performing the steps of the method of claim 1 or 2.
5. An updating system of a translation robot for use with the selection system of a translation robot of claim 4 for updating attributes of the translation robot, the translation robot attributes including a translation result record thereof.
CN201811091026.6A 2018-09-19 2018-09-19 Translation robot selection method Active CN109344409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811091026.6A CN109344409B (en) 2018-09-19 2018-09-19 Translation robot selection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811091026.6A CN109344409B (en) 2018-09-19 2018-09-19 Translation robot selection method

Publications (2)

Publication Number Publication Date
CN109344409A CN109344409A (en) 2019-02-15
CN109344409B true CN109344409B (en) 2023-10-27

Family

ID=65306035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811091026.6A Active CN109344409B (en) 2018-09-19 2018-09-19 Translation robot selection method

Country Status (1)

Country Link
CN (1) CN109344409B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080092B (en) * 2019-11-29 2023-04-18 北京云聚智慧科技有限公司 Data annotation management method and device, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763403A (en) * 2009-12-31 2010-06-30 哈尔滨工业大学 Query translation method facing multi-lingual information retrieval system
CN103064970A (en) * 2012-12-31 2013-04-24 武汉传神信息技术有限公司 Search method for optimizing translators
CN103678285A (en) * 2012-08-31 2014-03-26 富士通株式会社 Machine translation method and machine translation system
CN105138521A (en) * 2015-08-27 2015-12-09 武汉传神信息技术有限公司 General translator recommendation method for risk project in translation industry

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763403A (en) * 2009-12-31 2010-06-30 哈尔滨工业大学 Query translation method facing multi-lingual information retrieval system
CN103678285A (en) * 2012-08-31 2014-03-26 富士通株式会社 Machine translation method and machine translation system
CN103064970A (en) * 2012-12-31 2013-04-24 武汉传神信息技术有限公司 Search method for optimizing translators
CN105138521A (en) * 2015-08-27 2015-12-09 武汉传神信息技术有限公司 General translator recommendation method for risk project in translation industry

Also Published As

Publication number Publication date
CN109344409A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
CN109299480B (en) Context-based term translation method and device
US20190179855A1 (en) Error correction method and device for search term
WO2020186627A1 (en) Public opinion polarity prediction method and apparatus, computer device, and storage medium
WO2015081754A1 (en) Genome compression and decompression
CN102567409A (en) Method and device for providing retrieval associated word
CN107016018B (en) Database index creation method and device
CN111026884B (en) Dialog corpus generation method for improving quality and diversity of man-machine interaction dialog corpus
CN105701089A (en) Post-editing processing method for correction of wrong words in machine translation
CN104375988A (en) Word and expression alignment method and device
CN109241543B (en) Preprocessing technique for consistent translation terms
WO2017215242A1 (en) Method and device for searching resumes
CN107491425A (en) Determine method, determining device, computer installation and computer-readable recording medium
CN111160014A (en) Intelligent word segmentation method
CN115793571B (en) Processing equipment control method and system based on multi-mode data and related equipment
CN112395867A (en) Synonym mining method, synonym mining device, synonym mining storage medium and computer equipment
CN109344409B (en) Translation robot selection method
CN109284503A (en) Translate Statement Completion judgment method and system
CN104239292B (en) A kind of method for obtaining specialized vocabulary translation
CN107229613B (en) English-Chinese corpus extraction method based on vector space model
CN109325241B (en) Translation robot optimization method based on consistency calculation and computer system thereof
CN106611012A (en) Heterogeneous data real-time search method in big data environment
CN109325237A (en) Complete sentence recognition methods and system for machine translation
WO2013143362A1 (en) Method, device, and computer storage media for adding hyperlink to text
JP4883719B2 (en) Similar image retrieval method and apparatus
JP2024003752A (en) Search result sorting model training method, search result sorting method, search result sorting model training device, search result sorting device, electronic device, computer readable medium, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant