CN103714053A - Japanese verb identification method for machine translation - Google Patents

Japanese verb identification method for machine translation Download PDF

Info

Publication number
CN103714053A
CN103714053A CN201310569693.1A CN201310569693A CN103714053A CN 103714053 A CN103714053 A CN 103714053A CN 201310569693 A CN201310569693 A CN 201310569693A CN 103714053 A CN103714053 A CN 103714053A
Authority
CN
China
Prior art keywords
verb
candidate
japanese
character
sign
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310569693.1A
Other languages
Chinese (zh)
Other versions
CN103714053B (en
Inventor
张孝飞
胡月卿
马伟
金善花
孟翔
李彦刚
王强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhong Xian Electronic Technology Development Co., Ltd.
Original Assignee
Beijing Zhongxian Electronic Technology Development Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongxian Electronic Technology Development Center filed Critical Beijing Zhongxian Electronic Technology Development Center
Priority to CN201310569693.1A priority Critical patent/CN103714053B/en
Publication of CN103714053A publication Critical patent/CN103714053A/en
Application granted granted Critical
Publication of CN103714053B publication Critical patent/CN103714053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a Japanese verb identification method for machine translation and belongs to the field of natural language processing. The method has the advantages that the method is based on rules and combines with dictionaries by analyzing the conjugated form rules of Japanese verbs, the verbs in a text can be identified completely, and basic forms of the verbs can be obtained by form restoration; the method can use common universal dictionaries and is high in adaptability and robustness; by the method, the lexical analysis accuracy and bilingual word corresponding effect in machine translation are increased, and the translation quality of machine translation is increased as a whole.

Description

A kind of Japanese verb recognition methods of Machine oriented translation
Technical field
The invention belongs to natural language processing field, relate to a kind of automatic identifying method of Japanese verb, be specifically related to the Japanese verb recognition methods that a kind of Machine oriented rule-based and that dictionary combines is translated.
Background technology
Along with science and technology and cultural exchanges day by day frequent between Sino-Japan, understanding and the conversion disorder broken through between language become one of key element, convert Japanese information translation to readable intelligible Chinese information timely and accurately, not only there is theoretic value, have more the necessity and urgency in reality.In existing statictic machine translation system, need to carry out participle pretreatment operation before parallel corpora is carried out to machine training, its quality will directly affect translation quality.Because Japanese verb exists, apply flexibly in a large number shape and dictionary is included not congruent factor, the Japanese verb cutting based on dictionary is difficult to the effect that reaches desirable always.How verb being carried out to correct cutting and identification, improve the effect of word alignment, and then promote whole mechanical translation quality, is one of current problem demanding prompt solution.
The kudo of Japan is opened up and in 2006, has been developed the MeCab morphactin analytical tool of increasing income, this morphactin analytical tool be take dictionary as benchmark, the Japanese verb that dictionary can be included (fundamental form entry) correctly identifies, but, the Japanese verb of not including at parsing dictionary can be two even a plurality of words by its cutting while applying flexibly shape entry, then each word is carried out to part-of-speech tagging.This recognition methods, fails a complete verb to be syncopated as, and the participle pretreatment operation as for statistical machine translation, can reduce bilingual word-alignment effect, is unfavorable for the calculating of translation model probability, affects translation quality.
In the Japanese segmentation method of the interim < < of domestic tangible Chinese core journals < of golden spring < microcomputer information > > the 22nd volume 1-3 in 2006 based on morpheme and application > > mono-literary composition in O C R system thereof, a kind of Japanese segmentation method based on morpheme has been proposed, its main thought is according to Japanese verb feature and applies flexibly rule verb is split as to morpheme and suffix two parts, be stored in respectively in two different dictionaries, again Japanese verb is identified.The original intention of the method is to identify for OCR, object is to improve OCR correct recognition rata, after identification, do not need it to translate or other processing, its weak point is it is also to fail the word of a distortion intactly to cut out, need in addition respectively two dictionaries to be processed, extract morpheme information not only consuming time but also consume power.
Summary of the invention
This method mainly indicates to search candidate's verb according to the ending of the appearance position of verb in japanese sentence and verb, after finding candidate's verb, it is reduced, after reduction again by its correctness of formal verification of consulting the dictionary.If the new term information after reduction is found its corresponding entry in dictionary, explanation is reduced successfully, and then can carry out part-of-speech tagging to this word; If do not find identical entry by the new entry information of going back after meta-rule reduction in dictionary, candidate's verb is carried out to cutting again and reduction processing, after processing, if it does not find its corresponding entry yet in dictionary, entry is kept intact, and does not process.
Japanese verb feature:
After Japanese verb mainly appears at auxiliary word, combination auxiliary word and conjunction,
Japanese verb ending tab character is limited,
Japanese verb is applied flexibly shape and is had certain rule,
Feature based on aforementioned Japanese verb, the present invention proposes a kind of rule-based Japanese verb recognition methods combining with dictionary.The method comprises the following steps:
Steps A, retrieves and marks the special word that comprises left adjacency sign (character or character string) and ending sign (character), does not participate in follow-up verb identification.
Described special word comprises anomalous verb and special non-verb two class words, and described anomalous verb refers to that the character that comprises this special Japanese verb comprises the left word in abutting connection with sign (character or character string) while searching; Described special non-verb refers to the non-verb that comprises verb ending tab character.
Step B, after retrieving special word, starts to search candidate's verb.
Step C, reduces to the candidate's verb finding, and verifies that by the mode of consulting the dictionary whether it is correct.
Step D, for reducing successfully and find candidate's verb of corresponding entry in dictionary, carries out part-of-speech tagging to it.
Wherein, further comprising the steps in described step B:
Step B1, retrieves the left adjacency sign (character or character string) that candidate's verb is searched.
Described candidate's verb is searched left in abutting connection with indicating that (character or character string) comprising: auxiliary word, combination auxiliary word, conjunction.
Described verb is searched ending sign (character) and being comprised: shape ending sign applied flexibly in five sections of verb ending signs, one section of verb ending sign, verb.
Step B2 searches candidate's verb ending sign (character) in left respective range after sign (character or character string).
Step B3, the part using the left character late in abutting connection with sign (character or character string) to candidate's verb ending sign (character) cuts out as candidate's verb to be restored.
To sum up, we suppose
Figure 453427DEST_PATH_IMAGE001
for the text-string of input,
Figure 245933DEST_PATH_IMAGE002
for the left set forming in abutting connection with sign (character or character string) of verb,
Figure 901037DEST_PATH_IMAGE003
set for ending tab character composition., for any one input text, the possible situation that comprises verb in its character string is following form:
Figure 64556DEST_PATH_IMAGE004
Finding left adjacency sign
Figure 816612DEST_PATH_IMAGE005
with ending sign
Figure 514440DEST_PATH_IMAGE006
after, will
Figure 656840DEST_PATH_IMAGE007
character late extremely
Figure 793423DEST_PATH_IMAGE006
part cut out, as candidate's verb to be restored.
Described step C further comprises following steps:
C1, adopts character string forward direction maximum matching algorithm for the candidate's verb finding, and retrieves the suffix (P) of candidate's verb to be restored.
C2, the suffix (P) to the candidate's verb retrieving, reduces processing by the also meta-rule of its correspondence.
C3, compares the entry information after reduction with the corresponding entry information in dictionary, the correctness of checking identification.
C4, while not finding corresponding entry information in dictionary for the entry information after reduction, we can carry out secondary cutting and secondary reduction processing to candidate's verb, now, if can reduce successfully and find the entry after reduction in dictionary, illustrating and reduce successfully, otherwise no longer it is processed.
Described secondary cutting and secondary reduction processing are that the candidate's verb to be restored based on found may be two words or the consideration of three word combinations, according to Japanese verb, being used in conjunction rule and Japanese verb is used in conjunction tab character it is carried out to secondary cutting, by its cutting, be single word, and then by going back meta-rule, it reduced.
To sum up, the core algorithm that the reduction of our candidate's verb adopts is character string forward direction maximum matching method, works as and time, extract
Figure 23044DEST_PATH_IMAGE010
and reduce processing by its corresponding also meta-rule.Again the entry information after reduction is contrasted to the correctness that can verify identification with the corresponding entry information in dictionary.
The invention has the beneficial effects as follows: Japanese verb recognition methods in the past, all fail using verb apply flexibly shape entry as a complete word segmentation out, be unfavorable for that the bilingual word-alignment in statistical machine translation research is processed, affected translation quality.The Japanese verb recognition methods that the rule-based and dictionary of the Machine oriented translation that the present invention adopts combines, effectively dictionary not being included to Japanese verb applies flexibly the cutting intactly of shape entry and identifies, bilingual word-alignment effect while having improved participle pre-service in statistical machine translation, and be conducive to the lifting of mechanical translation quality based on statistics.
Accompanying drawing explanation
Figure is core processing process flow diagram of the present invention.
Embodiment
Specific embodiment below in conjunction with the identification of Japanese verb, further describes method of the present invention.
Embodiment
What this embodiment was described is that all verbs in Japanese patent documentation are identified, and related Japanese verb is applied flexibly form and comprised: fundamental form, past tense, passive type, make dynamic formula, perfect etc.
As shown in the figure, Japanese verb of the present invention recognition methods comprises following step:
Special word is retrieved and is marked
Retrieval and the mark of special word carried out in the special word storehouse of summing up according to us, do not participate in follow-up Japanese verb identification.
Now input Japanese as follows:
① Recognize Certificate ス イ ッ チ Ga そ Entries order と ID(designation) the imperial capable う of The of To I っ て そ れ ぞ れ system.
2. Ga Ru は on mood temperature Ga, the too warm め of Yang Hot Ga ground The, the warm め Ru か ら In あ Ru of the empty mood The of ground Ga.
Result for retrieval is as follows:
① Recognize Certificate ス イ ッ チ Ga そ Entries order と ID(designation) To I っ て そ れ ぞ れ +++ adv system is driven the capable う of The.
2. Ga Ru on mood temperature Ga +++ v こ と は, the too warm め of Yang Hot Ga ground The, the warm め Ru か ら In あ Ru of the empty mood The of ground Ga.
Sentence " そ れ ぞ れ " is 1. a non-verb, and it,, because comprising verb ending tab character " れ ", if do not retrieved in advance, can be identified as verb, and identification makes the mistake." the upper Ga Ru " of sentence in is 2. an anomalous verb, because comprising verb in its character, search left in abutting connection with sign " Ga ", if do not retrieved in advance, follow-up verb search rule can be " upper/Ga/Ru " three parts " upper Ga Ru " cutting, identification makes the mistake, so we retrieve in advance this class special word and are marked, do not participate in follow-up verb identification.
Candidate's verb is searched
After special word retrieval and mark finish, start in abutting connection with sign (character or character string), ending tab character seek scope, ending tab character, to search candidate's verb according to candidate's verb is left.
Now input a Japanese as follows:
Cis に Let け ら れ of Side か ら under Side To in さ ら To, こ box-shaped body は, そ
Lookup result is as shown in the table:
algorithm example searched in table 1 candidate verb
Japanese character (string) Let け ら れ
Sequence number 16 11 FIRST-char 11
In above-mentioned sequence number, 16 represent leftly in abutting connection with tab character, to be numbered 16(and to represent in this embodiment " To ") character, 11 represent that verbs ending tab characters are numbered 11(and represent in this embodiment " ") character, when searching, first find the left character in abutting connection with tab character numbering 16, then left in abutting connection with (13 of tab character, 3) in scope, search ending tab character, find the character of ending tab character numbering 11, described (13, 3) scope is the scope that verb ending tab character may occur, from left in abutting connection with tab character, seek scope is locked in the 3rd in the scope of the 13rd character from left to right, looked-up sequence is from back to front, since the 13rd character, search forward until the 3rd character, finding verb ending sign is numbered 11(and represents in this embodiment " ") character after, by left first character after tab character or character string, be that FIRST-CHAR is that 11(represents " " in this embodiment to character number) part link together, be candidate's verb to be restored that we will extract.
Candidate's verb search rule is as follows:
1.を*->FIND(OR,(8,2),"り"|"き"|"ぎ"|"し"|"ち"|"ひ"|"び"|"み")
……
5.において*->FIND(OR,(6,16),"た"|"だ")
……
16 に* ->FIND(OR,(3,13),"た"|"だ")
……
The reduction of candidate's verb
Now input Japanese as follows:
1. " in さ ら To, こ box-shaped body は, そ under Side To Cis に Let け ら れ of Side か ら ", the candidate's verb to be restored wherein having found out Wei “ Let け ら れ ".
table 2 candidate's verb to be restored retrieving algorithm example
Candidate's verb to be restored Reduction treatment process Candidate's verb after reduction
Let け ら れ られた(P 129)→る(I 129) Let け Ru
For candidate's verb “ Let け ら れ to be restored " we by before existing algorithm to maximum matching method, find out “ Let け ら れ " suffix P 129, i.e. " ら れ ", and then Jiang “ Let け ら れ " according to P 129go back meta-rule " ら れ " is reduced to I for the 129th of place 129i.e. " Ru ", described the 129th rule is " * ら れ->INFLEX (; Ru) ", first find the suffix of the moving verb of candidate to be restored, in above-mentioned " Let け ら れ " word; find out its suffix for " ら れ "; again " ら れ " is reduced to " Ru "; and then obtain new entry information " Let け Ru ", finally by the checking “ Let け Ru of looking up the dictionary " whether this entry exist; there is “ Let け Ru in dictionary " this entry, illustrate that identification is correctly.
Candidate's verb also meta-rule is as follows:
1*ぼう->INFLEX(-,ぶ)
……
129 *られた->INFLEX(-,る)
……
174.*われる->INFLEX(-,う)
……
Above-mentioned example has been described the situation that entry information after reduction is found consistent entry in dictionary, if corresponding entry do not found in the entry after reduction in dictionary, at this moment we can carry out cutting again and reduction processing again to it
Now input Japanese as follows:
RAID は, デ ー タ The PVC ッ ト/バ イ ト Unit position, あ Ru い は Block ロ ッ ク Unit Wei で Complex number scale recording device To disperse て to preserve The Ru +++ V method In, the high め of processing を オ ー バ ー ラ ッ プ The Ru こ と To I り パ フ ォ ー マ Application ス The, high speed を Actual Now て い Ru.
The candidate's verb to be restored finding according to above-mentioned candidate's verb search rule is that the new term after " disperseing て to preserve The Ru " this word reduces by above-mentioned also meta-rule is " disperseing て to preserve ", but, this entry is because be being used in combination of two verbs, so cannot find this entry in dictionary.For this class entry, we carry out cutting according to candidate's verb secondary segmentation rules to it.
Candidate's verb secondary segmentation rules is as follows:
ん*->FIND(OR,(6,3),"て")
……
ん*->FIND(OR,(6,3),"い"|"き"|"ぎ"|"し"|"じ"|"ち"|"み"|"り"|"れ"|"え"|"じ"|"け"|げ"|"せ"|"ぜ"|"ね"|"べ"|"め"|"ば")
……
Wherein, above-mentioned rule has priority from front to back, " ん " represents that all candidate's verbs are left in abutting connection with sign, " OR " represents outside and right, the meaning is used in conjunction tab character for search verb to the right in the outside of " ん ", we are according to " ん *->FIND (OR, (6, 3), " て ") " rule, in (6 of correspondence, 3) what in scope, find that this entry " disperses て to preserve The Ru " is used in conjunction sign " て ", then this word is divided into " disperseing て " and " preserving The Ru " two words, again according to above-mentioned candidate's verb also meta-rule it is reduced, after reduction, by consulting the dictionary, verify its reduction correctness, if can reduce successfully and find the entry after reduction in dictionary, illustrate and reduce successfully, if it does not find its corresponding entry yet in dictionary, entry is kept intact, do not process.。
Part-of-speech tagging
If corresponding entry found in the candidate's verb after reduction in dictionary, according to the also meta-rule and the dictionary collection situation that carry out before it, it is carried out to part-of-speech tagging.
The part-of-speech tagging symbol that this method is used is as follows:
table 3 part-of-speech tagging symbol
Part of speech Adverbial word Adjective Noun Verb Pronoun Conjunction
Symbol adv adj n v pron col
It is as follows that shape morphological markers symbol applied flexibly in the verb that the present invention uses:
shape form label symbol applied flexibly in table 4 Japanese verb
Form Symbol Form Symbol
Fundamental form ori Make shape cau
Modus tollens no Passive shape pas
Suppose shape if Perfect over
Past tense past End shape te
て い Ru shape ing Continue shape con
ま The shape masu Active shape can
In addition, the combined situation that shape form also exists verb form in above-mentioned table applied flexibly in Japanese verb, about composite marking symbol, do not enumerate.
" Cis に Let け ら れ of Side か ら under Side To in さ ら To, こ box-shaped body は, そ for example.”
Its annotation results is as follows:
Cis に Let け ら れ of Side か ら under Side To in さ ら To, こ box-shaped body は, そ +++ V (paspast).
By above method, though dictionary do not include Japanese verb apply flexibly shape entry, also can and identify a complete verb (verb fundamental form and apply flexibly shape) cutting.

Claims (10)

1. a Japanese verb recognition methods for Machine oriented translation, is characterized in that, comprises the following steps:
Steps A, retrieves and marks the left special word in abutting connection with sign and ending sign when comprising candidate's verb and searching, and does not participate in follow-up verb identification, and wherein, left adjacency is masked as character or character string, and ending is masked as character;
Step B, retrieves left adjacency sign and candidate's verb ending sign, searches candidate's verb;
Step C, reduces to the candidate's verb finding, and verifies that by the mode of consulting the dictionary whether it is correct;
Step D, for after reduction and can find candidate's verb of corresponding entry in dictionary, carries out part-of-speech tagging to it;
Wherein, further comprising the steps in described step B:
Step B1, retrieves the left in abutting connection with sign of candidate's verb;
Step B2 searches the ending tab character of candidate's verb in left specified scope after sign;
Step B3, the part using the left character late in abutting connection with sign to candidate's verb ending tab character cuts out as candidate's verb to be restored;
Described step C further comprises following steps:
C1, adopts character string forward direction maximum matching algorithm for the candidate's verb finding, and retrieves the suffix of candidate's verb to be restored;
C2, the suffix to the candidate's verb retrieving, reduces processing by the also meta-rule of its correspondence;
C3, compares the entry information after reduction with the corresponding entry information in dictionary, the correctness of checking identification;
C4, if when the entry information after reduction does not find corresponding entry information in dictionary, carries out cutting again and reduction processing to candidate's verb, now, if can reduce successfully and find the entry after reduction in dictionary, illustrating and reduce successfully, otherwise no longer it is processed.
2. method according to claim 1, the described special word in described steps A comprises anomalous verb and special non-verb.
3. method according to claim 2, described anomalous verb refers to and in Japanese verb, comprises the left verb in abutting connection with sign while searching; Described special non-verb refers to the non-verb that comprises verb ending tab character.
4. method according to claim 1, the left adjacency in described step B1 is masked as in japanese sentence and indicates that verb is about to auxiliary word, auxiliary word combination or the conjunction occurring.
5. method according to claim 1, the ending tab character in described step B2 is fundamental form and all last character of applying flexibly shape entry of Japanese verb.
6. method according to claim 1, the specified scope in described step B2 is for applying flexibly shape rule, the scope that the various ending sign most probables that sum up occur according to Japanese verb.
7. method according to claim 1, the suffix of the candidate's verb to be restored in described step C1 be Japanese verb apply flexibly shape part.
8. method according to claim 1, described secondary cutting in described step C4 and secondary reduction are processed: according to Japanese verb, be used in conjunction rule and Japanese verb and be used in conjunction tab character it is carried out to secondary cutting, by its cutting, be single word, and then by going back meta-rule, it reduced.
9. method according to claim 1, if corresponding entry found in the candidate's verb after reduction in dictionary, carries out part-of-speech tagging to it.
10. method according to claim 9, the part-of-speech tagging symbol of adverbial word, adjective, noun, verb, pronoun, conjunction is respectively adv, adj, n, v, pron, col.
CN201310569693.1A 2013-11-13 2013-11-13 Japanese verb identification method for machine translation Active CN103714053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310569693.1A CN103714053B (en) 2013-11-13 2013-11-13 Japanese verb identification method for machine translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310569693.1A CN103714053B (en) 2013-11-13 2013-11-13 Japanese verb identification method for machine translation

Publications (2)

Publication Number Publication Date
CN103714053A true CN103714053A (en) 2014-04-09
CN103714053B CN103714053B (en) 2017-05-10

Family

ID=50407044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310569693.1A Active CN103714053B (en) 2013-11-13 2013-11-13 Japanese verb identification method for machine translation

Country Status (1)

Country Link
CN (1) CN103714053B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268132A (en) * 2014-09-11 2015-01-07 北京交通大学 Machine translation method and system
CN104268133B (en) * 2014-09-11 2018-02-13 北京交通大学 machine translation method and system
CN108073566A (en) * 2016-11-16 2018-05-25 北京搜狗科技发展有限公司 Segmenting method and device, the device for participle
CN110781667A (en) * 2019-10-25 2020-02-11 北京中献电子技术开发有限公司 Japanese verb identification and part-of-speech tagging method for neural network machine translation
CN110991151A (en) * 2019-11-22 2020-04-10 北京云中融信网络科技有限公司 File processing method and device, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1652106A (en) * 2004-02-04 2005-08-10 北京赛迪翻译技术有限公司 Machine translation method and apparatus based on language knowledge base
CN1702650A (en) * 2004-05-28 2005-11-30 株式会社东芝 Apparatus and method for translating Japanese into Chinese and computer program product
WO2012079245A1 (en) * 2010-12-17 2012-06-21 北京交通大学 Device for acquiring knowledge and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1652106A (en) * 2004-02-04 2005-08-10 北京赛迪翻译技术有限公司 Machine translation method and apparatus based on language knowledge base
CN1702650A (en) * 2004-05-28 2005-11-30 株式会社东芝 Apparatus and method for translating Japanese into Chinese and computer program product
WO2012079245A1 (en) * 2010-12-17 2012-06-21 北京交通大学 Device for acquiring knowledge and method thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
姜尚仆等: "基于规则和统计的日语分词和词性标注的研究", 《中文信息学报》 *
王晶: "日语词法分析及在跨语言信息检索中的应用研究", 《中国优秀硕士学位论文全文数据库》 *
隋福民: "面向机器翻译的日语形态素解析", 《万方学位论文数据库》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268132A (en) * 2014-09-11 2015-01-07 北京交通大学 Machine translation method and system
CN104268132B (en) * 2014-09-11 2017-04-26 北京交通大学 machine translation method and system
CN104268133B (en) * 2014-09-11 2018-02-13 北京交通大学 machine translation method and system
CN108073566A (en) * 2016-11-16 2018-05-25 北京搜狗科技发展有限公司 Segmenting method and device, the device for participle
CN108073566B (en) * 2016-11-16 2022-01-18 北京搜狗科技发展有限公司 Word segmentation method and device and word segmentation device
CN110781667A (en) * 2019-10-25 2020-02-11 北京中献电子技术开发有限公司 Japanese verb identification and part-of-speech tagging method for neural network machine translation
CN110781667B (en) * 2019-10-25 2021-10-08 北京中献电子技术开发有限公司 Japanese verb identification and part-of-speech tagging method for neural network machine translation
CN110991151A (en) * 2019-11-22 2020-04-10 北京云中融信网络科技有限公司 File processing method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN103714053B (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN103714053B (en) Japanese verb identification method for machine translation
CN105068990B (en) A kind of English long sentence dividing method of more strategies of Machine oriented translation
CN111652006A (en) Computer-aided translation method and device
CN105068997A (en) Parallel corpus construction method and device
CN103678288A (en) Automatic proper noun translation method
Graën Exploiting alignment in multiparallel corpora for applications in linguistics and language learning
Lee et al. Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources
Bouamor et al. Automatic construction of a multiword expressions bilingual lexicon: A statistical machine translation evaluation perspective
Hakkani-Tur et al. Statistical sentence extraction for information distillation
Ji et al. Name extraction and translation for distillation
Aldarmaki et al. Robust part-of-speech tagging of Arabic text
Naemi et al. Informal-to-formal word conversion for persian language using natural language processing techniques
Sembok et al. A rule-based Arabic stemming algorithm
Rahmani Adapting *** translate for English-Persian cross-lingual information retrieval in medical domain
Zhu et al. All in strings: a powerful string-based automatic mt evaluation metric with multiple granularities
Tongtep et al. Multi-stage automatic NE and pos annotation using pattern-based and statistical-based techniques for thai corpus construction
Yashothara et al. Improving Phrase-Based Statistical Machine Translation with Preprocessing Techniques
Sinha et al. Hindi-English language identification, named entity recognition and back transliteration: shared task system description
Maimaiti et al. Construction of Uyghur named entity corpus
Yang et al. Lao Named Entity Recognition based on conditional random fields with simple heuristic information
CN103902524A (en) Uygur language sentence boundary recognition method
Li et al. The extracting method of Chinese-Naxi translation template based on improved dependency tree-to-string
Liu et al. Introduction to BIT Chinese spelling correction system at CLP 2014 bake-off
Ghaffar et al. English to arabic statistical machine translation system improvements using preprocessing and arabic morphology analysis
Ding et al. The Chinese-English bilingual sentence alignment based on length

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100088 No. 1 Madian South Village, Beijing, Haidian District

Patentee after: Beijing Zhong Xian Electronic Technology Development Co., Ltd.

Address before: 100088 No. 1 Madian South Village, Beijing, Haidian District

Patentee before: Beijing Zhongxian Electronic Technology Development Center