CN105843802A - Corpus intervention module and method in translation - Google Patents

Corpus intervention module and method in translation Download PDF

Info

Publication number
CN105843802A
CN105843802A CN201610202189.1A CN201610202189A CN105843802A CN 105843802 A CN105843802 A CN 105843802A CN 201610202189 A CN201610202189 A CN 201610202189A CN 105843802 A CN105843802 A CN 105843802A
Authority
CN
China
Prior art keywords
translation
language material
corpus
coupling
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610202189.1A
Other languages
Chinese (zh)
Inventor
白晓文
陈春纬
刘庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changan University
Original Assignee
Changan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changan University filed Critical Changan University
Priority to CN201610202189.1A priority Critical patent/CN105843802A/en
Publication of CN105843802A publication Critical patent/CN105843802A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a corpus intervention module and method in translation, which aims to realize corpus concordance and comparison, and a matched corpus can be easily intervened into the translation, so that the translation time can be shortened and the consistency of expression in the translation can be improved. The technical scheme of the invention is as follows: using a corpus reading module to selectively read a historical corpus and a corpus prepared for translation activities; using a translation material reading module to open materials needed to be translated, and phrasing the materials needed to be translated; searching the maximum corpus matching statement by statement for the read and phrased materials needed to be translated by a corpus and translation material retrieval matching module, finally obtaining a position and a corpus paraphrase of the matched corpus in a text, and differentially displaying the matched corpus and translation of the corpus by a matched corpus display module; and finally, copying the matched corpus translation by a matched corpus intervention translation module, and selecting a position to paste in the translation, so as to realize the intervention of the translation.

Description

Language material intervention module and method in translation
Technical field
The invention belongs to computational linguistics and translation technology field, be specifically related to a kind of language material intervention module and method in translation.
Background technology
Corpus is " collecting ", " collected works " etc. from Latin word corpus, original meaning, and plural form is corpora or corpuses. Corpus be " works collect, and the text of any related topics always collects " (OED) be " written word or spoken material always collect, Basis is provided for linguistic analysis " (OED).Corpus is " the language performance material selecting according to clear and definite linguistics standard and sorting Material collects, it is intended to as the sample of language " (Sinclair, 1986:185-203).Corpus is according to clear and definite design standard, for A certain specific purposes and integrated large-scale text library (Atkins and Clear, 1992:1-16).Renouf thinks, corpus be " by A large amount of written words collected or spoken composition, and by computer stored and process, for the text library of introduction on linguistics research " (Renouf, 1987:1).Leech points out, a large amount of machine readable e-texts collected are acquisitions " required frequency data " in Probability Study method Basis, " for obtaining required frequency data, we must enough Natural English (or other language) text of backwash, in order to The prediction conformed to the actual situation is carried out based on observing frequency (observed frequency).
It is therefore desirable to the most machine-readable e-text collection, the most machine readable corpus " (leech, 1987:2).In sum, Corpus has a following basic feature:
1) design and the construction of corpus is to carry out under the theoretical linguistics principle of system, the exploitation of corpus have clearly and Concrete goal in research.If the BROWN corpus main purpose of earlier 1960s is that Amerenglish is carried out grammer Analyze, and LOB corpus subsequently have collected coetaneous British English substantially according to the design principle of BROWN corpus, Purpose is by Amerenglish and the relative analysis of British English and syntactic analysis.
2) composition and the sampling of corpus language material is according to clear and definite linguistics principle and to take arbitrary sampling method to collect language material, Rather than pile up language material simply.Collected language material must be the natural discourse (naturally-occurred data) of language performance.
3) sample that corpus uses as natural language, is necessary for representative (representativeness).Chomsky Once criticizing corpus and be only attempt to the most unlimited practical language material with the least sample representation flood tide, its result is necessarily deposited In deviation, lacking representativeness, " natural discourse storehouse is collected together in the most serious deviation, to such an extent as to its description carried out will be only One vocabulary " (Chomsky, 1962:159).This criticism to any based on probability statistics the research of means be all to have valency (McEnery, the 1996:5) of value.
Li Wenzhong thinks: language material text is a continuous print text or language fragment (running text or continuous stretches of Discourse) rather than encourage sentence and vocabulary.In Corpus Research, to the grammatical relation of a certain search word, usage, And large quantities of observations is that the linguistic context (context) provided by analysis is carried out.
Research about language material is more for theoretical property at present, for the research service of language material interpretative science, is not directed to concrete actual application; Corpus be chosen as research corpus, be scarcely the corpus that can directly use in concrete Practice of Translation;Specifically translate reality In trampling, how corpus gets involved translation, and how corpus forms the help to translation in other words, the most specifically mentions.Exist at present In translation industry, there is no a term intervention tool of a kind of comparative maturity, usually artificial reference, inefficient.
Summary of the invention
In order to solve the problems of the prior art, the present invention proposes to be capable of data retrieval and contrast during a kind of translation, matches Language material can easily be got involved in translation such that it is able to the reduction translation time, and improve and express in conforming translation language material in translation and be situated between Enter module and method.
In order to realize object above, the technical solution adopted in the present invention is:
Language material intervention module in a kind of translation, including:
Language material read module: read history corpus and the corpus prepared for translation activities for selectivity;
Translation material read module: for opening the material needing translation, reads the described material needing translation, and to described needs The material of translation carries out subordinate sentence process;
Language material and translation material retrieval matching module: for the material to the described needs translation read and process through subordinate sentence, sentence by sentence Start to search for maximum language material coupling successively from first word, finally give coupling language material position in the text and language material lexical or textual analysis;
Coupling language material display module: for distinctly displaying out by the language material of coupling and the translation of language material;
Coupling language material gets involved translation module: for replicating the language material translation of coupling, and selects position to paste in translation, from And realize the intervention to translation.
Language material interventional method in a kind of translation, comprises the following steps:
1) translation material read module opens the material needing translation, reads the material needing translation, and to needing the material of translation Carrying out subordinate sentence process, language material read module selectivity reads history corpus and the corpus prepared for translation activities simultaneously;
2) language material and translation material retrieval matching module are to reading and needing the material translated through subordinate sentence process, sentence by sentence from first Individual word starts to search for maximum language material coupling successively, finally gives coupling language material position in the text and language material lexical or textual analysis;And by coupling The language material of coupling and the translation of language material are distinctly displayed out by language material display module;
3) the language material translation of coupling is replicated by coupling language material intervention translation module, and selects position to paste in translation, thus Realize the language material in translation to get involved.
Described step 1) in translation material read module board, Word document called the Com interface of Word obtain word In text;The Com interface that excel document calls excel obtains the text in excel form.
Described step 1) in translation material read module according to punctuation mark rule, define sentence full stop, it would be desirable to translation Material cutting is sentence, runs into full stop and is judged as a tail.
Described translation material read module needs English fullstop is determined whether initialism punctuate, comprises initialism in dictionary, Dictionary is searched for word before fullstop and fullstop, if searching is then initialism punctuate, then ignores not as sentence full stop.
Described step 1) in the language material read module language material to reading in history corpus and the corpus for preparing for translation activities with Tabular form preserves, and is alphabetically sorted language material.
Described step 2) in language material and the concrete steps bag of coupling of the translation material retrieval matching module material to needing translation Include:
2.1) word is taken to group of words, language material list search group of words;
2.2) if searching the language material of a full coupling, then the information of language material is preserved;Continue to step 2.1) search bigger Coupling;
2.3) if searching a son coupling, i.e. group of words is a part for language material, then forward step 2.1 to) continue search for;
2.4) as do not searched coupling, then empty group of words, start to forward step 2.1 to after last group of words mated), Until all of translation material searches is complete.
Described step 2) in coupling language material display module shown the coupling language material demarcated by suspension window or symbol labeling form Translation, and this translation can edit.
Compared with prior art, the present invention utilizes language material read module selectivity to read history corpus and the language prepared for translation activities Material storehouse;Utilize translation material read module to open the material needing translation, read the material needing translation, and to needing the material of translation Material carries out subordinate sentence process;Language material and translation material retrieval matching module to reading and need the material translated through subordinate sentence process, by Sentence starts to search for maximum language material coupling successively from first word, finally gives coupling language material position in the text and language material lexical or textual analysis, and Distinctly display out by the language material of coupling and the translation of language material by coupling language material display module;Translation is got involved finally by coupling language material The language material translation of coupling is replicated by module, and selects position to paste in translation, thus realizes the intervention to translation.During translation Being capable of data retrieval and contrast, the language material matched can easily be got involved in translation such that it is able to the reduction translation time, and carries Concordance is expressed in high translation.
Further, translation material read module, according to punctuation mark rule, defines sentence full stop, it would be desirable to the material cutting of translation For sentence, run into full stop and be judged as a tail, English fullstop is needed to determine whether initialism punctuate, dictionary comprises breviary Word, searches for word before fullstop and fullstop in dictionary, if searching is then initialism punctuate, then ignores and terminates not as sentence Symbol, further increases the accuracy that subordinate sentence is processed by translation material read module, improves translation efficiency.
Further, language material read module alternative reads history corpus and aims at the corpus that this translation activities prepares, it is possible to To be read as the corpus that this translation activities prepares, being read as auxiliary reference by history corpus, the language material of reading is with row Table preserves, and is alphabetically sorted language material, it is possible to efficiency during language material coupling search such that it is able to reduction translation time.
Further, the coupling of language material and the translation material retrieval matching module material to needing translation uses the former of maximum language material coupling Then, it is possible to preferably the conscientious language material of material needing translation is mated, improve the efficiency of the present invention further.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is further explained.
The present invention is by five module compositions:
Module one: language material read module: alternative reads history corpus and aims at the corpus that this translation activities prepares, also It is main for can being read as the corpus that this translation activities prepares, and is read as auxiliary reference by history corpus.Read language material with List preserves, and is alphabetically sorted language material, improves efficiency during language material coupling search;
Module two: translation material read module: open the material needing translation, while opening material, material is carried out at subordinate sentence Reason.According to punctuation mark and rule, be sentence one by one by English text cutting, define sentence full stop, as English fullstop, Exclamation mark, question mark etc., run into full stop and be judged as that a tail, English fullstop also need to judge whether initialism, comprise breviary in dictionary Word, searches for word before fullstop and fullstop in dictionary, if searching is then initialism punctuate, then ignores and terminates not as sentence Symbol;
Module three: language material and translation material retrieval matching module: to the translation material read and process through subordinate sentence, sentence by sentence from first Individual word starts to search for maximum language material coupling successively, finally gives coupling language material position in the text and language material (language material+lexical or textual analysis);Tool Body includes: (1) takes a word to group of words, language material list search group of words;(2) if searching the language material of a full coupling, Then preserve the information (position+language material+lexical or textual analysis) of language material, continue to the coupling that step (1) search is bigger;(3) if searched for Mate (part that phrase is language material) to a son, then forward step (1) to;(4) if the coupling of not searching, then word is emptied Group, starts to forward step (1) to after last joins phrase, until all of translation material searches is complete;
Module four: coupling language material display module: the language material that every label for labelling is crossed is all the language material matched, is translating this sentence When, have various ways to show:
1) display mode one: the language material color matched shows that (color can set, and can set two kinds of colors, divide into this Language material in corpus that translation activities prepares and history corpus), when mouse is placed on this language material, mouse branch this language material existing The text box of translation, the when that mouse moving on text frame, optional replicate this translation, mouse leaves text frame, then Text frame exits;
2) display packing two: the language material color matched shows that (color can set, and can set two kinds of colors, divide into this Language material in corpus that translation activities prepares and history corpus), the translation of this language material directly directly displays with setting symbol mark After this language material;
3) display packing three: the language material color matched shows that (color can set, and can set two kinds of colors, divide into this Language material in corpus that translation activities prepares and history corpus), the display that suspends of the translation of this language material above this language material, data The when of moving on this translation, this language material can be edited, such as, can replicate and change translation content;
Module five: coupling language material gets involved translation module: the language material translation of different display modes can be by replicating, then in translation Selection position is pasted, thus realizes the intervention to translation.
The step that the inventive method is complete:
Open at tools interfaces and need the text (form can be Word, Excel, notepad, board etc.) of translation, text literary composition Part directly obtains text with general reading file module, and board, Word document call the Com interface of Word and obtain in word Text, excel calls the Com interface of excel and obtains the text in excel form;Then click on " language material intervention " (language material For history language material or be the special language material of this project), (language material form list divides two hurdles to show, left hurdle is to select language material according to prompting Language material, right hurdle are lexical or textual analysis) file, calls language material and translation material retrieval matching module obtains the language material information mated;
It is optional that coupling shows two ways, and 1) directly show lexical or textual analysis, root with special symbol, such as [] for the language material matched Obtain mating language material position in cypher text, in order to simplify insertion to language material at literary composition according to language material and translation material retrieval matching module The impact of this position, cypher text inserts the lexical or textual analysis of coupling language material from back to front;
2) suspend on language material display, and the when that mouse moving on this language material, the time of staying exceedes setting value (default is 3 seconds), Getting the position of mouse, get sentence according to this position, sentence, through language material and translation material retrieval matching module, obtains changing sentence The language material of coupling also shows at the suspension window of the position pop-up display at mouse place;
Directly replicate the lexical or textual analysis that two kinds of methods show, paste translation locations, complete language material counting in translation.
Being capable of data retrieval and contrast when the present invention translates, the language material matched can easily be got involved in translation such that it is able to contracting Subtract the translation time, and improve expression concordance in translation.

Claims (8)

1. language material intervention module in a translation, it is characterised in that including:
Language material read module: read history corpus and the corpus prepared for translation activities for selectivity;
Translation material read module: for opening the material needing translation, reads the described material needing translation, and to described needs The material of translation carries out subordinate sentence process;
Language material and translation material retrieval matching module: for the material to the described needs translation read and process through subordinate sentence, sentence by sentence Start to search for maximum language material coupling successively from first word, finally give coupling language material position in the text and language material lexical or textual analysis;
Coupling language material display module: for distinctly displaying out by the language material of coupling and the translation of language material;
Coupling language material gets involved translation module: for replicating the language material translation of coupling, and selects position to paste in translation, from And realize the intervention to translation.
2. language material interventional method in a translation, it is characterised in that comprise the following steps:
1) translation material read module opens the material needing translation, reads the material needing translation, and to needing the material of translation Carrying out subordinate sentence process, language material read module selectivity reads history corpus and the corpus prepared for translation activities simultaneously;
2) language material and translation material retrieval matching module are to reading and needing the material translated through subordinate sentence process, sentence by sentence from first Individual word starts to search for maximum language material coupling successively, finally gives coupling language material position in the text and language material lexical or textual analysis;And by coupling The language material of coupling and the translation of language material are distinctly displayed out by language material display module;
3) the language material translation of coupling is replicated by coupling language material intervention translation module, and selects position to paste in translation, thus Realize the language material in translation to get involved.
Language material interventional method in a kind of translation the most according to claim 2, it is characterised in that described step 1) in turn over Translate material read module and board, Word document are called the text in the Com interface acquisition word of Word;To excel literary composition Shelves call the Com interface of excel and obtain the text in excel form.
Language material interventional method in a kind of translation the most according to claim 3, it is characterised in that described step 1) in turn over Translate material read module according to punctuation mark rule, definition sentence full stop, it would be desirable to the material cutting of translation is sentence, runs into end Only symbol is judged as a tail.
Language material interventional method in a kind of translation the most according to claim 4, it is characterised in that described translation material reads Module needs English fullstop is determined whether initialism punctuate, comprises initialism in dictionary, search in dictionary fullstop and fullstop it Front word, if searching is then initialism punctuate, then ignores not as sentence full stop.
Language material interventional method in a kind of translation the most according to claim 2, it is characterised in that described step 1) in language The material read module language material to reading in history corpus and the corpus for preparing for translation activities is with tabular form preservation, and to language material It is alphabetically sorted.
Language material interventional method in a kind of translation the most according to claim 2, it is characterised in that described step 2) in language The concrete steps of the coupling of material and the translation material retrieval matching module material to needing translation include:
2.1) word is taken to group of words, language material list search group of words;
2.2) if searching the language material of a full coupling, then the information of language material is preserved;Continue to step 2.1) search bigger Coupling;
2.3) if searching a son coupling, i.e. group of words is a part for language material, then forward step 2.1 to) continue search for;
2.4) as do not searched coupling, then empty group of words, start to forward step 2.1 to after last group of words mated), Until all of translation material searches is complete.
Language material interventional method in a kind of translation the most according to claim 2, it is characterised in that described step 2) in Join language material display module and shown the translation of the coupling language material demarcated by suspension window or symbol labeling form, and this translation can be compiled Volume.
CN201610202189.1A 2016-03-31 2016-03-31 Corpus intervention module and method in translation Pending CN105843802A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610202189.1A CN105843802A (en) 2016-03-31 2016-03-31 Corpus intervention module and method in translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610202189.1A CN105843802A (en) 2016-03-31 2016-03-31 Corpus intervention module and method in translation

Publications (1)

Publication Number Publication Date
CN105843802A true CN105843802A (en) 2016-08-10

Family

ID=56596566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610202189.1A Pending CN105843802A (en) 2016-03-31 2016-03-31 Corpus intervention module and method in translation

Country Status (1)

Country Link
CN (1) CN105843802A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109683773A (en) * 2017-10-19 2019-04-26 北京国双科技有限公司 Corpus labeling method and device
CN110046261A (en) * 2019-04-22 2019-07-23 山东建筑大学 A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering
CN110263149A (en) * 2019-05-29 2019-09-20 科大讯飞股份有限公司 A kind of textual presentation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996166A (en) * 2009-08-14 2011-03-30 张龙哺 Bilingual sentence pair modular recording method and translation method and translation system thereof
CN102831109A (en) * 2012-08-08 2012-12-19 中国专利信息中心 Machine translating device based on intelligent matching and method thereof
CN105159892A (en) * 2015-08-28 2015-12-16 长安大学 Corpus extractor and corpus extraction method
CN105183723A (en) * 2015-09-17 2015-12-23 成都优译信息技术有限公司 Associating method for translation software and language material searching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996166A (en) * 2009-08-14 2011-03-30 张龙哺 Bilingual sentence pair modular recording method and translation method and translation system thereof
CN102831109A (en) * 2012-08-08 2012-12-19 中国专利信息中心 Machine translating device based on intelligent matching and method thereof
CN105159892A (en) * 2015-08-28 2015-12-16 长安大学 Corpus extractor and corpus extraction method
CN105183723A (en) * 2015-09-17 2015-12-23 成都优译信息技术有限公司 Associating method for translation software and language material searching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
哈乐: "基于实例的汉阿语言机器翻译***的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109683773A (en) * 2017-10-19 2019-04-26 北京国双科技有限公司 Corpus labeling method and device
CN110046261A (en) * 2019-04-22 2019-07-23 山东建筑大学 A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering
CN110263149A (en) * 2019-05-29 2019-09-20 科大讯飞股份有限公司 A kind of textual presentation method and device

Similar Documents

Publication Publication Date Title
Gottlieb Language and the modern state: The reform of written Japanese
CN106777275B (en) Entity attribute and property value extracting method based on more granularity semantic chunks
CN104298662B (en) A kind of machine translation method and translation system based on nomenclature of organic compound entity
CN102693222B (en) Carapace bone script explanation machine translation method based on example
JP3300866B2 (en) Method and apparatus for preparing text for use by a text processing system
CN101937430B (en) Method for extracting event sentence pattern from Chinese sentence
DE69925831T2 (en) MACHINE ASSISTED TRANSLATION TOOLS
US7823061B2 (en) System and method for text segmentation and display
CN101206639B (en) Method for indexing complex impression based on PDF
CN106066866A (en) A kind of automatic abstracting method of english literature key phrase and system
CN102043808B (en) Method and equipment for extracting bilingual terms using webpage structure
CN106570171A (en) Semantics-based sci-tech information processing method and system
CN101404036B (en) Keyword abstraction method for PowerPoint electronic demonstration draft
CN101361064A (en) A text editing apparatus and method
Kosem et al. Automation of lexicographic work: an opportunity for both lexicographers and crowd-sourcing
Didakowski et al. Automatic example sentence extraction for a contemporary German dictionary
CN102214166A (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN105068990B (en) A kind of English long sentence dividing method of more strategies of Machine oriented translation
CN106021224A (en) Bilingual discourse annotation method
Gantar et al. Discovering automated lexicography: The case of the Slovene lexical database
CN105843802A (en) Corpus intervention module and method in translation
CN107818082A (en) With reference to the semantic role recognition methods of phrase structure tree
CN110119510A (en) A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word
CN109783819A (en) A kind of generation method and system of regular expression
CN109766453A (en) A kind of method and system of user's corpus semantic understanding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160810

WD01 Invention patent application deemed withdrawn after publication