CN106126508A - A kind of language material management method - Google Patents

A kind of language material management method Download PDF

Info

Publication number
CN106126508A
CN106126508A CN201610459092.9A CN201610459092A CN106126508A CN 106126508 A CN106126508 A CN 106126508A CN 201610459092 A CN201610459092 A CN 201610459092A CN 106126508 A CN106126508 A CN 106126508A
Authority
CN
China
Prior art keywords
language material
corpus
clouds
participle
management method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610459092.9A
Other languages
Chinese (zh)
Inventor
张井
陈件
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai One Mdt Infotech Ltd
Original Assignee
Shanghai One Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai One Mdt Infotech Ltd filed Critical Shanghai One Mdt Infotech Ltd
Priority to CN201610459092.9A priority Critical patent/CN106126508A/en
Publication of CN106126508A publication Critical patent/CN106126508A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of language material management method, including step: newly-built high in the clouds corpus, after the language material preparing to import is set up the participle evidence of falling row, store in the corpus of high in the clouds;The open at least two Account interface of high in the clouds corpus;Cloud database reads and writes according to all Account interface, and by according to reading and writing data real-time learning and the new language material of increase, stores and add in the corpus of high in the clouds after the participle evidence of falling row.The present invention contributes to carrying out the management of magnanimity language material, and helps avoid the problem that many people repeat translation.

Description

A kind of language material management method
Technical field
The present invention relates to a kind of translation technology field, particularly relate to a kind of language material management method.
Background technology
The corpus that translation industry accumulates in years development, production process is in large scale, and conventional desktop CAT is due to table The processor of face PC limits, it is impossible to the corpus of management magnanimity scale.And, if a team is collectively responsible for a certain file During translation, it is likely that duplicate translation, the problem lost time.
How to manage the corpus of magnanimity scale, it is to avoid many people repeat translation minimizing translation amount and become those skilled in the art Problem demanding prompt solution.
It should be noted that introduction to technical background above be intended merely to the convenient technical scheme to the application carry out clear, Complete explanation, and facilitate the understanding of those skilled in the art to illustrate.Can not be merely because these schemes be the application's Background section is set forth and thinks that technique scheme is known to those skilled in the art.
Summary of the invention
Because the drawbacks described above of prior art, the technical problem to be solved is to provide one and contributes to management The corpus of magnanimity scale, and help avoid the language material management method that many people repeat the problem of translation.
For achieving the above object, the invention provides a kind of language material management method, including step:
Newly-built high in the clouds corpus, after the language material preparing to import is set up the participle evidence of falling row, stores in the corpus of high in the clouds;
The open at least two Account interface of high in the clouds corpus;
Cloud database reads and writes according to all Account interface, and will learn in real time according to reading and write data The new language material practised and increase, stores after the participle evidence of falling row and adds in the corpus of high in the clouds.
Preferably, described reading and write data are to carry out machine translation training by training engine, reach to learn in real time The purpose of the new language material practised and increase.Although directly will read and the data of write, with sentence to and participle fall the form of row and deposit Enter high in the clouds corpus can also, but through training engine carry out machine translation training, can reach preferable at some professional field Training effect, and then make new language material more practicality and the accuracy increased.
Preferably, the new language material that described Account interface can increase according to this Account interface and training engine training are specific to The Machine Translation Model of this Account interface.Repeat the training of Machine Translation Model, so that new language material is existed by user After storage, act on the training of Machine Translation Model next time, produce the circulative metabolism of forward so that the matter of MT engine Measure more and more higher, the most increasingly meet the use habit of user.
Preferably, described new language material, after uploading high in the clouds corpus, first passes through document analysis biological function explore and obtains each Sentence is right, for each sentence to respectively original text and translation being carried out participle, then sets up the participle evidence of falling row of correspondence, and stores In the corpus of high in the clouds.The storage form of language material is united, facilitates later retrieval to use.
Preferably, its participle evidence of falling row corresponding is stored by described sentence.By sentence to and the corresponding participle evidence of falling row Together store so that user is when search, it is possible to more fully retrieve.
Preferably, read and the data of write when new, or the sentence of its correspondence to and the participle evidence of falling row be included in high in the clouds language Time in material storehouse, related data is carried out feedback display by high in the clouds corpus automatically.Automatically retrieve and feedback due to it so that Yong Huwu Need to do it yourself inquiry, this makes translation efficiency highly efficient, and its translation is the most unified and high-quality.
Preferably, when described high in the clouds corpus receives retrieval request, according to retrieval request content, carry out participle, and Investigate inquiry with word segmentation result, and return the highest participle stored of similarity and fall row according to as retrieval result.Will The retrieval result that similarity is the highest returns, and can be user-friendly to;Certainly, retrieval result here can be fed back multiple similar Spend higher result, in order to user selects.
Preferably, described similarity is that the frequency according to participle row calculates, and the highest then similarity of frequency is the highest.Citing Illustrating, such as user's input [People's University], the key word participle that user is inputted by platform is [Chinese], [people], according to dividing The result of word goes inquiry to fall to arrange, hit<people, 1>,<university, 1-2>two participles row, calculates frequency, sentence according to row Son 1 occurs twice, and sentence 2 occurs once, then the sentence that similarity is the highest herein is sentence 1: the Renmin University of China.
Preferably, described Account interface includes Trados, at least one software in Visual, Transmate, memoQ Or the account of platform.The account of many translation software and transcription platform is suitable for so that in time the user of different platform also be able to into Row cooperation translation.
The invention has the beneficial effects as follows: in the present invention, corpus is presented in the corpus of high in the clouds, due to its amount of storage Increase so that the present invention can carry out the management of magnanimity language material;It addition, due to the fact that and allow the account that at least two is different to connect Mouth accesses unified high in the clouds corpus, and high in the clouds corpus possesses the reading according to user and write carries out learning and increase new language material Function, this make team collaboration member in translation process, by translated sentence to being written in real time in the corpus of high in the clouds, Realize Real-Time Sharing translation memory library, thus reduce translation amount by avoiding many people to repeat translation, accelerate translation speed, by turning over Translating the complete matching technique of data base keeps unified translation to promote translation quality.
With reference to explanation hereinafter and accompanying drawing, disclose in detail the particular implementation of the application, specify the former of the application Reason can be in adopted mode.It should be understood that presently filed embodiment is not so limited in scope.In appended power In the range of the spirit and terms that profit requires, presently filed embodiment includes many changes, revises and be equal to.
The feature described for a kind of embodiment and/or illustrate can be in same or similar mode one or more Other embodiment individual uses, combined with the feature in other embodiment, or substitute the feature in other embodiment.
It should be emphasized that term " includes/comprises " existence referring to feature, one integral piece, step or assembly herein when using, but also It is not excluded for the existence of one or more further feature, one integral piece, step or assembly or additional.
Accompanying drawing explanation
Included accompanying drawing is used for providing being further understood from the embodiment of the present application, which constitutes of description Point, it is used for illustrating presently filed embodiment, and describes, with word, the principle coming together to explain the application.Under it should be evident that Accompanying drawing in the description of face is only some embodiments of the application, for those of ordinary skill in the art, is not paying wound On the premise of the property made is laborious, it is also possible to obtain other accompanying drawing according to these accompanying drawings.In the accompanying drawings:
Fig. 1 is the flow chart of the embodiment of the present invention.
Detailed description of the invention
For the technical scheme making those skilled in the art be more fully understood that in the application, real below in conjunction with the application Execute the accompanying drawing in example, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described enforcement Example is only some embodiments of the present application rather than whole embodiments.Based on the embodiment in the application, this area is common All other embodiments that technical staff is obtained under not making creative work premise, all should belong to the application protection Scope.
Fig. 1 is the flow chart of the embodiment of the present invention, sees Fig. 1, a kind of language material management method, including step:
S1: newly-built high in the clouds corpus, after the language material preparing to import is set up the participle evidence of falling row, storage to high in the clouds corpus In;
The open at least two Account interface of S2: high in the clouds corpus;
S3: cloud database reads and writes according to all Account interface, and will be according to reading and write data in fact Time study and increase new language material, store after the participle evidence of falling row and add in the corpus of high in the clouds.
The invention has the beneficial effects as follows: in the present invention, corpus is presented in the corpus of high in the clouds, due to its amount of storage Increase so that the present invention can carry out the management of magnanimity language material;It addition, due to the fact that and allow the account that at least two is different to connect Mouth accesses unified high in the clouds corpus, and high in the clouds corpus possesses the reading according to user and write carries out learning and increase new language material Function, this make team collaboration member in translation process, by translated sentence to being written in real time in the corpus of high in the clouds, Realize Real-Time Sharing translation memory library, thus reduce translation amount by avoiding many people to repeat translation, accelerate translation speed, by turning over Translating the complete matching technique of data base keeps unified translation to promote translation quality.
Language material in the present invention, typically presented in TMX file.
The present embodiment is preferred, and reading and write data is to carry out machine translation training by training engine, reaches real Time study and the purpose of new language material that increases.Although directly will read and the data of write, with sentence to and participle fall the shape of row Formula be stored in high in the clouds corpus can also, but through training engine carry out machine translation training, can reach at some professional field Preferably training effect, and then make new language material more practicality and the accuracy increased.
The present embodiment is preferred, and the new language material that Account interface can increase according to this Account interface and training engine training are special Belong to the Machine Translation Model of this Account interface.Repeat the training of Machine Translation Model, so that user is by new language Material after storing, acts on the training of Machine Translation Model next time, produces the circulative metabolism of forward so that MT engine Quality more and more higher, the most increasingly meet the use habit of user.
The present embodiment is preferred, and new language material, after uploading high in the clouds corpus, first passes through document analysis biological function explore and obtains often Article one, sentence is right, for each sentence to respectively original text and translation being carried out participle, then sets up the participle evidence of falling row of correspondence, and Store in the corpus of high in the clouds.The storage form of language material is united, facilitates later retrieval to use.
The present embodiment is preferred, and its participle evidence of falling row corresponding is stored by sentence.By sentence to and corresponding participle fall row Data together store so that user is when search, it is possible to more fully retrieve.
The present embodiment is preferred, reads and the data of write when new, or the sentence of its correspondence to and the participle evidence of falling row comprise Time beyond the clouds in corpus, related data is carried out feedback display by high in the clouds corpus automatically.Automatically retrieve and feedback due to it, make User without inquiry of doing it yourself, this makes translation efficiency highly efficient, and its translation is the most unified and high-quality.
The present embodiment is preferred, when high in the clouds corpus receives retrieval request, according to retrieval request content, carries out point Word, and investigate inquiry with word segmentation result, and return the highest participle stored of similarity and fall row according to as retrieval knot Really.Retrieval result the highest for similarity is returned, can be user-friendly to;Certainly, retrieval result here can be fed back many The result that individual similarity is higher, in order to user selects.
The present embodiment is preferred, and similarity is that the frequency according to participle row calculates, and the highest then similarity of frequency is the highest. Illustrating, such as user's input [People's University], the key word participle that user is inputted by platform is [Chinese], [people], root Inquiry is gone to fall to arrange according to the result of participle, hit<people, 1>,<university, 1-2>two participles row, calculate frequency according to row Rate, sentence 1 occurs twice, and sentence 2 occurs once, then the sentence that similarity is the highest herein is sentence 1: the Renmin University of China.
The present embodiment is preferred, and Account interface includes Trados, at least one in Visual, Transmate, memoQ Software or the account of platform.The account of many translation software and transcription platform is suitable for so that the user of different platform also can in time Enough carry out cooperation translation.
Trados involved in the present invention is a kind of translation memory software, and Visual is that a kind of programming tool is put down in other words Platform, Transmate is a kind of translation software, and memoQ is a kind of translation tool.
The preferred embodiment of the present invention described in detail above.Should be appreciated that those of ordinary skill in the art without Need creative work just can make many modifications and variations according to the design of the present invention.Therefore, all technology in the art Personnel are available by logical analysis, reasoning, or a limited experiment the most on the basis of existing technology Technical scheme, all should be in the protection domain being defined in the patent claims.

Claims (9)

1. a language material management method, is characterized in that: include step:
Newly-built high in the clouds corpus, after the language material preparing to import is set up the participle evidence of falling row, stores in the corpus of high in the clouds;
The open at least two Account interface of high in the clouds corpus;
Cloud database reads and writes according to all Account interface, and will according to read and write data real-time learning and The new language material increased, stores after the participle evidence of falling row and adds in the corpus of high in the clouds.
2. language material management method as claimed in claim 1, is characterized in that: described reading and write data are to be entered by training engine Row machine translation is trained, and reaches the purpose of the new language material of real-time learning and increase.
3. language material management method as claimed in claim 2, is characterized in that: described Account interface can increase according to this Account interface New language material and training engine training be specific to the Machine Translation Model of this Account interface.
4. language material management method as claimed in claim 1, is characterized in that: described new language material, after uploading high in the clouds corpus, first leads to Crossing document analysis biological function explore, to obtain each sentence right, for each sentence to respectively original text and translation being carried out participle, then Set up the corresponding participle evidence of falling row, and store in the corpus of high in the clouds.
5. language material management method as claimed in claim 3, is characterized in that: its participle evidence of falling row corresponding is deposited by described sentence Storage.
6. language material management method as claimed in claim 1, is characterized in that: read and the data of write when new, or the sentence of its correspondence To and time the participle evidence of falling row is included in the corpus of high in the clouds, related data is carried out feedback display by high in the clouds corpus automatically.
7. language material management method as claimed in claim 1, is characterized in that: when described high in the clouds corpus receives retrieval request, According to retrieval request content, carry out participle, and investigate inquiry with word segmentation result, and return the highest having stored of similarity Participle falls row according to as retrieval result.
8. language material management method as claimed in claim 6, is characterized in that: described similarity is that the frequency according to participle row calculates , the highest then similarity of frequency is the highest.
9. language material management method as claimed in claim 1, is characterized in that: described Account interface includes Trados, Visual, At least one software in Transmate, memoQ or the account of platform.
CN201610459092.9A 2016-06-22 2016-06-22 A kind of language material management method Pending CN106126508A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610459092.9A CN106126508A (en) 2016-06-22 2016-06-22 A kind of language material management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610459092.9A CN106126508A (en) 2016-06-22 2016-06-22 A kind of language material management method

Publications (1)

Publication Number Publication Date
CN106126508A true CN106126508A (en) 2016-11-16

Family

ID=57269160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610459092.9A Pending CN106126508A (en) 2016-06-22 2016-06-22 A kind of language material management method

Country Status (1)

Country Link
CN (1) CN106126508A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649280A (en) * 2017-02-13 2017-05-10 长沙军鸽软件有限公司 Method for creating shared corpus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1916905A (en) * 2006-09-04 2007-02-21 北京航空航天大学 Method for carrying out retrieval hint based on inverted list
CN102270198A (en) * 2011-08-16 2011-12-07 上海交通大学出版社有限公司 Computer assisted translation system
CN103020044A (en) * 2012-12-03 2013-04-03 江苏乐买到网络科技有限公司 Machine-aided webpage translation method and system thereof
CN103218354A (en) * 2013-03-28 2013-07-24 曾立人 On-line translation memory exchange method and system
CN105224683A (en) * 2015-10-28 2016-01-06 北京护航科技有限公司 A kind of natural language analysis intelligent interactive method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1916905A (en) * 2006-09-04 2007-02-21 北京航空航天大学 Method for carrying out retrieval hint based on inverted list
CN102270198A (en) * 2011-08-16 2011-12-07 上海交通大学出版社有限公司 Computer assisted translation system
CN103020044A (en) * 2012-12-03 2013-04-03 江苏乐买到网络科技有限公司 Machine-aided webpage translation method and system thereof
CN103218354A (en) * 2013-03-28 2013-07-24 曾立人 On-line translation memory exchange method and system
CN105224683A (en) * 2015-10-28 2016-01-06 北京护航科技有限公司 A kind of natural language analysis intelligent interactive method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张春祥,高雪瑶: "《基于短语评价的翻译知识获取》", 28 February 2012 *
王继辉: "《译界2015中国应用翻译论文专辑 第1辑》", 28 February 2016 *
蒋亚民: "《训练的科学方法》", 31 May 2012 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649280A (en) * 2017-02-13 2017-05-10 长沙军鸽软件有限公司 Method for creating shared corpus
CN106649280B (en) * 2017-02-13 2019-07-09 长沙军鸽软件有限公司 A method of creating shared corpus

Similar Documents

Publication Publication Date Title
CN104504001B (en) Towards the vernier building method of magnanimity distributed relational database
US20180144065A1 (en) Method for Generating Visual Representations of Data Based on Controlled Natural Language Queries and System Thereof
CN103955538B (en) HBase data persistence and query methods and HBase system
RU2005139141A (en) ACCESS TO COMPLEX DATA
CN108108426A (en) Understanding method, device and the electronic equipment that natural language is putd question to
CN111061828B (en) Digital library knowledge retrieval method and device
US20160070707A1 (en) Keyword search on databases
CN106570173A (en) High-dimensional sparse text data clustering method based on Spark
CN115309885A (en) Knowledge graph construction, retrieval and visualization method and system for scientific and technological service
Baralis et al. Learning from summaries: Supporting e-learning activities by means of document summarization
CN110119404A (en) A kind of intelligence access system and method based on natural language understanding
CN106126508A (en) A kind of language material management method
EP3306540A1 (en) System and method for content affinity analytics
Leung The Journeys of Books: Rare Books and Manuscripts Provenance Metadata in a Digital Age
CN106649833A (en) Knowledge database construction method based on knowledge module and query method and system
Tzitzikas et al. CIDOC-CRM and machine learning: a survey and future research
CN111831812B (en) Reading comprehension data set automatic generation method and device based on knowledge graph
Haw et al. XMapDB-Sim: Performance evalaution on model-based XML to Relational Database mapping choices
Krajewski Note-keeping: History, theory, practice of a counter-measurement against forgetting
Lönnqvist The research processes of humanities scholars
US11550780B2 (en) Pre-constructed query recommendations for data analytics
Klein et al. Digging into data white paper: Trading consequences
Sohail et al. From ER model to star model: a systematic transformation approach
CN112259099B (en) Task processing method and device based on voice interaction and storage medium
Zou et al. Research and application of association rule mining algorithm based on multidimensional sets

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161116

RJ01 Rejection of invention patent application after publication