CN102591858B - A kind of method and apparatus of machine translation - Google Patents

A kind of method and apparatus of machine translation Download PDF

Info

Publication number
CN102591858B
CN102591858B CN201110357938.5A CN201110357938A CN102591858B CN 102591858 B CN102591858 B CN 102591858B CN 201110357938 A CN201110357938 A CN 201110357938A CN 102591858 B CN102591858 B CN 102591858B
Authority
CN
China
Prior art keywords
language
text
english
information file
decoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110357938.5A
Other languages
Chinese (zh)
Other versions
CN102591858A (en
Inventor
忻尚明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhao Huajie
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201110357938.5A priority Critical patent/CN102591858B/en
Publication of CN102591858A publication Critical patent/CN102591858A/en
Application granted granted Critical
Publication of CN102591858B publication Critical patent/CN102591858B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of method of machine translation and device, wherein, the method includes: the text of input first language;Select the decoded information file that first language is corresponding with second language;The text of described first language is decoded by the decoded information file according to selecting, and obtains the text of second language。The method and apparatus of the machine translation of the present invention, it is possible to the first language text according to input, when without understanding the grammatical rules of any language, carries out any bilingual intertranslation, and can use when not having network。

Description

A kind of method and apparatus of machine translation
Technical field
The present invention relates to machine translation mothod field, in particular it relates to the method and apparatus of a kind of machine translation。
Background technology
Present handheld consumer electronic product, such as electronic dictionary, owing to by hardware condition restrictions such as memory capacity, CPU speed, the full sentence translation technology used is based on the method for rule on digital learning machine。Full sentence translation technology in electronic dictionary is based on English-Chinese intertranslation in the market, it is possible to use when not having network。The product of smart mobile phone such as Apple also has multilingual full sentence translation technology, but must use when there being network, first the text to translate is sent to server, server translates, then result is returned again。
It is translated as example with English-Chinese, english sentence " Theybuilttoomanybuildingslastyear. " is translated as Chinese, we first will from this sentence of the angle analysis of English grammar, can obtain a result: They is subject, built is predicate, toomany be make adjective modify denumerable buildings, buildings be object, lastyear is the adverbial modifier, is the structure of a typical subject+predicate+object+adverbial modifier!This structure is corresponding to the structure of the subject+adverbial modifier+predicate+object of Chinese, and generating corresponding Chinese sentence is: " they built up a lot of building last year。" rule-based method can use when hardware is limited, it can take less internal memory, runs on slower CPU。For grammatical sentence, it is possible to obtain reasonable translation result, but also have disadvantages that。
On the one hand, the language Limited Number of rule-based method energy intertranslation, it is assumed that the intertranslation program of development language x and language y, we must summarize respective rule, for widely used language, if Chinese is with English, accomplish that this point is still likely, for the language such as French that rule is complicated, rare foreign languages are Arabic such as, will summarize corresponding rule, if putting into without huge manpower financial capacity, substantially impossible, and these language exactly need translation。
On the other hand, the spoken language of many language, do not meet the rule of language, but widely use again in daily life, just there is people that " you get ahead " is said as " you walk elder generation " such as Chinese, also having some cyberspeaks as " what " is said as " god horse ", encounter this situation, rule-based method is just helpless。
Along with developing rapidly of microelectric technique, memory space is increasing, and CPU speed is increasingly faster, and the even memorizer of low capacity, the CPU of low rate has logged out market, runs rule-based translation with present hardware and wastes one's talent on a petty job only not。
Summary of the invention
It is an object of the invention to for interpretation method rule-based in prior art and need network to carry out the defect translated, method and the device of a kind of machine translation are proposed, can when without understanding the grammatical rules of any language, any bilingual is carried out intertranslation, and can use when there is no network。
For achieving the above object, according to an aspect of the invention, it is provided a kind of method of machine translation。
The method of a kind of machine translation according to embodiments of the present invention, including:
The text of input first language;
Select the decoded information file that first language is corresponding with second language;
The text of described first language is decoded by the decoded information file according to selecting, and obtains the text of second language。
In technique scheme, it is preferable that the text of input first language is carried out normalization process。
In technique scheme, it is preferable that when described first language and second language are not English, the text of first language is decoded by the decoded information file selecting first language corresponding with English, obtains the text of English language;The decoded information file that reselection English is corresponding with second language, is decoded the text of the English language obtained, and obtains the text of second language。
For achieving the above object, according to another aspect of the present invention, it is provided that the device of a kind of machine translation。
The device of a kind of machine translation according to embodiments of the present invention, including:
Input module, for inputting the text of first language, and is sent to selection module and decoder module;
Select module, the title according to the text of the first language received and the second language needing translation, select the decoded information file that first language is corresponding with second language, and text and the described decoded information file of described first language are sent to decoder module;
Decoder module, for the text of described first language being decoded according to the first language text received and decoding message file, obtains the text of second language。
In technique scheme, described input module is additionally operable to, and the text of input first language is sent to normalization processing module;
Normalization processing module is used for, and the text of the described first language received is carried out normalization process, and the normalization process text of first language is sent to decoder module;
Described decoder module is additionally operable to, and the normalization process text of the first language received is decoded, obtains the text of second language。
In technique scheme, described selection module is additionally operable to, and when described first language and second language are not English, selects the first language decoded information file corresponding with English and the English decoded information file corresponding with second language。
In technique scheme, described decoder module is additionally operable to, and when described first language and second language are not English, according to the decoded information file that first language is corresponding with English, the first language text of input are decoded, obtain the text of English language;
Text further according to the English decoded information file corresponding with the second language English language to obtaining is decoded, and obtains the text of second language。
The method and apparatus of the machine translation of the present embodiment, can when without understanding the grammatical rules of any language, any bilingual is carried out intertranslation, and can use when there is no network so that first language can be translated as second language by user conveniently and efficiently。
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from description, or understand by implementing the present invention。The purpose of the present invention and other advantages can be realized by structure specifically noted in the description write, claims and accompanying drawing and be obtained。
Below by drawings and Examples, technical scheme is described in further detail。
Accompanying drawing explanation
Accompanying drawing is for providing a further understanding of the present invention, and constitutes a part for description, is used for together with embodiments of the present invention explaining the present invention, is not intended that limitation of the present invention。In the accompanying drawings:
Fig. 1 is the embodiment flow chart of the method for a kind of machine translation of the present invention。
Fig. 2 is the structure chart of the embodiment one of the device of a kind of machine translation of the present invention。
Fig. 3 is the structure chart of the embodiment two of the device of a kind of machine translation of the present invention。
Detailed description of the invention
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are illustrated, it will be appreciated that preferred embodiment described herein is merely to illustrate and explains the present invention, is not intended to limit the present invention。
A kind of method and apparatus of the machine translation of the present invention, processed by the normalization of the first language text to input, select the decoded information file corresponding with first language and second language, text after normalization being processed by the decoded information file selected is decoded, and obtains second language text。By being previously stored decoding program in the NandFlash of the hand-held electronic equipment such as electronic dictionary, mobile phone and including the decoded information storehouse of multiple any macaronic decoded information file, macaronic text when not having Internet or other networks, can be translated by user in real time。
Embodiment of the method
As it is shown in figure 1, according to embodiments of the present invention, it is provided that a kind of method of machine translation, including:
Step 101: the text of input first language;
Step 102: decoding program TextTranslateProgram.bin is called in SDRAM from file system, and runs;
Step 103: select the decoded information file that first language is corresponding with second language, for instance if selecting English-Chinese intertranslation, then open EngChiTT.bin decoded information file;
Step 104: accept the text of first language of input, carry out normalization process according to the kind of first language, for instance input into English " What ' syourfather?" normalization process after be " Whatisyourfather?" different language has different normalization routines;
Step 105: decoding program TextTranslateProgram.bin starts to analyze normalized text, attempt all combinations of the different order of any word, according to selected EngChiTT.bin decoded information file, generate second language text, wherein maximum probability person is translation result, as above example will attempt following combination, " Whatisyourfather?What is your father?" " Whatyourisfather?What you be father?" " Whatfatherisyour?Any father is you?" determine that " what is your father finally according to selected decoded information file?" for maximum probability person, it is translation result。
In the method embodiment, any language all shares a decoding program, called after TextTranslateProgram.bin。But arbitrarily bilingual has independent decoded information file, as EngChiTT.bin represents English and Chinese intertranslation information bank, EngAraTT.bin represents English and Ah position's uncle's literary composition intertranslation information bank, and is stored in the form of a file in the NandFlash of electronic equipment.
Wherein the method for the decoded information file build process in step 103 is as follows:
The first step, sets up the corpus of bilingual good intertranslation;
Second step, pretreatment, mainly carry out participle;
Chinese word segmentation, for instance document below, after participle instrument processes, has just reached to separate with space between each word。
China announced to set up " leading group " being made up of high-ranking official yesterday。
China announced to set up " leading group " being made up of high-ranking official yesterday。
Many language will carry out this participle, such as Thai language, Korean, Japanese, separates with space between word with word!
Between each words of language such as Great Britain and France's moral with space separately, participle is fairly simple!
Butscientistsarenotsurewhythishappens.
But scientist cannot ascertain why this phenomenon。
Iwillnotbeintoworkagaintoday.
I still can not enter class today。
3rd step, is trained and study corpus。
Mainly obtaining 2 models, language model and translation model, the former is the rule making a sentence meet this language, and the latter makes there is good intertranslation between 2 language。
Language model is explained in detail below:
If S represents word w1, the w2 that a succession of particular order arranges ..., wn, in other words,
S represents the significant sentence that some word arranged by a succession of particular order forms。
We wonder the S probability occurred in the text, and namely the probability of mathematically described S, represents with P (S)。Utilize the formula of conditional probability, the probability multiplication that the probability that this sequence of S occurs occurs equal to each word, then P (S) is deployable is:
P (S)=P (w1) P (w2 | w1) P (w3 | w1w2) ... P (wn | w1w2 ... wn-1)
Wherein P (w1) represents first word w1 probability occurred;P (w2 | w1) it is under the premise of known first word,
The probability that second word occurs;Analogize with secondary。Being not difficult to find out, arrived word wn, its probability of occurrence depends on all words before it。From calculating, various probabilities are too many, it is impossible to realize。Therefore it is assumed that any one word
The probability of occurrence of wi is only relevant with the word wi-1 before it, and then problem just becomes very simple。Now, the probability that S occurs just becomes:
P (S)=P (w1) P (w2 | w1) P (w3 | w2) ... P (wi | wi-1) ...
Lift a simply example to illustrate: the probability of word bird heel fly is very big!But the probability of apple heel fly is just only small!
Translation model illustration is as follows:
The probability that Scientists is translated as " scientist " is very big, but the probability being translated as " I " is just only small!
4th step, processes language model and translation model。
In order to meet the requirement that hand-held product uses when not having network full sentence to translate, above-mentioned two model is done substantial amounts of process work, has mainly had following work:
1, probability is transferred to natural number 8436 from the form of decimal such as 0.008436, because computer disposal natural number is more faster than decimal;
2, reducing translation option, a kind of possible translation is called a translation option!As " majority " this Chinese word has much following translation option, we only retain first 10 of maximum probability, it is possible to save memory space, and improve translation speed;
Most | | | Few | | | 0.09090910.01075270.004608290.00324682.718
Most | | | Many | | | 0.005025130.00339750.01382490.00974032.718
Most | | | Mostofthe | | | 0.1750.05260160.03225810.01040552.718
Most | | | Mostof | | | 0.1265820.05260160.04608290.0201132.718
Most | | | Most | | | 0.1552350.1049510.1981570.1720782.718
Most | | | Themany | | | 0.50.001134050.004608290.0005903192.718
Most | | | The | | | 0.0001988270.00014950.009216590.02272732.718
Most | | | amajorityvote | | | 10.1320580.004608290.0003477072.718
Most | | | amajority | | | 0.3333330.2509220.004608290.02141872.718
Most | | | majorityofpeople | | | 0.1428570.08387810.004608290.0003351342.718
Most | | | majorityof | | | 0.160.2509220.01843320.03188672.718
Most | | | majorityvote | | | 0.3333330.1320580.004608290.003584092.718
Most | | | majority | | | 0.4767440.2509220.188940.2207792.718
Most | | | many | | | 0.001923080.00211860.01843320.0259742.718
Most | | | mostofthe | | | 0.07692310.01628670.01382490.01629532.718
Most | | | mostof | | | 0.04958680.01628670.02764980.03149772.718
Most | | | mostpeople | | | 0.02127660.01639010.004608290.003499742.718
Most | | | most | | | 0.07735430.03232090.3179720.2694812.718
Most | | | of | | | 0.0004080520.00025260.04147470.1168832.718
Most | | | poor | | | 0.002747250.00153260.004608290.00649352.718
Most | | | themajority | | | 0.50.2509220.02764980.114222.718
Most | | | thepoor | | | 0.0129870.00153260.004608290.003359412.718
Most | | | vote | | | 0.01190480.01319260.004608290.01
3, sequence, searches during in order to translate quickly;
4, the realization of decoder algorithm, by suitable algorithm, by the finding of maximum probability in corresponding interpreter language, thus obtaining the most appropriate translation result。
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can be completed by the hardware that programmed instruction is relevant, aforesaid program can be stored in a computer read/write memory medium, this program upon execution, performs to include the step of said method embodiment;And aforesaid storage medium includes: the various media that can store program code such as ROM, RAM, magnetic disc or CDs。
Device embodiment
Device embodiment one
According to the present embodiment, it is provided that the device of a kind of machine translation。The present embodiment includes:
Input module, selects module and decoder module。
Input module, for inputting the text of first language, and is sent to selection module and decoder module;
Select module, the title according to the text of the first language received and the second language needing translation, select the decoded information file that first language is corresponding with second language, and text and the described decoded information file of described first language are sent to decoder module;
Decoder module, for the text of described first language being decoded according to the first language text received and decoding message file, obtains the text of second language。
Device embodiment two
As in figure 2 it is shown, according to the present embodiment, it is provided that the device of a kind of machine translation。The present embodiment includes:
Input module, processing module of standardizing, select module and decoder module。
Input module is used for, the text of input first language, and is sent to normalization processing module;
Normalization processing module is used for, and the text of the described first language received is carried out normalization process, and the normalization process text of first language is sent to selection module and decoder module;
Select module, process text according to the normalization of the first language received, select the decoded information file that first language is corresponding with second language, and described decoded information file is sent to decoder module;
Decoder module, processes text for the normalization according to the first language received and the normalization of described first language is processed text and is decoded by decoding message file, obtain the text of second language。
Device embodiment three
As it is shown on figure 3, according to the present embodiment, it is provided that the device of a kind of machine translation。The present embodiment includes:
Input module, processing module of standardizing, select module and decoder module。
Input module is used for, the text of input first language, and is sent to normalization processing module;
Normalization processing module is used for, and the text of the described first language received is carried out normalization process, and the normalization process text of first language is sent to selection module and decoder module;
Select module, process text according to the normalization of the first language received, select the decoded information file that first language is corresponding with English and the decoded information file that English is corresponding with second language, and said two decoded information file is sent to decoder module;
Decoder module, processes the text decoded information file corresponding with English with the first language normalization process text to described first language for the normalization according to the first language received and is decoded, obtain the text of English language;Text further according to the English decoded information file corresponding with the second language English language to obtaining is decoded, and obtains the text of second language。
A kind of device of the machine translation of the present embodiment, the text of the first language according to input module input, selecting module by selecting the decoded information file that first language is corresponding with second language, the text of first language is decoded by decoder module, and output module obtains second language text。By this device, namely user can complete macaronic intertranslation on a handheld electronic device, convenient and swift, and accuracy rate is high, it is not necessary to understands the rule of each language in advance, and can use when not having network。
Last it is noted that the foregoing is only the preferred embodiments of the present invention, it is not limited to the present invention, although the present invention being described in detail with reference to previous embodiment, for a person skilled in the art, technical scheme described in foregoing embodiments still can be modified by it, or wherein portion of techniques feature carries out equivalent replacement。All within the spirit and principles in the present invention, any amendment of making, equivalent replacement, improvement etc., should be included within protection scope of the present invention。

Claims (5)

1. the method for a machine translation, it is characterised in that including:
The text of input first language;
Selecting the decoded information file that first language is corresponding with second language, described decoded information file is first language and the information bank of second language intertranslation;
The text of input first language is carried out normalization process;
Analyze normalized text, attempt all combinations of the different order of any word, according to the decoded information file selected, the normalization of first language is processed text and be decoded, obtain the text of second language。
2. the method for machine translation according to claim 1, it is characterised in that also include:
When described first language and second language are not English, the text of first language is decoded by the decoded information file selecting first language corresponding with English, obtains the text of English language;
The decoded information file that reselection English is corresponding with second language, is decoded the text of the English language obtained, and obtains the text of second language。
3. the device of a machine translation, it is characterised in that including:
Input module, for inputting the text of first language, and is sent to selection module and decoder module;
Selecting module, according to the decoded information file that the text selecting first language of the first language received is corresponding with second language, and described decoded information file is sent to decoder module, described decoded information file is first language and the information bank of second language intertranslation;
Decoder module, for the text of described first language being decoded according to the first language text received and decoding message file, obtains the text of second language;
Also include:
Described input module is additionally operable to, and the text of input first language is sent to normalization processing module;
Normalization processing module, is used for, and the text of the described first language received is carried out normalization process, and the normalization process text of first language is sent to decoder module;
Described decoder module is additionally operable to, and analyzes normalized text, attempts all combinations of the different order of any word, the normalization process text of the first language received is decoded, obtains the text of second language。
4. the device of machine translation according to claim 3, it is characterised in that:
Described selection module is additionally operable to, and when described first language and second language are not English, selects the first language decoded information file corresponding with English and the English decoded information file corresponding with second language。
5. the device of machine translation according to claim 3, it is characterised in that:
Described decoder module is additionally operable to, and when described first language and second language are not English, according to the decoded information file that first language is corresponding with English, the first language text of input are decoded, obtain the text of English language;
Text further according to the English decoded information file corresponding with the second language English language to obtaining is decoded, and obtains the text of second language。
CN201110357938.5A 2011-11-11 2011-11-11 A kind of method and apparatus of machine translation Expired - Fee Related CN102591858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110357938.5A CN102591858B (en) 2011-11-11 2011-11-11 A kind of method and apparatus of machine translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110357938.5A CN102591858B (en) 2011-11-11 2011-11-11 A kind of method and apparatus of machine translation

Publications (2)

Publication Number Publication Date
CN102591858A CN102591858A (en) 2012-07-18
CN102591858B true CN102591858B (en) 2016-06-22

Family

ID=46480527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110357938.5A Expired - Fee Related CN102591858B (en) 2011-11-11 2011-11-11 A kind of method and apparatus of machine translation

Country Status (1)

Country Link
CN (1) CN102591858B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079028A (en) * 2007-05-29 2007-11-28 中国科学院计算技术研究所 On-line translation model selection method of statistic machine translation
CN101382937A (en) * 2008-07-01 2009-03-11 深圳先进技术研究院 Multimedia resource processing method based on speech recognition and on-line teaching system thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4330285B2 (en) * 2001-04-16 2009-09-16 沖電気工業株式会社 Machine translation dictionary registration device, machine translation dictionary registration method, machine translation device, machine translation method, and recording medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079028A (en) * 2007-05-29 2007-11-28 中国科学院计算技术研究所 On-line translation model selection method of statistic machine translation
CN101382937A (en) * 2008-07-01 2009-03-11 深圳先进技术研究院 Multimedia resource processing method based on speech recognition and on-line teaching system thereof

Also Published As

Publication number Publication date
CN102591858A (en) 2012-07-18

Similar Documents

Publication Publication Date Title
US20140163951A1 (en) Hybrid adaptation of named entity recognition
Tran et al. JAIST: Combining multiple features for answer selection in community question answering
US10073673B2 (en) Method and system for robust tagging of named entities in the presence of source or translation errors
CN110737768B (en) Text abstract automatic generation method and device based on deep learning and storage medium
CN107357772A (en) List filling method, device and computer equipment
KR101732634B1 (en) Statistical Machine Translation Method using Dependency Forest
US20220309357A1 (en) Knowledge graph (kg) construction method for eventuality prediction and eventuality prediction method
Hasan et al. Recognizing Bangla grammar using predictive parser
CN114580382A (en) Text error correction method and device
CN103314369B (en) Machine translation apparatus and method
US9311299B1 (en) Weakly supervised part-of-speech tagging with coupled token and type constraints
CN109635297A (en) A kind of entity disambiguation method, device, computer installation and computer storage medium
Meetei et al. WAT2019: English-Hindi translation on Hindi visual genome dataset
CN101876975A (en) Identification method of Chinese place name
CN114818891A (en) Small sample multi-label text classification model training method and text classification method
CN112581327A (en) Knowledge graph-based law recommendation method and device and electronic equipment
Litake et al. L3cube-mahaner: A marathi named entity recognition dataset and bert models
CN101470701A (en) Text analyzer supporting semantic rule based on finite state machine and method thereof
Patil et al. L3cube-mahaner: A marathi named entity recognition dataset and BERT models
CN112949293A (en) Similar text generation method, similar text generation device and intelligent equipment
Banik Sentiment induced phrase-based machine translation: Robustness analysis of PBSMT with senti-module
CN102591858B (en) A kind of method and apparatus of machine translation
Sreeram et al. A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model.
CN114861628A (en) System, method, electronic device and storage medium for training machine translation model
CN104899193B (en) The interactive interpretation method of translation fragment is limited in a kind of computer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20160518

Address after: 523000 Guangdong city of Dongguan province Dongcheng District east-courser Jinghu Chunxiao Road 2 Road 20 B room 702 ladder

Applicant after: Zhang Shenglin

Address before: 523000 Guangdong city of Dongguan province Dongcheng District Wentang village industrial zone two soap

Applicant before: Dongguan Comet Electronics Holding Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20171227

Address after: 523000 Dongguan City, Guangdong, Dongguan, Dongcheng District, the main mountain, high field Fang Yun mansion building

Patentee after: Zhao Huajie

Address before: 523000 Guangdong city of Dongguan province Dongcheng District east-courser Jinghu Chunxiao Road 2 Road 20 B room 702 ladder

Patentee before: Zhang Shenglin

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160622