Summary of the invention
It is an object of the invention to for interpretation method rule-based in prior art and need network to carry out the defect translated, method and the device of a kind of machine translation are proposed, can when without understanding the grammatical rules of any language, any bilingual is carried out intertranslation, and can use when there is no network。
For achieving the above object, according to an aspect of the invention, it is provided a kind of method of machine translation。
The method of a kind of machine translation according to embodiments of the present invention, including:
The text of input first language;
Select the decoded information file that first language is corresponding with second language;
The text of described first language is decoded by the decoded information file according to selecting, and obtains the text of second language。
In technique scheme, it is preferable that the text of input first language is carried out normalization process。
In technique scheme, it is preferable that when described first language and second language are not English, the text of first language is decoded by the decoded information file selecting first language corresponding with English, obtains the text of English language;The decoded information file that reselection English is corresponding with second language, is decoded the text of the English language obtained, and obtains the text of second language。
For achieving the above object, according to another aspect of the present invention, it is provided that the device of a kind of machine translation。
The device of a kind of machine translation according to embodiments of the present invention, including:
Input module, for inputting the text of first language, and is sent to selection module and decoder module;
Select module, the title according to the text of the first language received and the second language needing translation, select the decoded information file that first language is corresponding with second language, and text and the described decoded information file of described first language are sent to decoder module;
Decoder module, for the text of described first language being decoded according to the first language text received and decoding message file, obtains the text of second language。
In technique scheme, described input module is additionally operable to, and the text of input first language is sent to normalization processing module;
Normalization processing module is used for, and the text of the described first language received is carried out normalization process, and the normalization process text of first language is sent to decoder module;
Described decoder module is additionally operable to, and the normalization process text of the first language received is decoded, obtains the text of second language。
In technique scheme, described selection module is additionally operable to, and when described first language and second language are not English, selects the first language decoded information file corresponding with English and the English decoded information file corresponding with second language。
In technique scheme, described decoder module is additionally operable to, and when described first language and second language are not English, according to the decoded information file that first language is corresponding with English, the first language text of input are decoded, obtain the text of English language;
Text further according to the English decoded information file corresponding with the second language English language to obtaining is decoded, and obtains the text of second language。
The method and apparatus of the machine translation of the present embodiment, can when without understanding the grammatical rules of any language, any bilingual is carried out intertranslation, and can use when there is no network so that first language can be translated as second language by user conveniently and efficiently。
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from description, or understand by implementing the present invention。The purpose of the present invention and other advantages can be realized by structure specifically noted in the description write, claims and accompanying drawing and be obtained。
Below by drawings and Examples, technical scheme is described in further detail。
Detailed description of the invention
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are illustrated, it will be appreciated that preferred embodiment described herein is merely to illustrate and explains the present invention, is not intended to limit the present invention。
A kind of method and apparatus of the machine translation of the present invention, processed by the normalization of the first language text to input, select the decoded information file corresponding with first language and second language, text after normalization being processed by the decoded information file selected is decoded, and obtains second language text。By being previously stored decoding program in the NandFlash of the hand-held electronic equipment such as electronic dictionary, mobile phone and including the decoded information storehouse of multiple any macaronic decoded information file, macaronic text when not having Internet or other networks, can be translated by user in real time。
Embodiment of the method
As it is shown in figure 1, according to embodiments of the present invention, it is provided that a kind of method of machine translation, including:
Step 101: the text of input first language;
Step 102: decoding program TextTranslateProgram.bin is called in SDRAM from file system, and runs;
Step 103: select the decoded information file that first language is corresponding with second language, for instance if selecting English-Chinese intertranslation, then open EngChiTT.bin decoded information file;
Step 104: accept the text of first language of input, carry out normalization process according to the kind of first language, for instance input into English " What ' syourfather?" normalization process after be " Whatisyourfather?" different language has different normalization routines;
Step 105: decoding program TextTranslateProgram.bin starts to analyze normalized text, attempt all combinations of the different order of any word, according to selected EngChiTT.bin decoded information file, generate second language text, wherein maximum probability person is translation result, as above example will attempt following combination, " Whatisyourfather?What is your father?" " Whatyourisfather?What you be father?" " Whatfatherisyour?Any father is you?" determine that " what is your father finally according to selected decoded information file?" for maximum probability person, it is translation result。
In the method embodiment, any language all shares a decoding program, called after TextTranslateProgram.bin。But arbitrarily bilingual has independent decoded information file, as EngChiTT.bin represents English and Chinese intertranslation information bank, EngAraTT.bin represents English and Ah position's uncle's literary composition intertranslation information bank, and is stored in the form of a file in the NandFlash of electronic equipment.
Wherein the method for the decoded information file build process in step 103 is as follows:
The first step, sets up the corpus of bilingual good intertranslation;
Second step, pretreatment, mainly carry out participle;
Chinese word segmentation, for instance document below, after participle instrument processes, has just reached to separate with space between each word。
China announced to set up " leading group " being made up of high-ranking official yesterday。
China announced to set up " leading group " being made up of high-ranking official yesterday。
Many language will carry out this participle, such as Thai language, Korean, Japanese, separates with space between word with word!
Between each words of language such as Great Britain and France's moral with space separately, participle is fairly simple!
Butscientistsarenotsurewhythishappens.
But scientist cannot ascertain why this phenomenon。
Iwillnotbeintoworkagaintoday.
I still can not enter class today。
3rd step, is trained and study corpus。
Mainly obtaining 2 models, language model and translation model, the former is the rule making a sentence meet this language, and the latter makes there is good intertranslation between 2 language。
Language model is explained in detail below:
If S represents word w1, the w2 that a succession of particular order arranges ..., wn, in other words,
S represents the significant sentence that some word arranged by a succession of particular order forms。
We wonder the S probability occurred in the text, and namely the probability of mathematically described S, represents with P (S)。Utilize the formula of conditional probability, the probability multiplication that the probability that this sequence of S occurs occurs equal to each word, then P (S) is deployable is:
P (S)=P (w1) P (w2 | w1) P (w3 | w1w2) ... P (wn | w1w2 ... wn-1)
Wherein P (w1) represents first word w1 probability occurred;P (w2 | w1) it is under the premise of known first word,
The probability that second word occurs;Analogize with secondary。Being not difficult to find out, arrived word wn, its probability of occurrence depends on all words before it。From calculating, various probabilities are too many, it is impossible to realize。Therefore it is assumed that any one word
The probability of occurrence of wi is only relevant with the word wi-1 before it, and then problem just becomes very simple。Now, the probability that S occurs just becomes:
P (S)=P (w1) P (w2 | w1) P (w3 | w2) ... P (wi | wi-1) ...
Lift a simply example to illustrate: the probability of word bird heel fly is very big!But the probability of apple heel fly is just only small!
Translation model illustration is as follows:
The probability that Scientists is translated as " scientist " is very big, but the probability being translated as " I " is just only small!
4th step, processes language model and translation model。
In order to meet the requirement that hand-held product uses when not having network full sentence to translate, above-mentioned two model is done substantial amounts of process work, has mainly had following work:
1, probability is transferred to natural number 8436 from the form of decimal such as 0.008436, because computer disposal natural number is more faster than decimal;
2, reducing translation option, a kind of possible translation is called a translation option!As " majority " this Chinese word has much following translation option, we only retain first 10 of maximum probability, it is possible to save memory space, and improve translation speed;
Most | | | Few | | | 0.09090910.01075270.004608290.00324682.718
Most | | | Many | | | 0.005025130.00339750.01382490.00974032.718
Most | | | Mostofthe | | | 0.1750.05260160.03225810.01040552.718
Most | | | Mostof | | | 0.1265820.05260160.04608290.0201132.718
Most | | | Most | | | 0.1552350.1049510.1981570.1720782.718
Most | | | Themany | | | 0.50.001134050.004608290.0005903192.718
Most | | | The | | | 0.0001988270.00014950.009216590.02272732.718
Most | | | amajorityvote | | | 10.1320580.004608290.0003477072.718
Most | | | amajority | | | 0.3333330.2509220.004608290.02141872.718
Most | | | majorityofpeople | | | 0.1428570.08387810.004608290.0003351342.718
Most | | | majorityof | | | 0.160.2509220.01843320.03188672.718
Most | | | majorityvote | | | 0.3333330.1320580.004608290.003584092.718
Most | | | majority | | | 0.4767440.2509220.188940.2207792.718
Most | | | many | | | 0.001923080.00211860.01843320.0259742.718
Most | | | mostofthe | | | 0.07692310.01628670.01382490.01629532.718
Most | | | mostof | | | 0.04958680.01628670.02764980.03149772.718
Most | | | mostpeople | | | 0.02127660.01639010.004608290.003499742.718
Most | | | most | | | 0.07735430.03232090.3179720.2694812.718
Most | | | of | | | 0.0004080520.00025260.04147470.1168832.718
Most | | | poor | | | 0.002747250.00153260.004608290.00649352.718
Most | | | themajority | | | 0.50.2509220.02764980.114222.718
Most | | | thepoor | | | 0.0129870.00153260.004608290.003359412.718
Most | | | vote | | | 0.01190480.01319260.004608290.01
3, sequence, searches during in order to translate quickly;
4, the realization of decoder algorithm, by suitable algorithm, by the finding of maximum probability in corresponding interpreter language, thus obtaining the most appropriate translation result。
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can be completed by the hardware that programmed instruction is relevant, aforesaid program can be stored in a computer read/write memory medium, this program upon execution, performs to include the step of said method embodiment;And aforesaid storage medium includes: the various media that can store program code such as ROM, RAM, magnetic disc or CDs。
Device embodiment
Device embodiment one
According to the present embodiment, it is provided that the device of a kind of machine translation。The present embodiment includes:
Input module, selects module and decoder module。
Input module, for inputting the text of first language, and is sent to selection module and decoder module;
Select module, the title according to the text of the first language received and the second language needing translation, select the decoded information file that first language is corresponding with second language, and text and the described decoded information file of described first language are sent to decoder module;
Decoder module, for the text of described first language being decoded according to the first language text received and decoding message file, obtains the text of second language。
Device embodiment two
As in figure 2 it is shown, according to the present embodiment, it is provided that the device of a kind of machine translation。The present embodiment includes:
Input module, processing module of standardizing, select module and decoder module。
Input module is used for, the text of input first language, and is sent to normalization processing module;
Normalization processing module is used for, and the text of the described first language received is carried out normalization process, and the normalization process text of first language is sent to selection module and decoder module;
Select module, process text according to the normalization of the first language received, select the decoded information file that first language is corresponding with second language, and described decoded information file is sent to decoder module;
Decoder module, processes text for the normalization according to the first language received and the normalization of described first language is processed text and is decoded by decoding message file, obtain the text of second language。
Device embodiment three
As it is shown on figure 3, according to the present embodiment, it is provided that the device of a kind of machine translation。The present embodiment includes:
Input module, processing module of standardizing, select module and decoder module。
Input module is used for, the text of input first language, and is sent to normalization processing module;
Normalization processing module is used for, and the text of the described first language received is carried out normalization process, and the normalization process text of first language is sent to selection module and decoder module;
Select module, process text according to the normalization of the first language received, select the decoded information file that first language is corresponding with English and the decoded information file that English is corresponding with second language, and said two decoded information file is sent to decoder module;
Decoder module, processes the text decoded information file corresponding with English with the first language normalization process text to described first language for the normalization according to the first language received and is decoded, obtain the text of English language;Text further according to the English decoded information file corresponding with the second language English language to obtaining is decoded, and obtains the text of second language。
A kind of device of the machine translation of the present embodiment, the text of the first language according to input module input, selecting module by selecting the decoded information file that first language is corresponding with second language, the text of first language is decoded by decoder module, and output module obtains second language text。By this device, namely user can complete macaronic intertranslation on a handheld electronic device, convenient and swift, and accuracy rate is high, it is not necessary to understands the rule of each language in advance, and can use when not having network。
Last it is noted that the foregoing is only the preferred embodiments of the present invention, it is not limited to the present invention, although the present invention being described in detail with reference to previous embodiment, for a person skilled in the art, technical scheme described in foregoing embodiments still can be modified by it, or wherein portion of techniques feature carries out equivalent replacement。All within the spirit and principles in the present invention, any amendment of making, equivalent replacement, improvement etc., should be included within protection scope of the present invention。