CN1077545A - Device for language reproduction - Google Patents

Device for language reproduction Download PDF

Info

Publication number
CN1077545A
CN1077545A CN 92102017 CN92102017A CN1077545A CN 1077545 A CN1077545 A CN 1077545A CN 92102017 CN92102017 CN 92102017 CN 92102017 A CN92102017 A CN 92102017A CN 1077545 A CN1077545 A CN 1077545A
Authority
CN
China
Prior art keywords
word
code
index
dictionary
index code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 92102017
Other languages
Chinese (zh)
Other versions
CN1040702C (en
Inventor
罗进财
林启轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to CN92102017A priority Critical patent/CN1040702C/en
Publication of CN1077545A publication Critical patent/CN1077545A/en
Application granted granted Critical
Publication of CN1040702C publication Critical patent/CN1040702C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The user is from the pronunciation symbol sequence of input part 11 input random lengths.Index code handling part 12 is converted to the pronunciation symbol sequence of being imported the index code of retrieval usefulness.Code character search part 14 with the above-mentioned index code that converts to as the corresponding code character in 15 retrievals of retrieval key cross index storage part and the corresponding dictionary of index code.Form in the dictionary 16 and each pronunciation of index stores portion 15 code character one to one, and to store the corresponding pronunciation of corresponding code character with each be the initial whole word index sign indicating numbers and the kanji code of each literal of word.Converter section 17 retrieves corresponding word with reference to dictionary 16 from corresponding code character as the retrieval key with the index code of being imported and exports to efferent.

Description

Device for language reproduction
The present invention relates to can be effectively used to middle national language/Japanese input system, word processor, particularly can find the word of correspondence and the device for language reproduction of relevant information at high speed from small-sized dictionary with the index code of being imported.
Common device for language reproduction is information such as the pronunciation that will be imported or radicals by which characters are arranged in traditional Chinese dictionaries are converted to the word sequence of respective word from dictionary as the retrieval key a device.
As existing device for language reproduction the middle national language Chinese character converter of being put down in writing in the Japanese kokai publication sho 59-121425 communique for example being arranged, is the device of the diacritic of middle national language being found corresponding word as the retrieval key from dictionary.The system chart of this invention is shown among Fig. 4 (a).Fig. 4 (b) is the constituted mode of dictionary among the corresponding embodiment:
Three kinds of diacritics that the watch sound mode of middle national language has Taiwan (attention, phonetic 2) and China's Mainland (phonetic 1) to use.Should existing example describe with China's Mainland (phonetic 1).
Middle national language is that a Chinese character is corresponding to a syllable in principle.Syllable is made of statement, simple or compound vowel of a Chinese syllable and tone, and simple or compound vowel of a Chinese syllable can also be subdivided into referral letter and main simple or compound vowel of a Chinese syllable, thereby it is constructed as follows:
Initial consonant+referral letter+main simple or compound vowel of a Chinese syllable+tone
In Fig. 4 (a), the 31st, be the tripping device of Roman capitals data harmony adjusting data with the data separating of being imported.The 33rd, store the dictionary of Roman capitals sequence, Chinese character sequence, tone and frequency projects of each word with the main points that are shown among Fig. 4 (b).The 32nd, with the comparable device that takes out from above-mentioned dictionary 33 with the corresponding all homonyms of the Roman capitals sequence of giving through above-mentioned tripping device 31.The 34th, the tone data of the Chinese character sequence that obtains through comparable device 32 and tripping device 31 are compared and export the Chinese character sequence of regulation, the usage frequency that utilizes corresponding Chinese character sequence in the occasion that does not have above-mentioned tone data is by the order output of frequency height and can select the comparison means of desired Chinese character sequence simultaneously.
In existing device for language reproduction as constituted above, for example think that the occasion of input " China " is at first imported its pronunciation " zhong 1 guo 2 " from keyboard.So tripping device 31 is separated into (zhong guo) Roman capitals sequence and (1,2) tone data.By comparable device 34 with (zhong guo) as the retrieval key from dictionary 33 searching word one by one.Yet the word of being taken in dictionary 33 has " China " and " middle fruit ", and the tone data be (1,2) be " China ", thereby judge and export " China " by comparison means.
Not only stored pronunciation symbol and corresponding word in the dictionary of above-mentioned existing example shown in Fig. 4 (b), the usage frequency of tone data and word is also stored in the reference during as conversion.The mode of storing data respectively makes waste of storage space.And on the other hand " China ", " middle national language " etc. there are some word of replicated literal sequence also to waste storage space as different project storages.
In view of the foregoing, the present invention is set in the usage frequency of word in the index code, and utilize the few word of number of words to be included in the so-called word feature of (promptly Duan word is included in the long word) in the many words of number of words, by means of the separation key a plurality of words of some replicated literal sequence are linked up as a word and be stored in the dictionary.Adopt above-mentioned dictionary structure energy conserve storage.
For addressing the above problem, the object of the invention is to provide a kind of device for language reproduction, it is characterized in that comprising, deposit the kanji code of the index code of word and corresponding character sequence in order in and divide and make information such as separation key that code character is divided into each word with the usage frequency of word, the longer word that contains short word and be configured in dictionary in index code or the kanji code respectively; According to the starting shift key forward part of the index code of input is retrieved the code character indexing unit of corresponding code character as the retrieval key in described dictionary; The index code of input is retrieved corresponding word as the retrieval key from the corresponding code character that retrieves, perhaps by starting again the conversion equipment that shift key retrieves the usage frequency of the longer word that contains corresponding word and these words.
The present invention because of as above-mentioned formation, the user imports the index code of certain word, the code character search part just retrieve with the corresponding dictionary of this index code in corresponding code character.Then converter section retrieves as the retrieval key index code of being imported the usage frequency of corresponding word and respective word from corresponding code character.And the user can also be by selecting index code by shift key again the forward part unanimity, as a plurality of words of candidate word.
The description of the drawings
The block diagram that Fig. 1 constitutes for expression one embodiment of the invention device for language reproduction.
Fig. 2 is the process flow diagram of the processing procedure of expression one embodiment of the invention.
Fig. 3 is the process flow diagram of the processing procedure of expression one embodiment of the invention.
The block diagram that Fig. 4 (a) constitutes for the existing device for language reproduction of expression.
The key diagram that Fig. 4 (b) constitutes for dictionary in the same existing example of expression.
Fig. 5 is the key diagram of a kind of middle national language diacritic coding of the expression embodiment of the invention, and Fig. 5 (a) is the key diagram of expression index code first byte diacritic coding, and Fig. 5 (b) is the key diagram of expression index code second byte diacritic coding.
Fig. 6 is the key diagram of the middle national language diacritic sign indicating number order of the expression embodiment of the invention.
Fig. 7 is the process flow diagram that index code handling part operation of the present invention is shown.
Fig. 8 is the key diagram of the dictionary formation of the same embodiment of explanation the present invention.
The explanation of symbol
11 is input part, and 12 is the index code handling part, and 13 is storage part, 131C, 132R, 135H are respectively register, and 133A, 134B, 136BC are respectively impact damper, and 1361 is the word field, 1362 is the usage frequency field, and 14 is the code character search part, and 15 is index stores portion, 16,33 be respectively dictionary, 17 is converter section, and 18 is efferent, 31 is tripping device, 32 is comparable device, and 34 is comparison means, and 35 is output unit.
Fig. 4 is a kind of index code arrangement plan of an embodiment among the present invention.The initial consonant and the tone of national language pronunciation in each shown in Fig. 5 (a) are configured in first byte together, simple or compound vowel of a Chinese syllable shown in Fig. 5 (b) and referral letter are configured in second byte together, the pronunciation with a literal is converted to index code in this way.With pronunciation
[outer 1]
Be example, just understand at once that with reference to Fig. 5 corresponding index code is 3306H.Just can at an easy rate diacritic be converted to index code according to this configuration mode with regular structure.And the information such as frequency of word can be deposited among high 2 of second byte, thereby use during for conversion.
Fig. 1 is the system chart of an embodiment in the device for language reproduction of the present invention.11 is the input parts that can import the pronunciation symbol sequence of random length among Fig. 1.The 12nd, the pronunciation symbol sequence of being imported is converted to the index code handling part that the index code of usefulness is retrieved in confession.The conversion process of index code handling part 12 is shown in the process flow diagram of Fig. 7, to give the middle national language diacritic of one of each sequence valve shown in Figure 6, by judging simply and calculating, just the diacritic of being imported is converted to index code.Here be the conversion process that example describes index code in detail with pronunciation (outer 1).With reference to the sequence valve of Fig. 6, initial consonant
[outer 2]
Be No. 10 of initial consonant order, the 1st is No. 0 of tone order, thereby the numerical value of first byte is converted as follows.
01H+10·5+0=33H
Simple or compound vowel of a Chinese syllable
[outer 3]
Be No. 0 of simple or compound vowel of a Chinese syllable order, referral letter
[outer 4]
Be No. 2 of referral letter order, thereby the numerical value of second byte is as follows.
04H+0·4+2=06H
The index code of pronunciation (outer 1) should be 3306H in sum.
Among Fig. 1, code character search part 14 can be retrieved the above-mentioned index code of changing corresponding to the corresponding code character in the dictionary of index code by cross index storage part 15 as the retrieval key.The detailed construction of dictionary 16 as shown in Figure 8, form code character one to one with each pronunciation of index stores portion 15, stored the kanji code of the index code of the whole words that begin from corresponding pronunciation and each literal of word in each corresponding code character and with the retrieval code series arrangement dictionary 16.Set multistage word usage frequency in the index code.Inserted the separation key that shows the short word that contains replicated literal in the long word in the kanji code.
In the present embodiment, the usage frequency of word is set in the last byte of word index sign indicating number on no high 2.Divide and make the most frequently used, commonly used, the level Four of using always, be of little use, respectively corresponding positions is configured to binary one 1,10,01,00.With word
[outer 5]
When (welcome) was example, its respective index sign indicating number of processing by index code handling part 12 was " 3326H 6b31H ".This word is a most frequent word, and the respective word number of words is 2, thereby the i.e. b of nybble of last byte 0And b 1On be set at 1 and show it is most frequent word.So everybody configuration following (table 1) of index code is described,
[table 1]
Become " 3326H 6bF1H ".
In national language call the few word of number of words to some extent and can be included in feature in the many words of number of words.For example, contain the word of " legislation " in " legislature ", and contain this two words in " legislation president " word.If this word-building characteristic of national language in utilizing, by separating that key is separated each word and only that number of words is the longest word is stored in the dictionary, just can be on the basis that retrieve proper word at a high speed the memory capacity of saving dictionary.It is as described below by the separation of "~" separation key to resemble " legislation president " so long word.
[outer 6]
Legislation " institute " is long
According to the conversion regime of index code, this word just as shown in Figure 6 with
" 2,701 1284 6da7 496c legislation " institute " length "
Form be stored in the dictionary 16.
Each literal accounts for 2 bytes in the kanji code, and "~" symbol only accounts for 1 byte.And because of " legislation ", " legislature ", " main method president's " frequency is set at commonly used, commonly used respectively and is of little use, thereby the configuration of the position of each index code is stored as (table 2).
[table 2]
Among Fig. 1, the 13rd, the storage part that constitutes by register and impact damper.The number of words of the word sequence that 131 storages of C register will be retrieved.The usage frequency of R register 132 storage index code transformation results and the word that retrieves.The index code that 133 storages of A impact damper convert the pronunciation symbol sequence of being imported to.For the ease of from long word, selecting short word, also the kanji code of the corresponding word that is retrieved is write A impact damper 133.B impact damper 134 is stored the index code of the corresponding word of the dictionary of object as a comparison.135 storages of H register are as the word number of the candidate word that retrieves.BC impact damper 136 is divided into word field 1361 and corresponding usage frequency field 1362, and stores whole candidate words and corresponding usage frequency respectively.The 15th, the index stores portion of the index code of 1335 text pronunciations of national language and the correspondence code group address in the dictionary 16 corresponding in the storage with corresponding pronunciation.The 14th, with the index code that converts to by index code handling part 12 as the retrieval key, and retrieve corresponding code character by the comparison process of index stores portion 15, and whole words that will be begun by the corresponding pronunciation of corresponding code character are stored in code character search part in the BC impact damper 136 as the candidate word of conversion.
The 17th, be converted to corresponding Chinese character sequence according to index code with reference to dictionary 16, or show one by one successively that after the user presses transfer key the candidate word that is stored in the BC impact damper 136 supplies the converter section of user's selection again.The 18th, the efferent of output transformation result.
For the embodiment device for language reproduction of the invention described above, one side describes in the face of conversion operations of the present invention with reference to the treatment scheme one of Fig. 2, Fig. 3.
At first will deposit in the B impact damper 134 from the pronunciation symbol sequence of input part 11 inputs.In case judge that the operation of index code handling part 12 is carried out in the processing that enters S6, and the pronunciation symbol sequence of being imported is converted to index code when being execute key, in register 131, set the length of index code when depositing in the A impact damper 133.If there is again shift key just to carry out the conversion process of the candidate word of S27.
The processing one of S6 finishes, and just enters the processing of the code character search part 14 of S7, and the concordance list of cross index storage part 15 is read the corresponding code character in the dictionary 16.And the processing that enters converter section 17.At first the H register 135 that will store the number of corresponding word through S8 puts 0.And after the index code of S9 each word in corresponding code character deposits in the B impact damper 134, compare with the index code that deposits A impact damper 133 in.Because the even bytes of the index code of comparison other has been set usage frequency, thereby relatively the time, be necessary to shield in advance the even bytes of the index code of B impact damper 134.Relatively if the index code of A impact damper 133 all is comprised in the index code of B impact damper 134, just judge in BC impact damper 136, whether to deposit corresponding word in No. 1 to the processing of S16 by sequentially by S13.If the corresponding word of number one is arranged, just carry out the processing of S17 to S20, when the setting of H register increases progressively, the related words that will begin from the number one pronunciation of the pronunciation symbol sequence of corresponding input is read in the BC impact damper 136 one by one, repeats processing from S20 to S22 up to the longest word of reading related words.Judging the A occasion identical with B through S13, or judge through S15 and to be stored in the identical occasion of the length separated in the BC impact damper 136 and C register 131, the separation key is directly removed in processing by S18 from the kanji code of corresponding word, and deposit in the word field 1361 of H project of BC impact damper 136, and usage frequency information is deposited in the usage frequency field 1362 of H project of BC impact damper 136 through S19.
Then whether S23 judges the corresponding word of is-symbol index code.If corresponding word just is stored in the corresponding word of the 0th project of BC impact damper 136 to efferent 19 outputs through S24.At that time, S26 is by the prioritizing candidate word group relevant with corresponding word of usage frequency behind the first word length.When importing again shift key, by S27 to S30 according to being stored in the value of the candidate word group in the C register 131 and the total H of candidate word, change candidate word group one by one and on screen, show.
With middle national language pronunciation symbol sequence
[outer 7]
(legislature) is example, and one side is described in detail in the face of the operation of the device for language reproduction of the following embodiment of the invention with reference to Fig. 2, Fig. 3 one.
Be explanation preferably, the value of establishing C register 131, R register 132, A impact damper 133, B impact damper 134, H register 135, BC impact damper 136 respectively is C, R, A, B, H, BC.
When importing corresponding pronunciation symbol sequence, after will depositing in the B impact damper 134, be converted to the index code of retrieval usefulness as shown in Figure 7 by index code handling part 12 by the sounding symbol sebolic addressing of input part 11 inputs.Result after the conversion is " 2,701 1204 6d27 ".This index code deposits A impact damper 133 in.Then by code character search part 14 cross index storage parts 15 read in the dictionary 16 with the corresponding corresponding code character of index code, the index code of the word in one by one should the correspondence code character deposits B impact damper 134 in.And compare with the index code that deposits in the A impact damper 133.As shown in Figure 8, when the index code " 2,701 0301 6C45 " (Libya) with first word read in the B impact damper 134, high 2 of the byte of shielding even number compared with A.A is not comprised by B, and the code value of B is just little than A, thereby more suitable word is in the back, compare so continue to read next word, with the index code of " unable to do what one wishes " of back and " upright life " two words relatively back B do not contain A yet.Shielding index code " 2,701 1284 6da7 496c " even bytes after this high 2 and when comparing, preceding 6 bytes be consistent be that B contains A, thereby in BC impact damper 136, deposit kanji code " legislation " institute of the respective index sign indicating number of B in " length ".The separation length that deposits the kanji code in the BC impact damper 136 in is set to 4,6 and maximum number of words 8, thereby when comparing with the value of C register 131,6 setting is consistent with C thereby just can judge the word that contains three numbers of words.So just remove the later word sequence of the 6th byte of this 8 byte word, and in the word field 1361 of the 0th project of BC impact damper 136, deposit the word sequence of " legislature " in, and " legislature " is common word, thereby when the usage frequency field to the 0th project of BC impact damper 136 deposits high 2 " 10 " of the 6th byte in, H register 135 is added 1.According to this word length 6 and the longer word of maximum length 8 is arranged, thereby in C register 131, set 8.In the 1st project of BC impact damper 136, deposit longer word " legislation president " and usage frequency (more commonly used) in.Then the index code of word " cube " is bigger thereby finish the operation of converter section 17 than index code.Just can be according to the signal of the 0th project of BC impact damper 136 word " legislature " and usage frequency by the correct result of efferent 18 outputs.
Just then show one by one by the identical word of word length, length when at that time, the user imports shift key again by the order of frequency height.In above-mentioned example, when importing shift key, converter section 17 is to the candidate word " legislation president " and the frequency of efferent 18 output BC impact dampers 136 again.Can not obtain needed word even do not import the whole of pronunciation symbol sequence of long word by this way yet.Just can improve input speed.
The present invention is not limited only to the foregoing description, also can suitably be out of shape in the scope that does not change main idea and implement.For example, the symbol of input be not limited to diacritic also can be with simple and easy storehouse (a word used in person's names) symbol.And the transformation rule of the index code of incoming symbol is also unqualified, if also passable according to making amendment like that of index code corresponding tables defined.And change the dictionary content as required and also establish relation.For example the position of the usage frequency of the word that is used for the foregoing description is transformed into the setting of the position of expression list separator, just can make usage frequency how hierarchical.In other words, the list separator that the long word of expression can be contained short word is set in its 1 that is used to set usage frequency originally, and the byte that was used for setting list separator originally is to avoid obscuring with the kanji code of national language removing high 1 other 7 whole setting usage frequencies.Just can define 128 grades at most at that rate.In this case, just can successfully carry out as long as revise the comparison means of converter section.Also have as the searching object of being selected candidate word function by converter section one by one and be not only dictionary, learning files, abbreviation dictionary, special-purpose dictionary etc. are as long as the word of corresponding code character deposits register in and just can adopt in the dictionary that can be found by the code character search part.Aforesaid way all belongs to the present invention.
According to device for language reproduction of the present invention described above, by usage frequency is set in index code not usefulness the position on, on kanji code, set the long word to contain short word and divide the list separator of making each word, with the existing method memory capacity of specific energy saving dictionary mutually.And can improve retrieval rate.
And since the related words that begins from first pronunciation of input pronunciation sequence as the candidate word, then deposit candidate word group in register by word length or same length by the order of usage frequency height, thereby when the user presses shift key again, just can show one by one that the word group who begins from corresponding pronunciation selects for the user.And since storage be the longest word, need not all import needed word sequence, thereby can improve the speed of input and conversion.Conversion practicality for Chinese characters such as middle national language, Japaneses is very strong.

Claims (1)

1, a kind of device for language reproduction is characterized in that it comprises: the kanji code of the index code of word and corresponding character sequence is stored in order and divided and make code character, with the usage frequency of word, the long word that contains short word is divided into the information such as separation key of each word, be configured in the dictionary in index code or the kanji code respectively; The forward part of the index code imported is retrieved the code character indexing unit of corresponding code character by the starting shift key in above-mentioned dictionary as the retrieval key; The index code imported is retrieved corresponding word as the retrieval key from the corresponding code character that is retrieved, or by starting again the conversion equipment that shift key retrieves the usage frequency of the longer word that contains corresponding word and these words.
CN92102017A 1992-03-24 1992-03-24 Device for language reproduction Expired - Fee Related CN1040702C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN92102017A CN1040702C (en) 1992-03-24 1992-03-24 Device for language reproduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN92102017A CN1040702C (en) 1992-03-24 1992-03-24 Device for language reproduction

Publications (2)

Publication Number Publication Date
CN1077545A true CN1077545A (en) 1993-10-20
CN1040702C CN1040702C (en) 1998-11-11

Family

ID=4939421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN92102017A Expired - Fee Related CN1040702C (en) 1992-03-24 1992-03-24 Device for language reproduction

Country Status (1)

Country Link
CN (1) CN1040702C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9573765B2 (en) 2011-08-23 2017-02-21 Siemens Aktiengesellschaft Belt-conveying installation, method for operating the same, and use thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62174867A (en) * 1985-10-16 1987-07-31 Nec Corp Chinese character input device
JPH0656609B2 (en) * 1985-10-18 1994-07-27 日本電気株式会社 Chinese input device
JPH0721797B2 (en) * 1987-02-13 1995-03-08 シャープ株式会社 Chinese input device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9573765B2 (en) 2011-08-23 2017-02-21 Siemens Aktiengesellschaft Belt-conveying installation, method for operating the same, and use thereof

Also Published As

Publication number Publication date
CN1040702C (en) 1998-11-11

Similar Documents

Publication Publication Date Title
EP0584992B1 (en) Text compression technique using frequency ordered array of word number mappers
CN1151424C (en) Apparatus and method for inputting ideographic characters
CN1008016B (en) Imput process system
CN1071522A (en) Chinese speech characters/Chinese converting means and method
JP2000517086A (en) Generate full hash using offset table
CN1095560C (en) Kanji conversion result amending system
JPH0675994A (en) Device for collating character string
CN101739142A (en) Five-stroke input system and method
CN1040702C (en) Device for language reproduction
CN1068688C (en) Literal information processing method and apparatus
CN1041356C (en) Device for digital search
CN101206665B (en) Multilingual words information searching method
CN1186708C (en) Chinese characters inputting method and its apparatus
CN1106146A (en) Computer input method by computer Chinese-character phonology-tone coding and its keyboard
CN1269542A (en) Association Chinese character input system
CN1057346A (en) A kind of screen selection Chinese character input method
CN1307273A (en) Intelligent phonetic input system and method
JPH0227423A (en) Method for rearranging japanese character data
CN1080903C (en) Method for distinguishing keyboard entried Chinese characters from English
CN1127010C (en) Chinese character graphic code
CN1048341C (en) Fuzzy character transtormer
CN1043542C (en) A kanji conversion apparatus
CN1043381C (en) Four-stroke digit look-up method for Chinese characters
SU1559367A1 (en) Electric dictionary for studying foreign language
CN1178344A (en) Four tone inputting method for Chinese characters

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C15 Extension of patent right duration from 15 to 20 years for appl. with date before 31.12.1992 and still valid on 11.12.2001 (patent law change 1993)
OR01 Other related matters
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee