CN1081523A - Dual spelling Chinese words coding method and keyboard thereof - Google Patents

Dual spelling Chinese words coding method and keyboard thereof Download PDF

Info

Publication number
CN1081523A
CN1081523A CN 92105929 CN92105929A CN1081523A CN 1081523 A CN1081523 A CN 1081523A CN 92105929 CN92105929 CN 92105929 CN 92105929 A CN92105929 A CN 92105929A CN 1081523 A CN1081523 A CN 1081523A
Authority
CN
China
Prior art keywords
chinese
bors
oeuveres
words
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 92105929
Other languages
Chinese (zh)
Inventor
梁晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 92105929 priority Critical patent/CN1081523A/en
Publication of CN1081523A publication Critical patent/CN1081523A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

A kind of dual spelling Chinese words coding method that is used for the Chinese information processing technical field, the Unified coding and the keyboard that mainly solve Chinese information are imported problem.
Major technique feature of the present invention is: according to the conjunction law of Chinese speech and the requirement of keyboard input, Chinese speech band tuning joint is decomposed into the Two bors d's oeuveres vowel, remerging is one group of Two bors d's oeuveres initial consonant code element and one group of Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element, realizes the double spelling coding of Chinese speech band tuning joint and various written forms thereof.Can be used for that all are large, medium and small, microcomputer Chinese information processing system, telex machine, typewriter is in Chinese terminating machine and the Chinese communication system.

Description

Dual spelling Chinese words coding method and keyboard thereof
The invention belongs to the Chinese information processing technical field.
Computing machine is in the widespread use in the Chinese world, must solve this key problem in technology of Chinese man-machine conversation, comprise the voice keyboard typing of Chinese, speech recognition, the keyboard typing of phonetic synthesis and literal, several aspects of Chinese information processing technology such as font identification and font printing.These all will relate to language coding, fail so far to unify to solve.Though the encoding scheme of Chinese character is fairly perfect, there is not a kind of scheme can be accepted as national standard and extensively implementation yet.
At present, existing nearly thousand kinds of Hanzi coding schemes are come out one after another, and are broadly divided into three types of the mixed sign indicating numbers of graphemic code, phonetic code and font voice.Graphemic code is a coding basis with the body characteristics of Chinese character, and is the most typical with " optimizing the Five-stroke Method compiling method and keyboard (patent of invention CN85100837) thereof ".But because Hanzi structure more complicated, cause the compiling method of font code also very complicated, the operator must learn the group code of divining by means of characters, and increases the workload of learning burden and brain, make computer application also keep away this intrinsic shortcoming of the hard to tackle complexity of unavoidable Chinese character, be unfavorable for the reform of Chinese character and the modernization of Chinese.In fact, five-stroke character form Chinese character coding method only is applied in professional domains such as typewriting, printing and statistics, and general personnel are difficult to study and grasp, even the technician of computer major is also forbidding.
The mixed sign indicating number of font voice has based on font, also has based on phonetic code, adopts the intersection feature as coding basis, and what have hard to tackle this factor of font, also ideal not to the utmost.
Phonetic code should be optimal, because have only phonetic code to be only the essential characteristic sign indicating number of language, just matches based on the mode of thinking of voice with people.The common ground of spoken word and written language is also only on the identical this point of voice; Simultaneously, written language only is the record symbol of spoken word, and literal has the leeway of change, and voice are more stable.With regard to the intrinsic advantage of Chinese, Chinese characters spoken language is fairly simple, should be used.From coding rule, the code fetch that has only phonetic code is according to having by oneself, and code taking method also is easy to learn.In addition, also have the Chinese phonetic alphabet to popularize this social base for many years.
Character shape coding can not be used for voice coding, and voice coding can expand to literal code.Therefore, the compiling method that only is based upon on the speech basic just can become unified Chinese words coding method.
The existing voice compiling method is mainly at Chinese character, generally not Chinese speech as a coding target, just as the foundation and the intermediary of encode Chinese characters for computer.With " Scheme for the Chinese Phonetic Alphabet " is the various compiling methods of direct coding foundation, and the coding that obtains a complete speech syllable all is no less than 3 times, also will waste the closely main space encoder of half.Must import sound, rhyme, tone one by one as " Chinese characters spelling computer keyboard (CN85102628) " and just constitute a complete syllable.In order to reduce the coding number of times, to shorten code length, most variations has been given up tone and has directly been entered next coding level, i.e. so-called " initial and final double-spelling " departs from mutually with the voice of reality, and cause the increase of unisonance sign indicating number, increased difficulty for further separating.The Pinyin coding that is widely adopted " diphthong coding input system " also fails to address this problem.
" two and half Chinese spelling symbols with all information compiling methods (CN86106542) " incorporated the information of tone among the font code information into, also do not have remarkable advantages.And " double-pass key Chinese keypad and double-pass key Two bors d's oeuveres four tones Chinese character input (CN88104949.2) " changed existing stroke mode, the key position be divided into light, weigh 2 grades, though can hit complete syllable of 2 key inputs, increased the difficulty of system cost and keyboard operation, be difficult to apply.
In a word, also do not have a kind of scheme on QWERTY keyboard, promptly to import a complete syllable for 2 times by keystroke, promptly realize real Chinese double-spelling.
The objective of the invention is to solve the Unified coding problem of Chinese information processing technical field, realize the real Two bors d's oeuveres of Chinese speech coding, a kind of efficient, simple and easy and practical dual spelling Chinese words coding method is provided, and then expands to Chinese character coding method, make the information processing of Chinese convenient.
Chinese speech constitutes the initial consonant of the corresponding Chinese phonetic alphabet, simple or compound vowel of a Chinese syllable and tone by three key elements of sound, rhyme, tone.23 initial consonants (merging of no initial consonant is considered as " zero initial "), 35 simple or compound vowel of a Chinese syllable and 5 tones are arranged, form nearly 1300 Chinese speech syllables.How obtaining the coding of these 1300 syllables, is the key point of voice coding.
1300 syllables directly are distributed on the big keyboard, can obtain the effect of one key for one character.If by square arrangement, need 36 row * 36 row at least.For the ease of retrieval, must make each syllable relevant with the row and column of its key position, place, preferably realize the Two bors d's oeuveres of row and column, this is close with Two bors d's oeuveres requirement on the universal small keyboard.
On universal small keyboard, realize Two bors d's oeuveres, just dividing the row and column on the big keyboard 2 correspondences to come out on the keypad.The main key position of universal small keyboard is generally 4 row, 10 row, realizes that this correspondence is unchallenged in quantity, and 1600 combinations are arranged after all, and is also more than 1300.Key issue is to make this correspondence have stronger regularity, is convenient to memory and application.
The present invention has at first established the target of Two bors d's oeuveres, by analysis of Chinese language syllabary, just finds out certain rules, thereby realizes Two bors d's oeuveres.
The Chinese speech joint table of not distinguishing tone has 415 no tuning joints, and 23 initial consonants and 1 " zero initial " can have 840 kinds of sound combinations with 35 simple or compound vowel of a Chinese syllable, and invalid combination reaches 425.These invalid combination form the room in syllabary, and the Cheng Fangcheng piece is very regular mostly, and as j, q, x and g, k, h, both rooms are just in time complementary.Illustrate that initial consonant and simple or compound vowel of a Chinese syllable have selection and assembly mutually.Thus, can generally can be divided into: b, p, m initial consonant by whether piecing together with identical rhythm parent phase and divide into groups; D, t; N, l; G, k, h; J, q, x, y; Zh, ch, sh; Z, c, s; F, r, w and zero initial.Except that the several special cases of n, l, each initial consonant can be combined into 20 no tuning joints at the most, and r, f, w and zero initial can only be risked about 10.The rhythm parent phase is pieced together headed by j, q, x and the ü, just in time can fill up Z, C, the corresponding room of S.According to these rules, can be decomposed into the Two bors d's oeuveres vowel to Chinese speech band tuning joint, remerging is one group of Two bors d's oeuveres initial consonant code element and one group of Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element, and obtains an initial and final double-spelling and do not have the accent syllabary, sees Fig. 1 to Fig. 4.Concrete rule is as described below:
Separating with the initial consonant that rhythm parent phase headed by i and the u is pieced together, add one " ' " number differentiation, again incorporating in the aforementioned initial consonant with the initial consonant that rhythm parent phase headed by the ü is pieced together, j ü and Z ' merging are designated as z ' j, q ü and c ' merging are designated as c ' q, x ü and s ' merging are designated as s ' x, can be designated as fy with initial consonant y and the f merging that ü and ü e piece together mutually, can be designated as ch ' y with initial consonant y and the ch ' merging that ü an and ü en piece together mutually; R ' is designated as p ' r ' with p ' merging, and fu is that f ' has only one, can directly incorporate among the f.Thereby form the Two bors d's oeuveres initial consonant code element about 40.
Generally can not be combined into one group, i.e. ang, iang, uan with the simple or compound vowel of a Chinese syllable that Two bors d's oeuveres initial consonant code element is pieced together simultaneously mutually; An, ü an, ian, ua; Ai, ia, u; A, iu, ui; Ong, o, iong, uo; E, ei, i; En, in, un; Totally 10 groups of eng, er, ü n, ing, ü eng and ou, ü e, ie, uang.Every group is divided into 4 by high and level tone, rising tone, last sound, falling tone again, then incorporates high and level tone (also can incorporate among other tone) softly into.This just forms 40 Two bors d's oeuveres simple or compound vowel of a Chinese syllable code elements.
Two bors d's oeuveres initial consonant code element and Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element are formed Two bors d's oeuveres vowel code element, can risk the band tuning joint of most Chinese speechs.The syllable that several stacks are arranged is less and number of words is also few because of probability of use, can directly merge, and can also be arranged in special memory on the idle bit position of Two bors d's oeuveres in case of necessity.In whole Chinese band tuning joints of GB2312-80, having of overlapping syllable appears during Two bors d's oeuveres: di ǎ-d ǔ, g ě-g ě i, h ē-h ē i, li á ng-lu á ng, li ǎ ng-lu ǎ n, li à ng-lu à n, lao-l ü, l ǒ u-l ü ě, l ò u-lue, ne-nei, n ǎ o-n ǔ, n à o-n ù, n ó u-n ù e, sh é-sh é i, y ō-y ō ng, z é-z é i, zhe-zhei, p ì-r ì, p ì n-r ù n, amount to 22, be because the merging of initial consonant or simple or compound vowel of a Chinese syllable produces, account for 1.7% of 1271 of Two bors d's oeuveres syllable sums, for general application, needn't separate again or special memory.If necessary, the syllable group that can both keep merging, again one of them syllable is arranged on another idle bit position, as gei is enrolled g ' ei, hei enrolls h ' ei, and l ü enrolls len, luan enrolls b ' uan, and nuan enrolls m ' uan or the like, corresponding with initial consonant or simple or compound vowel of a Chinese syllable on the idle bit position, place, convenient as much as possible memory.
Certainly, determining of Two bors d's oeuveres code element is not unique, and other method can also be arranged; Also can increase and decrease a little the quantity of piecing together code element; Increase and to simplify classification and be convenient to memory, subtract and to dwindle symbol space and improve the code element utilization factor.Total principle is, should realize Two bors d's oeuveres, is convenient to again sort out and memory, also will be convenient to use on universal small keyboard.For example, just can the vowel that not be easily distinguishable be merged, the Two bors d's oeuveres code element is further reduced according to the pronunciation difference for the southerner.
After the Two bors d's oeuveres code element was determined, the purpose of Two bors d's oeuveres had also just reached.Two bors d's oeuveres is the Two bors d's oeuveres of Chinese speech syllable, just a kind of two yuan of decomposition and two yuan of amalgamation forms of Chinese speech syllable, characteristics be have only a level deciliter, should use very convenient.The double spelling coding space has 1600, and it is 1271 at least that effective syllable takies, and utilization factor is 79.4%, is the highest in the voice coding.Unnecessary space can make symbols such as numeral commonly used, letter, punctuate also enroll the double spelling coding scope for expanding.
In the Chinese information processing technical field, double spelling coding can use as the internal code of computing machine is unified.Double spelling coding can be directly adopted in the keyboard input of voice and phonetic class literal; Speech recognition can be decomposed voice by the Two bors d's oeuveres principle, otherwise presses Two bors d's oeuveres principle synthetic speech; The identification of alphabetic writing and printout can be handled by the classification of Two bors d's oeuveres principle; The information processing of Chinese character then can be regarded the more profound of voice messaging as.Prior being, double spelling coding can be as the universal coding of these technology, so that phase coadaptation and conversion mutually are the coordinated development of Chinese computing machine and the unified encoding condition of integrated application creation of every technology.
In addition, double spelling coding also can be used for other field, as Chinese shorthand, Chinese braille, Chinese finger spelling or the like.Chinese double-spelling still is a direction of the reform of Chinese characters, can at first use on computers at least, makes the Chinese man-machine conversation of computing machine more convenient.
The major advantage of Chinese double-spelling coding still is embodied in the keyboard input of Chinese information, comprises the keyboard typing of the Chinese phonetic alphabet class literal and the Chinese character of Chinese speech, the Chinese phonetic alphabet and other written form.At this, in the language message that we comprise them, the publicly-owned feature of speech syllable is treated respectively and exclusive separately feature is placed on next step as unique discussion object.The statistical data of these speech syllables all from written Chinese, is come by the statistical data conversion of Chinese character.We the phonetic Chinese character of GB2312-80 as coded object, " Modern Chinese everyday words word frequency dictionary (sound preamble section) " (Yuhang Publishing House, June nineteen ninety first published) as main statistics foundation.
The Two bors d's oeuveres code element that we determine now amounts to 80, can be arranged in just on the key position that promptly 4 row 10 are listed as, major key position of universal small keyboard.A Two bors d's oeuveres initial consonant code element should be represented in each key position, represents a Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element again, and the sequencing during by each syllable input is distinguished.The key position of 4 row, 10 row is provided with, and is the boundary that staff ten refers to operation keyboard, surpasses this quantity, and the mode of operation that has just entered big keyboard is unfavorable for keying in fast and touch system.The keyboard Designing of existing coding techniques is though have various layouts such as 26 or 36 keys, the still pattern of 4 row, 10 row in essence.Though lastrow numerical key does not directly adopt, and still often uses when screen prompt is selected.For the ease of touch system, use numerical key less as far as possible, should note during this design.
The Two bors d's oeuveres code element need be arranged on the key position.Because staff ten refers to the keystroke speed difference, be forefinger, middle finger, the third finger, little finger of toe and thumb generally, and common people's right hand is faster than left hand by fast extremely slow order; The probability of use of each code element also has difference during Chinese information processing, and this just requires the high frequency code element is arranged on the fast key position of keystroke speed.Fig. 1 to Fig. 4 has provided the probability of use when each syllable is regardless of tone.The syllable probability of use sum of identical initial consonant code element is exactly the probability of use of this initial consonant code element; Equally, the syllable probability of use sum of identical simple or compound vowel of a Chinese syllable code element is exactly the probability of use of this simple or compound vowel of a Chinese syllable code element.This is the main foundation of design key position.
The regularity of Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element is very strong, should at first arrange the key position.Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element has 10 groups and is not with the accent code element, and they respectively have high and level tone, rising tone again, go up sound and 4 main tones of falling tone, and is just in time corresponding with 4 row, 10 row of key position.Therefore, the simple or compound vowel of a Chinese syllable code element should the same simple or compound vowel of a Chinese syllable of same column, colleague's same tone, is convenient to memory like this.We do not determine landscape layout with probability distribution of transferring the simple or compound vowel of a Chinese syllable code element and the similarity relation between them by every group; Probability distribution and mutual relationship by each tone are determined the vertical array mode.The upside of keyboard layout and left side with dashed lines frame have provided this arrangement among Fig. 5.
It is difficult that the keys arrangement of Two bors d's oeuveres initial consonant code element is wanted.Can be merely by they probability of use and grouping relation and they in the series arrangement of Chinese phonetic alphabet table or initial consonant table, be convenient to memory like this.Yet for the ease of international and enjoy existing english software achievement, the keyboard layout that still takes into account universal small keyboard is more necessary.
In 26 letter keys of universal small keyboard, close with letter and probability of use order with the initial consonant code element, major part can keep, and remaining just can be provided with separately.Fig. 5 is our preferred scheme, and the center of each key bitmap is original symbol, and its upper left side is a Two bors d's oeuveres initial consonant code element, and its lower left corner provides the probability of use of corresponding initial consonant code element.After 10 numerical keys were occupied, input digit can be carried out on the special digital key of the right side of extended pattern universal small keyboard.Chinese figure is higher because of frequency of utilization, going up most on row's numerical key after can being arranged in S and keying in, and except that digital 0 correspondence " ten ", all the other are by digital meaning correspondence.
Because the probability of use sum of preceding 40 high frequency syllables is up to 24.5%, definition one-level brevity code is necessary.These high frequency syllables need corresponding with a high frequency monosyllable, and the one-level brevity code is exactly an one-level brevity code speech.For the ease of corresponding and simplification memory mutually, choosing high frequency one words that contains the initial consonant code element is the one-level brevity code.The lower right corner of each key bitmap promptly is among Fig. 5.The one-level brevity code only need be keyed in this key and add a space again and get final product.
The key entry because Two bors d's oeuveres initial consonant code element matches with Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element except that the one-level brevity code, consider whether this collocation makes the number of times of right-hand man's consecutive operation more more.Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element is separated by the right-hand man, cut the probability of use of one-level brevity code, calculate the probability of use of piecing together the initial consonant code element with it mutually respectively, through finding that relatively the initial consonant code element that has is fit to left hand key position, the then suitable right hand key position that has.In order to continue to use the keyboard layout of universal small keyboard, adopted the horizontal order of simple or compound vowel of a Chinese syllable code element shown in Figure 5 rather than opposite.
Double spelling coding keyboard status in the present invention holds the balance, and can only provide this reference scheme of Fig. 5 here, and its final layout needs many people practical application and lot of data statistics for many years to determine.As considering the relevant statistical data of Chinese characters spoken language, could be fit to needs widely.Because the input of the voice keyboard of Chinese characters spoken language is the highest to the requirement of input speed, and relevant statistical data seldom.
Utilizing the dual spelling Chinese words keyboard, can import the multilingual information of Chinese, put it briefly, is exactly the keyboard input of voice, phonetic and Chinese character.
The keyboard input of Chinese speech is a kind of input mode that interrelates with speech recognition and replenish mutually.Voice are made of the syllable string, as long as one by one import syllable, just can get off voice record.Double spelling coding is the coding of syllable, as long as make coding and syllable corresponding one by one, eliminates other overlapping yard, avoids the ambiguity of encoding, and just can directly import with spelling keyboard.Because the double spelling code of syllable is isometric two yards, can eliminate blank character between the syllable, by the automatic syllabification of computing machine, so can reach very high input speed.General full-time typist's keystroke speed be 400 keys/minute, then the input speed of voice can reach 200 syllables/minute and the expression speed of Chinese characters spoken language be close, this just can be used on the real time record of Chinese characters spoken language, realizes the computerization of Chinese characters spoken language shorthand.Syllable after the input can show with the single syllable string of the Chinese phonetic alphabet, store and print, and also can use voice output.If be furnished with voice automatic word segmentation system, can be converted into Chinese phonetic alphabet word; The work point speech of can also choosing is until being converted into the Chinese character file.
The Chinese phonetic alphabet class literal of the Chinese phonetic alphabet and other written form can directly be imported with double spelling coding and spelling keyboard.Said other written form is meant the phonetic class literal such as International Phonetic Symbols written form of Chinese shorthand, Chinese braille, Chinese.Their great majority all participle are disconnected empty, and code length does not wait, and need distinguish with the space.If 4 codings of bisyllable sign indicating number that makes it a rule, space of the same benefit that less than is 4 yards, unnecessary 4 yards only gets 4 yards without exception, also can reduce stroke.This class input can be referred to as the phonetic input, is mainly used in the mutual translation of different literals form, Chinese speech teaching, fields such as Chinese program design.If the dual spelling Chinese words coding is developed into the Two bors d's oeuveres literal, so this phonetic input will become main Chinese input form.The phonetic input is the more profound of phonetic entry, has increased the group speech attribute of syllable, has comprised abundanter language message.
The widespread use of double spelling coding then is the coding and the keyboard typing thereof of Chinese character.Chinese character is the wirtiting form of Chinese, and existing history remote has numerous users again, is the maximum literal of number of users in the world.The speed of Hanzi keyboard input is all influential to the whole world, and very little progress a bit all can be saved great amount of manpower and supplies consumption.Therefore, the Two bors d's oeuveres input method of Chinese character also is an emphasis of the present invention.
6763 Chinese characters among the GB GB2312-80,1302 of total syllables, wherein light tone syllable is 36,1266 at band tuning joint.Also have 1250 syllables merging the back with high and level tone softly, average 5.3 words of each syllable, it is very uneven press the number of words that syllable distributes, but it is afterwards always better than " initial and final double-spelling " of not distinguishing tone to distinguish tone, and the quantity of stress sign indicating number obviously reduces.After can pressing the double spelling coding grouping to Chinese character, add a sequence number difference phonetically similar word again.These words can be arranged by the principle of priority of high frequency, by the sequence number prompting, select with numerical key on screen, then import high frequency word when not selecting automatically.This is the most basic by the word code input mode.
Double spelling coding allows to set 40 one-level brevity code words, if choose by the Chinese character frequency order, the accumulative total frequency of utilization of preceding 40 Chinese characters is 24.5%; If corresponding with the Two bors d's oeuveres initial consonant code element again sound of pressing is selected, the accumulative total frequency of utilization of 40 one-level brevity code words can reach 21%.The one-level brevity code only hits the major key position one time, adds a space again and gets final product, and code length is 2.Also can choose the one-level brevity code by Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element.
Double spelling coding allow to be set at least 1250 monosyllabic secondary brevity codes, is brevity code if get the high frequency word of each syllable, then adds up word and frequently can reach 60%, only needs import 2 keys by this word Two bors d's oeuveres syllable coding, adds a space again can import this word, and code length is 3; If setting code length is 2, but also automatic distinguishing.
Because each Two bors d's oeuveres syllable surpasses the few of 10 words, even surpass 10 words, its word is frequently also very low, so under the general application conditions, only need add a sequence number more just can import most Chinese characters.Indivedual syllables that surpass 10 words can be set up page turning key and continue to select, and perhaps corresponding one by one each word and 40 key positions, keystroke once can be selected, and code length is 3; If setting code length is 2, then to increase the sign key that enters selection mode, code length becomes 4.
The minimum average dynamic code length of this compiling method is: one-level brevity code 0.42, and secondary brevity code 1.2, all the other Chinese characters 0.76 amount to 2.38, are that general compiling method is unapproachable.
Double spelling coding also can be designed to the associative Chinese character input mode, and Chinese character is constituted unit, the range of choice of dwindling connective word according to the composing law of statement as one in the statement.This and people's the mode of thinking is close, and makes computing machine have the decision-making ability in advance of word input.The principle of design of this Two bors d's oeuveres associative Chinese character input method is, order according to the word frequency height is set at high frequency 1 words to the I and II brevity code, and make unisonance high frequency block be divided into different brevity codes as far as possible, again remaining word is imported within the option of back appearance at brevity code according to the frequency arrangement that is used in combination.
Even more ideal association's input mode is, the input of spelling keyboard and association's prompting are combined, whenever finish the input of a Chinese character, the word of its subsequent words will be subjected to the qualification of last Chinese character frequently, become the space of part Chinese character by the space of whole Chinese characters, word ordering frequently will change, and range of choice is dwindled.Import a syllable this moment again, the option of appearance just greatly reduces.If can draw association by all statement information of importing previously, then the scope of Xuan Zeing is equal to the input of Chinese character and the input of Two bors d's oeuveres syllable after syllable of input even can be reduced into 1 mutually.This is a kind of intelligentized double-spelling Chinese character input method, though take a large amount of calculator memory spaces, highly significant.
Can also adopt other coding method, realize the further differentiation on the Chinese character syllable basis of coding, distinguish mutually as font information by phonetically similar word.In principle, the compiling method that any one forms on the phonetic basis may be used to the Two bors d's oeuveres mode, and always receives better coding effect.
In the prior art scheme, adopt the vocabulary coding to become a kind of development trend, double spelling coding is no exception.Distinguish after the tone, the quantity of homonym obviously reduces, the above speech of 2 words and 2 words particularly, the quantity of its unisonance and frequency of utilization thereof are little of negligible degree.
In the Modern Chinese, the word frequency of 1 words is 57.53%, and number of words accounts for 39.07% of total number of word; The word frequency of 2 words is 39.25%, and number of words accounts for 53.33% of total number of word; The word frequency of multi-character words is 3.22%, and number of words accounts for 7.6% of total number of word.And the word frequency of preceding 1848 high frequency words accumulative total reaches 75%, and wherein 1 words is 863,16 of 2 words, 7 of 4 words.As seen, 1 words that unisonance is more, frequency of utilization is also high, relatively is difficult to handle, and is the difficult point of vocabulary coding.
Another difficult point of vocabulary coding is, the syllable number of speech does not wait, and code length does not wait yet, and is difficult to common appearance.
We can continue to use aforesaid Chinese character coding method and give 1 word coding method, but can not limit code length, and promptly the one-level brevity code is 2 yards, and the secondary brevity code is 3 yards, and remaining is 4 yards, all add the space as the ending sign.Each Two bors d's oeuveres syllable coding that the coding of 2 words is only imported 2 words gets final product, and does not add space bar and finishes automatically.Multi-character words can directly be imported the Two bors d's oeuveres initial consonant code element of preceding 4 words, the not enough simple or compound vowel of a Chinese syllable code element of mending the 3rd word again.When repeated code occurring, add a sequence number again and finish.
The coding of 1 words also can be realized by the coding of 2 words.As " diphthong coding input system " is exactly a kind of.Shortcoming is that 1 words needs 4 yards or 5 yards usually, but can realize touch system.In addition, double spelling coding can also be realized the another kind of scheme of input fast, is unprecedented:
The firsts and seconds brevity code all is designed to the brevity code of high frequency words, and wherein the one-level brevity code mainly is high frequency 1 words; The secondary brevity code mainly is high frequency 2 words, and comprises high frequency 1 words as much as possible, also will be distributed in the high frequency phonetically similar word among the different brevity codes.All brevity codes are all pressed the prefix word Two bors d's oeuveres syllable of high frequency words and are selected.As long as input 2 keys just show a high frequency two-character word and 2 high frequency words that comprised thereof, also can show the identical other of lead-in syllable 2 words simultaneously, and press the principle ordering and the selection of priority of high frequency.Can select full speech, lead-in, tail word or backward 2 words of suggested speech.This selection can once be finished on keyboard,, the row correspondence of major key position pressed in selected speech, and the corresponding different selection mode of the row of major key position, as the corresponding full speech of the 3rd row, remaining row is the selection mode of corresponding lead-in, tail word and backward respectively that is.Because the group speech of high frequency 1 words is very capable, in order to reduce the selection number of times, can be the selection of the also corresponding first syllable same words of one-level brevity code.If require touch system, can abandon selecting, and, draw with the double-tone joint mode of 2 words the speech outside the brevity code, carry out word selection again.This just need add space bar to the secondary brevity code and finish; Perhaps distinguish two kinds of different states of single-tone and double-tone; Also can set isometric 2 yards, and double-tone when input, the centre adds a space bar so that difference, and this can reduce the average dynamic code length on the whole.Compare with the double-tone system, increased the two-tone working state of brevity code, both can hold altogether.This single-tone embodies the working method of double-tone, also can use among other the coding method.
For double-tone and polyphonic word coding, can only get the part code element of Two bors d's oeuveres syllable and encode, for example only get the coding of the polyphonic word of Two bors d's oeuveres initial consonant code element.Sometimes, bisyllable also can only be got the Two bors d's oeuveres initial consonant code element of first sound and the Two bors d's oeuveres initial consonant code element of last or end syllable, adds the Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element of last or end syllable in case of necessity again.
In a word, double spelling coding is a kind of brand-new pronunciation inputting method, and the definite voice of 3 code elements of original needs (sound, rhyme, tone) are reduced to only needs 2 code elements to determine, can save a set of symbols; Perhaps make the differentiation of voice more accurate; Make the Pinyin coding method rise to a new step, established the new status of Pinyin coding method.Compared with prior art, the advantage of double spelling coding method can be summarized as follows:
1, speech syllable of two keys input has reduced stroke; Perhaps, under the situation that is both the input of two keys, reduced the quantity that unisonance is selected.
2, the one-level brevity code is 40, and 1250 of secondary brevity codes account for 80% of whole GB hanzi frequency counts, can shorten the average dynamic code length of encode Chinese characters for computer, improves input speed.
If 3 press Chinese word coding, general 4 keys that only need can be imported 2 words in a bisyllable or the speech, need further to select seldom.
If 4 secondary brevity codes are pressed the bisyllable coding, then can hold most everyday words and everyday character, the accumulative total word frequency reaches 75%, can shorten the average dynamic code length of Chinese character speech, improves input speed.
5, coding rule is simple, does not need the group code of divining by means of characters, and is convenient to learning and memory.
6, have good coding compatibility, can be common to the various aspects of Chinese information processing technical field.
7, professional application, a large amount of consumption of saving manpower and material both be fit to; Can popularize again comprehensively, make general personnel also can obtain input speed faster.
8, can be used for the keyboard input of Chinese speech, specialized input speed can be close with the expression speed of Chinese characters spoken language, can make the Chinese stenographing computerization.
9, be the speech recognition and the phonetic synthesis of Chinese, found an approach compatible mutually, help the coordinated development and the integrated application of these technology with the keyboard input.
10, adapt to the requirements of the times of the reform of Chinese characters, for spelling of Chinese character has found an outlet that adapts with the information age.
11, through constantly improvement and perfect, can force to carry out as national standard, reach the tidemark of language coding.
Most preferred embodiment of the present invention is as follows:
1, adopts dual spelling Chinese words encoded keyboard, Two bors d's oeuveres initial consonant code element and Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element shown in Figure 5.
2, Two bors d's oeuveres initial consonant code element and Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element internal code as computing machine, be arranged in the 10th district of GB2312-80, be common to the various aspects of Chinese information processing technology; In case of necessity, can whole Chinese speech syllables be arranged on these rooms, so that save memory headroom.
3, the Chinese speech syllable be encoded to isometric 2 yards, be put together by a Two bors d's oeuveres initial consonant code element and a Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element, can be directly used in the keyboard input of Chinese speech.
4, the Chinese phonetic alphabet adopts the dual spelling Chinese words coding, imports with the space bar participle by its format write.
5, set up conventional Chinese character input systems such as Two bors d's oeuveres Chinese-character encode system for inputting, peripheral hardware region-position code, telegraph code.
6, Chinese word coding pressed in Chinese character, divides into common and quick two kinds of input states, and compatible mutually with the input of voice and phonetic.
7, common Two bors d's oeuveres Chinese character coding method is included among the fast Chinese character compiling method, is the latter's special applications form.Both are in conjunction with constituting the Two bors d's oeuveres encode method for entering Chinese characters.
8, the one-level brevity code is high frequency 1 words, as shown in Figure 5, only needs key entry correspondent button position and a space to import.
9, the secondary brevity code is high frequency 2 words, and the Two bors d's oeuveres syllable coding of getting lead-in is corresponding with secondary brevity code syllable, and the tail word is then as much as possible to comprise high frequency 1 words, and unisonance high frequency 1 words is distributed within the different secondary brevity codes.When entering common input state, then only import lead-in.
10, isometric 2 yards, automatic distinguishing and identification.
11, word outside the brevity code and speech are arranged in the window of brevity code correspondence, if import after the brevity code, import a major key position again, then import this brevity code automatically; Otherwise, if space of input promptly enters selection mode.
12, under the selection mode, if be in common input state, then the single Chinese character of prompting and brevity code unisonance on the screen is pressed the selection of priority of high frequency and is sorted, with corresponding major key position input; If be in quick input state, then can import a Two bors d's oeuveres syllable again, the two-character word that screen prompt is determined by these two syllables, then input automatically when having only; Otherwise, use aural warning, and press the principle ordering of priority of high frequency, select input with the special digital key.
13, multi-character words splits into 1 words and 2 words are imported respectively.Also can establish the multi-character words state in addition and get the Two bors d's oeuveres initial consonant code element input of each word, the Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element of getting the tail word that less than is 4 yards is supplied 4 yards.
14, speech syllable, phonetic and Chinese character can be changed mutually, and Chinese character can directly become phonetic and speech syllable, and phonetic can directly become speech syllable.When speech syllable becomes phonetic, can directly eliminate the space and realize.When phonetic becomes Chinese character, can be transformed by computer self one to one, remaining is shown respectively by speech by computing machine, and the prompting homonym, selects input with keyboard.Same vocabulary in file can be by the disposable prompting of computing machine and disposable selection and conversion.This method also can be used for importing Chinese character.
15, can work out specific program, realize the automatic conversion of speech syllable, phonetic and Chinese character.
The Figure of description explanation:
Fig. 1~Fig. 3: the dual spelling Chinese words voice do not have the accent syllabary
The overall diagram of Fig. 4: Fig. 1 and Fig. 3.
Fig. 5: dual spelling Chinese words encoded keyboard figure.

Claims (18)

1, a kind of dual spelling Chinese words coding method that is used for the Chinese information processing technical field, it is characterized in that: according to the conjunction law of Chinese speech and the requirement of keyboard input, Chinese speech band tuning joint is decomposed into the Two bors d's oeuveres vowel, remerging is one group of Two bors d's oeuveres initial consonant code element and one group of Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element, realizes the double spelling coding of Chinese speech band tuning joint and various written forms thereof.
2, a kind of dual spelling Chinese words calculation of coding machine internal code system, it is characterized in that: a Two bors d's oeuveres initial consonant code element and a Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element, can risk the band tuning joint of Chinese speech, the identification of keyboard typing, font and the font that are directly used in keyboard typing, speech recognition, phonetic synthesis and the Chinese phonetic alphabet class literal of Chinese speech are printed, and Chinese information processing technical field such as Chinese program design.
3, a kind of double spelling code identical permutation mode of dual spelling Chinese words keyboard, it is characterized in that: be made up of totally 40 main key positions 4 row, 10 row, a Two bors d's oeuveres initial consonant code element had both been represented in each key position, represented a Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element again, distinguish mutually by input sequence, twice keystroke can be imported a band tuning joint.
4, according to the described Chinese words coding method of claim 1, it is characterized in that:, initial consonant is divided into different Two bors d's oeuveres initial consonants according to piecing together with the rhythm parent phase headed by middle mediated rhyme i, u or the ü; Not merging into Two bors d's oeuveres simple or compound vowel of a Chinese syllable group with the simple or compound vowel of a Chinese syllable that the Two bors d's oeuveres initial consonant is pieced together simultaneously mutually basically.Determine Two bors d's oeuveres initial consonant code element and Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element according to the restricted number of universal small keyboard major key position again.
5, according to claim 1 and 3 described dual spelling Chinese words keyboard double spelling code identical permutation modes, it is characterized in that: the compatibility relation of compiling code element according to dual spelling Chinese words, with reference to each code element when the Chinese information processing probability of use and the fingering of keyboard, be arranged on the corresponding main key position of universal small keyboard, make every effort to quick keystroke, be convenient to study, operation and memory.
6, according to the described Chinese words coding method of claim 5, it is characterized in that:, add one " ' " number difference separating with the initial consonant that rhythm parent phase headed by i or the u is pieced together; Incorporating in the aforementioned initial consonant with the initial consonant that rhythm parent phase headed by the ü is pieced together, j ü and z ' merging are designated as z ' j again; Q ü and c ' merging are designated as c ' q; X ü and s ' merging are designated as s ' x; Can merge with initial consonant y and the f that ü and ü e piece together mutually, be designated as fy; Initial consonant y and the ch ' merging that can piece together mutually with ü an and ü en are designated as ch ' y; P ' and r ' merging are designated as p ' r '; F ' has only a fu, incorporates among the f; Thereby form 40 Two bors d's oeuveres initial consonant code elements.Generally can not be combined into one group, i.e. ang, iang, uan with the simple or compound vowel of a Chinese syllable that Two bors d's oeuveres initial consonant code element is pieced together simultaneously mutually; An, ü an, ian, ua; Ai, ia, u; A, iu, ui, ong, o, iong, uo; E, ei, i; En, in, un; Totally 10 groups of eng, er, ü n, ing, ueng and ou, ü e, ie, uang.Every group press high and level tone, rising tone again, go up sound, 4 of falling tone each minutes, incorporate high and level tone softly into.This just forms 40 Two bors d's oeuveres simple or compound vowel of a Chinese syllable code elements.
7, according to claim 5 and 6 described dual spelling Chinese words keyboard double spelling code identical permutation modes, it is characterized in that: the same simple or compound vowel of a Chinese syllable of Two bors d's oeuveres simple or compound vowel of a Chinese syllable code element same column, colleague's same tone; Two bors d's oeuveres initial consonant code element is then compatible mutually with the alphanumeric key of universal small keyboard.
8, according to claim 6 and 7 described dual spelling Chinese words coded input methods, it is characterized in that: the tuning of Chinese speech band saves isometric 2 yards, can be used for the keyboard input of Chinese speech.
9, according to claim 6 and 7 described dual spelling Chinese words coded input methods, it is characterized in that: the direct participle input of format write pressed in Chinese phonetic alphabet class literal.
10, according to claim 6 and 7 described dual spelling Chinese words coded input methods, it is characterized in that: the pronunciation of Chinese character is decomposed into Two bors d's oeuveres initial consonant and Two bors d's oeuveres simple or compound vowel of a Chinese syllable, be aided with the method for distinguishing identical pronunciation again, Chinese character, speech or sentence and combination thereof are encoded, thereby form the Two bors d's oeuveres encode method for entering Chinese characters.
11, according to the described Two bors d's oeuveres encode method for entering Chinese characters of claim 10, it is characterized in that: getting the high frequency word that comprises the Two bors d's oeuveres code element is the one-level brevity code; The high frequency word of getting the double spelling coding syllable is the secondary brevity code.
12, according to the described Two bors d's oeuveres encode method for entering Chinese characters of claim 10, it is characterized in that: getting the high frequency words that comprises the Two bors d's oeuveres code element is the one-level brevity code; Getting the high frequency words that comprises the double spelling coding syllable is the secondary brevity code.It can be 1 words; Also can be 2 words that comprise 1 words; Or this two kinds of different states are set.
13, according to claim 11 or 12 described Two bors d's oeuveres encode method for entering Chinese characters, it is characterized in that: word outside the brevity code or speech, import by sequence number prompting and selection in unisonance sign indicating number scope according to the principle of priority of high frequency.
14, according to claim 11 or 12 described Two bors d's oeuveres encode method for entering Chinese characters, it is characterized in that: word outside the brevity code or speech, carry out screen according to the mode of association's prompting and select and import.
15, according to claim 11 or 12 described Two bors d's oeuveres encode method for entering Chinese characters, it is characterized in that: word outside the brevity code or speech, carry out screen prompt, selection and input with association's double-tone joint of double-tone joint or individual character.
16, according to claim 11 or 12 described Two bors d's oeuveres encode method for entering Chinese characters, it is characterized in that: isometric 4 yards, 4 yards of less thaies replenish a space, surpass 4 yards only get 4 yards.
17, according to claim 11 or 12 described Two bors d's oeuveres encode method for entering Chinese characters, it is characterized in that: isometric 2 yards, 2 yards of less thaies replenish a space, surpass 2 yards, the 3rd yard increases a space and enters screen prompt or further coding selection state.
18, according among the claim 1-17 any one, to the method that Chinese information is encoded and imported, can be used for that all are large, medium and small, microcomputer Chinese information processing system, telex machine, typewriter is in Chinese terminating machine and the Chinese communication system.
CN 92105929 1992-07-20 1992-07-20 Dual spelling Chinese words coding method and keyboard thereof Pending CN1081523A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 92105929 CN1081523A (en) 1992-07-20 1992-07-20 Dual spelling Chinese words coding method and keyboard thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 92105929 CN1081523A (en) 1992-07-20 1992-07-20 Dual spelling Chinese words coding method and keyboard thereof

Publications (1)

Publication Number Publication Date
CN1081523A true CN1081523A (en) 1994-02-02

Family

ID=4941666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 92105929 Pending CN1081523A (en) 1992-07-20 1992-07-20 Dual spelling Chinese words coding method and keyboard thereof

Country Status (1)

Country Link
CN (1) CN1081523A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1307513C (en) * 2003-08-28 2007-03-28 富士通株式会社 Chinese character input method and apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1307513C (en) * 2003-08-28 2007-03-28 富士通株式会社 Chinese character input method and apparatus

Similar Documents

Publication Publication Date Title
US4379288A (en) Means for encoding ideographic characters
KR100962956B1 (en) Input method for optimizing digitize operation code for the world characters information and information processing system thereof
US5131766A (en) Method for encoding chinese alphabetic characters
CN85100837A (en) Optimize the Five-stroke Method compiling method and keyboard thereof
CN1262473A (en) Chinese-caracter input method by phonetic letters with numeral key pad
CN100476826C (en) Chinese character ordering searching method and device and one information system
CN1097766C (en) Chinese-character 5-key input method
CN1136496C (en) Simplified spelling-touching screen mouse chinese character input method
CN101071337B (en) Phonetic alphabet letter-digit Chinese character input method and keyboard and screen display method
CN1081523A (en) Dual spelling Chinese words coding method and keyboard thereof
CN1834870A (en) Japanese character inputting method and system thereof
CN1035083C (en) Word-oriented Chinese character typing device
GB2071018A (en) Improvements in method and apparatus for information processing
CN1072785A (en) Irrational rank-numeral synthetic coding method and keyboard thereof
CN1054219C (en) Substitution type Chinese phonetic character, word input coding method and keyboard thereof
CN1196057C (en) One-code two-form quick Chinese digital coding input method
CN1257445C (en) Chinese-character 'Pronunciation-meaning code' input method
CN1173250C (en) Computer input method using intelligent code
CN1027839C (en) Chinese character encoding input method
CN1056457C (en) Hanyupinying writing inputing method for computer
CN1025540C (en) Double-combination encoding method by use of initial consonants and vowels of Chinese syllables
CN1244855C (en) Digital standard coding input technology for Chinese character in Chinese information processing
CN1032559C (en) Language input gradient acceleration method
CN1063946A (en) " Chinese communication scheme " abbreviated spelling computer input method and keyboard
CN1272693C (en) Artificial phonetic digital input method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C01 Deemed withdrawal of patent application (patent law 1993)
WD01 Invention patent application deemed withdrawn after publication