CN102591472B - Method and device for inputting Chinese characters - Google Patents
Method and device for inputting Chinese characters Download PDFInfo
- Publication number
- CN102591472B CN102591472B CN201110020045.1A CN201110020045A CN102591472B CN 102591472 B CN102591472 B CN 102591472B CN 201110020045 A CN201110020045 A CN 201110020045A CN 102591472 B CN102591472 B CN 102591472B
- Authority
- CN
- China
- Prior art keywords
- entry
- weight
- input
- pinyin string
- skew
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The invention provides a method and a device for inputting Chinese characters, which can solve the problem that a preferred rate is not high in prior art. The method comprises obtaining a Pinyin string; obtaining the current input context, which includes candidate words corresponding to the Pinyin string and a previous word of current input; acquiring the weight for each candidate word corresponding to the Pinyin string, the historical input count of the entry and the offset weight of the entry, wherein the entry offset weight refers to the offset weight between the entry and the previous word of current input; calculating an entry sorting weight for each entry according to the weight of the entry, the offset weight of the entry and the historical input count; sorting each entry corresponding to the Pinyin string in descending order according to the priority weight to obtain the final ranking of the candidate words of the input method; and inputting the entries corresponding to the Pinyin string according to the ranking results. The method can improve the preferred rate of the input method and facilitate input of users.
Description
Technical field
The present invention relates to a kind of character input technology, relate in particular to a kind of Chinese character input method and device.
Background technology
We need to use input method input to want the information of expressing, and some people likes the input of, and some people is partial to the input of a word of a word.We know same pinyin string corresponding multiple Chinese character strings conventionally, and this Chinese character string may be phrase or sentence.Input method, mainly according to the history of some word, phrase and the English word frequency of occurrences in daily life and user's input, is arranged these words, phrase and English word according to descending at present.For example, when pinyin string of input " xian ' cheng " time, first input method finds all entries corresponding to phonetic " xian ' cheng " from dictionary: " virtuous one-tenth ", " ready-made ", " county town ", " thread ", " XianCheng ", " orange " etc.The weight at " county town " is higher than the weight of " ready-made " and other all words, then now " county town " as candidate word appear at " ready-made " before.If selected entry " ready-made ", because user once inputted entry " ready-made ", when input Pinyin string again " xian ' cheng ", if according to the historical priority principle of input, now candidate word " ready-made " will come " county town " before.
As shown in Figure 1, introduce the input method of prior art below.
Step 101, obtain pinyin string.
The number of times of the historical input of step 102, the weight of obtaining each entry corresponding with pinyin string, entry from dictionary and this entry.
Step 103, to each entry, calculate the weight order of this entry according to the number of times of the weight of entry and historical input.
Step 104, each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtain the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
Although current input technology can meet the demand of active user's input to a certain extent preferably, but, present inventor finds, in the contact having between entry between context, the contextual relation that makes full use of current input can better help user to input, and current input method is not utilized the relation of Input context.For example: we know that " black dull knight " is a word, so entry " black dull " and " knight " have certain contact.After user input " black dull ", at input Pinyin string " qi ' shi ", afterwards, first-selected word is " in fact ".The dynamic order method that prior art is used, does not excavate current input scene completely, makes full use of the relation between context and the pinyin string of input of user input, to cause the initial selection of input method first-selection word not high.Described first-selected word refers to, first word of the candidate word that input Pinyin string is corresponding, and first-selected word can make user convenient and input fast Chinese character, and the accuracy of first-selected word is called initial selection, and initial selection is the important indicator of evaluating input method quality.
Summary of the invention
Embodiments of the invention provide a kind of Chinese character input method and device, can solve the problem that prior art initial selection is not high.
Embodiments of the invention provide a kind of Chinese character input method, comprising:
Obtain pinyin string;
Obtain the context of current input, the context of described current input comprises candidate word entry that described pinyin string is corresponding and the previous word of current input;
Obtain the weight of each the candidate word entry corresponding with pinyin string, the number of times of the historical input of this entry, the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input, and skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously;
To each entry, calculate the weight order of this entry according to the number of times of the skew weight of the weight of entry, this entry and historical input;
Each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtain the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
The previous word that the word above of described current input is current input.
The formula that calculates weight order is as follows:
Weight
paixu(W)=Weight(W)+a×Time(W)+b×Weight
pianyi(W)…(1)
Wherein a, b are constants; W has represented current entry; Weight
paixu(W) be the weight order of entry W, Weight (W) is the weight of entry W, Time(W) be the number of times of the history input of entry W; Weight
pianyi(W) be the skew weight of entry W.
The present invention also provides a kind of Chinese input unit, and it comprises: phonetic obtains module, context obtains module, enquiry module, computing module, load module;
Described phonetic obtains module and is used for obtaining pinyin string;
Context acquisition module is for obtaining the context of current input, and described context comprises candidate word entry that described pinyin string is corresponding and the previous word of current input;
Enquiry module is for obtaining the number of times of history input of weight, this entry of the candidate word entry corresponding with pinyin string and the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input, and skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously;
Computing module is for calculating the weight order of this entry according to the number of times of the history input of the skew weight of the weight of each entry, this entry and entry;
Load module, for each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtains the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
According to the present invention, the method for dynamically adjusting candidate word based on context can improve the initial selection of input method, user friendly input.Particularly in the middle of skew weight, add user's input habit, can further improve the ease for use of input method.
Brief description of the drawings
Fig. 1 shows the Chinese character input method of prior art;
Fig. 2 shows the Chinese character input method of the embodiment of the present invention;
Fig. 3 shows the Chinese input unit of the embodiment of the present invention.
Embodiment
Understand and realize the present invention for the ease of persons skilled in the art, now describing by reference to the accompanying drawings embodiments of the invention.
Embodiment mono-
The present embodiment provides a kind of Chinese character input method.According to the present invention, in order to improve the initial selection of the first-selected word of input method, the word (as previous word) above of the candidate word of input method and current input is combined, so the candidate word that need consideration input Pinyin string is corresponding and the above mutual relationship between word, finally again candidate word is sorted, determine first-selected word according to clooating sequence, this will improve the first-selected word hit rate of input.As shown in Figure 2, introduce Chinese character input method of the present invention below.
Weight
paixu(W)=Weight(W)+a×Time(W)+b×Weight
pianyi(W)…(1)
Wherein a, b are constants.W has represented current entry.Weight
paixu(W) be the weight order of entry W, Weight (W) is the weight of entry W, Time(W) be the number of times of the history input of entry W.Weight
pianyi(W) be the skew weight of entry W, skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously.For example: in a pile language material, entry " on " and " lock " have occurred 200 times simultaneously, and user inputs entry " on " and " lock " 1 time according to sequencing, and the skew weight between " on " and " lock " is exactly so: 200+1=201.
Embodiment bis-
As shown in Figure 3, the present embodiment discloses a kind of Chinese input unit, and it comprises: phonetic obtains module, context obtains module, enquiry module, computing module, load module.Phonetic obtains module and is used for obtaining pinyin string.Context acquisition module is for obtaining the context of current input, and described context comprises candidate word that described pinyin string is corresponding and the previous word of current input.Enquiry module is for obtaining the number of times of history input of weight, this entry of the candidate word corresponding with pinyin string and the skew weight of this entry.Computing module is for obtaining the weight order of entry according to the number of times of the history input of the weight of entry, entry, the skew weight of this entry.Load module, for entry being carried out to descending sort according to the weight order of entry, obtains the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
The principle of work of the modules of the present embodiment can be referring to the description of embodiment mono-.
Embodiment tri-
This example is described Chinese character input method of the present invention by an object lesson.
We know that entry " is visited a grave " and " firing " often occurs together simultaneously, and " shao ' zhi " pinyin string corresponding entry in dictionary has " firing ", " burning paper as sacrificial offerings ", " burn to ", " burning it ", " slightly knowing ".The weight that these entries are corresponding is:
Entry | Weight |
Fire | 39 |
Burn paper as sacrificial offerings | 38 |
Burn extremely | 36 |
Burn it | 34 |
Slightly know | 32 |
According to formula (1), in the case of the number of times of history input and a upper entry of Input context that there is no user, the weight order of entry is the weight of entry in dictionary namely.
When the user of input method is after input entry " is visited a grave ", again when input Pinyin string " shao ' zhi ", we find in entry corresponding to pinyin string " shao ' zhi ", only have entry " to burn paper as sacrificial offerings " and " visiting a grave " exists relation, through inquiry, we can know that the skew weight of " burning paper as sacrificial offerings " under the context environmental of " visiting a grave " is 6, and the skew weight of other entry is all 0, so can know that according to formula (1) their weight order is as shown in the table:
Entry | Weight |
Burn paper as sacrificial offerings | 44 |
Fire | 39 |
Burn extremely | 36 |
Burn it | 34 |
Slightly know | 32 |
Known according to upper table, the candidate word sequence of word string " shao ' zhi " is: burn paper as sacrificial offerings, fire, burn to, burn it, slightly know.
According to the present invention, the method for dynamically adjusting candidate word based on context can improve the initial selection of input method, user friendly input.Particularly in the middle of skew weight, add user's input habit, can further improve the ease for use of input method.
Although described the present invention by embodiment, those of ordinary skill in the art know, without departing from the spirit and substance in the present invention, just can make the present invention have many distortion and variation, and scope of the present invention is limited to the appended claims.
Claims (2)
1. a Chinese character input method, is characterized in that, comprising:
Obtain pinyin string;
Obtain the context of current input, the context of described current input comprises candidate word entry that described pinyin string is corresponding and the previous word of current input;
Obtain the weight of each the candidate word entry corresponding with pinyin string, the number of times of the historical input of this entry, the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input, and skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously;
To each entry, calculate the weight order of this entry according to the number of times of the skew weight of the weight of entry, this entry and historical input, the formula that calculates weight order is as follows:
Weight
paixu(W)=Weight(W)+a×Time(W)+b×Weight
pianyi(W)…(1)
Wherein a, b are constants, and W has represented current entry, Weight
paixu(W) be the weight order of entry W, Weight (W) is the weight of entry W, Time(W) be the number of times of the history input of entry W, Weight
pianyi(W) be the skew weight of entry W;
Each corresponding pinyin string candidate word is carried out to descending sort according to its weight order, obtain the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
2. a Chinese input unit, is characterized in that, it comprises: phonetic obtains module, context obtains module, enquiry module, computing module, load module;
Described phonetic obtains module and is used for obtaining pinyin string;
Context acquisition module is for obtaining the context of current input, and described context comprises candidate word that described pinyin string is corresponding and the previous word of current input;
Enquiry module is for obtaining the number of times of history input of weight, this entry of the candidate word corresponding with pinyin string and the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input, and skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously;
Computing module is for calculating the weight order of this entry according to the number of times of the history input of the skew weight of the weight of each entry, this entry and entry, the formula that calculates weight order is as follows:
Weight
paixu(W)=Weight(W)+a×Time(W)+b×Weight
pianyi(W)…(1)
Wherein a, b are constants, and W has represented current entry, Weight
paixu(W) be the weight order of entry W, Weight (W) is the weight of entry W, Time(W) be the number of times of the history input of entry W, Weight
pianyi(W) be the skew weight of entry W;
Load module, for each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtains the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110020045.1A CN102591472B (en) | 2011-01-13 | 2011-01-13 | Method and device for inputting Chinese characters |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110020045.1A CN102591472B (en) | 2011-01-13 | 2011-01-13 | Method and device for inputting Chinese characters |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102591472A CN102591472A (en) | 2012-07-18 |
CN102591472B true CN102591472B (en) | 2014-06-18 |
Family
ID=46480268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110020045.1A Active CN102591472B (en) | 2011-01-13 | 2011-01-13 | Method and device for inputting Chinese characters |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102591472B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104699766A (en) * | 2015-02-15 | 2015-06-10 | 浙江理工大学 | Implicit attribute mining method integrating word correlation and context deduction |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104823135B (en) * | 2012-08-31 | 2018-01-30 | 微软技术许可有限责任公司 | Personal language model for Input Method Editor |
CN103870001B (en) * | 2012-12-11 | 2018-07-10 | 百度国际科技(深圳)有限公司 | A kind of method and electronic device for generating candidates of input method |
US9665246B2 (en) * | 2013-04-16 | 2017-05-30 | Google Inc. | Consistent text suggestion output |
US8825474B1 (en) * | 2013-04-16 | 2014-09-02 | Google Inc. | Text suggestion output using past interaction data |
CN104376324A (en) * | 2013-08-12 | 2015-02-25 | 索尼公司 | State detection method and device based on signal processing |
CN105653061B (en) * | 2015-12-29 | 2020-03-31 | 北京京东尚科信息技术有限公司 | Entry retrieval and wrong word detection method and system for pinyin input method |
CN106557178B (en) * | 2016-11-29 | 2021-03-09 | 百度国际科技(深圳)有限公司 | Method and device for updating entries of input method |
CN106896935A (en) * | 2017-02-22 | 2017-06-27 | 李晓明 | Input method |
CN106873801A (en) * | 2017-02-28 | 2017-06-20 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating the combination of the entry in input method dictionary |
CN106909232A (en) * | 2017-02-28 | 2017-06-30 | 百度在线网络技术(北京)有限公司 | Method and apparatus for showing candidate entry |
CN107273537A (en) * | 2017-06-30 | 2017-10-20 | 深圳创维数字技术有限公司 | One kind search words recommending method, set top box and storage medium |
CN109032375B (en) * | 2018-06-29 | 2022-07-19 | 北京百度网讯科技有限公司 | Candidate text sorting method, device, equipment and storage medium |
CN110333787A (en) * | 2019-04-28 | 2019-10-15 | 华为技术有限公司 | The method and apparatus for inputting character |
CN111309158A (en) * | 2020-01-19 | 2020-06-19 | 李彦志 | Voice-form dual-mode Chinese input method, system, equipment and computer readable storage medium |
CN111984132B (en) * | 2020-07-07 | 2021-07-27 | 北京语言大学 | Method and system for inputting information according to context environment |
CN113360004A (en) * | 2021-07-01 | 2021-09-07 | 北京华宇信息技术有限公司 | Input method candidate word recommendation method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2202612B1 (en) * | 2005-10-14 | 2018-12-19 | BlackBerry Limited | Automatic language selection for improving text accuracy |
CN101324806B (en) * | 2007-06-14 | 2010-06-23 | 台达电子工业股份有限公司 | Input system and method for mobile search |
CN101290632B (en) * | 2008-05-30 | 2011-09-14 | 北京搜狗科技发展有限公司 | Input method for user words participating in intelligent word-making and input method system |
-
2011
- 2011-01-13 CN CN201110020045.1A patent/CN102591472B/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104699766A (en) * | 2015-02-15 | 2015-06-10 | 浙江理工大学 | Implicit attribute mining method integrating word correlation and context deduction |
CN104699766B (en) * | 2015-02-15 | 2018-01-02 | 浙江理工大学 | A kind of implicit attribute method for digging for merging word association relation and context of co-text deduction |
Also Published As
Publication number | Publication date |
---|---|
CN102591472A (en) | 2012-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102591472B (en) | Method and device for inputting Chinese characters | |
US20240095252A1 (en) | Search result ranking and presentation | |
CN100530171C (en) | Dictionary learning method and devcie | |
CN104102626B (en) | A kind of method for short text Semantic Similarity Measurement | |
US8077983B2 (en) | Systems and methods for character correction in communication devices | |
CN107562831A (en) | A kind of accurate lookup method based on full-text search | |
JP6122800B2 (en) | Electronic device, character string display method, and character string display program | |
CN105094368A (en) | Control method and control device for frequency modulation ordering of input method candidate item | |
JP2010537286A (en) | Creating an area dictionary | |
CN110134780B (en) | Method, device, equipment and computer readable storage medium for generating document abstract | |
KR101541306B1 (en) | Computer enabled method of important keyword extraction, server performing the same and storage media storing the same | |
KR101668725B1 (en) | Latent keyparase generation method and apparatus | |
CN104102658A (en) | Method and device for mining text contents | |
CN104281275B (en) | The input method of a kind of English and device | |
US20090278853A1 (en) | Character input program, character input device, and character input method | |
CN111159312A (en) | Fault-related information auxiliary retrieval method and device, storage medium and electronic equipment | |
CN105528404A (en) | Establishment method and apparatus of seed keyword dictionary, and extraction method and apparatus of keywords | |
CN104809236B (en) | A kind of age of user sorting technique and system based on microblogging | |
US9104755B2 (en) | Ontology enhancement method and system | |
CN104331483B (en) | Zone issue detection method and equipment based on short text data | |
KR101351555B1 (en) | classification-extraction system based meaning for text-mining of large data. | |
CN101266599B (en) | Input method and user terminal | |
JP2004227037A (en) | Field matching device, program therefor, computer readable recording medium, and identical field determination method | |
JP5575075B2 (en) | Representative document selection apparatus and method, program, and computer-readable recording medium | |
CN116088692B (en) | Method and apparatus for presenting candidate character strings and training discriminant models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230414 Address after: Room 501-502, 5/F, Sina Headquarters Scientific Research Building, Block N-1 and N-2, Zhongguancun Software Park, Dongbei Wangxi Road, Haidian District, Beijing, 100193 Patentee after: Sina Technology (China) Co.,Ltd. Address before: 100080, International Building, No. 58 West Fourth Ring Road, Haidian District, Beijing, 20 floor Patentee before: Sina.com Technology (China) Co.,Ltd. |
|
TR01 | Transfer of patent right |