CN102591472B - Method and device for inputting Chinese characters - Google Patents

Method and device for inputting Chinese characters Download PDF

Info

Publication number
CN102591472B
CN102591472B CN201110020045.1A CN201110020045A CN102591472B CN 102591472 B CN102591472 B CN 102591472B CN 201110020045 A CN201110020045 A CN 201110020045A CN 102591472 B CN102591472 B CN 102591472B
Authority
CN
China
Prior art keywords
entry
weight
input
pinyin string
skew
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110020045.1A
Other languages
Chinese (zh)
Other versions
CN102591472A (en
Inventor
蔡衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sina Technology China Co Ltd
Original Assignee
Sina Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sina Technology China Co Ltd filed Critical Sina Technology China Co Ltd
Priority to CN201110020045.1A priority Critical patent/CN102591472B/en
Publication of CN102591472A publication Critical patent/CN102591472A/en
Application granted granted Critical
Publication of CN102591472B publication Critical patent/CN102591472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention provides a method and a device for inputting Chinese characters, which can solve the problem that a preferred rate is not high in prior art. The method comprises obtaining a Pinyin string; obtaining the current input context, which includes candidate words corresponding to the Pinyin string and a previous word of current input; acquiring the weight for each candidate word corresponding to the Pinyin string, the historical input count of the entry and the offset weight of the entry, wherein the entry offset weight refers to the offset weight between the entry and the previous word of current input; calculating an entry sorting weight for each entry according to the weight of the entry, the offset weight of the entry and the historical input count; sorting each entry corresponding to the Pinyin string in descending order according to the priority weight to obtain the final ranking of the candidate words of the input method; and inputting the entries corresponding to the Pinyin string according to the ranking results. The method can improve the preferred rate of the input method and facilitate input of users.

Description

A kind of Chinese character input method and device
Technical field
The present invention relates to a kind of character input technology, relate in particular to a kind of Chinese character input method and device.
Background technology
We need to use input method input to want the information of expressing, and some people likes the input of, and some people is partial to the input of a word of a word.We know same pinyin string corresponding multiple Chinese character strings conventionally, and this Chinese character string may be phrase or sentence.Input method, mainly according to the history of some word, phrase and the English word frequency of occurrences in daily life and user's input, is arranged these words, phrase and English word according to descending at present.For example, when pinyin string of input " xian ' cheng " time, first input method finds all entries corresponding to phonetic " xian ' cheng " from dictionary: " virtuous one-tenth ", " ready-made ", " county town ", " thread ", " XianCheng ", " orange " etc.The weight at " county town " is higher than the weight of " ready-made " and other all words, then now " county town " as candidate word appear at " ready-made " before.If selected entry " ready-made ", because user once inputted entry " ready-made ", when input Pinyin string again " xian ' cheng ", if according to the historical priority principle of input, now candidate word " ready-made " will come " county town " before.
As shown in Figure 1, introduce the input method of prior art below.
Step 101, obtain pinyin string.
The number of times of the historical input of step 102, the weight of obtaining each entry corresponding with pinyin string, entry from dictionary and this entry.
Step 103, to each entry, calculate the weight order of this entry according to the number of times of the weight of entry and historical input.
Step 104, each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtain the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
Although current input technology can meet the demand of active user's input to a certain extent preferably, but, present inventor finds, in the contact having between entry between context, the contextual relation that makes full use of current input can better help user to input, and current input method is not utilized the relation of Input context.For example: we know that " black dull knight " is a word, so entry " black dull " and " knight " have certain contact.After user input " black dull ", at input Pinyin string " qi ' shi ", afterwards, first-selected word is " in fact ".The dynamic order method that prior art is used, does not excavate current input scene completely, makes full use of the relation between context and the pinyin string of input of user input, to cause the initial selection of input method first-selection word not high.Described first-selected word refers to, first word of the candidate word that input Pinyin string is corresponding, and first-selected word can make user convenient and input fast Chinese character, and the accuracy of first-selected word is called initial selection, and initial selection is the important indicator of evaluating input method quality.
Summary of the invention
Embodiments of the invention provide a kind of Chinese character input method and device, can solve the problem that prior art initial selection is not high.
Embodiments of the invention provide a kind of Chinese character input method, comprising:
Obtain pinyin string;
Obtain the context of current input, the context of described current input comprises candidate word entry that described pinyin string is corresponding and the previous word of current input;
Obtain the weight of each the candidate word entry corresponding with pinyin string, the number of times of the historical input of this entry, the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input, and skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously;
To each entry, calculate the weight order of this entry according to the number of times of the skew weight of the weight of entry, this entry and historical input;
Each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtain the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
The previous word that the word above of described current input is current input.
The formula that calculates weight order is as follows:
Weight paixu(W)=Weight(W)+a×Time(W)+b×Weight pianyi(W)…(1)
Wherein a, b are constants; W has represented current entry; Weight paixu(W) be the weight order of entry W, Weight (W) is the weight of entry W, Time(W) be the number of times of the history input of entry W; Weight pianyi(W) be the skew weight of entry W.
The present invention also provides a kind of Chinese input unit, and it comprises: phonetic obtains module, context obtains module, enquiry module, computing module, load module;
Described phonetic obtains module and is used for obtaining pinyin string;
Context acquisition module is for obtaining the context of current input, and described context comprises candidate word entry that described pinyin string is corresponding and the previous word of current input;
Enquiry module is for obtaining the number of times of history input of weight, this entry of the candidate word entry corresponding with pinyin string and the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input, and skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously;
Computing module is for calculating the weight order of this entry according to the number of times of the history input of the skew weight of the weight of each entry, this entry and entry;
Load module, for each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtains the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
According to the present invention, the method for dynamically adjusting candidate word based on context can improve the initial selection of input method, user friendly input.Particularly in the middle of skew weight, add user's input habit, can further improve the ease for use of input method.
Brief description of the drawings
Fig. 1 shows the Chinese character input method of prior art;
Fig. 2 shows the Chinese character input method of the embodiment of the present invention;
Fig. 3 shows the Chinese input unit of the embodiment of the present invention.
Embodiment
Understand and realize the present invention for the ease of persons skilled in the art, now describing by reference to the accompanying drawings embodiments of the invention.
Embodiment mono-
The present embodiment provides a kind of Chinese character input method.According to the present invention, in order to improve the initial selection of the first-selected word of input method, the word (as previous word) above of the candidate word of input method and current input is combined, so the candidate word that need consideration input Pinyin string is corresponding and the above mutual relationship between word, finally again candidate word is sorted, determine first-selected word according to clooating sequence, this will improve the first-selected word hit rate of input.As shown in Figure 2, introduce Chinese character input method of the present invention below.
Step 201, obtain pinyin string.
Step 202, obtain the context of current input, the context of described current input comprises candidate word entry that described pinyin string is corresponding and the word above of current input, preferably, and the previous word that the word above of current input is current input.
Step 203, the weight of obtaining the candidate word entry corresponding with pinyin string, the number of times of the historical input of this entry, the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input.Above-mentioned data can be stored in dictionary, or otherwise storage, and as stored in the mode of table, described dictionary comprises: entry, the weight of entry, the number of times of the historical input of this entry and the skew weight of this entry that pinyin string is corresponding.
Step 204, to each entry, calculate the weight order of this entry according to the number of times of the skew weight of the weight of entry, this entry and historical input.The formula that calculates weight order is as follows:
Weight paixu(W)=Weight(W)+a×Time(W)+b×Weight pianyi(W)…(1)
Wherein a, b are constants.W has represented current entry.Weight paixu(W) be the weight order of entry W, Weight (W) is the weight of entry W, Time(W) be the number of times of the history input of entry W.Weight pianyi(W) be the skew weight of entry W, skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously.For example: in a pile language material, entry " on " and " lock " have occurred 200 times simultaneously, and user inputs entry " on " and " lock " 1 time according to sequencing, and the skew weight between " on " and " lock " is exactly so: 200+1=201.
Step 205, each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtain the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
Embodiment bis-
As shown in Figure 3, the present embodiment discloses a kind of Chinese input unit, and it comprises: phonetic obtains module, context obtains module, enquiry module, computing module, load module.Phonetic obtains module and is used for obtaining pinyin string.Context acquisition module is for obtaining the context of current input, and described context comprises candidate word that described pinyin string is corresponding and the previous word of current input.Enquiry module is for obtaining the number of times of history input of weight, this entry of the candidate word corresponding with pinyin string and the skew weight of this entry.Computing module is for obtaining the weight order of entry according to the number of times of the history input of the weight of entry, entry, the skew weight of this entry.Load module, for entry being carried out to descending sort according to the weight order of entry, obtains the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
The principle of work of the modules of the present embodiment can be referring to the description of embodiment mono-.
Embodiment tri-
This example is described Chinese character input method of the present invention by an object lesson.
We know that entry " is visited a grave " and " firing " often occurs together simultaneously, and " shao ' zhi " pinyin string corresponding entry in dictionary has " firing ", " burning paper as sacrificial offerings ", " burn to ", " burning it ", " slightly knowing ".The weight that these entries are corresponding is:
Entry Weight
Fire 39
Burn paper as sacrificial offerings 38
Burn extremely 36
Burn it 34
Slightly know 32
According to formula (1), in the case of the number of times of history input and a upper entry of Input context that there is no user, the weight order of entry is the weight of entry in dictionary namely.
When the user of input method is after input entry " is visited a grave ", again when input Pinyin string " shao ' zhi ", we find in entry corresponding to pinyin string " shao ' zhi ", only have entry " to burn paper as sacrificial offerings " and " visiting a grave " exists relation, through inquiry, we can know that the skew weight of " burning paper as sacrificial offerings " under the context environmental of " visiting a grave " is 6, and the skew weight of other entry is all 0, so can know that according to formula (1) their weight order is as shown in the table:
Entry Weight
Burn paper as sacrificial offerings 44
Fire 39
Burn extremely 36
Burn it 34
Slightly know 32
Known according to upper table, the candidate word sequence of word string " shao ' zhi " is: burn paper as sacrificial offerings, fire, burn to, burn it, slightly know.
According to the present invention, the method for dynamically adjusting candidate word based on context can improve the initial selection of input method, user friendly input.Particularly in the middle of skew weight, add user's input habit, can further improve the ease for use of input method.
Although described the present invention by embodiment, those of ordinary skill in the art know, without departing from the spirit and substance in the present invention, just can make the present invention have many distortion and variation, and scope of the present invention is limited to the appended claims.

Claims (2)

1. a Chinese character input method, is characterized in that, comprising:
Obtain pinyin string;
Obtain the context of current input, the context of described current input comprises candidate word entry that described pinyin string is corresponding and the previous word of current input;
Obtain the weight of each the candidate word entry corresponding with pinyin string, the number of times of the historical input of this entry, the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input, and skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously;
To each entry, calculate the weight order of this entry according to the number of times of the skew weight of the weight of entry, this entry and historical input, the formula that calculates weight order is as follows:
Weight paixu(W)=Weight(W)+a×Time(W)+b×Weight pianyi(W)…(1)
Wherein a, b are constants, and W has represented current entry, Weight paixu(W) be the weight order of entry W, Weight (W) is the weight of entry W, Time(W) be the number of times of the history input of entry W, Weight pianyi(W) be the skew weight of entry W;
Each corresponding pinyin string candidate word is carried out to descending sort according to its weight order, obtain the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
2. a Chinese input unit, is characterized in that, it comprises: phonetic obtains module, context obtains module, enquiry module, computing module, load module;
Described phonetic obtains module and is used for obtaining pinyin string;
Context acquisition module is for obtaining the context of current input, and described context comprises candidate word that described pinyin string is corresponding and the previous word of current input;
Enquiry module is for obtaining the number of times of history input of weight, this entry of the candidate word corresponding with pinyin string and the skew weight of this entry, the skew weight of this entry refers to the skew weight of the last word of this entry and current input, and skew weight refers to that number of times that entry two entries in mass data occur simultaneously and user input the weighted sum of the number of times of these two entries continuously;
Computing module is for calculating the weight order of this entry according to the number of times of the history input of the skew weight of the weight of each entry, this entry and entry, the formula that calculates weight order is as follows:
Weight paixu(W)=Weight(W)+a×Time(W)+b×Weight pianyi(W)…(1)
Wherein a, b are constants, and W has represented current entry, Weight paixu(W) be the weight order of entry W, Weight (W) is the weight of entry W, Time(W) be the number of times of the history input of entry W, Weight pianyi(W) be the skew weight of entry W;
Load module, for each corresponding pinyin string entry is carried out to descending sort according to its weight order, obtains the final ranking results of the candidate word of input method; And input entry corresponding to this pinyin string according to ranking results.
CN201110020045.1A 2011-01-13 2011-01-13 Method and device for inputting Chinese characters Active CN102591472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110020045.1A CN102591472B (en) 2011-01-13 2011-01-13 Method and device for inputting Chinese characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110020045.1A CN102591472B (en) 2011-01-13 2011-01-13 Method and device for inputting Chinese characters

Publications (2)

Publication Number Publication Date
CN102591472A CN102591472A (en) 2012-07-18
CN102591472B true CN102591472B (en) 2014-06-18

Family

ID=46480268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110020045.1A Active CN102591472B (en) 2011-01-13 2011-01-13 Method and device for inputting Chinese characters

Country Status (1)

Country Link
CN (1) CN102591472B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699766A (en) * 2015-02-15 2015-06-10 浙江理工大学 Implicit attribute mining method integrating word correlation and context deduction

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104823135B (en) * 2012-08-31 2018-01-30 微软技术许可有限责任公司 Personal language model for Input Method Editor
CN103870001B (en) * 2012-12-11 2018-07-10 百度国际科技(深圳)有限公司 A kind of method and electronic device for generating candidates of input method
US9665246B2 (en) * 2013-04-16 2017-05-30 Google Inc. Consistent text suggestion output
US8825474B1 (en) * 2013-04-16 2014-09-02 Google Inc. Text suggestion output using past interaction data
CN104376324A (en) * 2013-08-12 2015-02-25 索尼公司 State detection method and device based on signal processing
CN105653061B (en) * 2015-12-29 2020-03-31 北京京东尚科信息技术有限公司 Entry retrieval and wrong word detection method and system for pinyin input method
CN106557178B (en) * 2016-11-29 2021-03-09 百度国际科技(深圳)有限公司 Method and device for updating entries of input method
CN106896935A (en) * 2017-02-22 2017-06-27 李晓明 Input method
CN106873801A (en) * 2017-02-28 2017-06-20 百度在线网络技术(北京)有限公司 Method and apparatus for generating the combination of the entry in input method dictionary
CN106909232A (en) * 2017-02-28 2017-06-30 百度在线网络技术(北京)有限公司 Method and apparatus for showing candidate entry
CN107273537A (en) * 2017-06-30 2017-10-20 深圳创维数字技术有限公司 One kind search words recommending method, set top box and storage medium
CN109032375B (en) * 2018-06-29 2022-07-19 北京百度网讯科技有限公司 Candidate text sorting method, device, equipment and storage medium
CN110333787A (en) * 2019-04-28 2019-10-15 华为技术有限公司 The method and apparatus for inputting character
CN111309158A (en) * 2020-01-19 2020-06-19 李彦志 Voice-form dual-mode Chinese input method, system, equipment and computer readable storage medium
CN111984132B (en) * 2020-07-07 2021-07-27 北京语言大学 Method and system for inputting information according to context environment
CN113360004A (en) * 2021-07-01 2021-09-07 北京华宇信息技术有限公司 Input method candidate word recommendation method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2202612B1 (en) * 2005-10-14 2018-12-19 BlackBerry Limited Automatic language selection for improving text accuracy
CN101324806B (en) * 2007-06-14 2010-06-23 台达电子工业股份有限公司 Input system and method for mobile search
CN101290632B (en) * 2008-05-30 2011-09-14 北京搜狗科技发展有限公司 Input method for user words participating in intelligent word-making and input method system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699766A (en) * 2015-02-15 2015-06-10 浙江理工大学 Implicit attribute mining method integrating word correlation and context deduction
CN104699766B (en) * 2015-02-15 2018-01-02 浙江理工大学 A kind of implicit attribute method for digging for merging word association relation and context of co-text deduction

Also Published As

Publication number Publication date
CN102591472A (en) 2012-07-18

Similar Documents

Publication Publication Date Title
CN102591472B (en) Method and device for inputting Chinese characters
US20240095252A1 (en) Search result ranking and presentation
CN100530171C (en) Dictionary learning method and devcie
CN104102626B (en) A kind of method for short text Semantic Similarity Measurement
US8077983B2 (en) Systems and methods for character correction in communication devices
CN107562831A (en) A kind of accurate lookup method based on full-text search
JP6122800B2 (en) Electronic device, character string display method, and character string display program
CN105094368A (en) Control method and control device for frequency modulation ordering of input method candidate item
JP2010537286A (en) Creating an area dictionary
CN110134780B (en) Method, device, equipment and computer readable storage medium for generating document abstract
KR101541306B1 (en) Computer enabled method of important keyword extraction, server performing the same and storage media storing the same
KR101668725B1 (en) Latent keyparase generation method and apparatus
CN104102658A (en) Method and device for mining text contents
CN104281275B (en) The input method of a kind of English and device
US20090278853A1 (en) Character input program, character input device, and character input method
CN111159312A (en) Fault-related information auxiliary retrieval method and device, storage medium and electronic equipment
CN105528404A (en) Establishment method and apparatus of seed keyword dictionary, and extraction method and apparatus of keywords
CN104809236B (en) A kind of age of user sorting technique and system based on microblogging
US9104755B2 (en) Ontology enhancement method and system
CN104331483B (en) Zone issue detection method and equipment based on short text data
KR101351555B1 (en) classification-extraction system based meaning for text-mining of large data.
CN101266599B (en) Input method and user terminal
JP2004227037A (en) Field matching device, program therefor, computer readable recording medium, and identical field determination method
JP5575075B2 (en) Representative document selection apparatus and method, program, and computer-readable recording medium
CN116088692B (en) Method and apparatus for presenting candidate character strings and training discriminant models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230414

Address after: Room 501-502, 5/F, Sina Headquarters Scientific Research Building, Block N-1 and N-2, Zhongguancun Software Park, Dongbei Wangxi Road, Haidian District, Beijing, 100193

Patentee after: Sina Technology (China) Co.,Ltd.

Address before: 100080, International Building, No. 58 West Fourth Ring Road, Haidian District, Beijing, 20 floor

Patentee before: Sina.com Technology (China) Co.,Ltd.

TR01 Transfer of patent right