Summary of the invention
The embodiment of the present application provides a kind of Search Hints method and system, be used for to solve the Search Hints information that prior art exists and the misspelling information is inaccurate or option problem seldom.
The embodiment of the present application one provides a kind of method that makes up the prompting dictionary, specifically comprises:
Based on original dictionary, generate original even numbers group dictionary by even numbers group algorithm;
Based on described original even numbers group dictionary, generate the Search Hints dictionary; And/or
Based on described original even numbers group dictionary, generate misspelling prompting dictionary.
Wherein, described based on described original even numbers group dictionary as the method that embodiment one provides, generate the Search Hints dictionary, specifically comprise:
Based on described original even numbers group dictionary, generate and treat sorted search prompting dictionary;
By the first marking standard, point out the candidate's cue in the dictionary to give a mark to the described sorted search for the treatment of, obtain the score value of each candidate's cue;
Press score value from high to low, the described candidate's cue that sorts obtains the Search Hints dictionary.
Describedly press score value from high to low, the described candidate's cue that sorts obtains the Search Hints dictionary, specifically comprises:
Press score value from high to low, described candidate's cue sorts;
From described candidate's cue, obtain score value greater than candidate's cue of a default score value;
Described score value is preset candidate's cue of score value as described Search Hints dictionary greater than one.
As embodiment one described method, described based on described original even numbers group dictionary, generate misspelling prompting dictionary, specifically comprise:
Based on described original even numbers group dictionary, generate with parallel the waiting of the described original even numbers group dictionary misspelling that sorts and point out dictionary;
By the second marking standard, the candidate's cue in the described misspelling dictionary of waiting to sort is given a mark, obtain the score value of each candidate's cue;
Press score value from high to low, the described candidate's cue that sorts obtains misspelling prompting dictionary.
Wherein, describedly press score value from high to low, the described candidate's cue that sorts obtains misspelling prompting dictionary, specifically comprises:
Press score value from high to low, described candidate's cue sorts;
From described candidate's cue, obtain score value greater than candidate's cue of a default score value;
Point out dictionary greater than candidate's cue of a default score value as described misspelling with described score value.
, generate after the Search Hints dictionary based on described original even numbers group dictionary described, described method also comprises:
Based on the search information of user's input, upgrade described Search Hints dictionary, with the Search Hints dictionary that obtains to upgrade.
, generate after the misspelling prompting dictionary based on described original even numbers group dictionary described, described method also comprises:
Based on the search information of user's input, upgrade described misspelling prompting dictionary, with the misspelling prompting dictionary that obtains to upgrade.
The embodiment of the present application two provides a kind of Search Hints method, and this method specifically comprises:
Obtain the search information of user's input;
Judge whether described search information is legal;
When described search information is legal, based on the Search Hints dictionary of the generation of the method among the embodiment one, and return the Search Hints word corresponding with described search information;
When described search information is illegal, points out dictionary based on the misspelling that the method among the embodiment one generates, and return at least two the miscue information corresponding with described search information.
The embodiment of the present application three provides a kind of Search Hints system, specifically comprises:
Obtain the unit, be used for obtaining the search information of user's input;
Judging unit is used for judging whether described search information is legal;
The Search Hints word returns the unit, is used for based on the Search Hints dictionary of the generation of the method among the embodiment one, and returning the Search Hints word corresponding with described search information when described search information is legal;
Miscue information is returned the unit, is used for when described search information is illegal, points out dictionary based on the misspelling that the method among the embodiment one generates, and returns at least two the miscue information corresponding with described search information.
The application has following beneficial effect or advantage by the technical scheme that above-mentioned one or more embodiment provide:
Owing to built the prompting dictionary based on the even numbers fabric, when Search Hints was inquired about, efficient was very high;
Search Hints word and miscue information are sorted, can demonstrate at least two Search Hints words or at least two miscue information in the time of demonstration, be convenient to the user and select;
Search word and the frequency of system's meeting recording user input, timing automatic is upgraded back-end data, Search Hints word and miscue information is resequenced, in order to point out out Search Hints word and the miscue information more corresponding with user search information.
Embodiment
The embodiment of the present application provides a kind of Search Hints method and system, be used for to solve the Search Hints information that prior art exists and the misspelling information is inaccurate or option problem seldom.
Below in conjunction with Figure of description and concrete embodiment technique scheme is described in detail and sets forth.
Embodiment 1
As shown in Figure 1, the embodiment of the present application one provides a kind of method that makes up the prompting dictionary, specifically comprises:
Step 101 based on original dictionary, generates original even numbers group dictionary by even numbers group algorithm;
Step 102 based on original even numbers group dictionary, generates the Search Hints dictionary; And/or
Step 103 based on original even numbers group dictionary, generates misspelling prompting dictionary.
For step 101, based on original dictionary, generate original even numbers group dictionary by even numbers group algorithm, its building process is as follows:
Traveling through at first that whole dictionary is converted into all words with the byte is the Trie tree (dictionary tree) of unit.
Make up the even numbers group then, building process mainly contains following step:
1, initialization active node tabulation adds tabulation to the ground floor node.With base[] array and check[] element of array is initialized as 0, as base[i]=check[i]=represent that this position is for empty 0 the time, as base[i] value be expressed as a state that can finish during for negative value, especially as base[i]<represent that this node is leaf node during 0 and abs (base[i])=i.
If 2 effort scales are not empty, choose in the Trie tree directly that the maximum node of son node number is present node, finish otherwise make up.
3, visit the subscript position that this node determines its base value and determines direct child node, making the value of each direct child node in the check array is the subscript value of present node.
4, current direct child node is added effort scale, repeating step 2.
As shown in Figure 2, the method as embodiment one provides for step 102, based on original even numbers group dictionary, generates the Search Hints dictionary, specifically comprises:
Step s102a based on original even numbers group dictionary, generates and treats sorted search prompting dictionary; Word prefix according to user input provides complete cue, because the identical word of prefix has a lot, so need sort from high to low to the identical word of prefix according to the marking standard, generates and treats that sorted search points out dictionary.
Step s102b, by the first marking standard, candidate's cue for the treatment of in the sorted search prompting dictionary is given a mark, and obtains the score value of each candidate's cue;
The first marking standard comprises following factor:
1, the word frequency nCorpus of this word in language material, weight is CorpusWeight;
The frequency nHour that this word is imported by the user in 2, one hours, weight is HourWeight;
The frequency nDay that this word is imported by the user in 3, one days, weight is DayWeight;
The frequency nMonth that this word is imported by the user in 4, one months, weight is MonthWeight.
The marking formula is:
Score?1=CorpusWeight*nCorpus+HourWeight*nHour+DayWeight*nDay+
MonthWeight*nMonth。
Step s102c presses score value from high to low, and ordering candidate cue obtains the Search Hints dictionary.
Step s102c specifically comprises:
Press score value from high to low, described candidate's cue sorts;
From described candidate's cue, obtain score value greater than candidate's cue of a default score value;
Described score value is preset candidate's cue of score value as described Search Hints dictionary greater than one.
Obtain after the mark of candidate's cue, suppose that this default score value is X, then the candidate cue of score value greater than the X correspondence extracted, as the cue in the Search Hints dictionary.
As shown in Figure 3, as embodiment one described method, step 103, based on described original even numbers group dictionary, generate misspelling prompting dictionary, the realization principle of misspelling prompting dictionary is with the realization principle of Search Hints dictionary, make up misspelling prompting dictionary based on even numbers group dictionary, the prompting dictionary is by phonetic--cue is corresponding, and with the pronunciation ID generation Chinese character-pronunciation ID dictionary of Hanzi internal code correspondence, it realizes that principle is integer array that length is Length=65536 (1<<16) of application, its subscript value is the ISN of Chinese character correspondence, element value is pronunciation ID, and the method that it generates misspelling prompting dictionary specifically comprises:
Step s103a based on described original even numbers group dictionary, generates with parallel the waiting of the described original even numbers group dictionary misspelling that sorts and points out dictionary;
Step s103b by the second marking standard, gives a mark to the candidate's cue in the described misspelling dictionary of waiting to sort, and obtains the score value of each candidate's cue; The second marking standard is identical with the first marking standard, and the marking standard of each word is comprised following factor:
1, the word frequency nCorpus of this word in language material, weight is CorpusWeight;
The frequency nHour that this word is imported by the user in 2, one hours, weight is HourWeight;
The frequency nDay that this word is imported by the user in 3, one days, weight is DayWeight;
The frequency nMonth that this word is imported by the user in 4, one months, weight is MonthWeight.
The marking formula is:
Score?2=CorpusWeight*nCorpus+HourWeight*nHour+DayWeight*nDay+
MonthWeight*nMonth。
Step s103c presses score value from high to low, and the described candidate's cue that sorts obtains misspelling prompting dictionary.
Pronunciation ID string in the misspelling prompting dictionary is pronunciation to be converted into ID represent, the benefit of doing like this is can not produce to search ambiguity, information such as user's input is " Xian ", just can not judge only according to phonetic what corresponding Chinese character is, can be " line earlier; existing ... ", also can be " Xi'an ".We are converted into corresponding ID with the phonetic of individual character in order not produce this class problem, so just can avoid the problem of producing ambiguity as " xi=1, an=2, xian=3 ".The phonetic of all words is set up the even numbers group after being converted into pronunciation ID string, generates parallel with it the 3rd array misspelling prompting dictionary simultaneously and preserves the misspelling information.
Wherein, describedly press score value from high to low, the described candidate's cue that sorts obtains misspelling prompting dictionary, specifically comprises:
Press score value from high to low, described candidate's cue sorts;
From described candidate's cue, obtain score value greater than candidate's cue of a default score value;
Point out dictionary greater than candidate's cue of a default score value as described misspelling with described score value.
If the user imports " hundred cross ", after system's marking ordering is finished, if the cue order Baidu from high to low that obtains, ferry-boat, Baidu, system can point out out so: 1, Baidu; 2, ferry-boat; 3 cue such as have the honour to read, and select for the user, have so just avoided the user to cause the Query Result failure because of the mistake input.
, generate after the Search Hints dictionary based on described original even numbers group dictionary described, described method also comprises:
Based on the search information of user's input, upgrade described Search Hints dictionary, with the Search Hints dictionary that obtains to upgrade.
, generate after the misspelling prompting dictionary based on described original even numbers group dictionary described, described method also comprises:
Based on the search information of user's input, upgrade described misspelling prompting dictionary, with the misspelling prompting dictionary that obtains to upgrade.
The information of user's input is recorded in the internal memory, write file once every ten minutes, expansion user in record Bo'ao imported word frequency in one hour, the word frequency of input in a day, the word frequency of an input in month etc.Automatically upgrade the relevant lexicon file of all Search Hints every one hour system then, regenerate the Search Hints dictionary, upgrade former internal storage data district; And upgrade all lexicon files relevant with misspelling automatically every one hour system, regenerate misspelling prompting dictionary, upgrade former internal storage data district.
Embodiment 2
As shown in Figure 4, the embodiment of the present application two provides a kind of Search Hints method, and this method specifically comprises:
Step 201 obtains the search information that the user imports;
Step 202 judges whether described search information is legal;
Step 203 when described search information is legal, based on the Search Hints dictionary that generates, and is returned the Search Hints word corresponding with described search information;
Step 204 when described search information is illegal, based on the misspelling prompting dictionary that generates, and is returned at least two the miscue information corresponding with described search information.
The generation method of its Search Hints dictionary and misspelling prompting dictionary was introduced in embodiment 1, did not repeat them here.
Embodiment 3
As shown in Figure 5, the embodiment of the present application three provides a kind of Search Hints system, it is characterized in that, comprising:
Obtain unit 301, be used for obtaining the search information of user's input;
Judging unit 302 is used for judging whether described search information is legal;
The Search Hints word returns unit 303, is used for based on the Search Hints dictionary that generates, and returning the Search Hints word corresponding with described search information when described search information is legal;
Miscue information is returned unit 304, is used for when described search information is illegal, based on the misspelling prompting dictionary that generates, and returns at least two the miscue information corresponding with described search information.
Equally, the generation method of its Search Hints dictionary and misspelling prompting dictionary was introduced in embodiment 1, did not repeat them here.
The application has following beneficial effect or advantage by the technical scheme that above-mentioned one or more embodiment provide:
Owing to built the prompting dictionary based on the even numbers fabric, when Search Hints is inquired about, improved search efficiency;
Search Hints word and miscue information are sorted, can demonstrate at least two Search Hints words or at least two miscue information in the time of demonstration, be convenient to the user and select;
Search word and the frequency of system's meeting recording user input, timing automatic is upgraded back-end data, Search Hints word and miscue information is resequenced, in order to point out out Search Hints word and the miscue information more corresponding with user search information.
Although described the preferred embodiments of the present invention, in a single day those skilled in the art get the basic creative concept of cicada, then can make other change and modification to these embodiment.So claims are intended to all changes and the modification that are interpreted as comprising preferred embodiment and fall into the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.