Specific embodiment
The content of following specific embodiments is enough so that anyone skilled in the art is understood this specification one or more
The technology contents of a embodiment are simultaneously implemented accordingly, and according to specification disclosed by this specification, claim and attached drawing, ability
Field technique personnel can be readily understood upon the relevant purpose of one or more embodiments and advantage of this specification.
Application context
In many application scenarios, need to execute text error correction to the text that user inputs.
For example, user may input " long " as " lomg " in the case where inputting English, this may be because " n " and
Position of " m " two characters on keyboard is closer.
In another example user may input " the Mi month passes " for " meter Yue Chuan " in the case where inputting Chinese.
Example when being input English above or being inputted using spelling input method.Using hand-writing input method or its
In the case where its character-shape input method (such as five-stroke input method etc.), it is also possible to malfunction.
For example, user may be incorrectly entered into " white to spill " when wanting input " white wine ".In another example user is wanting
When inputting " service charge ", " reading takes " may be incorrectly entered into.
For possible input error, the chance of two progress error correction usually may be present.
One chance is in input.
For example, the automatic error-correcting when inputting text with input method.For example, some English input methods are defeated for English at present
Enter to have automatic error correction function.In another example some spelling input methods also can automatically correct wrong input in Chinese.
Another chance is after input.
For example, some websites or application provide search service, user can input keyword (for example, " specially in search box
Benefit ") or even natural language sentence (for example, " being what patent ").Certainly, some websites or application allow to mention in specific area
For search service.For example, some websites or application allow user to search for medical knowledge, such as pass through the imported disease in search box
Or the keyword (such as " tenesmus ") or sentence (for example, " what if having a high fever ") of symptom title.In these cases, it uses
There may be mistakes for the input at family, such as " tenesmus " is entered as to " inner urgency is thick and heavy ".
In another example some websites or application provide intelligent assistant or intelligent customer service service.For example, user can be in its input frame
The problem of consulting or feedback, is wanted oneself in middle input, automatically provides answer by the website or application.For example, user can be in input frame
Middle input keyword (for example, " express delivery ") or sentence (for example, " not receiving express delivery ").Similarly, such service can also needle
To specific area.For example, user can input keyword (for example, " recruiting in input frame for the service for being absorbed in financial industry
Quotient's fund ") or sentence (for example, " how buying trade and investment promotion fund ").In these cases, user input it is equally possible that there are
Mistake, such as " trade and investment promotion fund " is inputted as " trade and investment promotion base is complete ".
In these cases, it is desirable to which correctly service is provided, it may be necessary to the text of input after user inputs text
Carry out error correction.
When carrying out error correction to the text of input in input or after input, need using error correction algorithm.It is described below
Typical error correction algorithm.
A kind of common error correction algorithm suitable for input in Chinese is that error correction is carried out based on same or similar pronunciation, this
The word that user inputs usually is compared by error correction algorithm with the word in text library, and is replaced with the word in text library
The word of user's input.For example, " meter Yue Chuan " can be corrected as " the Mi month passes " etc., because the latter appears in conventional dictionary.
It is a kind of suitable for English input common error correction algorithm be the error correction based on editing distance.For example, because alphabetical " o "
It is close with the position of alphabetical " p " on keyboard, it is easy to be write " word " mistake as " wprd ".At this point, the editor due to determining the two
Distance is 1 and " word " is in conventional dictionary, it is possible to " wprd " are corrected as " word " according to editing distance.For in
Text can also be changed into phonetic by text, then carry out error correction according to the editing distance of phonetic code.The shortcomings that this method is can only
Handle the similar mistake of pronunciation, such as front and back nasal sound mistake.
Another newer error correction algorithm is the algorithm based on big data or machine learning model, often relies on language
Model.This method is by the processing to mass data, so that the rule in human language habit is found, to carry out error correction.
For example, based on the speech habits found from big data, " type of motor car is on road " can be modified to that " running car is in road
On ".However, this algorithm depends on mass data, and model is complex.
The scheme introduced in the embodiment of this specification is suitable for carrying out error correction to text after input, can also apply
In input method.It should be understood that being equally after the completion of text has been enter into, then to when it is applied in input method
The text of input carries out error correction, rather than error correction is carried out during input.
Obtain target dictionary
In the embodiment of this specification, it usually needs building obtains target dictionary in other ways.
Preferably, target dictionary can be made of word associated with special scenes.The special scenes may include specific industry
Business field (such as financial business, software operation) or specific application (such as payment application).For example, in financial circles, fund name
The words such as title, Fund Company's title, stock name, film name, financial circles term may make up target dictionary.If these words
Input in include mistake, the intention that may cause user can not be accurately identified.Building is directed to specific business, field or technology
The proprietary dictionary of the proprietary word such as scene and in subsequent contrast be used only proprietary dictionary reduce required data volume, improve
Treatment effeciency and improve accuracy.
In some embodiments, target dictionary is the general dictionary for including all common words.
The building of target dictionary, which can according to need, to artificially collect.Moreover, target dictionary can be constantly expanded with the time.
Alternatively, target dictionary can otherwise be obtained.For example, the target that the usable other reasons of cause are fabricated
Dictionary.For example, the database of the keyword for specific area, such as fund industry may be had existed due to other purposes
Database of term etc..At this point, target dictionary can be obtained from such database.
For example, institute can be chosen from fund name database in the case where needing user to input the application scenarios of fund name
Some fund names, as target dictionary.Under the application scenarios for the intelligent customer service of special services, it can collect and the spy
Surely associated common words are serviced, as target dictionary.In the input scene (example of specific mobile application (such as cell phone application)
Input method as being directed to special handset APP) in, common words associated with the specific application can be collected, as target
Dictionary.
Pretreatment to text
Firstly, can receive from the user to error correction word.For example, can receive text input from the user, the text
It may include one or more in this input to error correction word.
It should be pointed out that in the embodiment of this specification it is received be the word inputted, rather than entangled in input method
Word to be confirmed or selection in mistake.For example, in existing stroke input method, user's lettering pen in stroke input area
Input is drawn, corresponding Chinese character then is inputted to stroke and makes confirmation.However,
After receiving text input from the user, text can be pre-processed first.For example, being inputted in user
Text include multiple Chinese characters or multiple words in the case where, can first to user input text carry out word segmentation processing, so as to
It is further divided into multiple individual words or Chinese character.
For example, it is assumed that user has input the amount of increase of full today " trade and investment promotion base ", then it can be subjected to word segmentation processing first, by it
Be divided into " trade and investment promotion base is complete ", " today ", " ", the words such as " amount of increase ".
Then, text correction process can be executed to each word.
If user only inputs individual Chinese character or single word, this step be can be omitted.
Chinese character indicates
When handling Chinese character, the raster font of the Chinese character can first determine that.Raster font is also known as " bitmap
(bitmap) font ", wherein each font is indicated with two-dimensional array.It needs to understand, the picture in this specification
Element is not necessarily referring to occupied screen pixels when Chinese character is shown on the screen, and refers to it and be indicated on the pixel in raster font.
With reference to Figure 1A -1B, which respectively show the pictures that Chinese character " gold " and " complete " are indicated with raster font.Such as Figure 1A and figure
Shown in 1B, the font of Chinese character is represented as the two-dimensional array of a 16*16, and wherein each pixel in array can be used
Binary value (0 or 1) indicates, such as wherein indicates to constitute the pixel of a part of font with 1, and not structure is indicated with 0
At the pixel of a part of font, or in turn.Correspondingly, " gold " and " complete " word can be expressed as one two in the database
Dimension group, or binary sequence can be represented as etc..
For example, " gold " word can be used to lower binary sequence to indicate:
" complete " word can be used to lower binary sequence to indicate:
Raster font can be used all size, for example, matrix size can be 8*8,9*9,10*10,12*12,14*14,
16*16,18*18,24*24,36*36,48*48,72*72,96*96 etc..
Relative to vector fonts, raster font has the prominent advantages that the size of usual each word is identical, and has done pair
Together, convenient for processing, and its data volume is small, and computation complexity is low.Hereinafter, using the raster font library of 16*16 as example
It is illustrated.
Chinese character, which is converted to raster font for example, may be accomplished by:
Firstly, Chinese character can be converted to GBK coding.Chinese character, which is converted to GBK coding, can be used those skilled in the art
Any mode known.
Then, the byte information of the Chinese character with GBK coded representation can be obtained.Wherein, with the Chinese character of GPK coded representation be by
Two byte codes, range is A1A1~FEFE, and wherein A1-A9 is sign field, and B0 to F7 is Chinese character area, usually each area
There is 94 Chinese characters (Chinese character text or Chinese character symbol).
Then, the region-position code and offset of Chinese character can be determined based on the byte information.Wherein region-position code includes area's code and position
Code, area's code is typically stored in the first byte of Chinese character byte information, and position code is typically stored within Chinese character byte information
In second byte.Offset can be calculated by following formula:
Offset=(94* (area code -0xa1)+position number -0xa1) * 32.
Then, type matrix of the Chinese character in dot matrix word library (such as HZK16 character library) can be found based on the offset of Chinese character
Position, to obtain the binary data of the Chinese character.In general, whether each in the binary data indicates respective pixel
It is occupied by the stroke of Chinese character.For example, " 1 " in the binary data indicates that respective pixel is occupied by the stroke of Chinese character, and this two
" 0 " in binary data indicates that respective pixel is not occupied by the stroke of Chinese character, or vice versa.
After being converted to binary data, the method for such as successive appraximation can be taken to compare the binary number of two Chinese characters
According to, so that it is determined that the similarity of two Chinese characters, as described in detail below.It will be appreciated by one skilled in the art that can be used
Other way in addition to binary data indicates Chinese character.
Other schemes well known by persons skilled in the art can be used, the Chinese character received is converted into raster font.
Similarly, each Chinese character in sentence or word word-for-word can also be converted into raster font.
Exemplary algorithm one
Calculate the font similarity of Chinese character
Due to the These characteristics of raster font, it can easily compare the raster font of two Chinese characters, to determine two Chinese characters
Font similarity.
It should be noted that Chinese character of the invention may include Chinese character text and Chinese character symbol (such as punctuate of Chinese etc.).
In one embodiment, it is identical that pixel value in the two-dimensional array of the raster font of two Chinese characters can be calculated
Pixel number accounts for the ratio of total pixel number, so that it may calculate the similarity of two Chinese characters.Assuming that two pixels are all by the pen of Chinese character
Picture occupies, then it is believed that the pixel value of the two pixels is identical, otherwise it is believed that the pixel value of the two pixels is different.
For example, two Chinese characters C and C ' being indicated for the raster font for being all made of 16*16, it is assumed that Chinese character C's and Chinese character C '
There is the pixel value of n pixel identical in the two-dimensional array of raster font, then the similarity char_sim of Chinese character C and Chinese character C '
It can be calculated as follows:
Char_sim (C, C ')=n/ (16*16)=n/256 formula 1
As can be seen that the value range of char_sim is [0,1] (be more than or equal to 0 and be less than or equal to 1), wherein when
The raster font of two Chinese characters is identical when the value of char_sim is 1.
For example, the two-dimensional array of " gold " word and the two-dimensional array of full word are respectively illustrated in Figure 1A -1B,
In in 16*16 totally 256 pixels, the identical pixel of pixel value has 238, and the different pixel of pixel value has 18, according to upper
The formula 1 in face can calculate the similarity of " gold " and " complete " two Chinese characters are as follows:
Char_sim (gold, complete)=238/256 ≈ 0.9297.
It will be appreciated that although this specification use Chinese character as an example, but the method for this specification can be applied to English, number
The other Languages such as word or character, as long as the character can be represented as raster font.
Calculate the font similarity of word
By the above-mentioned means, can not only calculate the similarity between two Chinese characters, can also calculate between two words
Font similarity.Font similarity between two words can be the raster font of each Chinese character based on two words come
It calculates, as being explained in detail below.
Word is being collectively referred to as word and phrase (also known as phrase).Wherein word is the smallest linguistic unit that can independently use,
And phrase is the entirety being made of multiple words.Word can be divided into single-morpheme word and compound word, and wherein compound word is by a morpheme structure
At word, the similarity for calculating two single-morpheme words for calculating the algorithm of individual character similarity above can be used.For synthesis
Following manner can be used to calculate in word.
Firstly, calculating the font similarity of the corresponding Chinese character of each of two words using method described above.
The font calculated between two words subsequently, based on the similarity of corresponding Chinese character each of between two words is similar
Degree.
In one embodiment, the minimum value that the similarity of all Chinese characters in word can be used in the similarity of two words is come
It calculates.For example, it is assumed that word W is made of in order Chinese character { C1, C2, C3 }, word W ' by Chinese character { C1 ', C2 ', C3 ' } in order
Composition, then the similarity word_sim between word W and W ' can be calculated as follows:
Word_sim (W, W ')=min (char_sim (C1, C1 '), char_sim (C2, C2 '), char_sim (C3,
C3 ')) formula 2
Wherein min () indicates to take the minimum value in each input value.As can be seen that the value model of above-mentioned similarity word_sim
It encloses for [0,1], wherein two words are identical when the value of word_sim is 1.
By above formula, the similarity word_sim between word W and W ' can use in word W and word W ' least similar two
Similarity between a Chinese character indicates.For example, it is assumed that two words to be compared are that " trade and investment promotion base is complete " (such as the word can be by
User's input) and " trade and investment promotion fund " (such as the word can be in target dictionary), then the similarity of the two words can be by the two
In similarity between least similar two Chinese characters (" complete " and " gold ") indicate.For example, as shown above, it is assumed that " complete " and
The similarity of " gold " is about 0.9297, then the similarity between word " trade and investment promotion base is complete " and " trade and investment promotion fund " is also denoted as about
0.9297。
Other algorithms can be used in the calculating of Words similarity.For example, the similarity word_sim between word W and W ' may be used also
It is calculated as follows:
Word_sim (W, W ')=average (char_sim (C1, C1 '), and char_sim (C2, C2 '), char_sim
(C3, C3 ')) formula 3
Wherein average () indicates to take the average value of each input value.Above formula can also be equally expressed as following formula:
Word_sim (W, W ')=sum (char_sim (C1, C1 '), char_sim (C2, C2 '), char_sim (C3,
C3 '))/char_count formula 4
Wherein sum () indicates to take the sum of input value, and char_count indicates the total number of word in word.
As can be seen that instead of being indicated with the similarity between two Chinese characters least similar in word W and word W '
The similarity of word, above-mentioned formula indicate word using the average value of the similarity of each Chinese character in word W and word W '
Similarity.For example, it is assumed that the similarity of " complete " and " gold " is 0.9297, then between word " trade and investment promotion base is complete " and " trade and investment promotion fund "
Similarity may be expressed as about (1+1+1+0.9297)/4 ≈ 0.9824.
Relative to above example, this example is excellent there are that may have in the case where multiple Chinese characters that need to be corrected in word
Gesture.For example, it is assumed that word W, W ' and W " include four Chinese characters, and between W ' and W but similarity different there are two Chinese character compared with
It is small, and have that Chinese character is different and similarity is slightly larger between W " and W, word W " and W are calculated according to formula y at this time ' and word
Similarity between language W may be more accurate.
It is conceivable that the other way for calculating Words similarity from Chinese character similarity.
Select candidate target word
In order to which the word of input to be corrected as to the word in target dictionary, can calculate in the word and target dictionary of input
The similarity of each word.Then, there can be the word of maximum similarity as candidate with to error correction word in target dictionary
Target word.
For example, can choose first in target dictionary and include and the word of the identical Chinese character quantity of the Chinese character quantity of word of input
Language.For example, it is assumed that input word be " trade and investment promotion fund ", have 4 Chinese characters, then find out first target dictionary (such as finance
Industry dictionary) in tool there are four Chinese character word.Then, can seriatim calculate found out word to the similar of error correction word
Degree.
Then, the word with maximum similarity may be selected as candidate target word.For example, connecting example, it is assumed that input
Word (i.e. to error correction word) be " trade and investment promotion base is complete ", and in target dictionary with to error correction word have maximum similarity word
For " trade and investment promotion fund ", then select word " trade and investment promotion fund " as candidate target word.It is similar being calculated using equation 2 above
In the case where degree, the similarity of the two words is about 0.9297.The feelings of similarity are being calculated using equation 3 above or 4
Under condition, the similarity of the two words is about 0.9824.
The filtering of font similarity threshold
After determining candidate target word, and by candidate target word and to the similarity between error correction word predefine
Threshold range is compared.For example, the predefined thresholds range may be greater than predefined minimum threshold and less than 1.
If similarity is more than or equal to predefined minimum threshold, candidate target word will be replaced with to error correction word.?
It is in the case that similarity is 1, i.e., identical as candidate target word to error correction word, candidate mesh can not will be replaced with to error correction word
Word is marked, candidate target word can also will be replaced with to error correction word.If similarity is less than predefined thresholds, will not be wait entangle
Wrong word replaces with candidate target word.
When using equation 2 above to calculate similarity, the value range of the predefined thresholds usually can be in (0.5,1)
In range.Preferably, which is 0.8.It is highly preferred that the predefined thresholds are 0.9.
When using equation 3 above or 4 to calculate similarity, the value range of the predefined thresholds usually can be
(0.5,1) in range.Preferably, which is 0.9.It is highly preferred that the predefined thresholds are 0.95.
For example, connecting example, it is assumed that it calculates similarity using formula 2 above, and predefined thresholds is taken as 0.9, and this
When word " trade and investment promotion fund " and word " trade and investment promotion base is complete " between similarity be about 0.9297, that is, be greater than the predefined thresholds, this
When can determine " trade and investment promotion fund " be corresponding to " trade and investment promotion base is complete " target input word.
Or, it is assumed that it calculates similarity using formula 3 above or formula 4, and predefined thresholds is taken as 0.95, and this
When word " trade and investment promotion fund " and word " trade and investment promotion base is complete " between similarity be 0.9824, that is, be greater than the predefined thresholds, at this time
It equally can determine that " trade and investment promotion fund " is the target input word corresponding to " trade and investment promotion base is complete ".
Exemplary algorithm two
In above example algorithm one, either Chinese character pattern similarity or word font similarity is to hold first
Row normalized (normalizing to its range between [0,1]).In a kind of algorithm two of substitution, normalization can not be executed,
But threshold value is scaled when executing threshold filtering.
For example, two Chinese characters C and C ' being indicated for the raster font using 16*16, it is assumed that the point of Chinese character C and Chinese character C '
There is the pixel value of n pixel identical in the two-dimensional array of battle array font, then the similarity char_sim ' of Chinese character C and Chinese character C '
It can be calculated as follows:
Char_sim ' (C, C ')=n formula 5
As can be seen that the value range of char_sim ' is [0,256] (be more than or equal to 0 and be less than or equal to 256), wherein
When the value of char_sim ' is 256, the raster font of two Chinese characters is identical.For example, in this way, can determine the Chinese
Similarity between word " gold " and " complete " is 238.
It is appreciated that when the raster font of Chinese character uses different sizes, above-mentioned value range then corresponding change.For example,
When raster font uses 8*8, value range is [0,64], and when raster font uses 12*12, value range is [0,144],
And so on.
For use identical total pixel number two Chinese characters (such as 16*16 matrix total pixel number for 16*16=
256) the identical pixel number of pixel value in the two-dimensional array for the raster font that, can only calculate two Chinese characters, so that it may calculate
The relative similarity of two Chinese characters.For example, for using 16*16 raster font indicate Chinese character composition two word W and
W ', similarity can be calculated with following formula:
Word_sim ' (W, W ')=min (char_sim ' (C1, C1 '), and char_sim ' (C2, C2 '), char_sim '
(C3, C3 ')) formula 6
Wherein min () indicates to take the minimum value in each input value.As can be seen that word_sim ' value range be [0,
256] (i.e. be more than or equal to 0 and be less than or equal to 256), wherein as word_sim ' value be 256 when two words it is identical.
In this way, it can determine that the similarity between word " trade and investment promotion fund " and " trade and investment promotion base is complete " is 238.
It is appreciated that when the raster font of Chinese character uses different sizes, above-mentioned value range then corresponding change.For example,
When raster font uses 8*8, value range is [0,64], and when raster font uses 12*12, value range is [0,144],
And so on.
Alternatively, other algorithms can be used in the calculating of Words similarity.For example, the similarity word_ between word W and W '
Sim can also be calculated as follows:
Word_sim (W, W ')=average (char_sim (C1, C1 '), and char_sim (C2, C2 '), char_sim
(C3, C3 ')) formula 7
Wherein average () indicates to take the average value of each input value.Above formula can also be equally expressed as following formula:
Word_sim (W, W ')=sum (char_sim (C1, C1 '), char_sim (C2, C2 '), char_sim (C3,
C3 '))/char_count formula 8
Wherein sum () indicates to take the sum of input value, and char_count indicates the total number of word in word.
Similarly, the value range of word_sim ' be [0,256] (i.e. be more than or equal to 0 and be less than or equal to 256), wherein its
In when the value of word_sim ' be 256 when two words it is identical.In this way, word " trade and investment promotion base can be determined
Similarity between gold " and " trade and investment promotion base is complete " is 251.5.
It is appreciated that when the raster font of Chinese character uses different sizes, above-mentioned value range then corresponding change.For example,
When raster font uses 8*8, value range is [0,64], and when raster font uses 12*12, value range is [0,144],
And so on.
In the case where determining similarity using algorithm above, predefined thresholds are different from the threshold value of algorithm one.For example,
Used threshold value can be the predefined thresholds in algorithm one multiplied by the pixel number of the type matrix of Chinese character at this time.For example, for adopting
With the Chinese character of 16*16 type matrix, threshold value can be the predefined thresholds in algorithm one multiplied by 256;For the Chinese using 8*8 type matrix
Word, threshold value can be the predefined thresholds in algorithm one multiplied by 64;And for the Chinese character using 12*12 type matrix, threshold value can
To be predefined thresholds in algorithm one multiplied by 144, and so on.
For example, when using equation 5 above to calculate similarity, this is predefined for the Chinese character using 16*16 type matrix
The value range of threshold value usually can be in (128,256) range.Preferably, which is 204.8.It is highly preferred that should
Predefined thresholds are 230.4.
For the Chinese character using 16*16 type matrix, when using equation 3 above or 4 to calculate similarity, the predefined threshold
The value range of value usually can be in (128,256) range.Preferably, which is 230.4.It is highly preferred that this is pre-
Defining threshold value is 243.2.
Since the word of identical number of pixels number is all made of to all Chinese characters in most cases when carrying out text-processing
Mould indicates, therefore the calculation amount when calculating similarity can be reduced using the case where algorithm two.
In itself, algorithm one and algorithm are second is that identical.
As can be seen which kind of algorithm no matter used, the error correction scheme of this specification embodiment does not all need mass data
Semantic analysis is carried out, processing is also very simple, does not need largely to be calculated.
Replacement is to error correction word
With reference to attached drawing x, it illustrates replace word " trade and investment promotion base is complete " Lai Zhihang with target input word " trade and investment promotion fund "
The example of intelligent customer service.
As shown in the drawing, it is assumed that user input be " it is complete how to buy trade and investment promotion base ", then can directly in referring now to " how
The answer of purchase trade and investment promotion fund ".For example, the semanteme of " how buying trade and investment promotion fund " can be parsed by intelligent algorithm, and from
The answer of the problem is searched in database.Then, the answer can be presented to user.
At this time, it is preferable that the prompt that input text has been replaced can be presented to user.For example, can be shown to user " for
You find the answer of ' how buying trade and investment promotion fund ', and how the searching that clicks here ' it is complete buys trade and investment promotion base '." in above-mentioned display,
" trade and investment promotion fund " and/or " trade and investment promotion base is complete " can be highlighted to understand in user.
Alternatively, the prompt that input text has been replaced can not be presented to user.
In a further mode of operation, user not instead of directly is not inquired in the answer referring now to " how buying trade and investment promotion fund "
Whether text error correction is carried out.For example, " whether you want to look for the answer of ' how buying trade and investment promotion fund ' " can be presented to user,
And user is requested to select "Yes" or "No", and execute subsequent processing according to the user's choice.For example, if user selects
"Yes", then in the answer referring now to " how buying trade and investment promotion fund ";If user selects "No", in referring now to " how buying
Trade and investment promotion base is complete " answer.
It will be appreciated that being only the automatic example for executing text error correction above.Other way can be used in those skilled in the art
To execute text error correction.
With reference to Fig. 2, it illustrates the example flows according to the method 200 for text error correction of this specification embodiment
Figure.
Method 200 can include: in step 202, can receive to error correction word.For example, it is defeated to can receive text from the user
Enter.Then, the text can be inputted and executes word segmentation processing, text input is divided into multiple words.Multiple word can wrap
Include single-morpheme word (Chinese character) or compound word.
Method 200 may also include that in step 204, for each target word in target dictionary, can calculate described wait entangle
The font similarity of wrong word and the target word.Target dictionary can be based on manner described above acquisition.Preferably,
Target dictionary can be made of word associated with special scenes.Font similarity for example can be based on to the Chinese character in error correction word
Raster font and the target word in the raster font of Chinese character calculate.Calculate the font to error correction word and target word
The method of similarity can refer to the description below for Fig. 3.
Method 200 may also include that in step 206, it may be determined that have most with described to error correction word in the target dictionary
The target word of big font similarity is as candidate target word.
Method 200 may also include that in step 208, if described between error correction word and the candidate target word
Font similarity then can be used the candidate target word replacement described to error correction word in threshold range.
Referring to Fig. 3, it illustrates the exemplary flows calculated to error correction word and the method 300 of the font similarity of target word
Cheng Tu.The method for calculating the font similarity of two words can refer to above description.
Specifically, method 300 can include: in step 302, each Chinese character in error correction word can be calculated and be somebody's turn to do
The font similarity of the corresponding Chinese character of target word.The example for calculating the font similarity of two Chinese characters can refer to below for figure
4 description.
Method 300 may also include that in step 304, can be based on each Chinese character in error correction word and the target word
The average value and/or minimum value of the font similarity of the corresponding Chinese character of language are described to error correction word and the target word to determine
Font similarity.
It is described to each Chinese character Chinese character corresponding to the target word in error correction word it illustrates calculating with reference to Fig. 4
Font similarity method 400 example flow diagram.The specific method for calculating the font similarity of two Chinese characters can refer to
The corresponding description in face.
Specifically, method 400 can include: in step 402, it may be determined that the point to each Chinese character in error correction word
Battle array font.Wherein it is determined that the raster font to each Chinese character in error correction word may be accomplished by.
Firstly, for each Chinese character in error correction word, which can be expressed as to GBK coding.Then, it can obtain
Take the byte information of the Chinese character of GBK coded representation.Then, can be determined based on the byte information Chinese character region-position code and
Offset.In this way, the position of type matrix of the Chinese character in dot matrix word library can be found based on the offset of Chinese character, to be somebody's turn to do
The binary data of Chinese character.
Method 400 may also include that in step 404, can by the raster font to each Chinese character in error correction word and
The raster font of the corresponding Chinese character of the target word is compared.In general, the dot matrix to each Chinese character in error correction word
The raster font of the corresponding Chinese character of font and the target word has pixel as much.
Method 400 may also include that in step 406, determine based on the comparison described to each Chinese character in error correction word
The font similarity of Chinese character corresponding to the target word.
Specifically, can first determine that the raster font to each Chinese character in error correction word in a kind of example
The quantity of the identical pixel of pixel value between the raster font of the corresponding Chinese character of the target word.Then, the picture can be based on
The quantity of the identical pixel of element value is described to each Chinese character Chinese character corresponding to the target word in error correction word to determine
Font similarity.Specifically, by the binary data of the Chinese character and can be somebody's turn to do for each Chinese character in error correction word
The binary data of corresponding Chinese character in target word carries out successive appraximation, to determine that the Chinese character is corresponding in the target word
The font similarity of Chinese character.
Alternatively, in another example, it can first determine that the raster font to each Chinese character in error correction word
The quantity of the identical pixel of pixel value between the raster font of the corresponding Chinese character of the target word.Then, it may be determined that the picture
The quantity of the identical pixel of element value accounts for the ratio of the total pixel number amount to all Chinese characters in error correction word.Finally, can base
The font similarity to each Chinese character Chinese character corresponding to the target word in error correction word is determined in the ratio.
Referring to Fig. 5, it illustrates the examples for the method 500 for providing intelligent customer service service for illustrating embodiment according to this
Flow chart.
Method 500 can include: in step 502, can receive the intelligent customer service problem provided by the user.Intelligent customer service is asked
Topic for example can be " it is complete how to buy trade and investment promotion base? ", " this month bill how many? ", " where is express delivery? ", " today weather such as
What? " etc..
Method 500 may also include that in step 504, word segmentation processing can be executed to the intelligent customer service problem, described in obtaining
The multiple words for including in intelligent customer service problem.Word segmentation processing can be executed with any mode known in the art, no longer superfluous herein
It states.For example, " it is complete how to buy trade and investment promotion base? " can be segmented for " how ", " purchase ", " trade and investment promotion base is complete ", "? " equal words, and " this
Month bill how many " can be segmented as the words such as " this month ", " bill ", " having ", " how many ".
Method 500 may also include that, and the side that text error correction is used for described in this specification embodiment can be used in step 506
Method executes text error correction to the multiple word, wherein using dictionary associated with intelligent customer service service as target word
Library.For example, " it is complete how to buy trade and investment promotion base? " example in, can respectively to " how ", " purchase ", " trade and investment promotion base is complete " execute text
This error correction, so as to which " trade and investment promotion base is complete " is replaced with " trade and investment promotion fund ", so that intelligent customer service problem is " how to buy trick by error correction
Quotient's fund? ".Executing text error correction for word is advantageous in that the target word that can be made full use of in target dictionary, thus real
Now more accurate text error correction.
Alternatively, the text error correction as described in this specification embodiment can be executed by Chinese character one by one, without executing above-mentioned participle
Processing.For example, for " it is complete how to buy trade and investment promotion base? " intelligent customer service problem, can for " such as ", " how ", " purchase ", " buying ",
" trick ", " quotient ", " base ", " complete " execute text error correction respectively.However, this mode may cannot achieve replacing for " gold " and " complete "
It changes.
Method 500 may also include that in step 508, can provide intelligent customer service for the intelligent customer service problem through text error correction
Service.For example, can for " how buying trade and investment promotion fund? " this corrected intelligent customer service problem provides intelligent customer service service.
For example, the semanteme of " how buying trade and investment promotion fund " can be parsed by intelligent algorithm, and the problem is searched for from database
Answer.Then, the answer can be presented to user, such as the specific steps of purchase trade and investment promotion fund can be presented to user.
At this time, it is preferable that the prompt that input text has been replaced can be presented to user.For example, can be shown to user " for
Do you find ' how to buy trade and investment promotion fund? ' answer, how the searching that clicks here ' it is complete to buy trade and investment promotion base? ' answer." above-mentioned
In display, " trade and investment promotion fund " and/or " trade and investment promotion base is complete " can be highlighted to understand in user.It alternatively, can not be to user
The prompt that input text has been replaced is presented.
In a further mode of operation, user not instead of directly is not inquired in the answer referring now to " how buying trade and investment promotion fund "
Whether text error correction is carried out.For example, can present to user, " you want to look for answering for ' how buying trade and investment promotion fund? '
Case ", and request user to select "Yes" or "No", and execute subsequent processing according to the user's choice.For example, if user selects
"Yes" then executes text error correction, in the answer referring now to " how buying trade and investment promotion fund ";If user selects "No", no
Text error correction is executed, still in the answer referring now to " it is complete how to buy trade and investment promotion base ".
Referring to Fig. 6, it illustrates the examples for the method 600 for providing vertical search service for illustrating embodiment according to this
Flow chart.
It is similar with method 500, method 600 can include: in step 602, can receive the vertical search provided by the user
Inquiry.The search that for example can be each specific field is inquired in vertical search.For example, the vertical search for shopping area, such as
The inquiry can be title " Samsung Gai Dongshi mobile phone price " of specific commodity etc..For example, the vertical search for music field,
Inquiry for example can be " peninsula Zhou Jielun can " etc..
Method 600 may also include that in step 604, can be to the vertical search query execution word segmentation processing, described in obtaining
The multiple words for including in vertical search inquiry.Similar with front, word segmentation processing can be executed with any mode known in the art,
Details are not described herein.For example, " Samsung Gai Dongshi mobile phone price " can be segmented as " Samsung ", " Gai Dongshi ", " mobile phone ", " price "
Equal words.
Method 600 may also include that, and the side that text error correction is used for described in this specification embodiment can be used in step 606
Method executes text error correction to the multiple word, wherein using dictionary associated with vertical search service as target word
Library.For example, target dictionary may include various quotient in the example of the shopping area vertical search of " Samsung Gai Dongshi mobile phone price "
The title of product and associated word, such as " price ", " parameter " etc..By text error correction, " Gai Dongshi " can be replaced by
" Gai Leshi ".
Alternatively, the text error correction as described in this specification embodiment can be executed by Chinese character one by one, without executing above-mentioned participle
Processing.
Method 600 may also include that in step 608, can provide search result for the vertical search inquiry through text error correction.
For example, search result can be provided this corrected vertical search inquiry for " Samsung Gai Leshi mobile phone price ".For example, can be to
The price of user's offer Samsung Gai Leshi mobile phone.
At this time, it is preferable that the prompt that input text has been replaced can be presented to user.Alternatively, it can not be presented to user
The prompt that input text has been replaced.
Equally, before executing text error correction, text error correction to be carried out can be asked the user whether first.Then, it is only connecing
In the case where user is received to the confirmation for executing text error correction, text error correction just is executed to the multiple word.
It will be appreciated that the scheme disclosed in the embodiment of this specification can be not only used for intelligent customer service service and vertical
Search service, but can be applied to various other application scenarios.
Moreover, disclosed herein as well is a kind of the computer-readable of computer executable instructions including being stored thereon to deposit
Storage media, the computer executable instructions make the processor execute each implementation as described herein when being executed by processor
The method of example.
In addition, the system includes the side for realizing various embodiments described herein disclosed herein as well is a kind of system
The device of method.
It is appreciated that software, firmware or combinations thereof can be used according to the method for one or more embodiments of this specification
To realize.
It should be understood that all the embodiments in this specification are described in a progressive manner, phase between each embodiment
Same or similar part may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially
Its, for device and system embodiment, since it is substantially similar to the method embodiment, so being described relatively simple, phase
Place is closed to illustrate referring to the part of embodiment of the method.
It should be understood that above-mentioned be described this specification specific embodiment.Other embodiments are in appended claims
In the range of book.In some cases, the movement recorded in detail in the claims or step can be according to different from embodiments
Sequence execute and still may be implemented desired result.In addition, process depicted in the drawing is not necessarily required and is shown
Particular order or consecutive order be just able to achieve desired result.In some embodiments, multitasking and parallel place
It manages also possible or may be advantageous.
It should be understood that being described with singular herein or only being shown that one element is not represented this yuan in the accompanying drawings
The quantity of part is limited to one.In addition, individual module can be combined by being described or be shown as separated module or element herein
Or element, and multiple modules or element can be split as by being described or be shown as single module or element herein.
It should also be understood that terminology employed herein and form of presentation are only intended to describe, the one or more of this specification is real
Applying example should not be limited to these terms and statement.It is not meant to exclude any signal and description using these terms and statement
The equivalent features of (or in which part), it should be recognized that various modifications that may be present should also be included in scope of the claims.Its
He modifies, variations and alternatives are also likely to be present.Correspondingly, claim should be regarded as covering all these equivalents.
Equally, it should be pointed out that although being described with reference to current specific embodiment, in the art
Those of ordinary skill it should be appreciated that more than embodiment be intended merely to illustrate one or more embodiments of this specification,
Various equivalent change or replacement can be also made in the case where not being detached from spirit of that invention, therefore, as long as in reality of the invention
The variation, modification of above-described embodiment will all be fallen in the range of following claims in matter scope.