CN110147549A - For executing the method and system of text error correction - Google Patents

For executing the method and system of text error correction Download PDF

Info

Publication number
CN110147549A
CN110147549A CN201910318541.1A CN201910318541A CN110147549A CN 110147549 A CN110147549 A CN 110147549A CN 201910318541 A CN201910318541 A CN 201910318541A CN 110147549 A CN110147549 A CN 110147549A
Authority
CN
China
Prior art keywords
word
error correction
chinese character
target
font
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910318541.1A
Other languages
Chinese (zh)
Inventor
陈召群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910318541.1A priority Critical patent/CN110147549A/en
Publication of CN110147549A publication Critical patent/CN110147549A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Document Processing Apparatus (AREA)

Abstract

This application involves a kind of methods for text error correction, which comprises receives to error correction word;For each target word in target dictionary, the font similarity to error correction word and the target word is calculated;Determine in the target dictionary has the target word of maximum font similarity as candidate target word with described to error correction word;And if the font similarity between error correction word and the candidate target word within the scope of predefined thresholds, is replaced described to error correction word with the candidate target word.The application further relates to intelligent customer service method and method for vertical search and related system and computer storage medium.The application can realize text error correction with less calculation amount.

Description

For executing the method and system of text error correction
Technical field
One or more embodiments of this specification are related to the method and system for executing text error correction.
Background technique
The mistake in text in order to correct user's input, text error correction become more and more important.For example, user is executing When search, the keyword of mistake may be had input, at this time in order to provide correct search result, it usually needs to user's input Keyword executes text error correction automatically, the result wanted in order to provide user.
However, there are various defects for current text error correction scheme, need one kind that can be effectively carried out text error correction Scheme.
Summary of the invention
In order to overcome the drawbacks of the prior art, one or more embodiments of this specification are provided for text error correction Technical solution.
One or more embodiments of this specification are achieved through the following technical solutions its above-mentioned purpose.
In an aspect, a kind of method for text error correction is disclosed, which comprises receive to error correction term Language;For each target word in target dictionary, the font similarity to error correction word and the target word is calculated;Really Have the target word of maximum font similarity as candidate target word to error correction word with described in the fixed target dictionary Language;And if the font similarity between error correction word and the candidate target word in predefined thresholds range It is interior, then it is described to error correction word with candidate target word replacement.
Preferably, calculating the font similarity to error correction word and the target word includes: based on described to error correction The raster font of Chinese character in the raster font of Chinese character in word and the target word calculates the font similarity.
Preferably, calculating the font similarity to error correction word and the target word includes: that calculating is described to error correction The font similarity of each Chinese character Chinese character corresponding to the target word in word;And based on described in error correction word The average value and/or minimum value of the font similarity of each Chinese character Chinese character corresponding to the target word are described to error correction to determine The font similarity of word and the target word.
Preferably, it is similar to calculate the font to each Chinese character corresponding Chinese character to the target word in error correction word Degree comprises determining that the raster font to each Chinese character in error correction word;By each Chinese character in error correction word Raster font and the raster font of corresponding Chinese character of the target word be compared;And determine based on the comparison it is described to The font similarity of each Chinese character Chinese character corresponding to the target word in error correction word.
Preferably, the point of the corresponding Chinese character of the raster font to each Chinese character in error correction word and the target word Battle array font has pixel as much.
It is preferably based on described relatively determining more described to each Chinese character Chinese corresponding to the target word in error correction word The font similarity of word comprises determining that the raster font to each Chinese character in error correction word is corresponding with the target word The quantity of the identical pixel of pixel value between the raster font of Chinese character;And the quantity based on the identical pixel of the pixel value come Determine the font similarity to each Chinese character Chinese character corresponding to the target word in error correction word.
It is preferably based on described relatively determining more described to each Chinese character Chinese corresponding to the target word in error correction word The font similarity of word comprises determining that the raster font to each Chinese character in error correction word is corresponding with the target word The quantity of the identical pixel of pixel value between the raster font of Chinese character;It is described to determine that the quantity of the identical pixel of the pixel value accounts for The ratio of total pixel number amount to all Chinese characters in error correction word;And it is determined based on the ratio described to error correction word In each Chinese character Chinese character corresponding to the target word font similarity.
Preferably, determine that the raster font to each Chinese character in error correction word includes:
For each Chinese character in error correction word, which is expressed as GBK coding;It obtains and uses GBK coding schedule The byte information of the Chinese character shown;The region-position code and offset of the Chinese character are determined based on the byte information;And it is based on Chinese character Offset find the position of type matrix of the Chinese character in dot matrix word library, to obtain the binary data of the Chinese character.
Preferably, the method also includes: for each Chinese character in error correction word, by the binary system of the Chinese character Data carry out successive appraximation to the binary data of the corresponding Chinese character in the target word, to determine the Chinese character and the target word In corresponding Chinese character font similarity.
Preferably, the method also includes: obtain the target dictionary, the target dictionary is by associated with special scenes Word constitute.
In another aspect, disclose a kind of for providing the method for intelligent customer service service, which comprises receive by The intelligent customer service problem that the user provides;Word segmentation processing is executed to the intelligent customer service problem, to obtain the intelligent customer service The multiple words for including in problem;Using as described herein for the method for text error correction to the multiple word execution text Error correction, wherein using dictionary associated with intelligent customer service service as target dictionary;For the intelligence visitor through text error correction It takes problem and intelligent customer service service is provided.
Preferably, the method also includes: to the multiple word execute text error correction before, request user confirmation is It is no to execute text error correction;And only in the case where receiving user to the confirmation for executing text error correction, just to described more A word executes text error correction.
Preferably, the method also includes: inform that the intelligent customer service service is based on described through text error correction to user Intelligent customer service problem provide.
In another aspect, it discloses and the method for vertical search service is provided, which comprises receive by the user The vertical search of offer is inquired;To the vertical search query execution word segmentation processing, wrapped with obtaining in the vertical search inquiry The multiple words included;Text error correction is executed to the multiple word using the method for being used for text error correction as described herein, In using dictionary associated with vertical search service as target dictionary;It is mentioned for the vertical search inquiry through text error correction For search result.
Preferably, the method also includes: to the multiple word execute text error correction before, request user confirmation is It is no to execute text error correction;And only in the case where receiving user to the confirmation for executing text error correction, just to described more A word executes text error correction.
Preferably, the method also includes: inform that the intelligent customer service service is based on described through text error correction to user Intelligent customer service problem provide.
In another further aspect, a kind of system is disclosed, the system comprises the devices for executing the above method.
In another aspect, a kind of computer readable storage medium of store instruction is disclosed, described instruction, which is worked as, to be calculated When machine executes, the computer is made to execute the above method.
Compared with prior art, one or more embodiments of this specification can not need mass data and a large amount of processing In the case where accurate text error correction is provided.
Certainly, any technical solution for implementing the application is not necessarily to while reaching all above-mentioned technical effects.
Detailed description of the invention
The above summary of the invention and following specific embodiment can be better understood when reading in conjunction with the drawings.It needs It is noted that attached drawing is only used as the example of claimed invention.In the accompanying drawings, identical appended drawing reference represent it is identical or Similar element.
Figure 1A -1B shows the example of the raster font of two Chinese characters.
Fig. 2 shows the example flow diagrams according to the method for text error correction of this specification embodiment.
Fig. 3 shows the example flow diagram for calculating the method for font similarity of two words.
Fig. 4 shows the example flow diagram for calculating the method for font similarity of two Chinese characters.
Fig. 5 shows the example flow diagram for illustrating the method for providing intelligent customer service service of embodiment according to this.
Fig. 6 shows the example flow diagram for illustrating the method for providing vertical search service of embodiment according to this.
Specific embodiment
The content of following specific embodiments is enough so that anyone skilled in the art is understood this specification one or more The technology contents of a embodiment are simultaneously implemented accordingly, and according to specification disclosed by this specification, claim and attached drawing, ability Field technique personnel can be readily understood upon the relevant purpose of one or more embodiments and advantage of this specification.
Application context
In many application scenarios, need to execute text error correction to the text that user inputs.
For example, user may input " long " as " lomg " in the case where inputting English, this may be because " n " and Position of " m " two characters on keyboard is closer.
In another example user may input " the Mi month passes " for " meter Yue Chuan " in the case where inputting Chinese.
Example when being input English above or being inputted using spelling input method.Using hand-writing input method or its In the case where its character-shape input method (such as five-stroke input method etc.), it is also possible to malfunction.
For example, user may be incorrectly entered into " white to spill " when wanting input " white wine ".In another example user is wanting When inputting " service charge ", " reading takes " may be incorrectly entered into.
For possible input error, the chance of two progress error correction usually may be present.
One chance is in input.
For example, the automatic error-correcting when inputting text with input method.For example, some English input methods are defeated for English at present Enter to have automatic error correction function.In another example some spelling input methods also can automatically correct wrong input in Chinese.
Another chance is after input.
For example, some websites or application provide search service, user can input keyword (for example, " specially in search box Benefit ") or even natural language sentence (for example, " being what patent ").Certainly, some websites or application allow to mention in specific area For search service.For example, some websites or application allow user to search for medical knowledge, such as pass through the imported disease in search box Or the keyword (such as " tenesmus ") or sentence (for example, " what if having a high fever ") of symptom title.In these cases, it uses There may be mistakes for the input at family, such as " tenesmus " is entered as to " inner urgency is thick and heavy ".
In another example some websites or application provide intelligent assistant or intelligent customer service service.For example, user can be in its input frame The problem of consulting or feedback, is wanted oneself in middle input, automatically provides answer by the website or application.For example, user can be in input frame Middle input keyword (for example, " express delivery ") or sentence (for example, " not receiving express delivery ").Similarly, such service can also needle To specific area.For example, user can input keyword (for example, " recruiting in input frame for the service for being absorbed in financial industry Quotient's fund ") or sentence (for example, " how buying trade and investment promotion fund ").In these cases, user input it is equally possible that there are Mistake, such as " trade and investment promotion fund " is inputted as " trade and investment promotion base is complete ".
In these cases, it is desirable to which correctly service is provided, it may be necessary to the text of input after user inputs text Carry out error correction.
When carrying out error correction to the text of input in input or after input, need using error correction algorithm.It is described below Typical error correction algorithm.
A kind of common error correction algorithm suitable for input in Chinese is that error correction is carried out based on same or similar pronunciation, this The word that user inputs usually is compared by error correction algorithm with the word in text library, and is replaced with the word in text library The word of user's input.For example, " meter Yue Chuan " can be corrected as " the Mi month passes " etc., because the latter appears in conventional dictionary.
It is a kind of suitable for English input common error correction algorithm be the error correction based on editing distance.For example, because alphabetical " o " It is close with the position of alphabetical " p " on keyboard, it is easy to be write " word " mistake as " wprd ".At this point, the editor due to determining the two Distance is 1 and " word " is in conventional dictionary, it is possible to " wprd " are corrected as " word " according to editing distance.For in Text can also be changed into phonetic by text, then carry out error correction according to the editing distance of phonetic code.The shortcomings that this method is can only Handle the similar mistake of pronunciation, such as front and back nasal sound mistake.
Another newer error correction algorithm is the algorithm based on big data or machine learning model, often relies on language Model.This method is by the processing to mass data, so that the rule in human language habit is found, to carry out error correction. For example, based on the speech habits found from big data, " type of motor car is on road " can be modified to that " running car is in road On ".However, this algorithm depends on mass data, and model is complex.
The scheme introduced in the embodiment of this specification is suitable for carrying out error correction to text after input, can also apply In input method.It should be understood that being equally after the completion of text has been enter into, then to when it is applied in input method The text of input carries out error correction, rather than error correction is carried out during input.
Obtain target dictionary
In the embodiment of this specification, it usually needs building obtains target dictionary in other ways.
Preferably, target dictionary can be made of word associated with special scenes.The special scenes may include specific industry Business field (such as financial business, software operation) or specific application (such as payment application).For example, in financial circles, fund name The words such as title, Fund Company's title, stock name, film name, financial circles term may make up target dictionary.If these words Input in include mistake, the intention that may cause user can not be accurately identified.Building is directed to specific business, field or technology The proprietary dictionary of the proprietary word such as scene and in subsequent contrast be used only proprietary dictionary reduce required data volume, improve Treatment effeciency and improve accuracy.
In some embodiments, target dictionary is the general dictionary for including all common words.
The building of target dictionary, which can according to need, to artificially collect.Moreover, target dictionary can be constantly expanded with the time.
Alternatively, target dictionary can otherwise be obtained.For example, the target that the usable other reasons of cause are fabricated Dictionary.For example, the database of the keyword for specific area, such as fund industry may be had existed due to other purposes Database of term etc..At this point, target dictionary can be obtained from such database.
For example, institute can be chosen from fund name database in the case where needing user to input the application scenarios of fund name Some fund names, as target dictionary.Under the application scenarios for the intelligent customer service of special services, it can collect and the spy Surely associated common words are serviced, as target dictionary.In the input scene (example of specific mobile application (such as cell phone application) Input method as being directed to special handset APP) in, common words associated with the specific application can be collected, as target Dictionary.
Pretreatment to text
Firstly, can receive from the user to error correction word.For example, can receive text input from the user, the text It may include one or more in this input to error correction word.
It should be pointed out that in the embodiment of this specification it is received be the word inputted, rather than entangled in input method Word to be confirmed or selection in mistake.For example, in existing stroke input method, user's lettering pen in stroke input area Input is drawn, corresponding Chinese character then is inputted to stroke and makes confirmation.However,
After receiving text input from the user, text can be pre-processed first.For example, being inputted in user Text include multiple Chinese characters or multiple words in the case where, can first to user input text carry out word segmentation processing, so as to It is further divided into multiple individual words or Chinese character.
For example, it is assumed that user has input the amount of increase of full today " trade and investment promotion base ", then it can be subjected to word segmentation processing first, by it Be divided into " trade and investment promotion base is complete ", " today ", " ", the words such as " amount of increase ".
Then, text correction process can be executed to each word.
If user only inputs individual Chinese character or single word, this step be can be omitted.
Chinese character indicates
When handling Chinese character, the raster font of the Chinese character can first determine that.Raster font is also known as " bitmap (bitmap) font ", wherein each font is indicated with two-dimensional array.It needs to understand, the picture in this specification Element is not necessarily referring to occupied screen pixels when Chinese character is shown on the screen, and refers to it and be indicated on the pixel in raster font.
With reference to Figure 1A -1B, which respectively show the pictures that Chinese character " gold " and " complete " are indicated with raster font.Such as Figure 1A and figure Shown in 1B, the font of Chinese character is represented as the two-dimensional array of a 16*16, and wherein each pixel in array can be used Binary value (0 or 1) indicates, such as wherein indicates to constitute the pixel of a part of font with 1, and not structure is indicated with 0 At the pixel of a part of font, or in turn.Correspondingly, " gold " and " complete " word can be expressed as one two in the database Dimension group, or binary sequence can be represented as etc..
For example, " gold " word can be used to lower binary sequence to indicate:
" complete " word can be used to lower binary sequence to indicate:
Raster font can be used all size, for example, matrix size can be 8*8,9*9,10*10,12*12,14*14, 16*16,18*18,24*24,36*36,48*48,72*72,96*96 etc..
Relative to vector fonts, raster font has the prominent advantages that the size of usual each word is identical, and has done pair Together, convenient for processing, and its data volume is small, and computation complexity is low.Hereinafter, using the raster font library of 16*16 as example It is illustrated.
Chinese character, which is converted to raster font for example, may be accomplished by:
Firstly, Chinese character can be converted to GBK coding.Chinese character, which is converted to GBK coding, can be used those skilled in the art Any mode known.
Then, the byte information of the Chinese character with GBK coded representation can be obtained.Wherein, with the Chinese character of GPK coded representation be by Two byte codes, range is A1A1~FEFE, and wherein A1-A9 is sign field, and B0 to F7 is Chinese character area, usually each area There is 94 Chinese characters (Chinese character text or Chinese character symbol).
Then, the region-position code and offset of Chinese character can be determined based on the byte information.Wherein region-position code includes area's code and position Code, area's code is typically stored in the first byte of Chinese character byte information, and position code is typically stored within Chinese character byte information In second byte.Offset can be calculated by following formula:
Offset=(94* (area code -0xa1)+position number -0xa1) * 32.
Then, type matrix of the Chinese character in dot matrix word library (such as HZK16 character library) can be found based on the offset of Chinese character Position, to obtain the binary data of the Chinese character.In general, whether each in the binary data indicates respective pixel It is occupied by the stroke of Chinese character.For example, " 1 " in the binary data indicates that respective pixel is occupied by the stroke of Chinese character, and this two " 0 " in binary data indicates that respective pixel is not occupied by the stroke of Chinese character, or vice versa.
After being converted to binary data, the method for such as successive appraximation can be taken to compare the binary number of two Chinese characters According to, so that it is determined that the similarity of two Chinese characters, as described in detail below.It will be appreciated by one skilled in the art that can be used Other way in addition to binary data indicates Chinese character.
Other schemes well known by persons skilled in the art can be used, the Chinese character received is converted into raster font.
Similarly, each Chinese character in sentence or word word-for-word can also be converted into raster font.
Exemplary algorithm one
Calculate the font similarity of Chinese character
Due to the These characteristics of raster font, it can easily compare the raster font of two Chinese characters, to determine two Chinese characters Font similarity.
It should be noted that Chinese character of the invention may include Chinese character text and Chinese character symbol (such as punctuate of Chinese etc.).
In one embodiment, it is identical that pixel value in the two-dimensional array of the raster font of two Chinese characters can be calculated Pixel number accounts for the ratio of total pixel number, so that it may calculate the similarity of two Chinese characters.Assuming that two pixels are all by the pen of Chinese character Picture occupies, then it is believed that the pixel value of the two pixels is identical, otherwise it is believed that the pixel value of the two pixels is different.
For example, two Chinese characters C and C ' being indicated for the raster font for being all made of 16*16, it is assumed that Chinese character C's and Chinese character C ' There is the pixel value of n pixel identical in the two-dimensional array of raster font, then the similarity char_sim of Chinese character C and Chinese character C ' It can be calculated as follows:
Char_sim (C, C ')=n/ (16*16)=n/256 formula 1
As can be seen that the value range of char_sim is [0,1] (be more than or equal to 0 and be less than or equal to 1), wherein when The raster font of two Chinese characters is identical when the value of char_sim is 1.
For example, the two-dimensional array of " gold " word and the two-dimensional array of full word are respectively illustrated in Figure 1A -1B, In in 16*16 totally 256 pixels, the identical pixel of pixel value has 238, and the different pixel of pixel value has 18, according to upper The formula 1 in face can calculate the similarity of " gold " and " complete " two Chinese characters are as follows:
Char_sim (gold, complete)=238/256 ≈ 0.9297.
It will be appreciated that although this specification use Chinese character as an example, but the method for this specification can be applied to English, number The other Languages such as word or character, as long as the character can be represented as raster font.
Calculate the font similarity of word
By the above-mentioned means, can not only calculate the similarity between two Chinese characters, can also calculate between two words Font similarity.Font similarity between two words can be the raster font of each Chinese character based on two words come It calculates, as being explained in detail below.
Word is being collectively referred to as word and phrase (also known as phrase).Wherein word is the smallest linguistic unit that can independently use, And phrase is the entirety being made of multiple words.Word can be divided into single-morpheme word and compound word, and wherein compound word is by a morpheme structure At word, the similarity for calculating two single-morpheme words for calculating the algorithm of individual character similarity above can be used.For synthesis Following manner can be used to calculate in word.
Firstly, calculating the font similarity of the corresponding Chinese character of each of two words using method described above.
The font calculated between two words subsequently, based on the similarity of corresponding Chinese character each of between two words is similar Degree.
In one embodiment, the minimum value that the similarity of all Chinese characters in word can be used in the similarity of two words is come It calculates.For example, it is assumed that word W is made of in order Chinese character { C1, C2, C3 }, word W ' by Chinese character { C1 ', C2 ', C3 ' } in order Composition, then the similarity word_sim between word W and W ' can be calculated as follows:
Word_sim (W, W ')=min (char_sim (C1, C1 '), char_sim (C2, C2 '), char_sim (C3, C3 ')) formula 2
Wherein min () indicates to take the minimum value in each input value.As can be seen that the value model of above-mentioned similarity word_sim It encloses for [0,1], wherein two words are identical when the value of word_sim is 1.
By above formula, the similarity word_sim between word W and W ' can use in word W and word W ' least similar two Similarity between a Chinese character indicates.For example, it is assumed that two words to be compared are that " trade and investment promotion base is complete " (such as the word can be by User's input) and " trade and investment promotion fund " (such as the word can be in target dictionary), then the similarity of the two words can be by the two In similarity between least similar two Chinese characters (" complete " and " gold ") indicate.For example, as shown above, it is assumed that " complete " and The similarity of " gold " is about 0.9297, then the similarity between word " trade and investment promotion base is complete " and " trade and investment promotion fund " is also denoted as about 0.9297。
Other algorithms can be used in the calculating of Words similarity.For example, the similarity word_sim between word W and W ' may be used also It is calculated as follows:
Word_sim (W, W ')=average (char_sim (C1, C1 '), and char_sim (C2, C2 '), char_sim (C3, C3 ')) formula 3
Wherein average () indicates to take the average value of each input value.Above formula can also be equally expressed as following formula:
Word_sim (W, W ')=sum (char_sim (C1, C1 '), char_sim (C2, C2 '), char_sim (C3, C3 '))/char_count formula 4
Wherein sum () indicates to take the sum of input value, and char_count indicates the total number of word in word.
As can be seen that instead of being indicated with the similarity between two Chinese characters least similar in word W and word W ' The similarity of word, above-mentioned formula indicate word using the average value of the similarity of each Chinese character in word W and word W ' Similarity.For example, it is assumed that the similarity of " complete " and " gold " is 0.9297, then between word " trade and investment promotion base is complete " and " trade and investment promotion fund " Similarity may be expressed as about (1+1+1+0.9297)/4 ≈ 0.9824.
Relative to above example, this example is excellent there are that may have in the case where multiple Chinese characters that need to be corrected in word Gesture.For example, it is assumed that word W, W ' and W " include four Chinese characters, and between W ' and W but similarity different there are two Chinese character compared with It is small, and have that Chinese character is different and similarity is slightly larger between W " and W, word W " and W are calculated according to formula y at this time ' and word Similarity between language W may be more accurate.
It is conceivable that the other way for calculating Words similarity from Chinese character similarity.
Select candidate target word
In order to which the word of input to be corrected as to the word in target dictionary, can calculate in the word and target dictionary of input The similarity of each word.Then, there can be the word of maximum similarity as candidate with to error correction word in target dictionary Target word.
For example, can choose first in target dictionary and include and the word of the identical Chinese character quantity of the Chinese character quantity of word of input Language.For example, it is assumed that input word be " trade and investment promotion fund ", have 4 Chinese characters, then find out first target dictionary (such as finance Industry dictionary) in tool there are four Chinese character word.Then, can seriatim calculate found out word to the similar of error correction word Degree.
Then, the word with maximum similarity may be selected as candidate target word.For example, connecting example, it is assumed that input Word (i.e. to error correction word) be " trade and investment promotion base is complete ", and in target dictionary with to error correction word have maximum similarity word For " trade and investment promotion fund ", then select word " trade and investment promotion fund " as candidate target word.It is similar being calculated using equation 2 above In the case where degree, the similarity of the two words is about 0.9297.The feelings of similarity are being calculated using equation 3 above or 4 Under condition, the similarity of the two words is about 0.9824.
The filtering of font similarity threshold
After determining candidate target word, and by candidate target word and to the similarity between error correction word predefine Threshold range is compared.For example, the predefined thresholds range may be greater than predefined minimum threshold and less than 1.
If similarity is more than or equal to predefined minimum threshold, candidate target word will be replaced with to error correction word.? It is in the case that similarity is 1, i.e., identical as candidate target word to error correction word, candidate mesh can not will be replaced with to error correction word Word is marked, candidate target word can also will be replaced with to error correction word.If similarity is less than predefined thresholds, will not be wait entangle Wrong word replaces with candidate target word.
When using equation 2 above to calculate similarity, the value range of the predefined thresholds usually can be in (0.5,1) In range.Preferably, which is 0.8.It is highly preferred that the predefined thresholds are 0.9.
When using equation 3 above or 4 to calculate similarity, the value range of the predefined thresholds usually can be (0.5,1) in range.Preferably, which is 0.9.It is highly preferred that the predefined thresholds are 0.95.
For example, connecting example, it is assumed that it calculates similarity using formula 2 above, and predefined thresholds is taken as 0.9, and this When word " trade and investment promotion fund " and word " trade and investment promotion base is complete " between similarity be about 0.9297, that is, be greater than the predefined thresholds, this When can determine " trade and investment promotion fund " be corresponding to " trade and investment promotion base is complete " target input word.
Or, it is assumed that it calculates similarity using formula 3 above or formula 4, and predefined thresholds is taken as 0.95, and this When word " trade and investment promotion fund " and word " trade and investment promotion base is complete " between similarity be 0.9824, that is, be greater than the predefined thresholds, at this time It equally can determine that " trade and investment promotion fund " is the target input word corresponding to " trade and investment promotion base is complete ".
Exemplary algorithm two
In above example algorithm one, either Chinese character pattern similarity or word font similarity is to hold first Row normalized (normalizing to its range between [0,1]).In a kind of algorithm two of substitution, normalization can not be executed, But threshold value is scaled when executing threshold filtering.
For example, two Chinese characters C and C ' being indicated for the raster font using 16*16, it is assumed that the point of Chinese character C and Chinese character C ' There is the pixel value of n pixel identical in the two-dimensional array of battle array font, then the similarity char_sim ' of Chinese character C and Chinese character C ' It can be calculated as follows:
Char_sim ' (C, C ')=n formula 5
As can be seen that the value range of char_sim ' is [0,256] (be more than or equal to 0 and be less than or equal to 256), wherein When the value of char_sim ' is 256, the raster font of two Chinese characters is identical.For example, in this way, can determine the Chinese Similarity between word " gold " and " complete " is 238.
It is appreciated that when the raster font of Chinese character uses different sizes, above-mentioned value range then corresponding change.For example, When raster font uses 8*8, value range is [0,64], and when raster font uses 12*12, value range is [0,144], And so on.
For use identical total pixel number two Chinese characters (such as 16*16 matrix total pixel number for 16*16= 256) the identical pixel number of pixel value in the two-dimensional array for the raster font that, can only calculate two Chinese characters, so that it may calculate The relative similarity of two Chinese characters.For example, for using 16*16 raster font indicate Chinese character composition two word W and W ', similarity can be calculated with following formula:
Word_sim ' (W, W ')=min (char_sim ' (C1, C1 '), and char_sim ' (C2, C2 '), char_sim ' (C3, C3 ')) formula 6
Wherein min () indicates to take the minimum value in each input value.As can be seen that word_sim ' value range be [0, 256] (i.e. be more than or equal to 0 and be less than or equal to 256), wherein as word_sim ' value be 256 when two words it is identical. In this way, it can determine that the similarity between word " trade and investment promotion fund " and " trade and investment promotion base is complete " is 238.
It is appreciated that when the raster font of Chinese character uses different sizes, above-mentioned value range then corresponding change.For example, When raster font uses 8*8, value range is [0,64], and when raster font uses 12*12, value range is [0,144], And so on.
Alternatively, other algorithms can be used in the calculating of Words similarity.For example, the similarity word_ between word W and W ' Sim can also be calculated as follows:
Word_sim (W, W ')=average (char_sim (C1, C1 '), and char_sim (C2, C2 '), char_sim (C3, C3 ')) formula 7
Wherein average () indicates to take the average value of each input value.Above formula can also be equally expressed as following formula:
Word_sim (W, W ')=sum (char_sim (C1, C1 '), char_sim (C2, C2 '), char_sim (C3, C3 '))/char_count formula 8
Wherein sum () indicates to take the sum of input value, and char_count indicates the total number of word in word.
Similarly, the value range of word_sim ' be [0,256] (i.e. be more than or equal to 0 and be less than or equal to 256), wherein its In when the value of word_sim ' be 256 when two words it is identical.In this way, word " trade and investment promotion base can be determined Similarity between gold " and " trade and investment promotion base is complete " is 251.5.
It is appreciated that when the raster font of Chinese character uses different sizes, above-mentioned value range then corresponding change.For example, When raster font uses 8*8, value range is [0,64], and when raster font uses 12*12, value range is [0,144], And so on.
In the case where determining similarity using algorithm above, predefined thresholds are different from the threshold value of algorithm one.For example, Used threshold value can be the predefined thresholds in algorithm one multiplied by the pixel number of the type matrix of Chinese character at this time.For example, for adopting With the Chinese character of 16*16 type matrix, threshold value can be the predefined thresholds in algorithm one multiplied by 256;For the Chinese using 8*8 type matrix Word, threshold value can be the predefined thresholds in algorithm one multiplied by 64;And for the Chinese character using 12*12 type matrix, threshold value can To be predefined thresholds in algorithm one multiplied by 144, and so on.
For example, when using equation 5 above to calculate similarity, this is predefined for the Chinese character using 16*16 type matrix The value range of threshold value usually can be in (128,256) range.Preferably, which is 204.8.It is highly preferred that should Predefined thresholds are 230.4.
For the Chinese character using 16*16 type matrix, when using equation 3 above or 4 to calculate similarity, the predefined threshold The value range of value usually can be in (128,256) range.Preferably, which is 230.4.It is highly preferred that this is pre- Defining threshold value is 243.2.
Since the word of identical number of pixels number is all made of to all Chinese characters in most cases when carrying out text-processing Mould indicates, therefore the calculation amount when calculating similarity can be reduced using the case where algorithm two.
In itself, algorithm one and algorithm are second is that identical.
As can be seen which kind of algorithm no matter used, the error correction scheme of this specification embodiment does not all need mass data Semantic analysis is carried out, processing is also very simple, does not need largely to be calculated.
Replacement is to error correction word
With reference to attached drawing x, it illustrates replace word " trade and investment promotion base is complete " Lai Zhihang with target input word " trade and investment promotion fund " The example of intelligent customer service.
As shown in the drawing, it is assumed that user input be " it is complete how to buy trade and investment promotion base ", then can directly in referring now to " how The answer of purchase trade and investment promotion fund ".For example, the semanteme of " how buying trade and investment promotion fund " can be parsed by intelligent algorithm, and from The answer of the problem is searched in database.Then, the answer can be presented to user.
At this time, it is preferable that the prompt that input text has been replaced can be presented to user.For example, can be shown to user " for You find the answer of ' how buying trade and investment promotion fund ', and how the searching that clicks here ' it is complete buys trade and investment promotion base '." in above-mentioned display, " trade and investment promotion fund " and/or " trade and investment promotion base is complete " can be highlighted to understand in user.
Alternatively, the prompt that input text has been replaced can not be presented to user.
In a further mode of operation, user not instead of directly is not inquired in the answer referring now to " how buying trade and investment promotion fund " Whether text error correction is carried out.For example, " whether you want to look for the answer of ' how buying trade and investment promotion fund ' " can be presented to user, And user is requested to select "Yes" or "No", and execute subsequent processing according to the user's choice.For example, if user selects "Yes", then in the answer referring now to " how buying trade and investment promotion fund ";If user selects "No", in referring now to " how buying Trade and investment promotion base is complete " answer.
It will be appreciated that being only the automatic example for executing text error correction above.Other way can be used in those skilled in the art To execute text error correction.
With reference to Fig. 2, it illustrates the example flows according to the method 200 for text error correction of this specification embodiment Figure.
Method 200 can include: in step 202, can receive to error correction word.For example, it is defeated to can receive text from the user Enter.Then, the text can be inputted and executes word segmentation processing, text input is divided into multiple words.Multiple word can wrap Include single-morpheme word (Chinese character) or compound word.
Method 200 may also include that in step 204, for each target word in target dictionary, can calculate described wait entangle The font similarity of wrong word and the target word.Target dictionary can be based on manner described above acquisition.Preferably, Target dictionary can be made of word associated with special scenes.Font similarity for example can be based on to the Chinese character in error correction word Raster font and the target word in the raster font of Chinese character calculate.Calculate the font to error correction word and target word The method of similarity can refer to the description below for Fig. 3.
Method 200 may also include that in step 206, it may be determined that have most with described to error correction word in the target dictionary The target word of big font similarity is as candidate target word.
Method 200 may also include that in step 208, if described between error correction word and the candidate target word Font similarity then can be used the candidate target word replacement described to error correction word in threshold range.
Referring to Fig. 3, it illustrates the exemplary flows calculated to error correction word and the method 300 of the font similarity of target word Cheng Tu.The method for calculating the font similarity of two words can refer to above description.
Specifically, method 300 can include: in step 302, each Chinese character in error correction word can be calculated and be somebody's turn to do The font similarity of the corresponding Chinese character of target word.The example for calculating the font similarity of two Chinese characters can refer to below for figure 4 description.
Method 300 may also include that in step 304, can be based on each Chinese character in error correction word and the target word The average value and/or minimum value of the font similarity of the corresponding Chinese character of language are described to error correction word and the target word to determine Font similarity.
It is described to each Chinese character Chinese character corresponding to the target word in error correction word it illustrates calculating with reference to Fig. 4 Font similarity method 400 example flow diagram.The specific method for calculating the font similarity of two Chinese characters can refer to The corresponding description in face.
Specifically, method 400 can include: in step 402, it may be determined that the point to each Chinese character in error correction word Battle array font.Wherein it is determined that the raster font to each Chinese character in error correction word may be accomplished by.
Firstly, for each Chinese character in error correction word, which can be expressed as to GBK coding.Then, it can obtain Take the byte information of the Chinese character of GBK coded representation.Then, can be determined based on the byte information Chinese character region-position code and Offset.In this way, the position of type matrix of the Chinese character in dot matrix word library can be found based on the offset of Chinese character, to be somebody's turn to do The binary data of Chinese character.
Method 400 may also include that in step 404, can by the raster font to each Chinese character in error correction word and The raster font of the corresponding Chinese character of the target word is compared.In general, the dot matrix to each Chinese character in error correction word The raster font of the corresponding Chinese character of font and the target word has pixel as much.
Method 400 may also include that in step 406, determine based on the comparison described to each Chinese character in error correction word The font similarity of Chinese character corresponding to the target word.
Specifically, can first determine that the raster font to each Chinese character in error correction word in a kind of example The quantity of the identical pixel of pixel value between the raster font of the corresponding Chinese character of the target word.Then, the picture can be based on The quantity of the identical pixel of element value is described to each Chinese character Chinese character corresponding to the target word in error correction word to determine Font similarity.Specifically, by the binary data of the Chinese character and can be somebody's turn to do for each Chinese character in error correction word The binary data of corresponding Chinese character in target word carries out successive appraximation, to determine that the Chinese character is corresponding in the target word The font similarity of Chinese character.
Alternatively, in another example, it can first determine that the raster font to each Chinese character in error correction word The quantity of the identical pixel of pixel value between the raster font of the corresponding Chinese character of the target word.Then, it may be determined that the picture The quantity of the identical pixel of element value accounts for the ratio of the total pixel number amount to all Chinese characters in error correction word.Finally, can base The font similarity to each Chinese character Chinese character corresponding to the target word in error correction word is determined in the ratio.
Referring to Fig. 5, it illustrates the examples for the method 500 for providing intelligent customer service service for illustrating embodiment according to this Flow chart.
Method 500 can include: in step 502, can receive the intelligent customer service problem provided by the user.Intelligent customer service is asked Topic for example can be " it is complete how to buy trade and investment promotion base? ", " this month bill how many? ", " where is express delivery? ", " today weather such as What? " etc..
Method 500 may also include that in step 504, word segmentation processing can be executed to the intelligent customer service problem, described in obtaining The multiple words for including in intelligent customer service problem.Word segmentation processing can be executed with any mode known in the art, no longer superfluous herein It states.For example, " it is complete how to buy trade and investment promotion base? " can be segmented for " how ", " purchase ", " trade and investment promotion base is complete ", "? " equal words, and " this Month bill how many " can be segmented as the words such as " this month ", " bill ", " having ", " how many ".
Method 500 may also include that, and the side that text error correction is used for described in this specification embodiment can be used in step 506 Method executes text error correction to the multiple word, wherein using dictionary associated with intelligent customer service service as target word Library.For example, " it is complete how to buy trade and investment promotion base? " example in, can respectively to " how ", " purchase ", " trade and investment promotion base is complete " execute text This error correction, so as to which " trade and investment promotion base is complete " is replaced with " trade and investment promotion fund ", so that intelligent customer service problem is " how to buy trick by error correction Quotient's fund? ".Executing text error correction for word is advantageous in that the target word that can be made full use of in target dictionary, thus real Now more accurate text error correction.
Alternatively, the text error correction as described in this specification embodiment can be executed by Chinese character one by one, without executing above-mentioned participle Processing.For example, for " it is complete how to buy trade and investment promotion base? " intelligent customer service problem, can for " such as ", " how ", " purchase ", " buying ", " trick ", " quotient ", " base ", " complete " execute text error correction respectively.However, this mode may cannot achieve replacing for " gold " and " complete " It changes.
Method 500 may also include that in step 508, can provide intelligent customer service for the intelligent customer service problem through text error correction Service.For example, can for " how buying trade and investment promotion fund? " this corrected intelligent customer service problem provides intelligent customer service service. For example, the semanteme of " how buying trade and investment promotion fund " can be parsed by intelligent algorithm, and the problem is searched for from database Answer.Then, the answer can be presented to user, such as the specific steps of purchase trade and investment promotion fund can be presented to user.
At this time, it is preferable that the prompt that input text has been replaced can be presented to user.For example, can be shown to user " for Do you find ' how to buy trade and investment promotion fund? ' answer, how the searching that clicks here ' it is complete to buy trade and investment promotion base? ' answer." above-mentioned In display, " trade and investment promotion fund " and/or " trade and investment promotion base is complete " can be highlighted to understand in user.It alternatively, can not be to user The prompt that input text has been replaced is presented.
In a further mode of operation, user not instead of directly is not inquired in the answer referring now to " how buying trade and investment promotion fund " Whether text error correction is carried out.For example, can present to user, " you want to look for answering for ' how buying trade and investment promotion fund? ' Case ", and request user to select "Yes" or "No", and execute subsequent processing according to the user's choice.For example, if user selects "Yes" then executes text error correction, in the answer referring now to " how buying trade and investment promotion fund ";If user selects "No", no Text error correction is executed, still in the answer referring now to " it is complete how to buy trade and investment promotion base ".
Referring to Fig. 6, it illustrates the examples for the method 600 for providing vertical search service for illustrating embodiment according to this Flow chart.
It is similar with method 500, method 600 can include: in step 602, can receive the vertical search provided by the user Inquiry.The search that for example can be each specific field is inquired in vertical search.For example, the vertical search for shopping area, such as The inquiry can be title " Samsung Gai Dongshi mobile phone price " of specific commodity etc..For example, the vertical search for music field, Inquiry for example can be " peninsula Zhou Jielun can " etc..
Method 600 may also include that in step 604, can be to the vertical search query execution word segmentation processing, described in obtaining The multiple words for including in vertical search inquiry.Similar with front, word segmentation processing can be executed with any mode known in the art, Details are not described herein.For example, " Samsung Gai Dongshi mobile phone price " can be segmented as " Samsung ", " Gai Dongshi ", " mobile phone ", " price " Equal words.
Method 600 may also include that, and the side that text error correction is used for described in this specification embodiment can be used in step 606 Method executes text error correction to the multiple word, wherein using dictionary associated with vertical search service as target word Library.For example, target dictionary may include various quotient in the example of the shopping area vertical search of " Samsung Gai Dongshi mobile phone price " The title of product and associated word, such as " price ", " parameter " etc..By text error correction, " Gai Dongshi " can be replaced by " Gai Leshi ".
Alternatively, the text error correction as described in this specification embodiment can be executed by Chinese character one by one, without executing above-mentioned participle Processing.
Method 600 may also include that in step 608, can provide search result for the vertical search inquiry through text error correction. For example, search result can be provided this corrected vertical search inquiry for " Samsung Gai Leshi mobile phone price ".For example, can be to The price of user's offer Samsung Gai Leshi mobile phone.
At this time, it is preferable that the prompt that input text has been replaced can be presented to user.Alternatively, it can not be presented to user The prompt that input text has been replaced.
Equally, before executing text error correction, text error correction to be carried out can be asked the user whether first.Then, it is only connecing In the case where user is received to the confirmation for executing text error correction, text error correction just is executed to the multiple word.
It will be appreciated that the scheme disclosed in the embodiment of this specification can be not only used for intelligent customer service service and vertical Search service, but can be applied to various other application scenarios.
Moreover, disclosed herein as well is a kind of the computer-readable of computer executable instructions including being stored thereon to deposit Storage media, the computer executable instructions make the processor execute each implementation as described herein when being executed by processor The method of example.
In addition, the system includes the side for realizing various embodiments described herein disclosed herein as well is a kind of system The device of method.
It is appreciated that software, firmware or combinations thereof can be used according to the method for one or more embodiments of this specification To realize.
It should be understood that all the embodiments in this specification are described in a progressive manner, phase between each embodiment Same or similar part may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially Its, for device and system embodiment, since it is substantially similar to the method embodiment, so being described relatively simple, phase Place is closed to illustrate referring to the part of embodiment of the method.
It should be understood that above-mentioned be described this specification specific embodiment.Other embodiments are in appended claims In the range of book.In some cases, the movement recorded in detail in the claims or step can be according to different from embodiments Sequence execute and still may be implemented desired result.In addition, process depicted in the drawing is not necessarily required and is shown Particular order or consecutive order be just able to achieve desired result.In some embodiments, multitasking and parallel place It manages also possible or may be advantageous.
It should be understood that being described with singular herein or only being shown that one element is not represented this yuan in the accompanying drawings The quantity of part is limited to one.In addition, individual module can be combined by being described or be shown as separated module or element herein Or element, and multiple modules or element can be split as by being described or be shown as single module or element herein.
It should also be understood that terminology employed herein and form of presentation are only intended to describe, the one or more of this specification is real Applying example should not be limited to these terms and statement.It is not meant to exclude any signal and description using these terms and statement The equivalent features of (or in which part), it should be recognized that various modifications that may be present should also be included in scope of the claims.Its He modifies, variations and alternatives are also likely to be present.Correspondingly, claim should be regarded as covering all these equivalents.
Equally, it should be pointed out that although being described with reference to current specific embodiment, in the art Those of ordinary skill it should be appreciated that more than embodiment be intended merely to illustrate one or more embodiments of this specification, Various equivalent change or replacement can be also made in the case where not being detached from spirit of that invention, therefore, as long as in reality of the invention The variation, modification of above-described embodiment will all be fallen in the range of following claims in matter scope.

Claims (20)

1. a kind of method for text error correction, which is characterized in that the described method includes:
It receives to error correction word;
For each target word in target dictionary, the font similarity to error correction word and the target word is calculated;
Determine in the target dictionary has the target word of maximum font similarity as candidate with described to error correction word Target word;And
If the font similarity between error correction word and the candidate target word within the scope of predefined thresholds, It is described to error correction word with candidate target word replacement.
2. the method as described in claim 1, which is characterized in that calculate the font phase to error correction word and the target word Include: like degree
It is calculated based on the raster font of the Chinese character in the raster font to the Chinese character in error correction word and the target word The font similarity.
3. the method as described in claim 1, which is characterized in that calculate the font phase to error correction word and the target word Include: like degree
Calculate the font similarity to each Chinese character Chinese character corresponding to the target word in error correction word;And
Average value based on the font similarity to each Chinese character Chinese character corresponding to the target word in error correction word And/or minimum value determines the font similarity to error correction word and the target word.
4. method as claimed in claim 3, which is characterized in that calculate each Chinese character and the target in error correction word The font similarity of the corresponding Chinese character of word includes:
Determine the raster font to each Chinese character in error correction word;
By the raster font of the raster font to each Chinese character in error correction word and the corresponding Chinese character of the target word into Row compares;And
The font phase to each Chinese character Chinese character corresponding to the target word in error correction word is determined based on the comparison Like degree.
5. method as claimed in claim 4, which is characterized in that the raster font to each Chinese character in error correction word and The raster font of the corresponding Chinese character of the target word has pixel as much.
6. method as claimed in claim 4, which is characterized in that determine based on the comparison described to each of error correction word The font similarity of Chinese character Chinese character corresponding to the target word includes:
Determine the raster font of the corresponding Chinese character of the raster font and target word to each Chinese character in error correction word Between the identical pixel of pixel value quantity;And
The each Chinese character and the target word in error correction word is determined based on the quantity of the identical pixel of the pixel value The font similarity of the corresponding Chinese character of language.
7. method as claimed in claim 4, which is characterized in that determine based on the comparison described to each of error correction word The font similarity of Chinese character Chinese character corresponding to the target word includes:
Determine the raster font of the corresponding Chinese character of the raster font and target word to each Chinese character in error correction word Between the identical pixel of pixel value quantity;
Determine that the quantity of the identical pixel of the pixel value accounts for the total pixel number amount to all Chinese characters in error correction word Ratio;And
The font to each Chinese character Chinese character corresponding to the target word in error correction word is determined based on the ratio Similarity.
8. method as claimed in claim 4, which is characterized in that determine the dot-matrix to each Chinese character in error correction word Body includes:
For each Chinese character in error correction word, which is expressed as GBK coding;
Obtain the byte information of the Chinese character with GBK coded representation;
The region-position code and offset of the Chinese character are determined based on the byte information;And
The position of type matrix of the Chinese character in dot matrix word library is found, based on the offset of Chinese character to obtain the binary system of the Chinese character Data.
9. method according to claim 8, which is characterized in that the method also includes:
For each Chinese character in error correction word, by the binary data of the Chinese character and the corresponding Chinese in the target word The binary data of word carries out successive appraximation, to determine the font similarity of the Chinese character and the corresponding Chinese character in the target word.
10. the method as described in claim 1, which is characterized in that the method also includes:
The target dictionary is obtained, the target dictionary is made of word associated with special scenes.
11. a kind of for providing the method for intelligent customer service service, which is characterized in that the described method includes:
The intelligent customer service problem provided by the user is provided;
Word segmentation processing is executed to the intelligent customer service problem, to obtain the multiple words for including in the intelligent customer service problem;
Text is executed to the multiple word using the method such as of any of claims 1-10 for text error correction Error correction, wherein using dictionary associated with intelligent customer service service as target dictionary;
Intelligent customer service service is provided for the intelligent customer service problem through text error correction.
12. method as claimed in claim 11, which is characterized in that the method also includes:
Before executing text error correction to the multiple word, request user is confirmed whether text error correction to be executed;And
Only in the case where receiving user to the confirmation for executing text error correction, text just is executed to the multiple word and is entangled It is wrong.
13. method as claimed in claim 11, which is characterized in that the method also includes:
Inform that the intelligent customer service service is provided based on the intelligent customer service problem through text error correction to user.
14. a kind of for providing the method for vertical search service, which is characterized in that the described method includes:
It receives and is inquired by the vertical search that the user provides;
To the vertical search query execution word segmentation processing, to obtain the multiple words for including in the vertical search inquiry;
Text is executed to the multiple word using the method such as of any of claims 1-10 for text error correction Error correction, wherein using dictionary associated with vertical search service as target dictionary;
Search result is provided for the vertical search inquiry through text error correction.
15. method as claimed in claim 14, which is characterized in that the method also includes:
Before executing text error correction to the multiple word, request user is confirmed whether text error correction to be executed;And
Only in the case where receiving user to the confirmation for executing text error correction, text just is executed to the multiple word and is entangled It is wrong.
16. method as claimed in claim 14, which is characterized in that the method also includes:
Described search is informed to user the result is that inquiring to provide based on the vertical search through text error correction.
17. a kind of system for text error correction, which is characterized in that the system comprises:
For receiving the device to error correction word;
For calculating described similar to the font of the target word to error correction word for each target word in target dictionary The device of degree;
For determine in the target dictionary with it is described to error correction word have the target word of maximum font similarity as The device of candidate target word;And
If for the font similarity between error correction word and the candidate target word in predefined thresholds range It is interior, then with the candidate target word replacement device to error correction word.
18. a kind of for providing the system of intelligent customer service service, which is characterized in that the system comprises:
For receiving the device of the intelligent customer service problem provided by the user;
For executing word segmentation processing to the intelligent customer service problem, to obtain the multiple words for including in the intelligent customer service problem Device;
For being executed using such as the method for any of claims 1-10 for text error correction to the multiple word The device of text error correction, wherein using dictionary associated with intelligent customer service service as target dictionary;
For providing the device of intelligent customer service service for the intelligent customer service problem through text error correction.
19. a kind of for providing the system of vertical search service, which is characterized in that the system comprises:
For receiving the device of the vertical search inquiry provided by the user;
For to the vertical search query execution word segmentation processing, to obtain the multiple words for including in the vertical search inquiry Device;
For being executed using such as the method for any of claims 1-10 for text error correction to the multiple word The device of text error correction, wherein using dictionary associated with vertical search service as target dictionary;
For providing the device of search result for the vertical search inquiry through text error correction.
20. a kind of computer readable storage medium of store instruction, described instruction when executed by a computer, makes the computer Execute such as method of any of claims 1-10.
CN201910318541.1A 2019-04-19 2019-04-19 For executing the method and system of text error correction Pending CN110147549A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910318541.1A CN110147549A (en) 2019-04-19 2019-04-19 For executing the method and system of text error correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910318541.1A CN110147549A (en) 2019-04-19 2019-04-19 For executing the method and system of text error correction

Publications (1)

Publication Number Publication Date
CN110147549A true CN110147549A (en) 2019-08-20

Family

ID=67589661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910318541.1A Pending CN110147549A (en) 2019-04-19 2019-04-19 For executing the method and system of text error correction

Country Status (1)

Country Link
CN (1) CN110147549A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674276A (en) * 2019-09-23 2020-01-10 深圳前海微众银行股份有限公司 Robot self-learning method, robot terminal, device and readable storage medium
CN110705536A (en) * 2019-09-24 2020-01-17 北京字节跳动网络技术有限公司 Chinese character recognition error correction method and device, computer readable medium and electronic equipment
CN111967246A (en) * 2020-07-30 2020-11-20 湖南大学 Error correction method for shopping bill recognition result
CN112395864A (en) * 2020-11-26 2021-02-23 北京世纪好未来教育科技有限公司 Text error correction model training method, text error correction method and related device
CN112396049A (en) * 2020-11-19 2021-02-23 平安普惠企业管理有限公司 Text error correction method and device, computer equipment and storage medium
CN112733529A (en) * 2019-10-28 2021-04-30 阿里巴巴集团控股有限公司 Text error correction method and device
CN112883718A (en) * 2021-04-27 2021-06-01 恒生电子股份有限公司 Spelling error correction method and device based on Chinese character sound-shape similarity and electronic equipment
CN113536776A (en) * 2021-06-22 2021-10-22 深圳价值在线信息科技股份有限公司 Confusion statement generation method, terminal device and computer-readable storage medium
CN114328831A (en) * 2021-12-24 2022-04-12 江苏银承网络科技股份有限公司 Bill information identification and error correction method and device
WO2023005293A1 (en) * 2021-07-30 2023-02-02 平安科技(深圳)有限公司 Text error correction method, apparatus, and device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964048A (en) * 2010-07-19 2011-02-02 安徽科大讯飞信息科技股份有限公司 Character recognition method and system
US20140104175A1 (en) * 2012-10-16 2014-04-17 Google Inc. Feature-based autocorrection
CN103853702A (en) * 2012-12-06 2014-06-11 富士通株式会社 Device and method for correcting idiom error in linguistic data
CN108154167A (en) * 2017-12-04 2018-06-12 昆明理工大学 A kind of Chinese character pattern similarity calculating method
CN108280051A (en) * 2018-01-22 2018-07-13 清华大学 Detection method, device and the equipment of error character in a kind of text data
CN108717412A (en) * 2018-06-12 2018-10-30 北京览群智数据科技有限责任公司 Chinese check and correction error correction method based on Chinese word segmentation and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964048A (en) * 2010-07-19 2011-02-02 安徽科大讯飞信息科技股份有限公司 Character recognition method and system
US20140104175A1 (en) * 2012-10-16 2014-04-17 Google Inc. Feature-based autocorrection
CN103853702A (en) * 2012-12-06 2014-06-11 富士通株式会社 Device and method for correcting idiom error in linguistic data
CN108154167A (en) * 2017-12-04 2018-06-12 昆明理工大学 A kind of Chinese character pattern similarity calculating method
CN108280051A (en) * 2018-01-22 2018-07-13 清华大学 Detection method, device and the equipment of error character in a kind of text data
CN108717412A (en) * 2018-06-12 2018-10-30 北京览群智数据科技有限责任公司 Chinese check and correction error correction method based on Chinese word segmentation and system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674276A (en) * 2019-09-23 2020-01-10 深圳前海微众银行股份有限公司 Robot self-learning method, robot terminal, device and readable storage medium
CN110705536A (en) * 2019-09-24 2020-01-17 北京字节跳动网络技术有限公司 Chinese character recognition error correction method and device, computer readable medium and electronic equipment
CN112733529A (en) * 2019-10-28 2021-04-30 阿里巴巴集团控股有限公司 Text error correction method and device
CN112733529B (en) * 2019-10-28 2023-09-29 阿里巴巴集团控股有限公司 Text error correction method and device
CN111967246A (en) * 2020-07-30 2020-11-20 湖南大学 Error correction method for shopping bill recognition result
CN112396049A (en) * 2020-11-19 2021-02-23 平安普惠企业管理有限公司 Text error correction method and device, computer equipment and storage medium
CN112395864A (en) * 2020-11-26 2021-02-23 北京世纪好未来教育科技有限公司 Text error correction model training method, text error correction method and related device
CN112395864B (en) * 2020-11-26 2021-04-06 北京世纪好未来教育科技有限公司 Text error correction model training method, text error correction method and related device
CN112883718A (en) * 2021-04-27 2021-06-01 恒生电子股份有限公司 Spelling error correction method and device based on Chinese character sound-shape similarity and electronic equipment
CN113536776A (en) * 2021-06-22 2021-10-22 深圳价值在线信息科技股份有限公司 Confusion statement generation method, terminal device and computer-readable storage medium
WO2023005293A1 (en) * 2021-07-30 2023-02-02 平安科技(深圳)有限公司 Text error correction method, apparatus, and device, and storage medium
CN114328831A (en) * 2021-12-24 2022-04-12 江苏银承网络科技股份有限公司 Bill information identification and error correction method and device

Similar Documents

Publication Publication Date Title
CN110147549A (en) For executing the method and system of text error correction
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
CN108959246A (en) Answer selection method, device and electronic equipment based on improved attention mechanism
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN113011186B (en) Named entity recognition method, named entity recognition device, named entity recognition equipment and computer readable storage medium
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN111414561B (en) Method and device for presenting information
CN111459977B (en) Conversion of natural language queries
CN110750624A (en) Information output method and device
CN110704586A (en) Information processing method and system
CN111985243B (en) Emotion model training method, emotion analysis device and storage medium
CN114596566B (en) Text recognition method and related device
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
KR20200139008A (en) User intention-analysis based contract recommendation and autocomplete service using deep learning
CN113673432A (en) Handwriting recognition method, touch display device, computer device and storage medium
CN112464927B (en) Information extraction method, device and system
CN112926471A (en) Method and device for identifying image content of business document
WO2022256144A1 (en) Application-specific optical character recognition customization
CN114461806A (en) Training method and device of advertisement recognition model and advertisement shielding method
CN112561530A (en) Transaction flow processing method and system based on multi-model fusion
CN116909435A (en) Data processing method and device, electronic equipment and storage medium
CN117131155A (en) Multi-category identification method, device, electronic equipment and storage medium
Strauß et al. System Description of CITlab's Recognition & Retrieval Engine for ICDAR2017 Competition on Information Extraction in Historical Handwritten Records
CN116225956A (en) Automated testing method, apparatus, computer device and storage medium
CN113837157B (en) Topic type identification method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190820