CN1116343A - Chinese wrongly writen character automatic correcting method and device - Google Patents

Chinese wrongly writen character automatic correcting method and device Download PDF

Info

Publication number
CN1116343A
CN1116343A CN 94109394 CN94109394A CN1116343A CN 1116343 A CN1116343 A CN 1116343A CN 94109394 CN94109394 CN 94109394 CN 94109394 A CN94109394 A CN 94109394A CN 1116343 A CN1116343 A CN 1116343A
Authority
CN
China
Prior art keywords
word
literal
scoring
wrongly written
candidate character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 94109394
Other languages
Chinese (zh)
Other versions
CN1056933C (en
Inventor
张照煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial Technology Research Institute ITRI
Original Assignee
Industrial Technology Research Institute ITRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial Technology Research Institute ITRI filed Critical Industrial Technology Research Institute ITRI
Priority to CN94109394A priority Critical patent/CN1056933C/en
Publication of CN1116343A publication Critical patent/CN1116343A/en
Application granted granted Critical
Publication of CN1056933C publication Critical patent/CN1056933C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

Said method compares the selected words with original words in text to find wrongly written or mispronounced charaters and provide right words. Said invention has practical value in Chinese words process.

Description

Chinese wrongly writen character automatic correcting method and device
The present invention is relevant for a kind of Chinese wrongly writen character automatic correcting method and device, particularly relevant for utilizing comprehensive approximate word collection replacement and language model marking mode, make font, word sound, the meaning of word or the word collection close produce candidate character string with input code, and find out the highest candidate character string of scoring, so that obtain the Chinese wrongly writen character automatic correcting method and the device of correct word.
" wrongly written character " former finger one Chinese words is owing to increase and decrease, change the erroneous word that stroke or radical misplace are caused, " malapropism " then refers to the situation of misapplying his word without certain word, also has the people to contain " malapropism " with " wrongly written character " speech now, below is referred to as " wrongly written or mispronounced characters ".
The number of wrongly written or mispronounced characters has a strong impact on the quality of document, tradition with an artificial school again the school original text in school correct, waste time and energy and often have and leak the school situation, as the newspapers and magazines book of general many schools publication, still common malapropism is grown thickly.In recent years because popularizing of computer though exempted the erroneous word that the stroke mistake causes through the document of input computer, produces the mistake that causes owing to input process also thereupon.So utilize computer to detect automatically and to correct the demand of wrongly written or mispronounced characters very urgent really.
" detection wrongly written or mispronounced characters " refers to find out the place of wrongly written or mispronounced characters in the document, and " correcting wrongly written or mispronounced characters " then refers to find out the correct corresponding word of this wrongly written or mispronounced characters.Known techniques only has the function that detects and do not correct as commercial Chinese school original text system, and the present invention then possesses the function that detects and correct simultaneously.
The wrongly written or mispronounced characters of computer document is write the mistake that production process or input editing process are produced no matter derive from, all can be divided into following four classes or wherein the above institute of two classes cause jointly:
(1) unisonance or nearly sound word, its pronunciation is identical or close,
Example 1: " Hang ” Trace suspicious (shape)
Example 2: press " step " with regard to class (portion)
(2) word familiar in shape,
Example 3: tea " Pot " (Pot)
Example 4: defend " mattress " (bacterium)
(3) the close word of the meaning of word,
Example 5: previously " do not study carefully " (fault)
Example 6: name is " symbol " Real (pair) not
(4) input operation mistake, promptly owing to the close wrongly written or mispronounced characters that causes of input code or owing to the scarce word of editing operation mistake generation, superfluous word or front and back word intermodulation,
Example 7: " Shito ” System (Xi , Warehouse Jie Code respectively is VIF, HVIF)
Example 8: " clod of earth " " bank " (frustration), Xi is used to " being used to " ()
According to these finishing analysis in addition, the blundering font of common people, word sound, the meaning of word or close word with input code are compiled, make it to become comprehensive approximate word collection database, in order to the literal in the former document that replaces, produce candidate character string, constitute basis of the present invention.
As for the Chinese language model comprehensive grading, contain the scoring of substrate language model and " non-former word deduction of points ".
Its language model scoring can utilize known statistics scoring end, continues and shows or clump continues and shows or mark frequently based on the speech long word of dictionary as word table, speech the continue table, part of speech of word that continue between table, speech that continue, and shows with probability value or fractional value." non-former word deduction of points " then is that approximate word to non-former literal is with classification or stepless deduction of points.
Utilize the language model comprehensive grading, find out the highest candidate character string of scoring, again with former document in Chinese words compare, can detect the wrongly written or mispronounced characters place in the document automatically and corresponding correct word is provided, extremely with practical value.
The automatic detection method of Chinese wrongly written character of No. 81104438 " Chinese wrongly written character Auto-Sensing method and arrangement for detecting " propositions of Taiwan of the prior art patented claim, mainly comprise two steps: (1) false disconnected speech step, promptly with reference to a dictionary with find out can't form multiple words form the monosyllabic word of multiple words with adjacent words, and with its taking-up; (2) determining step, promptly the intensity that continues according to the word frequency of the monosyllabic word of each taking-up and prev word, back one word judges whether to be correct word.This method has two shortcomings: (1) False Rate is too high, has only a real wrongly written character in the word of average per 40 sign mistakes; (2) fail to provide corresponding correct word.
Other has the Taiwan patented claim No. 80102492 " wrongly written character that improves Chinese discrimination power is more executed " and No. 80107315 " document identification correcting device ", is the many candidate identification result that produces at the text-recognition device and does the wrongly written character corrigendum, and is irrelevant with the present invention.
It is 4,689 that the United States Patent (USP) such as the patent No. are arranged again, 768 (1987), 4,783,758 (1988), 4,903,206 (1988), 4,829,472 (1989), 5,148,367 (1992) patent, being at correcting as the spelling check of western languages such as English, because characteristic of speech sounds differs widely, is the technology that has nothing to do with the present invention therefore.
The Chinese document school original text system relevant with the present invention is in the past all by the word frequency that detects individual character behind the disconnected speech and the front and back word technology of intensity that continues, so there is False Rate too high and fail to provide the shortcoming and the difficulties such as correct word of correspondence.The present invention is for overcoming these shortcomings, a kind of automatic detection is provided and corrected the method and the device of Chinese wrongly written or mispronounced characters.
First purpose of the present invention is to provide a kind of Chinese wrongly written character of novelty to detect correction method and device automatically.
Second purpose of the present invention is to provide the correct corresponding word of detected wrongly written or mispronounced characters, for correcting.
A further object of the present invention is to reduce the False Rate that wrongly written or mispronounced characters detects, and improves the efficient of automatic school original text.
For achieving the above object, Chinese wrongly written or mispronounced characters of the present invention detects correction method automatically, is the method that the power supply brain detected and corrected wrongly written or mispronounced characters in the Chinese document automatically, and this method comprises the following steps:
Comprehensive approximate word collection replacement step is replaced the literal in the document with font, word sound, the meaning of word or with each literal of the comprehensive approximate word collection of the close word of input code, be combined into a plurality of candidate character strings;
Language model scoring step is utilized a statistics formula language model that each candidate character string is marked, and is found out the highest candidate character string of scoring; And
The wrongly written or mispronounced characters determining step, candidate character string and the literal in the document that this scoring is the highest are word for word compared, and to indicate wherein different literal be wrongly written or mispronounced characters.
Again, Chinese wrongly written or mispronounced characters of the present invention detects automatically and corrects device, is the device that the power supply brain detected and corrected wrongly written or mispronounced characters in the Chinese document automatically, and this device comprises:
Comprehensive approximate word collection replacement device, in order to the literal in the document is replaced into font, word sound, the meaning of word or with the literal of the close word of input code, for being combined into a plurality of candidate character strings;
The language model scoring apparatus in order to each candidate character string is marked, and is found out the highest candidate character string of scoring; And
The wrongly written or mispronounced characters judgment means, in order to word for word comparing the highest candidate character string of this scoring and the literal in the document, and to indicate wherein different literal be wrongly written or mispronounced characters.
Be clear demonstration device and method of the present invention, cooperate diagram to be described in detail as follows now:
Fig. 1 detects the calcspar of correcting device embodiment for the wrongly written or mispronounced characters of the present invention's Chinese.
Fig. 2 detects the process flow diagram of correction method for the wrongly written or mispronounced characters of the present invention's Chinese.
Fig. 3 is the comprehensively some of approximate word collection database of the embodiment of the invention.
Fig. 4 is the input example sentence that contains four wrongly written or mispronounced characterss.
Fig. 5 is the result of this input example sentence after approximate word collection replacement.
Fig. 6 is mark the highest five candidate character strings of this input example sentence after the language model scoring.
Fig. 7 handles output result behind this example sentence for the embodiment of the invention.
The wrongly written or mispronounced characters of the present invention Chinese detect correct device embodiment composition as shown in Figure 1.
This device mainly comprises: input media 100, comprehensively approximate word collection replacement device 120, language model scoring apparatus 140, and wrongly written or mispronounced characters judgment means 170,180.
The Chinese document 110 that input media 100 inputs are provided by the user, and can comprise a segmenting device, in order to before replacement, earlier the literal in the document is divided into a plurality of processing units according to punctuation mark.
Comprehensive approximate word collection replacement device 120 is in order to replace into each literal in the document 110 font, word sound, the meaning of word or the literal close with input code, for being combined into a plurality of candidate character strings.Should then comprise by comprehensive approximate word collection replacement device: (a) comprehensively approximate word collection data library device, include Chinese words and concentrate each literal to comprise one or more fonts, word sound, the meaning of word or the literal close with input code of former word, the approximate word of each literal also can be divided into a plurality of grades; Reach (b) replacement device, the literal replacement is the comprehensive approximate word that is similar in the word device.
Language model scoring apparatus 140 in order to each candidate character string is marked, and is found out the highest candidate character string of scoring.This language model scoring apparatus comprises: (a) language model staqtistical data base, write down the frequency of occurrences of each linguistic unit and the frequency of occurrences that continues between the linguistic unit, and wherein also can comprise the Chinese vocabulary bank of each speech part of speech of record; (b) scoring apparatus according to linguistic unit contained in the word string and language model staqtistical data base, is evaluated the mark of this word string, and this scoring apparatus is deducted points to the literal of non-former document; And (c) the highest scoring candidate character string search device, decision is the candidate character string of high scoring, and present embodiment is searched the highest scoring candidate character string in the dynamic programming mode.
Wrongly written or mispronounced characters judgment means 170,180, in order to word for word comparing the highest candidate character string of this scoring and the literal in the document, and to indicate wherein different literal be wrongly written or mispronounced characters.This wrongly written or mispronounced characters judgment means comprises (a) comparison device 170, word for word compares the literal of the highest candidate character string of this scoring and the document; And (b) indication device 180, indicating the different literal of comparison result is wrongly written or mispronounced characters, when indicating wrongly written or mispronounced characters, judges that the corresponding literal in the highest candidate character string of this scoring is the correct word of this wrongly written or mispronounced characters, and will indicate result's output and become document 190 after the sign.
The treatment scheme of the present invention's Chinese wrongly written or mispronounced characters detection correction method as shown in Figure 2.The method power supply brain detects the wrongly written or mispronounced characters in the Chinese document automatically, comprises the following steps: input step 200, imports a Chinese document 110, can be earlier with the literal in the document according to ", ", ".", "? ", "! ", "; ", punctuation mark such as ": " is divided into a plurality of processing units.The word string of each processing unit is carried out 230 to 290 each step, until each processing unit all after treatment, end step 220; Comprehensive approximate word collection replacement step 230, with the literal in the document 110 with font (S), word sound (P), the meaning of word (M) or with each literal replacement of the comprehensive approximate word collection 130 of the close word of input code (I), be combined into a plurality of candidate character strings, wherein comprehensively approximate word collection is made up of one or more fonts, word sound, the meaning of word or the literal close with input code that each literal comprises former word, and wherein the approximate word of each literal also can be divided into a plurality of grades; Language model scoring step 240, utilize 250 pairs of each candidate character strings of a statistics formula language model to mark, wherein the language model scoring is deducted points to the literal scoring of non-former document, utilizes Viterbi dynamic programming mode to search the highest scoring candidate character string (260); And wrongly written or mispronounced characters determining step 270, candidate character string and the literal in the document that this scoring is the highest are word for word compared, and to indicate wherein different literal be wrongly written or mispronounced characters (280), judges that simultaneously the corresponding literal in the highest candidate character string of this scoring is the correct word of this wrongly written or mispronounced characters; And will indicate result output (290) and become one and indicate back document (190).
Now lift an example, implementation process of the present invention is described.
Suppose that " comprehensively approximate word collection " is:
One:
People: go into S
Power: Li P Reed P cutter S sword S
Oneself: S S in the sixth of the twelve Earthly Branches second S
Do: sweet P universe P thousand S
Shoot a retrievable arrow: dagger-axe S
Smelting: control S
Study carefully: M is with regard to P for fault
Sharp: Li M clever S cuts the S S that stops and declares S power P
Anxious: loft P disease M
Yarn: be I
Venerate: venerate S Only S whetstone S root S and lick S Paper S Arrived S to S
The region between the heart and the diaphragm: educate the blind S of S
The replacement step is a processing unit with the word string after making pauses in reading unpunctuated ancient writings, and establishes former sentence to be
S=C1,C2,...,Cn
Each Chinese words,, produce candidate character string through comprehensive approximate word collection replacement:
P(i1,i2,...,in)=c1(i1),c2(i2),...,cn(in)
Wherein cj (ij) contains ij the approximate word of former word at j interior word, and 1<=ij<=mj (ij=1 represents to use former word),
1<=j<=n, that is form altogether m1 * m2 * ... * mn candidate character string.
Utilize language model each candidate character string of marking, wherein mark and carry out " non-former word deduction of points ", find out the highest candidate character string of scoring.
The Chinese language model comprehensive grading comprises the scoring of substrate language model and " non-former word deduction of points ".
Substrate language model scoring can utilize known statistics scoring, continues and shows or clump continues and shows or mark frequently based on the speech long word of dictionary as word table, speech the continue table, part of speech of word that continue between table, speech that continue, and shows with probability value or fractional value." non-former word deduction of points " then is that the approximate word to non-former literal gives classification or stepless deduction of points.
What the used substrate language model of present embodiment was that word continues word frequency in table and the dictionary between speech unites scoring, and " non-former word is deducted points " with candidate character string P (i1, i2 ..., the approximate word number weighting of the non-former word of using in) and getting:
Penalty (P (i1, i2 ..., in))=W * (ij!=1 number)
FinalScore=BaseScore+Penalty
Find out the practice of the highest candidate character string of scoring and can take exhaustive search method or Viterbi formula dynamic programming search method.
If find out the highest candidate character string of scoring and be P (k1, k2 .kn), through with former document in Chinese words S=C1, C2 ... Cn is compared, and can detect the wrongly written or mispronounced characters place in the document automatically, and corresponding correct word is provided.Be not equal to cj as cj (kj), then indicating cj is wrongly written or mispronounced characters, and is the correct word of correspondence with cj (kj).
The output step:
The result of output each processing unit comprises that wrongly written or mispronounced characters indicates and provide corresponding correct word.
Now with a specific example sentence operation of the present invention is described;
(1) input and punctuate step
" message that tea Pot Shito System name is not inconsistent Real is Difference and walking not.”
S=C1C2...????C15
(2) approximate word collection replacement step
The message that tea Pot Shito System name is not inconsistent Real is Difference and walking not
Bitter edible plant Pot is Fu Pins of Lv Tibia Seoul
Apply free and unfettered
In respect of 2 * 2 * 2 * 2 * 3 * 3 * 2 * 2=576 candidate character string.
(3) language model scoring step
Output is the result of the unit of processing respectively, comprises that wrongly written or mispronounced characters indicates and provide corresponding Chinese language model comprehensive grading TOP V candidate character string, for:
Ranking Scoring Candidate character string
?1 ?2 ?3 ?4 ?5 ?189-8=181 ?184-6=178 ?182-6=176 ?177-4=173 ?181-10=171 Tea Pot Xi System name not secondary Real message not Tibia and walk tea Pot Xi System name not secondary Real message not Tibia and walk tea Pot Xi System name be not inconsistent Real disappear Tibia not and walk message that tea Pot Xi System name is not inconsistent Real not Tibia and tea Pot Xi System name not secondary Real De Pins cease not Tibia and walk
(4) compare step
Former sentence: the message that tea Pot Shito System name is not inconsistent Real is Difference and walk best result not: tea Pot Xi System name is the message Tibia and walk not Difference and to walk Pot be secondary Tibia of message that XX X X (5) output step tea Pot Shito System name is not inconsistent Real not of secondary Real not
Successfully detect and correct four all wrongly written or mispronounced characterss of former sentence.The appraisal procedure of effect of the present invention is as follows: the Chinese words total number of word that makes A=input document
B=school original text method indicates the number of words of wrongly written or mispronounced characters
The number of words that C=school original text method detects and correctly corrects
D=school original text method detects the number of words into true wrongly written or mispronounced characters
The true wrongly written or mispronounced characters number of words of E=input document is sign rate B-rate=B/A then
Accuracy rate P-rate=D/B
Recall rate D-rate=D/E
Correct the index of the existing Chinese school of rate C-rate=C/E original text system: (seeing CCL Research Journal, 1992.8)
B-rate=5.2% (too high)
P-rate=2.5% (too low)
D-rate=73.8% (can)
Result after a large amount of experiments is as follows for C-rate=0% (not having) embodiments of the invention: (B, C, D is the embodiment index, B ', D ' is the index of the known Chinese school of simulation original text system) test data A B C D B ' D ' D and D ' International Politics 37,114 13 66 2,987 10 4
International economy 87,890 51 17 17 4,721 15 12
Internal politics 121,863 73 34 34 8,362 29 27
International economy 110,079 66 48 48 5,526 47 45
356946????203???105???105???21596????101???88
If D ' is 73.8% of E, E=137 then calculates every index of the present invention thus and is
Sign rate B-rate=B/A=203/356946=0.056%
Accuracy rate P-rate=D/B=105/203=51.72%
Recall rate D-rate=D/E=105/137=76.64%
Correct rate C-rate=C/E=105/137=76.64%
B-rate of the present invention, P-rate, the C-rate index is excellent far beyond known techniques all, and D-rate is roughly suitable, proves that the present invention is extremely with practical value.

Claims (16)

1. a Chinese wrongly written or mispronounced characters detects correction method automatically, and this method is the method that the power supply brain detected and corrected wrongly written or mispronounced characters in the Chinese document automatically, it is characterized in that comprising the following steps:
Comprehensive approximate word collection replacement step is replaced the literal in the document with font, word sound, the meaning of word or with each literal of the comprehensive approximate word collection of the close word of input code, be combined into a plurality of candidate character strings;
Language model scoring step is utilized a statistics formula language model that each candidate character string is marked, and is found out the highest candidate character string of scoring; And
The erroneous words determining step, candidate character string that above-mentioned scoring is the highest and the literal in the described document are word for word compared, and to indicate wherein different literal be wrongly written or mispronounced characters.
2. the method for claim 1 is characterized in that, the comprehensive approximate word collection in the described comprehensive approximate word collection replacement step is made up of one or more fonts, word sound, the meaning of word or the literal close with input code that each literal comprises former word.
3. method as claimed in claim 2 is characterized in that, described comprehensive approximate word concentrates the approximate word of each literal to be divided into a plurality of grades.
4. the method for claim 1 is characterized in that, in the described comprehensive approximate word collection replacement step, earlier the literal in the described document is divided into a plurality of processing units according to the labeling symbol before the replacement.
5. the method for claim 1 is characterized in that, described language model scoring step is deducted points to the literal scoring of non-former document.
6. the method for claim 1 is characterized in that, described wrongly written or mispronounced characters determining step judges that the corresponding literal in the highest candidate character string of described scoring is the correct word of this wrongly written or mispronounced characters when indicating wrongly written or mispronounced characters.
7. a Chinese wrongly written or mispronounced characters detects automatically and corrects device, and this device power supply brain detects and correct the wrongly written or mispronounced characters in the Chinese document automatically, it is characterized in that it comprises:
Comprehensive approximate word collection replacement device, in order to the literal in the document is replaced into font, word sound, the meaning of word or with the literal of the close word of input code, for being combined into a plurality of candidate character strings;
The language model scoring apparatus in order to each candidate character string is marked, and is found out the highest candidate character string of scoring; And
The wrongly written or mispronounced characters judgment means, in order to word for word comparing the highest candidate character string of above-mentioned scoring and the literal in the described document, and to indicate wherein different literal be wrongly written or mispronounced characters.
8. device as claimed in claim 7 is characterized in that, described comprehensive approximate word collection replacement device comprises a segmenting device, in order to earlier the literal in the described document is divided into a plurality of processing units according to punctuation mark before replacement.
9. device as claimed in claim 7 is characterized in that, described comprehensive approximate word collection replacement device comprises:
Comprehensive approximate word collection data library device includes Chinese words and concentrates each literal to comprise one or more fonts, word sound, the meaning of word or the literal close with input code of former word; And
The replacement device is the approximate word in the comprehensive approximate word acquisition means with the literal replacement.
10. device as claimed in claim 9 is characterized in that, the approximate word of comprehensive approximate each literal of word collection data library device in the described comprehensive approximate word acquisition means is divided into a plurality of grades.
11. device as claimed in claim 7 is characterized in that, described language model scoring apparatus comprises:
The language model staqtistical data base writes down the frequency of occurrences of each linguistic unit and the frequency of occurrences that continues between the linguistic unit;
Scoring apparatus according to linguistic unit contained in the word string and language model staqtistical data base, is evaluated the mark of this word string; And
The highest scoring candidate character string search device, decision is the candidate character string of high scoring.
12. device as claimed in claim 11 is characterized in that, described scoring apparatus is deducted points to the literal scoring of non-former document.
13. device as claimed in claim 11 is characterized in that, the language model staqtistical data base of described language model scoring apparatus comprises the Chinese vocabulary bank of each speech part of speech of record.
14. device as claimed in claim 11 is characterized in that, described language model scoring apparatus is searched the highest scoring candidate character string in the dynamic programming mode.
15. device as claimed in claim 7 is characterized in that, described wrongly written or mispronounced characters judgment means comprises:
Comparison device is word for word compared the highest candidate character string of described scoring and the literal in the described document; And
Indication device, indicating the different literal of comparison result is wrongly written or mispronounced characters.
16. device as claimed in claim 7 is characterized in that, described wrongly written or mispronounced characters judgment means judges that the corresponding literal in the highest candidate character string of described scoring is the correct word of this wrongly written or mispronounced characters when indicating wrongly written or mispronounced characters.
CN94109394A 1994-08-05 1994-08-05 Chinese wrongly writen character automatic correcting method and device Expired - Lifetime CN1056933C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN94109394A CN1056933C (en) 1994-08-05 1994-08-05 Chinese wrongly writen character automatic correcting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN94109394A CN1056933C (en) 1994-08-05 1994-08-05 Chinese wrongly writen character automatic correcting method and device

Publications (2)

Publication Number Publication Date
CN1116343A true CN1116343A (en) 1996-02-07
CN1056933C CN1056933C (en) 2000-09-27

Family

ID=5033910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN94109394A Expired - Lifetime CN1056933C (en) 1994-08-05 1994-08-05 Chinese wrongly writen character automatic correcting method and device

Country Status (1)

Country Link
CN (1) CN1056933C (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375807A (en) * 2010-08-27 2012-03-14 汉王科技股份有限公司 Method and device for proofing characters
CN102456001A (en) * 2010-10-27 2012-05-16 北京四维图新科技股份有限公司 Method and device for checking wrongly written characters
CN103034625A (en) * 2011-10-05 2013-04-10 王铭樟 System and method for detecting and correcting mismatched Chinese character
CN103488488A (en) * 2013-09-26 2014-01-01 贝壳网际(北京)安全技术有限公司 Text input check method, device ad mobile terminal
CN106250354A (en) * 2015-06-09 2016-12-21 富士通株式会社 Process the information processor of document, information processing method and program
CN108121455A (en) * 2016-11-29 2018-06-05 渡鸦科技(北京)有限责任公司 Identify method and device for correcting
WO2023184633A1 (en) * 2022-03-31 2023-10-05 上海蜜度信息技术有限公司 Chinese spelling error correction method and system, storage medium, and terminal

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847140B (en) * 2009-03-23 2012-04-18 中国科学院计算技术研究所 Wrongly-written or mispronounced character processing method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1082218A (en) * 1993-06-26 1994-02-16 李金龙 A kind of Chinese is the method for check and correction automatically

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375807A (en) * 2010-08-27 2012-03-14 汉王科技股份有限公司 Method and device for proofing characters
CN102375807B (en) * 2010-08-27 2014-01-15 汉王科技股份有限公司 Method and device for proofing characters
CN102456001A (en) * 2010-10-27 2012-05-16 北京四维图新科技股份有限公司 Method and device for checking wrongly written characters
CN102456001B (en) * 2010-10-27 2014-11-26 北京四维图新科技股份有限公司 Method and device for checking wrongly written characters
CN103034625A (en) * 2011-10-05 2013-04-10 王铭樟 System and method for detecting and correcting mismatched Chinese character
CN103488488A (en) * 2013-09-26 2014-01-01 贝壳网际(北京)安全技术有限公司 Text input check method, device ad mobile terminal
CN106250354A (en) * 2015-06-09 2016-12-21 富士通株式会社 Process the information processor of document, information processing method and program
CN106250354B (en) * 2015-06-09 2020-09-18 富士通株式会社 Information processing apparatus, information processing method, and program for processing document
CN108121455A (en) * 2016-11-29 2018-06-05 渡鸦科技(北京)有限责任公司 Identify method and device for correcting
WO2023184633A1 (en) * 2022-03-31 2023-10-05 上海蜜度信息技术有限公司 Chinese spelling error correction method and system, storage medium, and terminal

Also Published As

Publication number Publication date
CN1056933C (en) 2000-09-27

Similar Documents

Publication Publication Date Title
CN1145872C (en) Method for automatically cutting and identiying hand written Chinese characters and system for using said method
Drobac et al. Optical character recognition with neural networks and post-correction with finite state methods
CN1135485C (en) Identification of words in Japanese text by a computer system
CN1207664C (en) Error correcting method for voice identification result and voice identification system
CN110134953B (en) Traditional Chinese medicine named entity recognition method and recognition system based on traditional Chinese medicine ancient book literature
EP2166488A3 (en) Handwritten word spotter using synthesized typed queries
CN108764074A (en) Subjective item intelligently reading method, system and storage medium based on deep learning
CN1670723A (en) Systems and methods for improved spell checking
CN1226696C (en) Explanatory and search for handwriting sloppy Chinese characters based on shape of radicals
CN1193779A (en) Method for dividing sentences in Chinese language into words and its use in error checking system for texts in Chinese language
Kesiman et al. AMADI_LontarSet: the first handwritten Balinese palm leaf manuscripts dataset
CN1834955A (en) Multilingual translation memory, translation method, and translation program
CN1097883A (en) Dictionary retrieval device
Aulamo et al. OpusFilter: A configurable parallel corpus filtering toolbox
RU2002127826A (en) METHOD FOR AUTOMATIC DETERMINATION OF THE LANGUAGE OF RECOGNIZABLE TEXT WITH MULTILINGUAL RECOGNITION
CN1571979A (en) A method and apparatus for decoding handwritten characters
US20090132530A1 (en) Web content mining of pair-based data
CN105573979A (en) Chinese character confusion set based wrong word knowledge generation method
CN104123550A (en) Cloud computing-based text scanning identification method
CN1910573A (en) System for identifying and classifying denomination entity
CN107688630B (en) Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN101067766A (en) Method for cancelling character string in inputting method and word inputting system
CN1056933C (en) Chinese wrongly writen character automatic correcting method and device
CN106650664A (en) Collection system and method for college and university enrollment book data
KR102344144B1 (en) Early childhood learning system using by handwriting recognition

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CX01 Expiry of patent term

Expiration termination date: 20140805

Granted publication date: 20000927