CN105512110A - 一种基于模糊匹配与统计的错字词知识库构建方法 - Google Patents
一种基于模糊匹配与统计的错字词知识库构建方法 Download PDFInfo
- Publication number
- CN105512110A CN105512110A CN201510934356.7A CN201510934356A CN105512110A CN 105512110 A CN105512110 A CN 105512110A CN 201510934356 A CN201510934356 A CN 201510934356A CN 105512110 A CN105512110 A CN 105512110A
- Authority
- CN
- China
- Prior art keywords
- word
- string
- combinatorial
- word string
- binary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000009411 base construction Methods 0.000 title abstract 3
- 239000000463 material Substances 0.000 claims description 49
- 238000010276 construction Methods 0.000 claims description 14
- 239000000203 mixture Substances 0.000 claims description 4
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000013518 transcription Methods 0.000 claims description 3
- 230000035897 transcription Effects 0.000 claims description 3
- 238000012937 correction Methods 0.000 abstract description 2
- 230000001915 proofreading effect Effects 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510934356.7A CN105512110B (zh) | 2015-12-15 | 2015-12-15 | 一种基于模糊匹配与统计的错字词知识库构建方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510934356.7A CN105512110B (zh) | 2015-12-15 | 2015-12-15 | 一种基于模糊匹配与统计的错字词知识库构建方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105512110A true CN105512110A (zh) | 2016-04-20 |
CN105512110B CN105512110B (zh) | 2018-04-06 |
Family
ID=55720103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510934356.7A Active CN105512110B (zh) | 2015-12-15 | 2015-12-15 | 一种基于模糊匹配与统计的错字词知识库构建方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105512110B (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106528532A (zh) * | 2016-11-07 | 2017-03-22 | 上海智臻智能网络科技股份有限公司 | 文本纠错方法、装置及终端 |
CN107180084A (zh) * | 2017-05-05 | 2017-09-19 | 上海木爷机器人技术有限公司 | 词库更新方法及装置 |
CN108280051A (zh) * | 2018-01-22 | 2018-07-13 | 清华大学 | 一种文本数据中错误字符的检测方法、装置和设备 |
CN108564086A (zh) * | 2018-03-17 | 2018-09-21 | 深圳市极客思索科技有限公司 | 一种字符串的识别校验方法及装置 |
CN108717412A (zh) * | 2018-06-12 | 2018-10-30 | 北京览群智数据科技有限责任公司 | 基于中文分词的中文校对纠错方法及*** |
JP2018185601A (ja) * | 2017-04-25 | 2018-11-22 | 富士ゼロックス株式会社 | 情報処理装置及び情報処理プログラム |
CN108984515A (zh) * | 2018-05-22 | 2018-12-11 | 广州视源电子科技股份有限公司 | 错别字检测方法、装置及计算机可读存储介质、终端设备 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060129381A1 (en) * | 1998-06-04 | 2006-06-15 | Yumi Wakita | Language transference rule producing apparatus, language transferring apparatus method, and program recording medium |
JP2007073054A (ja) * | 2005-09-08 | 2007-03-22 | Fujitsu Ltd | 対訳語句提示プログラム、対訳語句提示方法および対訳語句提示装置 |
CN101639826A (zh) * | 2009-09-01 | 2010-02-03 | 西北大学 | 一种基于中文句式模板变换的文本隐藏方法 |
CN101655982A (zh) * | 2009-09-04 | 2010-02-24 | 上海交通大学 | 基于改进Harris角点的图像配准方法 |
CN101950306A (zh) * | 2010-09-29 | 2011-01-19 | 北京新媒传信科技有限公司 | 新词发现中的字符串过滤方法 |
CN104915264A (zh) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | 一种输入纠错方法和装置 |
CN104991889A (zh) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | 一种基于模糊分词的非多字词错误自动校对方法 |
-
2015
- 2015-12-15 CN CN201510934356.7A patent/CN105512110B/zh active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060129381A1 (en) * | 1998-06-04 | 2006-06-15 | Yumi Wakita | Language transference rule producing apparatus, language transferring apparatus method, and program recording medium |
JP2007073054A (ja) * | 2005-09-08 | 2007-03-22 | Fujitsu Ltd | 対訳語句提示プログラム、対訳語句提示方法および対訳語句提示装置 |
CN101639826A (zh) * | 2009-09-01 | 2010-02-03 | 西北大学 | 一种基于中文句式模板变换的文本隐藏方法 |
CN101655982A (zh) * | 2009-09-04 | 2010-02-24 | 上海交通大学 | 基于改进Harris角点的图像配准方法 |
CN101950306A (zh) * | 2010-09-29 | 2011-01-19 | 北京新媒传信科技有限公司 | 新词发现中的字符串过滤方法 |
CN104915264A (zh) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | 一种输入纠错方法和装置 |
CN104991889A (zh) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | 一种基于模糊分词的非多字词错误自动校对方法 |
Non-Patent Citations (5)
Title |
---|
ANDREW PARGELLIS 等: "Metrics for Measuring Domain Independence of Semantic Classes", 《PROC. OF EUROPEAN SPEECH PROCESSING》 * |
刘亮亮 等: "领域问答***中的文本错误自动发现方法", 《中文信息学报》 * |
施恒利 等: "汉字种子混淆集的构建方法研究", 《计算机科学》 * |
马金山 等: "利用三元模型及依存分析查找中文文本错误", 《情报学报》 * |
骆卫华 等: "中文文本自动校对技术的研究", 《计算机研究与发展》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106528532A (zh) * | 2016-11-07 | 2017-03-22 | 上海智臻智能网络科技股份有限公司 | 文本纠错方法、装置及终端 |
CN106528532B (zh) * | 2016-11-07 | 2019-03-12 | 上海智臻智能网络科技股份有限公司 | 文本纠错方法、装置及终端 |
JP2018185601A (ja) * | 2017-04-25 | 2018-11-22 | 富士ゼロックス株式会社 | 情報処理装置及び情報処理プログラム |
JP7027696B2 (ja) | 2017-04-25 | 2022-03-02 | 富士フイルムビジネスイノベーション株式会社 | 情報処理装置及び情報処理プログラム |
CN107180084A (zh) * | 2017-05-05 | 2017-09-19 | 上海木爷机器人技术有限公司 | 词库更新方法及装置 |
CN107180084B (zh) * | 2017-05-05 | 2020-04-21 | 上海木木聚枞机器人科技有限公司 | 词库更新方法及装置 |
CN108280051A (zh) * | 2018-01-22 | 2018-07-13 | 清华大学 | 一种文本数据中错误字符的检测方法、装置和设备 |
CN108564086A (zh) * | 2018-03-17 | 2018-09-21 | 深圳市极客思索科技有限公司 | 一种字符串的识别校验方法及装置 |
CN108564086B (zh) * | 2018-03-17 | 2024-05-10 | 上海柯渡医学科技股份有限公司 | 一种字符串的识别校验方法及装置 |
CN108984515A (zh) * | 2018-05-22 | 2018-12-11 | 广州视源电子科技股份有限公司 | 错别字检测方法、装置及计算机可读存储介质、终端设备 |
CN108984515B (zh) * | 2018-05-22 | 2022-09-06 | 广州视源电子科技股份有限公司 | 错别字检测方法、装置及计算机可读存储介质、终端设备 |
CN108717412A (zh) * | 2018-06-12 | 2018-10-30 | 北京览群智数据科技有限责任公司 | 基于中文分词的中文校对纠错方法及*** |
Also Published As
Publication number | Publication date |
---|---|
CN105512110B (zh) | 2018-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105512110A (zh) | 一种基于模糊匹配与统计的错字词知识库构建方法 | |
CN105045778B (zh) | 一种汉语同音词错误自动校对方法 | |
WO2021114745A1 (zh) | 一种基于词缀感知的社交媒体命名实体识别方法 | |
McCauley et al. | Learning simple statistics for language comprehension and production: The CAPPUCCINO model | |
CN107463607B (zh) | 结合词向量和自举学习的领域实体上下位关系获取与组织方法 | |
Al Tamimi et al. | AARI: automatic Arabic readability index. | |
CN110489760A (zh) | 基于深度神经网络文本自动校对方法及装置 | |
CN105138514B (zh) | 一种基于词典的正向逐次加一字最大匹配中文分词方法 | |
CN104991889A (zh) | 一种基于模糊分词的非多字词错误自动校对方法 | |
CN107039034A (zh) | 一种韵律预测方法及*** | |
CN103823794A (zh) | 一种关于英语阅读理解测试疑问式简答题的自动化命题方法 | |
CN109918670A (zh) | 一种文章查重方法及*** | |
CN106528524A (zh) | 一种基于MMseg算法与逐点互信息算法的分词方法 | |
CN103631858A (zh) | 一种科技项目相似度计算方法 | |
CN107688630A (zh) | 一种基于语义的弱监督微博多情感词典扩充方法 | |
CN112364623A (zh) | 基于Bi-LSTM-CRF的三位一体字标注汉语词法分析方法 | |
TW201403354A (zh) | 以資料降維法及非線性算則建構中文文本可讀性數學模型之系統及其方法 | |
CN109213998A (zh) | 中文错字检测方法及*** | |
CN104933032A (zh) | 一种基于复杂网络的博客关键词提取方法 | |
CN105159917A (zh) | 一种电子病历的非结构化信息转化为结构化的泛化方法 | |
CN114969294A (zh) | 一种音近敏感词的扩展方法 | |
Cavalli-Sforza et al. | Arabic readability research: current state and future directions | |
CN104881400A (zh) | 基于联想网络的语义相关性计算方法 | |
Forsyth | Automatic readability prediction for modern standard Arabic | |
CN106202037A (zh) | 基于组块的越南语短语树构建方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Liu Liangliang Inventor after: Liu Haibo Inventor after: Wu Jiankang Inventor after: Gu Dezhi Inventor after: Zhang Zaiyue Inventor after: Zhang Xiaoru Inventor before: Liu Haibo Inventor before: Liu Liangliang Inventor before: Wu Jiankang Inventor before: Gu Dezhi Inventor before: Zhang Zaiyue Inventor before: Zhang Xiaoru |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20160420 Assignee: JIANGSU KEDA HUIFENG SCIENCE AND TECHNOLOGY Co.,Ltd. Assignor: JIANGSU University OF SCIENCE AND TECHNOLOGY Contract record no.: X2020980007325 Denomination of invention: A method of building wrong word knowledge base based on fuzzy matching and statistics Granted publication date: 20180406 License type: Common License Record date: 20201029 |
|
EC01 | Cancellation of recordation of patent licensing contract | ||
EC01 | Cancellation of recordation of patent licensing contract |
Assignee: JIANGSU KEDA HUIFENG SCIENCE AND TECHNOLOGY Co.,Ltd. Assignor: JIANGSU University OF SCIENCE AND TECHNOLOGY Contract record no.: X2020980007325 Date of cancellation: 20201223 |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221230 Address after: Room 02A-084, Building C (Second Floor), No. 28, Xinxi Road, Haidian District, Beijing 100085 Patentee after: Jingchuang United (Beijing) Intellectual Property Service Co.,Ltd. Address before: 212003, No. 2, Mengxi Road, Zhenjiang, Jiangsu Patentee before: JIANGSU University OF SCIENCE AND TECHNOLOGY Effective date of registration: 20221230 Address after: Room 606-609, Compound Office Complex Building, No. 757, Dongfeng East Road, Yuexiu District, Guangzhou, Guangdong Province, 510699 Patentee after: China Southern Power Grid Internet Service Co.,Ltd. Address before: Room 02A-084, Building C (Second Floor), No. 28, Xinxi Road, Haidian District, Beijing 100085 Patentee before: Jingchuang United (Beijing) Intellectual Property Service Co.,Ltd. |