CN109213998A - 中文错字检测方法及*** - Google Patents
中文错字检测方法及*** Download PDFInfo
- Publication number
- CN109213998A CN109213998A CN201810942637.0A CN201810942637A CN109213998A CN 109213998 A CN109213998 A CN 109213998A CN 201810942637 A CN201810942637 A CN 201810942637A CN 109213998 A CN109213998 A CN 109213998A
- Authority
- CN
- China
- Prior art keywords
- word
- language model
- frequency
- error detection
- chinese
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 76
- 238000012545 processing Methods 0.000 claims abstract description 64
- 238000000034 method Methods 0.000 claims abstract description 23
- 230000011218 segmentation Effects 0.000 claims abstract description 22
- 230000008569 process Effects 0.000 claims description 5
- 239000000463 material Substances 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 235000012054 meals Nutrition 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 101001072091 Homo sapiens ProSAAS Proteins 0.000 description 1
- 206010028916 Neologism Diseases 0.000 description 1
- 102100036366 ProSAAS Human genes 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810942637.0A CN109213998B (zh) | 2018-08-17 | 2018-08-17 | 中文错字检测方法及*** |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810942637.0A CN109213998B (zh) | 2018-08-17 | 2018-08-17 | 中文错字检测方法及*** |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109213998A true CN109213998A (zh) | 2019-01-15 |
CN109213998B CN109213998B (zh) | 2023-06-23 |
Family
ID=64989219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810942637.0A Active CN109213998B (zh) | 2018-08-17 | 2018-08-17 | 中文错字检测方法及*** |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109213998B (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291552A (zh) * | 2020-05-09 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | 一种文本内容修正的方法和*** |
CN111709228A (zh) * | 2020-06-22 | 2020-09-25 | 中国标准化研究院 | 一种字词重复错误的自动识别方法 |
CN111737982A (zh) * | 2020-06-29 | 2020-10-02 | 武汉虹信技术服务有限责任公司 | 一种基于深度学习的汉语文本错别字检测方法 |
CN112183071A (zh) * | 2019-06-14 | 2021-01-05 | 上海流利说信息技术有限公司 | 一种文本纠错的方法、装置、存储介质及电子设备 |
CN112966506A (zh) * | 2021-03-23 | 2021-06-15 | 北京有竹居网络技术有限公司 | 一种文本处理方法、装置、设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003331214A (ja) * | 2002-05-15 | 2003-11-21 | Nippon Telegr & Teleph Corp <Ntt> | 文字認識誤り訂正方法、装置及びプログラム |
CN102156551A (zh) * | 2011-03-30 | 2011-08-17 | 北京搜狗科技发展有限公司 | 一种字词输入的纠错方法及*** |
CN102789504A (zh) * | 2012-07-19 | 2012-11-21 | 姜赢 | 一种基于xml规则的中文语法校正方法与*** |
CN104915264A (zh) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | 一种输入纠错方法和装置 |
CN105279149A (zh) * | 2015-10-21 | 2016-01-27 | 上海应用技术学院 | 一种中文文本自动校正方法 |
-
2018
- 2018-08-17 CN CN201810942637.0A patent/CN109213998B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003331214A (ja) * | 2002-05-15 | 2003-11-21 | Nippon Telegr & Teleph Corp <Ntt> | 文字認識誤り訂正方法、装置及びプログラム |
CN102156551A (zh) * | 2011-03-30 | 2011-08-17 | 北京搜狗科技发展有限公司 | 一种字词输入的纠错方法及*** |
CN102789504A (zh) * | 2012-07-19 | 2012-11-21 | 姜赢 | 一种基于xml规则的中文语法校正方法与*** |
CN104915264A (zh) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | 一种输入纠错方法和装置 |
CN105279149A (zh) * | 2015-10-21 | 2016-01-27 | 上海应用技术学院 | 一种中文文本自动校正方法 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183071A (zh) * | 2019-06-14 | 2021-01-05 | 上海流利说信息技术有限公司 | 一种文本纠错的方法、装置、存储介质及电子设备 |
CN112183071B (zh) * | 2019-06-14 | 2022-12-13 | 上海流利说信息技术有限公司 | 一种文本纠错的方法、装置、存储介质及电子设备 |
CN111291552A (zh) * | 2020-05-09 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | 一种文本内容修正的方法和*** |
CN111709228A (zh) * | 2020-06-22 | 2020-09-25 | 中国标准化研究院 | 一种字词重复错误的自动识别方法 |
CN111709228B (zh) * | 2020-06-22 | 2023-11-21 | 中国标准化研究院 | 一种字词重复错误的自动识别方法 |
CN111737982A (zh) * | 2020-06-29 | 2020-10-02 | 武汉虹信技术服务有限责任公司 | 一种基于深度学习的汉语文本错别字检测方法 |
CN112966506A (zh) * | 2021-03-23 | 2021-06-15 | 北京有竹居网络技术有限公司 | 一种文本处理方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN109213998B (zh) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111104794B (zh) | 一种基于主题词的文本相似度匹配方法 | |
CN104636466B (zh) | 一种面向开放网页的实体属性抽取方法和*** | |
CN109213998A (zh) | 中文错字检测方法及*** | |
CN103399901B (zh) | 一种关键词抽取方法 | |
CN104063387B (zh) | 在文本中抽取关键词的装置和方法 | |
CN112035730B (zh) | 一种语义检索方法、装置及电子设备 | |
Ahmed et al. | Language identification from text using n-gram based cumulative frequency addition | |
CN113495900A (zh) | 基于自然语言的结构化查询语言语句获取方法及装置 | |
CN106570180A (zh) | 基于人工智能的语音搜索方法及装置 | |
CN107180026B (zh) | 一种基于词嵌入语义映射的事件短语学习方法及装置 | |
CN108984661A (zh) | 一种知识图谱中实体对齐方法和装置 | |
CN104199965A (zh) | 一种语义信息检索方法 | |
CN108509490B (zh) | 一种网络热点话题发现方法及*** | |
CN109766547B (zh) | 一种句子相似度计算方法 | |
CN111027323A (zh) | 一种基于主题模型和语义分析的实体指称项识别方法 | |
CN113360647B (zh) | 一种基于聚类的5g移动业务投诉溯源分析方法 | |
CN109522396B (zh) | 一种面向国防科技领域的知识处理方法及*** | |
CN112380848B (zh) | 文本生成方法、装置、设备及存储介质 | |
CN113934814B (zh) | 古诗文主观题自动评分方法 | |
Sembok et al. | Arabic word stemming algorithms and retrieval effectiveness | |
CN101369285B (zh) | 一种中文搜索引擎中查询词的拼写校正方法 | |
CN110705285B (zh) | 一种政务文本主题词库构建方法、装置、服务器及可读存储介质 | |
Ahmad et al. | Pipilika n-gram viewer: an efficient large scale n-gram model for bengali | |
KR101351555B1 (ko) | 대용량 데이터의 텍스트마이닝을 위한 의미기반 분류 추출시스템 | |
CN107818078B (zh) | 汉语自然语言对话的语义关联与匹配方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210621 Address after: No.18-d2561, Jianshe Road, Kaixuan street, Liangxiang, Fangshan District, Beijing Applicant after: Beijing Yuyun Technology Co.,Ltd. Address before: 100068 620, 5th floor, building 1, yard 36, Majiabao West Road, Fengtai District, Beijing Applicant before: HUIZHI RONGDA (BEIJING) INFORMATION TECHNOLOGY Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230517 Address after: Room 301AB, No. 10, Lane 198, Zhangheng Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, 200120 Applicant after: SHANGHAI MDATA INFORMATION TECHNOLOGY Co.,Ltd. Address before: No.18-d2561, Jianshe Road, Kaixuan street, Liangxiang, Fangshan District, Beijing Applicant before: Beijing Yuyun Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: Room 301ab, No.10, Lane 198, zhangheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai 201204 Patentee after: Shanghai Mido Technology Co.,Ltd. Address before: Room 301AB, No. 10, Lane 198, Zhangheng Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, 200120 Patentee before: SHANGHAI MDATA INFORMATION TECHNOLOGY Co.,Ltd. |
|
CP03 | Change of name, title or address | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Chinese misspelling detection method and system Granted publication date: 20230623 Pledgee: Bank of Communications Ltd. Shanghai New District Branch Pledgor: Shanghai Mido Technology Co.,Ltd. Registration number: Y2024310000145 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240412 Address after: Room 301, 3rd Floor, Building 3, No. 20 Yong'an Road, Shilong Economic Development Zone, Mentougou District, Beijing, 102308 Patentee after: Beijing Midu Information Technology Co.,Ltd. Country or region after: China Address before: Room 301ab, No.10, Lane 198, zhangheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai 201204 Patentee before: Shanghai Mido Technology Co.,Ltd. Country or region before: China |
|
TR01 | Transfer of patent right |