CN110298020B - Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment - Google Patents

Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment Download PDF

Info

Publication number
CN110298020B
CN110298020B CN201910462812.0A CN201910462812A CN110298020B CN 110298020 B CN110298020 B CN 110298020B CN 201910462812 A CN201910462812 A CN 201910462812A CN 110298020 B CN110298020 B CN 110298020B
Authority
CN
China
Prior art keywords
text
child node
cheating
mapping
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910462812.0A
Other languages
Chinese (zh)
Other versions
CN110298020A (en
Inventor
袁晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910462812.0A priority Critical patent/CN110298020B/en
Publication of CN110298020A publication Critical patent/CN110298020A/en
Application granted granted Critical
Publication of CN110298020B publication Critical patent/CN110298020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a text anti-cheating variant reduction method, which comprises the following steps: taking the text as a root node text; directly taking the root node text as a child node text, and expanding each character in the root node text according to each mapping in one of N mapping relations to generate the child node text; taking each child node text as a new child node text, and expanding each character in each child node text according to each mapping in one mapping relation in the non-adopted mapping relation to generate a new child node text; repeating the steps until each mapping in each mapping relation of all characters in all child node texts is traversed; and scoring the smoothness of the child node texts, and selecting the child node text with the highest smoothness score as the restored text. Through the method, the difficulty in matching the follow-up keywords or identifying the models can be reduced, and finally the cheating text is deleted.

Description

Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment
Technical Field
The invention relates to the field of text anti-cheating, in particular to a text anti-cheating variant reduction method, a text anti-cheating variant reduction device, a text anti-cheating method and a text anti-cheating device.
Background
With the continuous development of the Internet, the number of netizens is increased year by year, and various forms of flow bonus are provided for large Internet companies. However, another "mothproof market" of cheating popularization is grown behind the bright and bright internet market, various cheating popularization posts (namely soft text or soft advertisement) are issued under the product line of communities, feed streams and the like for the purpose of popularizing certain goods or services, the user experience of the products is seriously affected, and potential advertisers are guided freely to a certain extent, so that the income of the companies is lost. There are a large number of black and gray producing teams in today's internet environment that send offending text content in large quantities for purposes including pornography web site drainage, fake medicine sales, network fraud, etc. The cheating text they send belongs to the most difficult class to identify. Text anti-cheating refers to identifying advertisement cheating content in text. In the prior art, two methods for identifying cheating texts exist: 1. keyword matching identification; 2. machine learning model identification.
The keyword matching method comprises the following steps: and reading the dictionary, and matching the text to be predicted through the keywords, wherein if the matching is successful, the text to be predicted is considered to contain cheating content. Enumerating variant content and writing into a keyword dictionary. The keyword matching method has the defects that:
first, although the number of the whole characters is limited, the number of the combination of a plurality of variant characters is extremely huge (if the word of micro is assumed to have 20 variants, the word of letter is assumed to have 15 variants, and the variants of the keyword of micro are assumed to have 20×15=300), and it is almost impossible to enumerate the variant combinations of all the characters;
second, enumerated variant character combinations tend to be more traumatic.
The machine learning model identification method comprises the following steps: and (3) establishing a corpus, manually labeling the cheating text, training an NLP classification model through machine learning, predicting the text to be judged through the model, outputting the cheating probability, and considering that the text contains the cheating content when the cheating probability is higher than a certain threshold value.
The defects of machine learning model identification are: the effect of the model depends on training samples in the corpus, and the corpus cannot cover variant combinations of all characters as the keyword matching method is the same. The model can only identify variant cheating text contained in the corpus, but cannot identify variant cheating text outside the scope of the corpus. The existing NLP classification model mainly solves the problem of classification of natural language, but the context relation among words in the variant text is artificially destroyed, and cannot be regarded as natural language, and the NLP model effect is not ideal.
Disclosure of Invention
The method and the device for eliminating the variant in the text effectively remove the variant in the text, restore the real information of the text, facilitate the subsequent execution of other strategies such as keyword matching, model recognition and the like, and improve the generalization capability of anti-cheating strategies.
To achieve the above object, in a first aspect of the present invention, there is provided a text anti-cheating variant reduction method, the method comprising:
taking the text as a root node text;
directly taking the root node text as a child node text, and expanding each character in the root node text according to each mapping in one of N mapping relations to generate the child node text;
taking each child node text as a new child node text, and expanding each character in each child node text according to each mapping in one mapping relation in the non-adopted mapping relation to generate a new child node text;
repeating the steps until each mapping in each mapping relation of all characters in all child node texts is traversed;
and scoring the smoothness of all the child node texts generated by the last expansion in the traversed child node texts, and selecting the child node text with the highest smoothness score as a restored text.
Optionally, in the step, the N mapping relations include a shape-near word mapping, a homophone mapping, a harmonic word mapping, and an interference character mapping.
Optionally, the method for expanding the shape-near word mapping relation comprises the following steps:
capturing all characters in the root node text or the child node text by adopting image processing software, and carrying out image recognition on the captured characters;
chinese characters similar to the shape of each recognized character are expanded into child node text as shape near words.
Optionally, the method for expanding the shape-near word mapping relation comprises the following steps:
capturing all characters in the root node text or the child node text by adopting image processing software, and carrying out image recognition on the captured characters;
capturing the disassembled data and the radical data of each character by adopting image processing software;
and under the condition that each character is subjected to radical removal, if the rest part of the characters subjected to radical removal can form Chinese characters, expanding the rest part into child node text as a shape near word.
Optionally, the method for expanding homophone mapping relation comprises the following steps:
converting each Chinese character in the root node text or the child node text into pinyin;
and expanding each Chinese character with the same pinyin as the pinyin as homophones into child node text.
Optionally, the method for expanding the mapping relation of the harmonic words comprises the following steps:
converting each Chinese character in the root node text or the child node text into pinyin;
and expanding each Chinese character with pinyin similar to the pinyin as a harmonic character into child node text.
Optionally, the method for expanding the mapping relation of the interference characters comprises the following steps:
the nonsensical characters are expanded as null characters into child node text.
Optionally, the steps further include: and scoring the smoothness of each child node text, and deleting M child node texts with the rear score according to the obtained smoothness score.
In another aspect of the present invention, there is also provided a text anti-cheating method, the method including:
reducing the variants in the text by adopting the method to obtain a reduced text;
performing keyword matching or model recognition on the restored text;
labeling the text successfully matched with the keywords as a cheating text, or labeling the text successfully identified by the model as the cheating text.
In a third aspect of the present invention, there is also provided a text anti-cheating variant reduction apparatus, including:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method described above by executing the instructions stored by the memory.
The fourth aspect of the present invention also provides a text anti-cheating device, including:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method described above by executing the instructions stored by the memory.
In a fifth aspect of the invention, there is also provided a machine-readable storage medium having stored thereon instructions which, when executed by a controller, enable the controller to perform the method as described hereinbefore.
The technical scheme of the invention has at least the following effects:
the method effectively removes the variants in the text, restores the real information of the text, facilitates the subsequent execution of other strategies such as keyword matching, model recognition and the like, and improves the generalization capability of the anti-cheating strategy.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:
FIG. 1 is a flow chart of a text anti-cheating variant reduction method provided by an embodiment of the present invention;
fig. 2 is a flowchart of a text anti-cheating method provided by an embodiment of the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
In the embodiments of the present invention, unless otherwise indicated, terms of orientation such as "upper, lower, top, bottom" are used generally with respect to the orientation shown in the drawings or with respect to the positional relationship of the various components with respect to one another in the vertical, vertical or gravitational directions.
FIG. 1 is a flow chart of a text anti-cheating variant reduction method provided by an embodiment of the present invention; as shown in fig. 1, the present invention provides a text anti-cheating variant reduction method, which includes:
s11) taking the text as a root node text; when the root node text is considered to belong to the cheating text, the root node text is taken as the root of the variant reduction tree.
S12) directly taking the root node text as a child node text, and expanding each character in the root node text according to each mapping in one of N mapping relations to generate the child node text; since root node text typically contains multiple characters; for example, starting from the first character, each of the nth mappings is used to extend to sub-byte text for the first character. Such as: "can be expanded into" you "," Ni "," woolen "etc. by using homophones or homophones mapping relation, and the expanded sub-byte text uses" you "," Ni "or" woolen "as the first character respectively. In turn, the second character, the third character, etc. of the root node text may also be expanded into sub-byte text by using each mapping relationship in the nth mapping relationship in the manner described above. The non-replaced text is also expanded by one child node to be written, i.e. the root node text is also expanded as one child node text.
S13) taking each child node text as a new child node text, and expanding each character in each child node text according to each mapping in one mapping relation in the non-adopted mapping relation to generate a new child node text; and when a plurality of sub-byte texts expanded by adopting the N-th mapping relation are generated in the step S12), expanding each character by adopting the N-1-th mapping relation for each sub-byte text. For example: the sub-byte text generated by expansion in step S12) includes: "hello", "nigood", "woolen", "nigood"; then adopting an N-1 mapping relation which may be a shape-near-word mapping relation, and expanding the shape-near-word mapping relation for 'hello' into 'you "," 1' girl "," man 'good' and the like; the "Nigood" is expanded into "female", "female good", "Nifemale", etc. by adopting the shape-near-word mapping relationship. The non-replaced text is also expanded by one child node to be written, namely, the child node text is also expanded as a new child node text. By adopting the N-1 expansion method of the mapping relation, each sub-byte text in the plurality of sub-node texts can generate more sub-byte texts, and a plurality of branches can be sent out similar to branches on a tree, and each branch can be separated into a plurality of branches. Therefore, variant restoration of the root node text and child node text forms a variant restoration tree similar to multiple branches on a tree.
S14) repeating the step S13) until each mapping in each mapping relation of all characters in all child node texts is traversed; repeating the steps to obtain a variant reduction tree with a plurality of branches. If the characters of the root byte text are many, more sub-byte text is generated, and similarly, the branches of the variant reduction tree are more. If the traversal is complete, a very large number of sub-byte text is formed.
S15) carrying out smoothness scoring on all the child node texts generated by the last expansion in the traversed child node texts, and selecting the child node text with the highest smoothness score as a restored text. And scoring the smoothness of the tail ends of the branches in the traversed sub-byte text, namely the sub-byte text obtained by the last expansion. The scoring step is prior art in the art and will not be described in detail herein.
In step S12), the N mapping relationships include a shape-near word map, a homophone map, a harmonic word map, and an interference character map. The mapping relation can be increased and decreased according to actual conditions.
Specifically, the method for expanding the shape-near word mapping relation comprises the following steps:
capturing all characters in the root node text or the child node text by adopting image processing software, and carrying out image recognition on the captured characters; according to one embodiment, the image recognition is performed by means of OCR recognition.
Chinese characters similar to the shape of each recognized character are expanded into child node text as shape near words. For example, a pixel image OCR result of a "micro" word, a result with a similarity greater than a certain threshold might be: micro, , bar, , sign, badge, bear, , and mikania micrantha, the micro word itself is removed, the remaining micro, , bar, , sign, bear, , and mikania micrantha are recorded as near-shape words, and the recorded near-shape words are used as expandable characters.
The method steps of the shape-near word mapping relation expansion can further comprise:
capturing all characters in the root node text or the child node text by adopting image processing software, and carrying out image recognition on the captured characters;
capturing the disassembled data and the radical data of each character by adopting image processing software;
and under the condition that each character is subjected to radical removal, if the rest part of the characters subjected to radical removal can form Chinese characters, expanding the rest part into child node text as a shape near word.
In particular, many Chinese characters are composed of a plurality of radicals and a plurality of other Chinese characters. For example: the "greeting" can be broken down into two words, "Add" and "shellfish", so that the "greeting" can be expanded to child node text with "Add" and, at the same time, to child byte text with "shellfish".
Specifically, the method for expanding homophone mapping relation comprises the following steps:
converting each Chinese character in the root node text or the child node text into pinyin;
and expanding each Chinese character with the same pinyin as the pinyin as homophones into child node text. For example, expanding "sad" to Chinese characters with the same pinyin may be: "enemy", "frame", "thick", and the like. Therefore, "sad" can be extended to sub-byte text with "enemy", "frame", "thick" using homophone mapping relation.
Specifically, the method for expanding the harmonic word mapping relation comprises the following steps:
converting each Chinese character in the root node text or the child node text into pinyin;
and expanding each Chinese character with pinyin similar to the pinyin as a harmonic character into child node text.
For example, extending "river" to Chinese characters with harmonic tones may be: "will", "Jiang", "prize", and so forth. Thus, "river" can be extended with harmonic word mapping to have: sub-byte text of "will", "Jiang", "prize".
Specifically, the method for expanding the mapping relation of the interference characters comprises the following steps:
a nonsensical character, such as "-", "\", "one", or "kana", is expanded as a null character into child node text.
Step S13) further includes: and scoring the smoothness of each child node text, and deleting M child node texts with the rear score according to the obtained smoothness score. For example, M is 5. If the complete variant reduction tree is directly constructed, the number of nodes is quite large, so that proper pruning operation can be performed in the actual execution process, such as forward degree scoring, partial nodes are abandoned, and the search range is reduced, so that the system performance is improved.
Fig. 2 is a flowchart of a text anti-cheating method according to an embodiment of the present invention, and a second aspect of the present invention further provides a text anti-cheating method, where the method includes:
s1) reducing the variants in the text by the method of any one of the above claims to obtain a reduced text;
s2) carrying out keyword matching or model recognition on the text restored in the step S1);
and S3) marking the text successfully matched with the keyword in the step S2) as a cheating text, or marking the text successfully identified by the model as the cheating text.
The third aspect of the present invention also provides a text anti-cheating variant reduction apparatus, comprising:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method as described above by executing the instructions stored by the memory.
The fourth aspect of the present invention also provides a text anti-cheating apparatus, comprising:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method as described above by executing the instructions stored by the memory.
The fifth aspect of the invention also provides a machine-readable storage medium having stored thereon instructions which, when executed by a controller, enable the controller to perform a method as described above.
According to the technical scheme, the root node text is expanded into a plurality of sub-byte texts step by step through a plurality of mapping relations, the sub-byte texts are subjected to smoothness scoring, part of the sub-byte texts are deleted, and finally the text with the highest smoothness is obtained as the restored text. Based on the restored text, keyword matching or model recognition can be performed, and finally the aim of deleting the matched keywords or deleting the text after model recognition is fulfilled.
The alternative embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the embodiments of the present invention are not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the embodiments of the present invention within the scope of the technical concept of the embodiments of the present invention, and all the simple modifications belong to the protection scope of the embodiments of the present invention.
In addition, the specific features described in the above embodiments may be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the various possible combinations of embodiments of the invention are not described in detail.
Those skilled in the art will appreciate that all or part of the steps in a method for implementing the above embodiments may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a single-chip microcomputer, chip or processor (processor) to perform all or part of the steps in a method according to the embodiments of the invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In addition, any combination of the various embodiments of the present invention may be made, so long as it does not deviate from the idea of the embodiments of the present invention, and it should also be regarded as what is disclosed in the embodiments of the present invention.

Claims (11)

1. A text anti-cheating variant reduction method, the method comprising:
s11) taking the text as a root node text;
s12) directly taking the root node text as a child node text, and simultaneously expanding each character in the root node text according to each mapping in one of N mapping relations to generate the child node text, wherein the N mapping relations comprise a shape-near word mapping, a homophone word mapping, a harmonic word mapping and an interference character mapping;
s13) taking each child node text as a new child node text, and expanding each character in each child node text according to each mapping in one mapping relation in the non-adopted mapping relation to generate a new child node text;
s14) repeating the step S13) until each mapping in each mapping relation of all characters in all child node texts is traversed;
s15) carrying out smoothness scoring on all the child node texts generated by the last expansion in the traversed child node texts, and selecting the child node text with the highest smoothness score as a restored text.
2. The text anti-cheating variant reduction method according to claim 1, wherein the method steps of the form-near word mapping relation expansion are as follows:
capturing all characters in the root node text or the child node text by adopting image processing software, and carrying out image recognition on the captured characters;
chinese characters similar to the shape of each recognized character are expanded into child node text as shape near words.
3. The text anti-cheating variant reduction method according to claim 1, wherein the method steps of the form-near word mapping relation expansion are as follows:
capturing all characters in the root node text or the child node text by adopting image processing software, and carrying out image recognition on the captured characters;
capturing the disassembled data and the radical data of each character by adopting image processing software;
and under the condition that each character is subjected to radical removal, if the rest part of the characters subjected to radical removal can form Chinese characters, expanding the rest part into child node text as a shape near word.
4. The text anti-cheating variant reduction method according to claim 1, wherein the method steps of homophone mapping relation expansion are as follows:
converting each Chinese character in the root node text or the child node text into pinyin;
and expanding each Chinese character with the same pinyin as the pinyin as homophones into child node text.
5. The text anti-cheating variant restoration method according to claim 1, wherein the method steps of the harmonic word mapping relation expansion are as follows:
converting each Chinese character in the root node text or the child node text into pinyin;
and expanding each Chinese character with pinyin similar to the pinyin as a harmonic character into child node text.
6. The text anti-cheating variant reduction method according to claim 1, wherein the method step of expanding the interference character mapping relation is as follows:
the nonsensical characters are expanded as null characters into child node text.
7. The text anti-cheating variant reduction method according to claim 1, wherein step S13) further comprises: and scoring the smoothness of each child node text, and deleting M child node texts with the rear score according to the obtained smoothness score.
8. A method of text anti-cheating, the method comprising:
s1) reducing the variants in the text by the method of any one of claims 1-7 to obtain reduced text;
s2) carrying out keyword matching or model recognition on the text restored in the step S1);
and S3) marking the text successfully matched with the keyword in the step S2) as a cheating text, or marking the text successfully identified by the model as the cheating text.
9. A text anti-cheating variant reduction apparatus, comprising:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of any one of claims 1-7 by executing the instructions stored by the memory.
10. A text anti-cheating device, comprising:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of claim 8 by executing the instructions stored by the memory.
11. A machine-readable storage medium having stored thereon instructions which, when executed by a controller, cause the controller to perform the method of any of claims 1-7.
CN201910462812.0A 2019-05-30 2019-05-30 Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment Active CN110298020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910462812.0A CN110298020B (en) 2019-05-30 2019-05-30 Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910462812.0A CN110298020B (en) 2019-05-30 2019-05-30 Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment

Publications (2)

Publication Number Publication Date
CN110298020A CN110298020A (en) 2019-10-01
CN110298020B true CN110298020B (en) 2023-05-16

Family

ID=68027532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910462812.0A Active CN110298020B (en) 2019-05-30 2019-05-30 Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment

Country Status (1)

Country Link
CN (1) CN110298020B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591464B (en) * 2021-07-28 2022-06-10 百度在线网络技术(北京)有限公司 Variant text detection method, model training method, device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324883A (en) * 2008-07-31 2008-12-17 电子科技大学 Method for extracting variation key word
CN101976253A (en) * 2010-10-27 2011-02-16 重庆邮电大学 Chinese variation text matching recognition method
CN106021371A (en) * 2016-05-11 2016-10-12 苏州大学 Event recognition method and system
CN108804413A (en) * 2018-04-28 2018-11-13 百度在线网络技术(北京)有限公司 The recognition methods of text cheating and device
CN109241523A (en) * 2018-08-10 2019-01-18 北京百度网讯科技有限公司 Recognition methods, device and the equipment of variant cheating field
CN109408824A (en) * 2018-11-05 2019-03-01 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN109522550A (en) * 2018-11-08 2019-03-26 和美(深圳)信息技术股份有限公司 Text information error correction method, device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364470B2 (en) * 2008-01-15 2013-01-29 International Business Machines Corporation Text analysis method for finding acronyms
US8473501B2 (en) * 2009-08-25 2013-06-25 Ontochem Gmbh Methods, computer systems, software and storage media for handling many data elements for search and annotation
CN102053993B (en) * 2009-11-10 2014-04-09 阿里巴巴集团控股有限公司 Text filtering method and text filtering system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324883A (en) * 2008-07-31 2008-12-17 电子科技大学 Method for extracting variation key word
CN101976253A (en) * 2010-10-27 2011-02-16 重庆邮电大学 Chinese variation text matching recognition method
CN106021371A (en) * 2016-05-11 2016-10-12 苏州大学 Event recognition method and system
CN108804413A (en) * 2018-04-28 2018-11-13 百度在线网络技术(北京)有限公司 The recognition methods of text cheating and device
CN109241523A (en) * 2018-08-10 2019-01-18 北京百度网讯科技有限公司 Recognition methods, device and the equipment of variant cheating field
CN109408824A (en) * 2018-11-05 2019-03-01 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN109522550A (en) * 2018-11-08 2019-03-26 和美(深圳)信息技术股份有限公司 Text information error correction method, device, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Analysis of Textual Variation by Latent Tree Structures;Teemu Roos 等;《2011 IEEE 11th International Conference on Data Mining》;20120123;567-576 *
一种基于关联规则的中文变体词识别算法;赵俊杰;《重庆理工大学学报(自然科学)》;20180315;第32卷(第03期);178-185 *
反搜索引擎作弊中种子集合自动扩展算法研究;韩博;《中国优秀硕士学位论文全文数据库信息科技辑》;20100715(第07期);I138-1111 *
基于机器学习的不良短文本识别研究;韩伟;《中国优秀硕士学位论文全文数据库信息科技辑》;20181215(第12期);I138-2019 *

Also Published As

Publication number Publication date
CN110298020A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
Calvo-Zaragoza et al. Handwritten music recognition for mensural notation with convolutional recurrent neural networks
CN109977416B (en) Multi-level natural language anti-spam text method and system
CN109726657B (en) Deep learning scene text sequence recognition method
CN110851596A (en) Text classification method and device and computer readable storage medium
CN113903363B (en) Violation behavior detection method, device, equipment and medium based on artificial intelligence
CN111078893A (en) Method for efficiently acquiring and identifying linguistic data for dialog meaning graph in large scale
CN110287784B (en) Annual report text structure identification method
CN109583401A (en) Question searching method capable of automatically generating answers and user equipment
CN108509423A (en) A kind of acceptance of the bid webpage name entity abstracting method based on second order HMM
CN111742322A (en) System and method for domain and language independent definition extraction using deep neural networks
CN112287100A (en) Text recognition method, spelling error correction method and voice recognition method
CN108399157B (en) Dynamic extraction method of entity and attribute relationship, server and readable storage medium
CN110390104B (en) Irregular text transcription method and system for voice dialogue platform
CN110298020B (en) Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment
CN111078874B (en) Foreign Chinese difficulty assessment method based on decision tree classification of random subspace
Kesiman et al. A model for posttransliteration suggestion for balinese palm leaf manuscript with text generation and lstm model
CN114579796B (en) Machine reading understanding method and device
CN114332476A (en) Method, device, electronic equipment, storage medium and product for identifying dimensional language
CN116796796A (en) GPT architecture-based automatic document generation method and device
CN112364131B (en) Corpus processing method and related device thereof
CN115331236A (en) Method and device for generating handwriting whole-line sample
CN114565751A (en) OCR recognition model training method, OCR recognition method and related device
CN113990286A (en) Speech synthesis method, apparatus, device and storage medium
CN113935307A (en) Method and device for extracting features of advertisement case
CN111324745A (en) Word stock generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant