CN113408270B - Variant text recognition method and device and electronic equipment - Google Patents

Variant text recognition method and device and electronic equipment Download PDF

Info

Publication number
CN113408270B
CN113408270B CN202110651589.1A CN202110651589A CN113408270B CN 113408270 B CN113408270 B CN 113408270B CN 202110651589 A CN202110651589 A CN 202110651589A CN 113408270 B CN113408270 B CN 113408270B
Authority
CN
China
Prior art keywords
text
character
target
variant
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110651589.1A
Other languages
Chinese (zh)
Other versions
CN113408270A (en
Inventor
刘舟
徐键滨
吴梓辉
雷紫娟
董馨远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Sanqi Jichuang Network Technology Co ltd
Original Assignee
Guangzhou Sanqi Jichuang Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Sanqi Jichuang Network Technology Co ltd filed Critical Guangzhou Sanqi Jichuang Network Technology Co ltd
Priority to CN202110651589.1A priority Critical patent/CN113408270B/en
Publication of CN113408270A publication Critical patent/CN113408270A/en
Application granted granted Critical
Publication of CN113408270B publication Critical patent/CN113408270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Character Discrimination (AREA)

Abstract

The application discloses a method and a device for identifying a variant text and electronic equipment, wherein the method comprises the following steps: acquiring a text to be identified; acquiring the position of a first target character from a text to be recognized, detecting a first text character with a preset character interval between the first text character and the first target character in the text to be recognized according to the position of the first target character, and deleting the first target character and the first text character from the text to be recognized when the first text character is detected to be a numeric string; determining a target text according to the text to be recognized after the first target character and the first text character are deleted, and performing leading word matching after performing variant character conversion on the target text; and if the guide word is matched, marking the text to be recognized as variant text.

Description

Variant text recognition method and device and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for identifying a variant text, and an electronic device.
Background
And the user can issue information to communicate with other users in the running process of the networking application program. However, in a large amount of published information, there is inevitably malicious chat text published by a malicious user. These chat texts are usually avoided by using a font distortion to avoid recognition, such as "← → wei er ← → 199 ← → 2638 ← → 723 ← →" "jia dimension → xinI0230 burn 66183". For this reason, in the related art, the text character drop is matched by matching the guide words such as "WeChat, telephone, mail, plus" and the like, and the regular expressions of the number, the telephone number, the url link, and the variant text is identified according to the matching result. However, the accuracy of the scheme is low, for example, "great wesson is too high, the war force 2056421" can be recognized as variant characters, so that the final variant text recognition result is not accurate, and the subsequent selection of the banned text is influenced.
Disclosure of Invention
The present application aims to solve at least one of the technical problems in the prior art, and provides a method, an apparatus and an electronic device for identifying a variant text, so as to improve the identification accuracy and the identification efficiency of the variant text.
In a first aspect, an embodiment of the present application provides a method for identifying a variant text, including:
acquiring a text to be identified;
acquiring the position of a first target character from the text to be recognized, detecting a first text character with a preset character interval between the first text character and the first target character in the text to be recognized according to the position of the first target character, and deleting the first target character and the first text character from the text to be recognized when the first text character is detected to be a numeric string;
determining the target text according to the text to be recognized after the first target character and the first text character are deleted, and performing leading word matching after performing variant character conversion on the target text;
and if the guide word is matched, marking the text to be recognized as variant text.
By acquiring the position of the first target character, extracting the numeric string with a preset character interval between the numeric string and the first target character from the text according to the position of the first target character and deleting the numeric string to obtain the target text for variant character recognition, the possibility that the conventional text with the numeric string is recognized as the variant text by mistake in the related technology is reduced, and the recognition accuracy of the variant text is improved. Meanwhile, when variant text recognition is carried out, if part of text is deleted in the text to be recognized, the text length during subsequent variant text recognition can be reduced, so that the text amount needing variant text recognition is reduced, the subsequent recognition time of variant text recognition is saved, and the recognition efficiency is improved.
Further, determining the target text according to the text to be recognized in which the first target character and the first text character are deleted includes:
marking the text to be recognized, in which the first target character and the first text character are deleted, as a residual text;
and acquiring the position of a second target character from the residual text, detecting a second text character adjacent to the second target character in the residual text according to the position of the second target character, deleting the second target character and the second text character from the residual text when detecting that the second text character and the second target character form a preset character, and determining the target text.
Further, before performing variant word conversion on the target text, the method further includes:
matching the target text according to each preset variant character;
if the preset variant characters are matched, marking the text to be recognized as a variant text;
and if the preset variant characters are not matched, carrying out variant character conversion on the target text.
Further, if the preset variant character is matched, marking the text to be recognized as a variant text, including:
if the preset variant characters are matched, obtaining a target score of the target text according to the preset variant characters matched with the text to be recognized, and marking the text to be recognized as a variant text when the target score is larger than a preset threshold value.
Further, the method also comprises the following steps:
and when the target score is smaller than or equal to a preset threshold value, carrying out variant word conversion on the target text.
Further, performing variant word conversion on the target text, including:
and performing sound code conversion on the target text.
Further, the method also comprises the following steps:
and mapping the target text after the voice code conversion according to a preset mapping table.
In a second aspect, in an embodiment of the present application, there is further provided an apparatus for recognizing a variant text, including:
the text acquisition module is used for acquiring a text to be identified;
the text processing module is used for acquiring the position of a first target character from the text to be recognized, detecting a first text character with a preset character interval between the first text character and the first target character in the text to be recognized according to the position of the first target character, and deleting the first target character and the first text character from the text to be recognized when the first text character is detected to be a numeric string;
the text matching module is used for determining the target text according to the text to be recognized after the first target character and the first text character are deleted, and performing leading word matching after the target text is subjected to variant character conversion;
and the text recognition module is used for marking the text to be recognized as a variant text if the guide word is matched.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for identifying a variant text as described in the above embodiments when executing the program.
In a fourth aspect, the present application provides a computer-readable storage medium storing computer-executable instructions for causing a computer to execute the method for identifying variant texts according to the foregoing embodiment.
Drawings
The present application is further described with reference to the following figures and examples;
FIG. 1 is a diagram of an exemplary implementation of a method for variant text recognition;
FIG. 2 is a flow diagram that illustrates a method for variant text recognition, according to one embodiment;
FIG. 3 is a block diagram of an apparatus for recognizing a variant text in one embodiment;
FIG. 4 is a block diagram of a computer device in one embodiment.
Detailed Description
Reference will now be made in detail to the present embodiments of the present application, preferred embodiments of which are illustrated in the accompanying drawings, which are for the purpose of visually supplementing the description with figures and detailed description, so as to enable a person skilled in the art to visually and visually understand each and every feature and technical solution of the present application, but not to limit the scope of the present application.
During the operation of the networking application program, for example, during the operation of the network game, the user can issue information to communicate with other users. However, in a large amount of distributed information, pull information distributed by a malicious user inevitably exists, wherein the pull information lures the user to pull together by using baits such as high welfare. On one hand, the malicious users swipe the screen for a long time, which seriously affects the experience of normal users; on the other hand, if the user is pulled to other platforms, the user's churn rate will be increased, and the platform traffic will be reduced. Therefore, the conventional method is to obtain the chat text of the user, identify the chat text of the user, extract keywords from the chat text and then compare the keywords, so as to select the user to be prohibited according to the comparison result.
However, currently many malicious chat texts are evaded using morphing fonts, such as "← → wei er ← → 199 ← → 2638 ← → 723 ← →" _ jia dimension → xinI0230 burn 66183", and the like. For this reason, in the related art, the text character drop is matched by matching the guide words such as "WeChat, telephone, mail, plus" and the like, and the regular expressions of the number, the telephone number, the url link, and the variant text is identified according to the matching result. However, the accuracy of this scheme is low, and there may be cases where conventional text in which a number string exists is misrecognized as variant text. For example, "wesson in great guy is too high, the battle force 2056421W" is recognized as a variant character due to the combination of the sensitive word "wesson" plus the number string "2056421", and even may be recognized as a variant character only due to the number string "2056421", which causes the result of the final variant text recognition to be inaccurate, thereby affecting the subsequent selection of the sealed text.
In order to solve the above technical problem, in an embodiment, a method for recognizing a variant text is provided, and the embodiment is exemplified by applying the method to a server in a system for recognizing a variant text. Fig. 1 is a diagram illustrating an application environment of the method for recognizing a variant text in one embodiment. Referring to fig. 1, the system includes a terminal 110 and a server 120. The terminal 110 and the local server 120 are connected through a network. The terminal 110 may be specifically a desktop terminal or a mobile terminal, and the mobile terminal may be one of a mobile phone, a tablet computer, a notebook computer, a wearable device, and the like. The server 120 may be implemented by an independent server or a server cluster composed of a plurality of servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 110 runs a client, and the client and the server correspond to the same application program. The client is used for sending the text to be recognized to the server. The server is used for detecting the first text character with a preset character interval between the first text character and the first target character after acquiring the position of the first target character in the text to be recognized. And if the first text character is a numeric string, deleting the target character and the text character from the text to be recognized, and performing variant recognition on the target text after determining the target text according to the text to be recognized in which the first target character and the first text character are deleted. When the target text is recognized to be the variant text, the text to be recognized is marked as the variant text, so that the text to be recognized can be forbidden subsequently, and meanwhile, a user who issues the text to be recognized can be forbidden.
By acquiring the position of the first target character, extracting the numeric string with a preset character interval between the numeric string and the first target character from the text according to the position of the first target character and deleting the numeric string to obtain the target text for variant character recognition, the possibility that the conventional text with the numeric string is recognized as the variant text by mistake in the related technology is reduced, and the recognition accuracy of the variant text is improved. Meanwhile, when variant text recognition is carried out, if part of text is deleted in the text to be recognized, the text length during subsequent variant text recognition can be reduced, so that the text amount needing variant text recognition is reduced, the subsequent recognition time of variant text recognition is saved, and the recognition efficiency is improved.
The method for identifying the variant text provided by the embodiments of the present application will be described and explained in detail by several specific embodiments.
In one embodiment, as shown in fig. 2, a method for identifying variant text is provided. The embodiment is mainly illustrated by applying the method to computer equipment. The computer device may specifically be the server 120 in fig. 1 described above.
Referring to fig. 2, the method for identifying the variant text specifically includes the following steps:
and S11, acquiring a text to be recognized.
In an embodiment, in order to identify the variant text, the server may obtain, from the client, chat information sent by the user as the text to be identified, or obtain, from the client, chat information received by the user as the text to be identified. Illustratively, taking the client as the network game as an example, the chat information sent by the user in the chat channel of the network game or in private chat with other users is obtained, for example, "great guo wesson is too high, war 2056421 is unmanned, and we can also refuel! "as the text to be recognized. Or, obtaining the push information received by the user in the client, such as "need assistance plus boos1857 And taking the jun sheep of the war as a text to be identified.
S12, acquiring the position of a first target character from the text to be recognized, detecting the first text character with a preset character interval between the first text character and the first target character in the text to be recognized according to the position of the first target character, and deleting the first target character and the first text character from the text to be recognized when the first text character is detected to be a numeric string.
Considering that the text to be recognized with the number strings is sometimes not variant text, particularly in games, evaluation of character attributes often appears in the text, such as text like "i have 2056421 battle", "have 100 angry", "have 600 credits", "have risen to 30 level", and the like. Therefore, in order to avoid judging the conventional text with the numeric string as the variant text, in an embodiment, a target character library with a plurality of first target characters is preset, and the first target characters can comprise text characters which are simultaneously appeared with the numeric string and are adjacent to the numeric string according to N times in each historical chat text. For example, the history text includes history text such as "i have 2056421 war power", "total 2056421 war power", and "he has 2056421 war power", wherein the character "war power" appears simultaneously with the number string for many times, and at this time, the character "war power" can be used as the first target character. The first target character can also comprise a character which is preset according to the requirement, and taking a game client as an example, because the keywords such as 'hour', 'minute', 'score', 'anger', 'level', 'integral', 'battle' and the like are combined with numbers which are normal number combinations in game communication, the keywords can be preset as the first target character.
In one embodiment, the predetermined character interval may be obtained from a large amount of experimental data, or may be set by one skilled in the art. Preferably, the first text character in the text to be recognized, which has a preset character interval with the first target character, is detected, and the text character adjacent to the first target character may be detected as the first text character. Where text characters include words, symbols and letters and strings of numbers consisting of consecutive numbers. In consideration of the usage habit of grammar, there may be prepositions or letters representing units between the first target character and the numeric string, such as "he has 2056421W battle", "his anger value is 100", etc., so that, for example, a literal character having a character interval with the target character may also be detected as the first text character. I.e. the preset character interval may comprise 0 and/or 1. When the preset character interval is 0, detecting a first text character adjacent to the first target character; when the preset character interval is 1, detecting a first text character with a character interval with a first target character; when the preset character interval is 0 and 1, detecting a first text character adjacent to the first target character and detecting the first text character having a character interval with the first target character.
And when detecting that the first text character is a numeric string, deleting the first target character and the first text character from the text to be recognized. If the recognized first target character is "battle", the text to be recognized is "great wesson is too high, and the battle 2029W", the "battle 2029" is deleted from the text to be recognized, and the text "great wesson is too high, W" is obtained.
S13, determining a target text according to the text to be recognized after the first target character and the first text character are deleted, and performing leading word matching after performing variant character conversion on the target text.
In one embodiment, the text to be recognized after the first target character and the first text character are deleted can be directly used as the target text. And then carrying out variant word conversion on the target text through a preset variant word conversion word list. The variant word conversion word list stores variant word conversion relations, so that variant word conversion can be performed on the target text according to the variant word conversion word list. The variant character conversion word list stores variant character conversion relations such as Chinese-English conversion, character-number conversion, symbol/letter-character conversion and the like. Such as "wechat" - "Wechat", "one" - "1", "jia/+" - "plus", etc. After the target text is subjected to variant character conversion, the target text after variant character conversion can be subjected to leading word matching based on a preset leading word list. For example, the introductory word may be "WeChat," "QQ," etc.
Considering that some words may have sensitive words besides the word string, but the words belong to regular text rather than variant text, for example, words such as "add", "little", "smile", etc., there are sensitive words such as "add", "little", etc., which are usually used as guide words for recognizing variant text, and this may result in misrecognizing the text to be recognized with the words as variant text. For this purpose, in an embodiment, after deleting the first target character and the first text character from the text to be recognized, marking the text to be recognized, from which the first target character and the first text character are deleted, as a residual text; and acquiring the position of a second target character from the residual text, detecting a second text character adjacent to the second target character in the residual text according to the position of the second target character, deleting the second target character and the second text character from the text to be recognized when detecting that the second text character and the second target character form a preset character, and determining the target text.
The second target character may be a character often listed as a leader, such as "micro", "plus", or the like, and may be obtained specifically according to a large amount of experimental data, or may be set by a person skilled in the art. The second target character is not the same character as the first target character. The preset characters comprise second target characters, and can be preset according to actual conditions, such as "smile", "small", "breeze", and other common words containing the second target characters. And when a preset character formed by a second target character and a second text character is detected, deleting the second target character and the second text character from the residual text to obtain the target character.
It can be understood that if the first text character is detected in the text to be recognized as not being a numeric string, the text to be recognized is directly marked as the remaining text. And if the character formed by the second text character and the second target character is detected not to be a preset character in the residual text, directly determining the residual text as the target text.
And performing secondary filtering on the text to be recognized by detecting whether the second text character and the adjacent second target character form a preset character, so that the recognition efficiency is further improved while the possibility that the conventional text is mistakenly recognized as a variant text is further reduced.
And S14, if the guide word is matched, marking the text to be recognized as a variant text.
In one embodiment, if the leading word is matched, the text to be recognized is judged to be variant text, and therefore the variant text is subjected to blocking processing. If the guide words are not matched, the text to be recognized is judged to be a normal text, and meanwhile, the text to be recognized is issued through the client.
By acquiring the position of the first target character, extracting the numeric string with a preset character interval between the numeric string and the first target character from the text according to the position of the first target character and deleting the numeric string to obtain the target text for variant character recognition, the possibility that the conventional text with the numeric string is recognized as the variant text by mistake in the related technology is reduced, and the recognition accuracy of the variant text is improved. Meanwhile, when variant text recognition is carried out, if part of text is deleted in the text to be recognized, the text length during subsequent variant text recognition can be reduced, so that the text amount needing variant text recognition is reduced, the subsequent recognition time of variant text recognition is saved, and the recognition efficiency is improved.
When the variant text is identified for the target text, multiple variant conversion processes need to be performed first, and then the guide word matching needs to be performed, so that if the text is too long, the identification efficiency of the variant text is low. In order to improve the recognition efficiency of the variant text, in an embodiment, before the variant word conversion is performed on the target text, the method further includes: matching the target text according to each preset variant character; if the preset variant characters are matched, marking the text to be recognized as a variant text; and if the preset variant characters are not matched, carrying out variant character conversion on the target text.
In an embodiment, the server is pre-stored with a variant character library, where the variant character library is pre-stored with a plurality of preset variant characters, and the preset variant characters may be chinese characters, arabic numerals, symbols, english, or the like. The number of preset variant characters stored in the variant character library is less than the number of characters recorded in the variant word conversion word list. The predetermined variant characters may be obtained from a large amount of experimental data, or may be set by one skilled in the art.
After the target text is obtained, matching the target text with each preset variant character, and if the corresponding preset variant character is matched in the target text, judging the text to be a variant text; and if the target text is not matched with the target text, performing variant word conversion on the target text.
Before the variant character conversion processing is carried out, the feature matching is carried out according to the preset variant characters, so that the variant text can be identified according to the preset variant characters before the variant character conversion processing is carried out, the subsequent processing time of the text is saved, and the efficiency is improved.
In order to further improve the accuracy of the result of the variant text recognition, in an embodiment, if a preset variant character is matched, the text to be recognized is marked as a variant text, which includes: and if the preset variant characters are matched, acquiring a target score of the target text according to the preset variant characters matched with the text to be recognized, and marking the text to be recognized as the variant text when the target score is greater than a preset threshold value. And when the target score is less than or equal to a preset threshold value, carrying out variant word conversion on the target text.
In an embodiment, the target score of the target text is obtained according to the preset variant characters matched with the target text, and the score of each matched preset variant character can be obtained according to a preset score mapping table. The score mapping table stores corresponding relations between each different preset variant character and the score, for example, the preset variant characters are 'WeChat', 'Add me', and the like, and the corresponding relations are 'WeChat' - '30 points', 'add me' - '20' points. And according to the score mapping table, weighting the score matched from the target text to each preset variant character so as to obtain the target score of the target text.
In an embodiment, the target score of the target text is obtained according to the preset variant characters matched with the target text, and the target score can be determined according to the number of the preset variant characters matched with the target text. For example, the server presets the matching relationship between the preset variant character number and the target score, such as "preset variant character number 1" - "20 points". The target scores corresponding to the preset variant character number can be preset according to actual requirements, and only the condition that the preset variant character number is in direct proportion to the corresponding target scores needs to be met, namely the more the preset variant character number is, the higher the corresponding target score is, namely the mapping relation is.
Before the variant word conversion processing is carried out, the variant text can be identified according to the score of the preset variant character, and a regular variant text identification strategy is realized by identifying the preset variant character in the target text and scoring, so that the accuracy of the algorithm and the accuracy of the variant text identification are improved.
In an embodiment, when performing variant character conversion on the target text, the Chinese characters in the target text may be subjected to sound code conversion, the Chinese characters having the same sound as the numerals are converted into arabic numerals, and then the guide word matching is performed on the target text after the sound code conversion. For example, "jiawei letter 12 three 45678" is converted into "jiawei letter 12345678" after being subjected to sound code conversion.
In order to further improve the accuracy of the variant character conversion, in an embodiment, after the phonetic code conversion is performed on the Chinese characters in the target text, the target text after the phonetic code conversion can be mapped according to a preset mapping table. The server is provided with a preset mapping table, and the preset mapping table stores mapping relations between symbols and letters, chinese character bodies or numbers in advance so as to map symbols which are similar to the letters and suspicious sensitive characters in the text into the letters or the Chinese character bodies. Such as "Jia me wei alpha" Java: this $12359", after mapping processing according to the preset mapping table, we can get the following information of" plus I Wei (& alpha ]: YS12359".
After mapping processing is carried out on the target text after the voice code conversion according to a preset mapping table, encoding-decoding processing can be carried out on the text after the mapping processing, and therefore the final text is obtained to carry out leading word matching. If it is necessary to assist boos1857 The battle jungle is coded and decoded to obtain 'needing to assist the boost 1857 to battle jungle'.
In one embodiment, as shown in fig. 3, there is provided a variant text recognition apparatus, including:
the text acquiring module 101 is configured to acquire a text to be recognized.
The text processing module 102 is configured to obtain a position of a first target character from the text to be recognized, detect, according to the position of the first target character, a first text character having a preset character interval with the first target character in the text to be recognized, and delete, when the first text character is detected to be a numeric string, the first target character and the first text character from the text to be recognized.
The text matching module 103 is configured to determine a target text according to the text to be recognized after the first target character and the first text character are deleted, perform morphing conversion on the target text, and perform lead word matching.
And the text recognition module 104 is configured to mark the text to be recognized as a variant text if the guide word is matched.
In an embodiment, the text matching module 103 is further configured to: marking the text to be recognized, in which the first target character and the first text character are deleted, as a residual text; and acquiring the position of a second target character from the residual text, detecting a second text character adjacent to the second target character in the residual text according to the position of the second target character, deleting the second target character and the second text character from the residual text when detecting that the second text character and the second target character form a preset character, and determining the target text.
In an embodiment, before performing the morphing conversion on the target text, the text matching module 103 is further configured to: matching the target text according to each preset variant character; if the preset variant characters are matched, marking the text to be recognized as a variant text; and if the preset variant characters are not matched, carrying out variant character conversion on the target text.
In one embodiment, the text matching module 103 is further configured to: and if the preset variant characters are matched, acquiring a target score of the target text according to the preset variant characters matched with the text to be recognized, and marking the text to be recognized as the variant text when the target score is greater than a preset threshold value.
In one embodiment, the text matching module 103 is further configured to: and when the target score is less than or equal to a preset threshold value, carrying out variant word conversion on the target text.
In one embodiment, the text matching module 103 is further configured to: and performing sound code conversion on the target text.
In one embodiment, the text matching module 103 is further configured to: and mapping the target text after the voice code conversion according to a preset mapping table.
In one embodiment, a computer apparatus is provided, as shown in fig. 4, comprising a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program which, when executed by the processor, causes the processor to implement the method of variant text recognition. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a method of variant text recognition. Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the variant text recognition apparatus provided in the present application may be implemented in the form of a computer program, which is executable on a computer device as shown in fig. 4. The memory of the computer device may store program modules constituting the means for recognizing the variant text. The computer program constituted by the respective program modules causes the processor to execute the steps in the method for recognizing the variation text of the embodiments of the present application described in the present specification.
In one embodiment, there is provided an electronic device including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to perform the steps of the method for recognition of variant text described above. Here, the steps of the method for recognizing a variant text may be the steps in the method for recognizing a variant text of the above-described embodiments.
In one embodiment, a computer-readable storage medium is provided, which stores computer-executable instructions for causing a computer to perform the steps of the above method for identifying variant text. Here, the steps of the method for identifying a variant text may be steps in the method for identifying a variant text of each of the above embodiments.
The foregoing is a preferred embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations are also regarded as the protection scope of the present application.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (9)

1. A method for identifying variant text in a game, comprising:
acquiring a text to be identified;
acquiring the position of a first target character from the text to be recognized, detecting a first text character with a preset character interval between the first text character and the first target character in the text to be recognized according to the position of the first target character, and deleting the first target character and the first text character from the text to be recognized when the first text character is detected to be a numeric string; the first target character is a text character in a preset target character library;
determining a target text according to the text to be recognized after the first target character and the first text character are deleted, and performing leading word matching after performing variant character conversion on the target text;
if the guide word is matched, marking the text to be recognized as a variant text;
determining the target text according to the deleted first target character and the text to be recognized of the first text character, wherein the determining of the target text comprises the following steps: marking the text to be recognized, in which the first target character and the first text character are deleted, as a residual text; and acquiring the position of a second target character from the residual text, detecting a second text character adjacent to the second target character in the residual text according to the position of the second target character, deleting the second target character and the second text character from the residual text when detecting that the second text character and the second target character form a preset character, and determining the target text.
2. The method for identifying variant texts according to claim 1, further comprising, before performing variant word conversion on the target text:
matching the target text according to each preset variant character;
if the preset variant characters are matched, marking the text to be recognized as a variant text;
and if the preset variant character is not matched, carrying out variant character conversion on the target text.
3. The method for recognizing the variant text according to claim 2, wherein if the preset variant character is matched, the text to be recognized is marked as the variant text, and the method comprises:
if the preset variant characters are matched, obtaining a target score of the target text according to the preset variant characters matched with the text to be recognized, and marking the text to be recognized as a variant text when the target score is larger than a preset threshold value.
4. The method for recognizing a variant text according to claim 3, further comprising:
and when the target score is smaller than or equal to a preset threshold value, carrying out variant word conversion on the target text.
5. The method for identifying variant texts according to any one of claims 1-4, wherein the variant word conversion of the target text comprises:
and performing sound code conversion on the target text.
6. The method for identifying variant text as claimed in claim 5, further comprising:
and mapping the target text after the voice code conversion according to a preset mapping table.
7. An apparatus for recognizing a variant text in a game, comprising:
the text acquisition module is used for acquiring a text to be identified;
the text processing module is used for acquiring the position of a first target character from the text to be recognized, detecting a first text character with a preset character interval between the first text character and the first target character in the text to be recognized according to the position of the first target character, and deleting the first target character and the first text character from the text to be recognized when the first text character is detected to be a numeric string;
the text matching module is used for determining a target text according to the text to be recognized after the first target character and the first text character are deleted, and performing leading word matching after the target text is subjected to variant character conversion; determining the target text according to the deleted first target character and the text to be recognized of the first text character, wherein the determining of the target text comprises the following steps: marking the text to be recognized, in which the first target character and the first text character are deleted, as a residual text; acquiring the position of a second target character from the residual text, detecting a second text character adjacent to the second target character in the residual text according to the position of the second target character, deleting the second target character and the second text character from the residual text when detecting that the second text character and the second target character form a preset character, and determining the target text;
and the text recognition module is used for marking the text to be recognized as a variant text if the guide word is matched.
8. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for recognition of variant text according to any of claims 1 to 6 when executing the program.
9. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded and executed by a processor to cause a computer device having said processor to carry out the method of any one of claims 1 to 6.
CN202110651589.1A 2021-06-10 2021-06-10 Variant text recognition method and device and electronic equipment Active CN113408270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110651589.1A CN113408270B (en) 2021-06-10 2021-06-10 Variant text recognition method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110651589.1A CN113408270B (en) 2021-06-10 2021-06-10 Variant text recognition method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113408270A CN113408270A (en) 2021-09-17
CN113408270B true CN113408270B (en) 2023-02-10

Family

ID=77683456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110651589.1A Active CN113408270B (en) 2021-06-10 2021-06-10 Variant text recognition method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113408270B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241523A (en) * 2018-08-10 2019-01-18 北京百度网讯科技有限公司 Recognition methods, device and the equipment of variant cheating field
CN109657228A (en) * 2018-10-31 2019-04-19 北京三快在线科技有限公司 It is a kind of sensitivity text determine method and device
CN112199948A (en) * 2020-09-28 2021-01-08 中国互联网金融协会 Text content identification and illegal advertisement identification method and device and electronic equipment
CN112287684A (en) * 2020-10-30 2021-01-29 中国科学院自动化研究所 Short text auditing method and device integrating variant word recognition

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108885699B (en) * 2018-07-11 2020-06-26 深圳前海达闼云端智能科技有限公司 Character recognition method, device, storage medium and electronic equipment
CN109657738B (en) * 2018-10-25 2024-04-30 平安科技(深圳)有限公司 Character recognition method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241523A (en) * 2018-08-10 2019-01-18 北京百度网讯科技有限公司 Recognition methods, device and the equipment of variant cheating field
CN109657228A (en) * 2018-10-31 2019-04-19 北京三快在线科技有限公司 It is a kind of sensitivity text determine method and device
CN112199948A (en) * 2020-09-28 2021-01-08 中国互联网金融协会 Text content identification and illegal advertisement identification method and device and electronic equipment
CN112287684A (en) * 2020-10-30 2021-01-29 中国科学院自动化研究所 Short text auditing method and device integrating variant word recognition

Also Published As

Publication number Publication date
CN113408270A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
WO2019184217A1 (en) Hotspot event classification method and apparatus, and storage medium
US8543375B2 (en) Multi-mode input method editor
CN107357824B (en) Information processing method, service platform and computer storage medium
CN114490998B (en) Text information extraction method and device, electronic equipment and storage medium
CN115309877A (en) Dialog generation method, dialog model training method and device
CN111354340B (en) Data annotation accuracy verification method and device, electronic equipment and storage medium
CN111444905B (en) Image recognition method and related device based on artificial intelligence
US11947909B2 (en) Training a language detection model for language autodetection from non-character sub-token signals
CN110674370A (en) Domain name identification method and device, storage medium and electronic equipment
CN111444906B (en) Image recognition method and related device based on artificial intelligence
CN110362826A (en) Periodical submission method, equipment and readable storage medium storing program for executing based on artificial intelligence
CN108090044B (en) Contact information identification method and device
CN113408270B (en) Variant text recognition method and device and electronic equipment
CN112860995A (en) Interaction method, device, client, server and storage medium
KR20190134100A (en) Method and apparatus for providing chatting service
CN113705164A (en) Text processing method and device, computer equipment and readable storage medium
JP2018195272A (en) Information extraction device
CN115186095B (en) Juvenile text recognition method and device
CN114708580B (en) Text recognition method, text recognition model training method, text recognition device, model training device, text recognition program, model training program, and computer-readable storage medium
CN113746814B (en) Mail processing method, mail processing device, electronic equipment and storage medium
CN113360617B (en) Abnormality recognition method, apparatus, device, and storage medium
CN114416974A (en) Model training method and device, electronic equipment and storage medium
CN112131374A (en) Text recognition method and device and server
CN113836917A (en) Text word segmentation processing method and device, equipment and medium thereof
CN111695350B (en) Word segmentation method and word segmentation device for text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant