CN109359274B

CN109359274B - Method, device and equipment for identifying character strings generated in batch

Info

Publication number: CN109359274B
Application number: CN201811074092.2A
Authority: CN
Inventors: 江大鹏
Original assignee: ANT Financial Hang Zhou Network Technology Co Ltd
Current assignee: ANT Financial Hang Zhou Network Technology Co Ltd
Priority date: 2018-09-14
Filing date: 2018-09-14
Publication date: 2023-05-02
Anticipated expiration: 2038-09-14
Also published as: CN109359274A

Abstract

The specification discloses a method, a device and equipment for identifying character strings generated in batches. The method comprises the following steps: receiving character strings to be identified which are generated in batches; dividing the character string to be identified to obtain at least one sub-character string of the character string to be identified; determining the occurrence probability of at least one sub-character string of the character string to be identified, and determining the randomness degree of the character string to be identified according to the occurrence probability of the sub-character string; judging whether the character string to be recognized is a randomly generated character string or not according to the randomness degree of the character string to be recognized.

Description

Method, device and equipment for identifying character strings generated in batch

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for identifying a character string generated in batch.

Background

With the development and popularization of internet technology, more and more character strings in network platforms are character strings automatically generated in batches by machines. Taking batch registration accounts as an example, these batch registration accounts may use various functions of the platform. Because ordinary users do not use such account numbers, a lot of garbage content is brought to the platform, and even resources are lost. For example, the criticizing water army of information application, a plurality of accounts express in a short time and similar views, guide public opinion trend and influence normal user experience. For another example, if the e-commerce site has a greedy and cheap person such as a 'wool party', the subsidy resource of the e-commerce site is obtained by using the batch registration account number, so that the marketing funds are seriously wasted, and the marketing effect is greatly discounted.

In the prior art, such accounts are identified by a supervised learning classification algorithm, such as LR, SVM, etc., to classify the account number. According to the algorithm, a large number of accounts are manually marked as common accounts or random accounts, training data training classification models are obtained, and then the input accounts are classified, so that the labor consumption is high. Moreover, because the character strings with smaller overall lengths contain too little information, the classification model has poor classification effect on the character strings with smaller overall lengths, and cannot be recognized well.

Disclosure of Invention

The embodiment of the specification provides a method, a device and equipment for identifying character strings generated in batches. The problem that manual labeling of a large number of accounts consumes large labor and the classification effect of the classification model on the character strings with smaller overall length is poor is solved.

In order to solve the above technical problems, the embodiments of the present specification are implemented as follows:

the embodiment of the specification provides a method for identifying character strings generated in batches, which comprises the following steps:

receiving character strings to be identified which are generated in batches;

dividing the character string to be identified to obtain at least one sub-character string of the character string to be identified;

determining the occurrence probability of at least one sub-character string of the character string to be identified, and determining the randomness degree of the character string to be identified according to the occurrence probability of the sub-character string;

judging whether the character string to be recognized is a randomly generated character string or not according to the randomness degree of the character string to be recognized.

The embodiment of the specification provides a device for identifying character strings generated in batches, which comprises: the device comprises a receiving module, a dividing module, a determining module and a judging module;

the receiving module is used for receiving the character strings to be identified which are generated in batches;

the segmentation module is used for segmenting the character string to be identified to obtain at least one sub-character string of the character string to be identified;

the determining module is used for determining the occurrence probability of at least one sub-character string of the character string to be identified, and determining the randomness degree of the character string to be identified according to the occurrence probability of the sub-character string;

the judging module is used for judging whether the character string to be identified is a randomly generated character string according to the randomness degree of the character string to be identified.

The device for identifying character strings generated in batches provided in the embodiment of the specification comprises: the system comprises a memory and a processor, wherein the memory stores a program and is configured to execute the method for identifying the character strings generated in batch.

The above-mentioned at least one technical scheme that this description embodiment adopted can reach following beneficial effect: determining the randomness degree of the character string by determining the occurrence probability of the sub-character string of the character string, and further judging whether the character string is a randomly generated character string or not, wherein a large amount of training data is not required to be marked manually in the whole process, and the labor cost is saved; aiming at the type of the character string to be identified, sample character string data can be selected in a targeted manner; the effect of recognizing the character strings with smaller overall length is improved.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for identifying a batch of character strings according to an embodiment of the present disclosure;

FIG. 2 is another flow chart of a method for recognizing a batch-generated character string according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an apparatus for recognizing a batch-generated character string according to an embodiment of the present disclosure.

Detailed Description

The embodiment of the specification provides a method, a device and equipment for identifying character strings generated in batches.

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

Fig. 1 is a schematic flow chart of a method for identifying a batch of character strings according to an embodiment of the present disclosure, where the schematic flow chart includes:

step 105, receiving character strings to be identified generated in batches;

in the embodiment of the present disclosure, account numbers of each large network platform are taken as an example, and the account numbers are character strings formed by splicing characters. The account number automatically generated by the machine is a random character string formed by character stitching, such as 'iehfdjksyneyg', and most of account numbers registered by common users adopt character strings with a certain meaning, such as 'ilekkobe', and the randomness degree of the character strings is far greater than that of the account numbers registered by the common users.

As in step 220 of fig. 2, a string (a string to be recognized generated in batch) is input, and in the embodiment of the present disclosure, the string input in step 220 is received, for example, the string "ak, ti od e dgza" is received.

Step 110, segmenting the character string to be identified to obtain at least one sub-character string of the character string to be identified;

preferably, the character string "ak, ti odoe dgza" to be identified received in step 105 is preprocessed, and characters which cannot be used by the account numbers such as space and punctuation marks are removed, so that the preprocessed character string is "aktiodoedza"; the pre-processed string is re-segmented to obtain at least one substring, as shown in step 225 of FIG. 2.

In this embodiment of the present disclosure, the preprocessed string is divided at intervals of a preset character length, for example, the preprocessed string is divided once every two character pairs and/or the preprocessed string is divided once every three character pairs, so as to obtain at least one sub-string.

In the embodiment of the present disclosure, if n=2 of the N-gram model is taken, the preprocessed string "aktiodoedza" is divided to obtain substrings "ak", "ti", "od", "oe", "dg" and "za"; if n=3 of the N-gram model is taken, the preprocessed string "aktiodoedgza" is split to obtain substrings "akt", "iod", "oed" and "gza".

Step 115, determining the occurrence probability of at least one sub-character string of the character string to be recognized, and determining the randomness degree of the character string to be recognized according to the occurrence probability of the sub-character string;

in the embodiment of the present specification, the probability dictionary is first used to match the probabilities of occurrence of the substrings "ak", "ti", "od", "oe", "dg" and "za" of the character string "ak, ti odoe dgza" to be recognized. According to the occurrence probability of the sub-character strings, the occurrence probability of the character strings "ak, ti odoe dgza" to be identified is calculated, and the randomness degree R of the character strings "ak, ti odoe dgza" to be identified is further determined, as shown in step 230 in fig. 2; wherein the probability dictionary contains correspondence between sample substrings and probabilities of the sample substrings. Specifically, in the case where the probabilities of occurrence of the substrings "ak", "ti", "od", "oe", "dg", and "za" are obtained as 0.79, 0.59, 0.63, 0.71, 0.56, and 0.68, respectively, the geometric average values of 0.79, 0.59, 0.63, 0.71, 0.56, and 0.68 are calculated as 0.66 as the probability P of occurrence of the character string to be recognized "ak, ti odoe dgza", and further, the degree of randomness r=1 to P of the character string to be recognized "ak, ti odoe dgza", the degree of randomness R is 0.34; or under the condition that the probability that at least two adjacent sub-strings in the sub-strings 'ak', 'ti', 'od', 'oe', 'dg' and 'za' are simultaneously appeared is obtained, taking the geometric average value of the probability that the adjacent at least two sub-strings are simultaneously appeared as the probability P of the character string to be identified. Taking the following example of obtaining the probability that two adjacent substrings "ak" and "ti", "ti" and "od", "od" and "oe", "oe" and "dg" and "za" occur simultaneously, respectively, calculating the geometric average value of 0.69, 0.63, 0.71 and 0.66 as 0.68 as the probability P that the character string "ak, ti od dgza" to be recognized occurs, and further, the randomness degree r=1-P of the character string "ak, ti od dgza" to be recognized, then the randomness degree R is 0.32; or under the condition that the probability of single occurrence of the sub-character strings of the character strings 'ak, ti odoe dgza' to be identified and the probability of simultaneous occurrence of two adjacent sub-character strings are obtained simultaneously, taking the arithmetic average value of the probability geometric average value of single occurrence of the sub-character strings and the probability geometric average value of simultaneous occurrence of the two adjacent sub-character strings as the probability P of occurrence of the character strings 'ak, ti odoe dgza' to be identified, wherein the probability P is 0.67. And determining the randomness degree R of the character string to be recognized to be 0.33 according to the probability 0.67 of the character string to be recognized.

It should be noted that, before the probabilities of occurrence of the sub-strings "ak", "ti", "od", "oe", "dg" and "za" of the character strings "ak, ti odoe dgza" to be recognized are matched by using the probability dictionary, the probability dictionary is obtained. In the embodiment of the present specification, the type of the sample string data is the same as the type of the batch-generated character string to be recognized. Therefore, taking an english magazine, an english web page, or other english articles that can be normally obtained as sample string data, step 205 in fig. 2 is taken as an example. Further, the sample character string data is segmented to obtain a plurality of sample substrings; as shown in step 210 of fig. 2, the number of times that a plurality of sample substrings occur individually and/or the number of times that at least two adjacent sample substrings occur simultaneously is counted; calculating the probability of the single occurrence of the plurality of sample substrings and/or the probability of the simultaneous occurrence of the adjacent at least two sample substrings to obtain a probability dictionary, as shown in step 215 in fig. 2; the probability dictionary comprises a plurality of sample substrings and the probability that the sample substrings appear independently and/or comprises at least two adjacent sample substrings and the probability that the sample substrings appear simultaneously.

And step 120, judging whether the character string to be recognized is a randomly generated character string according to the randomness degree of the character string to be recognized.

In the embodiment of the present disclosure, as shown in step 235 in fig. 2, the randomness degree R is determined to be equal to the preset random threshold. As shown in step 240 in fig. 2, if the randomness R of the character string "ak, ti odoe dza" is greater than the preset randomness threshold, it is determined that the character string "ak, ti odoe dza" is a randomly generated character string. The preset random threshold=1-a preset probability threshold; the preset probability threshold is the median of the probabilities of the single occurrence of a plurality of sample substrings in the probability dictionary; or the median of the probability of simultaneous occurrence of at least two adjacent sample substrings in the probability dictionary; or an arithmetic average of the median of probabilities of the individual occurrence of the plurality of sample substrings in the probability dictionary and the median of probabilities of the simultaneous occurrence of at least two adjacent sample substrings in the probability dictionary. Taking the preset probability threshold value of 0.7 as an example, the preset random threshold value of 0.3 is obtained. The randomness R of the character string "ak, ti odoe dgza" to be identified obtained in the above step 115 is greater than a preset random threshold value of 0.3. Thus, the character string "ak, ti odoe dgza" to be recognized is a randomly generated character string. As shown in step 245 in fig. 2, in the case that the randomness degree R of the character string is not greater than the preset randomness threshold, the character string is a normal character string.

Further, in the embodiment of the present disclosure, the randomly generated character strings "ak, ti odoe dgza" are controlled in a key manner, specifically, the authority of the character strings "ak, ti odoe dgza" is limited, or the verification is enhanced on the character strings "ak, ti odoe dgza" or the character strings "ak, ti odoe dgza" are forbidden to log on the network platform.

Compared with the prior art, the technical scheme adopted by the embodiment of the specification can achieve the following beneficial effects: determining the randomness degree of the character string by determining the occurrence probability of the sub-character string of the character string, and further judging whether the character string is a randomly generated character string or not, wherein a large amount of training data is not required to be marked manually in the whole process, and the labor cost is saved; aiming at the type of the character string to be identified, sample character string data can be selected in a targeted manner; the effect of recognizing the character strings with smaller overall length is improved.

Fig. 3 is a schematic structural diagram of an apparatus for identifying a batch of character strings according to an embodiment of the present disclosure, where the schematic structural diagram includes: a receiving module 305, a dividing module 310, a determining module 315 and a judging module 320;

the receiving module 305 is configured to receive a batch of generated character strings to be identified;

the segmentation module 310 is configured to segment the character string to be identified to obtain at least one sub-character string of the character string to be identified;

the determining module 315 is configured to determine a probability of occurrence of at least one sub-string of the character string to be identified, and determine a degree of randomness of the character string to be identified according to the probability of occurrence of the sub-string;

the judging module 320 is configured to judge whether the character string to be recognized is a randomly generated character string according to the randomness degree of the character string to be recognized.

Preferably, the determining module 315 is specifically configured to match probabilities of occurrence of sub-strings of the character string to be identified using a probability dictionary, where the probability dictionary includes correspondence between probabilities of sample sub-strings and sample sub-strings; and determining the randomness degree of the character string to be identified according to the occurrence probability of the sub-character string.

Preferably, the apparatus further comprises: the probability dictionary obtaining module is used for dividing the sample character string data to obtain a plurality of sample sub-character strings; counting the number of times that a plurality of sample substrings occur independently and/or the number of times that at least two adjacent sample substrings occur simultaneously; calculating the probability of the independent occurrence of the plurality of sample substrings and/or the probability of the simultaneous occurrence of the adjacent at least two sample substrings to obtain a probability dictionary; the probability dictionary comprises a plurality of sample substrings and the probability that the sample substrings appear independently and/or comprises at least two adjacent sample substrings and the probability that the sample substrings appear simultaneously.

Preferably, the type of the sample character string data is the same as the type of the character string to be identified generated in batch.

Preferably, the determining module 315 is further specifically configured to determine the probability of occurrence of the character string to be identified according to the probability of occurrence of the substring; and determining the randomness degree of the character strings to be identified according to the occurrence probability of the character strings to be identified.

More preferably, the determining module 315 is further specifically configured to, in a case where a probability that the substring of the character string to be recognized appears alone is obtained, use a geometric average of probabilities that the substring appears alone as the probability P of the character string to be recognized; or under the condition of obtaining the probability of the simultaneous occurrence of at least two adjacent substrings of the character string to be identified, taking the geometric mean value of the probability of the simultaneous occurrence of the at least two adjacent substrings as the probability P of the occurrence of the character string to be identified; or under the condition that the probability of the independent occurrence of the sub-character string of the character string to be recognized and the probability of the simultaneous occurrence of at least two adjacent sub-character strings of the character string to be recognized are obtained, taking the arithmetic average value of the probability geometric average value of the independent occurrence of the sub-character string and the probability geometric average value of the simultaneous occurrence of at least two adjacent sub-character strings as the probability P of the occurrence of the character string to be recognized.

Further, the determining module 315 is further specifically configured to determine a randomness degree r=1 of the character string to be identified—a probability P of occurrence of the character string to be identified.

Preferably, the judging module 320 is specifically configured to, in a case where the randomness degree R of the character string to be identified is greater than a preset random threshold, randomly generate the character string.

Preferably, the preset random threshold = 1-a preset probability threshold; the preset probability threshold is the median of the probabilities of the single occurrence of a plurality of sample substrings in the probability dictionary; or the median of the probability of simultaneous occurrence of at least two adjacent sample substrings in the probability dictionary; or the median of the probabilities of the independent occurrence of a plurality of sample substrings in the probability dictionary and the median of the probabilities of the simultaneous occurrence of at least two adjacent sample substrings in the probability dictionary.

Preferably, the apparatus further comprises: the key prevention and control module is used for performing key prevention and control on the randomly generated character string under the condition that the character string to be identified is determined to be the randomly generated character string; wherein the emphasis prevention includes at least one of restricting rights, enforcing authentication, and/or disabling login.

The embodiment of the specification also provides a device for identifying character strings generated in batches, which comprises: a memory storing a program and configured to execute receiving, by the processor, a batch-generated character string to be recognized; dividing the character string to be identified to obtain at least one sub-character string of the character string to be identified; determining the occurrence probability of at least one sub-character string of the character string to be identified, and determining the randomness degree of the character string to be identified according to the occurrence probability of the sub-character string; judging whether the character string to be recognized is a randomly generated character string or not according to the randomness degree of the character string to be recognized.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely an example of the present specification and is not intended to limit the present specification. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of identifying a batch of character strings, the method comprising:

receiving character strings to be identified which are generated in batches;

determining the occurrence probability of at least one sub-character string of the character string to be identified, and determining the randomness degree of the character string to be identified according to the occurrence probability of the sub-character string; under the condition that the probability of the independent occurrence of the sub-character strings of the character strings to be identified is obtained, taking the geometric average value of the probability of the independent occurrence of the sub-character strings as the probability P of the occurrence of the character strings to be identified; or (b)

Under the condition that the probability that at least two adjacent sub-strings of the character string to be recognized appear simultaneously is obtained, taking the geometric average value of the probability that the at least two adjacent sub-strings appear simultaneously as the probability P of the character string to be recognized; or (b)

Under the condition that the probability of the independent occurrence of the sub-character strings of the character string to be recognized and the probability of the simultaneous occurrence of at least two adjacent sub-character strings of the character string to be recognized are obtained, taking the arithmetic average value of the probability geometric average value of the independent occurrence of the sub-character strings and the probability geometric average value of the simultaneous occurrence of the at least two adjacent sub-character strings as the probability P of the occurrence of the character string to be recognized;

2. The method for recognizing character strings generated in batch according to claim 1, wherein the determining the probability of occurrence of at least one sub-character string of the character string to be recognized, determining the degree of randomness of the character string to be recognized according to the probability of occurrence of the sub-character string, comprises:

matching the occurrence probability of the sub-character strings of the character strings to be identified by using a probability dictionary, wherein the probability dictionary comprises the corresponding relation between the sample sub-character strings and the probability of the sample sub-character strings;

and determining the randomness degree of the character string to be identified according to the occurrence probability of the sub-character string.

3. The method of claim 2, wherein before matching the probabilities of occurrence of substrings of the character string to be recognized using a probability dictionary, the method further comprises:

dividing sample character string data to obtain a plurality of sample substrings;

counting the number of times that a plurality of sample substrings occur independently and/or the number of times that at least two adjacent sample substrings occur simultaneously;

calculating the probability of the independent occurrence of the plurality of sample substrings and/or the probability of the simultaneous occurrence of the adjacent at least two sample substrings to obtain a probability dictionary;

the probability dictionary comprises a plurality of sample substrings and the probability that the sample substrings appear independently and/or comprises at least two adjacent sample substrings and the probability that the sample substrings appear simultaneously.

4. A method of identifying a batch of character strings according to claim 3, further comprising: and the type of the sample character string data is the same as the type of the character string to be identified generated in batch.

5. The method for recognizing character strings generated in batch according to claim 2, wherein the determining the degree of randomness of the character strings to be recognized according to the probability of occurrence of the sub-character strings comprises:

determining the occurrence probability of the character string to be identified according to the occurrence probability of the sub character string;

and determining the randomness degree of the character strings to be identified according to the occurrence probability of the character strings to be identified.

6. The method for recognizing character strings generated in batch according to claim 1, wherein the determining the randomness degree of the character strings to be recognized according to the probability of occurrence of the character strings to be recognized comprises: and determining the randomness degree R=1 of the character strings to be identified, and determining the probability P of the occurrence of the character strings to be identified.

7. The method for recognizing character strings generated in batch according to claim 6, wherein the judging whether the character string to be recognized is a randomly generated character string according to the degree of randomness of the character string to be recognized comprises:

and under the condition that the randomness degree R of the character strings to be identified is larger than a preset random threshold value, the character strings to be identified are randomly generated character strings.

8. The method of claim 7, wherein the step of identifying the character string generated in batch,

the preset random threshold = 1-a preset probability threshold;

the preset probability threshold is the median of the probabilities of the single occurrence of a plurality of sample substrings in the probability dictionary; or the median of the probability of simultaneous occurrence of at least two adjacent sample substrings in the probability dictionary; or the median of the probabilities of the independent occurrence of a plurality of sample substrings in the probability dictionary and the median of the probabilities of the simultaneous occurrence of at least two adjacent sample substrings in the probability dictionary.

9. The method of identifying a batch of character strings according to claim 7, further comprising:

under the condition that the character string to be identified is a randomly generated character string, performing key prevention and control on the randomly generated character string;

wherein the emphasis prevention includes at least one of restricting rights, enforcing authentication, and/or disabling login.

10. An apparatus for identifying a batch of character strings, the apparatus comprising: the device comprises a receiving module, a dividing module, a determining module and a judging module;

the judging module is used for judging whether the character string to be identified is a randomly generated character string according to the randomness degree of the character string to be identified;

the determining module is further specifically configured to, when the probability that the sub-string of the character string to be recognized appears alone is obtained, use a geometric average value of the probabilities that the sub-string appears alone as the probability P of the character string to be recognized; or under the condition of obtaining the probability of the simultaneous occurrence of at least two adjacent substrings of the character string to be identified, taking the geometric mean value of the probability of the simultaneous occurrence of the at least two adjacent substrings as the probability P of the occurrence of the character string to be identified; or under the condition that the probability of the independent occurrence of the sub-character string of the character string to be recognized and the probability of the simultaneous occurrence of at least two adjacent sub-character strings of the character string to be recognized are obtained, taking the arithmetic average value of the probability geometric average value of the independent occurrence of the sub-character string and the probability geometric average value of the simultaneous occurrence of at least two adjacent sub-character strings as the probability P of the occurrence of the character string to be recognized.

11. The apparatus for recognizing character strings generated in batch according to claim 10, wherein the determining module is specifically configured to match probabilities of occurrence of sub-character strings of the character strings to be recognized using a probability dictionary, the probability dictionary containing correspondence between sample sub-character strings and probabilities of sample sub-character strings; and determining the randomness degree of the character string to be identified according to the occurrence probability of the sub-character string.

12. The apparatus for identifying a batch of character strings as in claim 11, further comprising: the probability dictionary obtaining module is used for dividing the sample character string data to obtain a plurality of sample sub-character strings; counting the number of times that a plurality of sample substrings occur independently and/or the number of times that at least two adjacent sample substrings occur simultaneously; calculating the probability of the independent occurrence of the plurality of sample substrings and/or the probability of the simultaneous occurrence of the adjacent at least two sample substrings to obtain a probability dictionary; the probability dictionary comprises a plurality of sample substrings and the probability that the sample substrings appear independently and/or comprises at least two adjacent sample substrings and the probability that the sample substrings appear simultaneously.

13. The apparatus for recognizing character strings according to claim 12, wherein the sample character string data is the same type as the character string to be recognized in batch.

14. The apparatus for recognizing character strings generated in batch according to claim 11, wherein the determining module is further specifically configured to determine the probability of occurrence of the character string to be recognized according to the probability of occurrence of the sub-character string; and determining the randomness degree of the character strings to be identified according to the occurrence probability of the character strings to be identified.

15. The apparatus for recognizing character strings generated in batch according to claim 10, wherein the determining module is further specifically configured to determine a degree of randomness r=1 of the character strings to be recognized—a probability P of occurrence of the character strings to be recognized.

16. The apparatus for recognizing character strings generated in batch according to claim 15, wherein the judging module is specifically configured to, in a case where the degree of randomness R of the character string to be recognized is greater than a preset random threshold, randomly generate the character string.

17. The apparatus for identifying character strings generated in batch according to claim 16, wherein the preset random threshold = 1-preset probability threshold; the preset probability threshold is the median of the probabilities of the single occurrence of a plurality of sample substrings in the probability dictionary; or the median of the probability of simultaneous occurrence of at least two adjacent sample substrings in the probability dictionary; or the median of the probabilities of the independent occurrence of a plurality of sample substrings in the probability dictionary and the median of the probabilities of the simultaneous occurrence of at least two adjacent sample substrings in the probability dictionary.

18. The apparatus for identifying a batch of character strings as in claim 16, further comprising: the key prevention and control module is used for performing key prevention and control on the randomly generated character string under the condition that the character string to be identified is determined to be the randomly generated character string; wherein the emphasis prevention includes at least one of restricting rights, enforcing authentication, and/or disabling login.

19. An apparatus for identifying a batch of generated character strings, comprising: a memory storing a program and configured to perform the method of identifying a batch-generated string of any one of claims 1-9 by the processor.