CN105389722B

CN105389722B - Malicious order identification method and device

Info

Publication number: CN105389722B
Application number: CN201510808956.9A
Authority: CN
Inventors: 于亮; 马利超; 韩爱君
Original assignee: Xiaomi Inc
Current assignee: Xiaomi Inc
Priority date: 2015-11-20
Filing date: 2015-11-20
Publication date: 2019-12-13
Anticipated expiration: 2035-11-20
Also published as: CN105389722A

Abstract

The disclosure relates to a malicious order identification method and device, and belongs to the field of electronic technology application. The method comprises the following steps: performing word segmentation on an address to be recognized according to a preset first word segmentation algorithm to obtain words to be recognized in the address to be recognized; counting the number of words belonging to a malicious word library in the words to be recognized, wherein the malicious word library is pre-established and records at least one malicious word for identifying a malicious address; judging whether the address to be identified is a malicious address or not according to the number of the words belonging to the malicious word library; and when the address to be identified is a malicious address, determining that an order corresponding to the address to be identified is a malicious order. The method and the device improve the accuracy of malicious order identification and solve the problem of low accuracy when malicious orders are identified through similarity in the related art. The present disclosure is directed to identifying malicious orders.

Description

Malicious order identification method and device

Technical Field

the present disclosure relates to the field of electronic technology application, and in particular, to a malicious order identification method and apparatus.

Background

With the rapid development of e-commerce technology, various marketing means are developed, and a marketing means in a first-aid and big-promotion mode is popular as follows, for example: the goods are priced at a lower price and opened for purchase at a specified point in time. In this case, some malicious users may appear, and take a lot of resources in a mode of violating the activity rule, and sell the resources at a high price. The behavior of these malicious users severely affects the interests of other users with real purchasing intent.

in the related art, when a malicious user performs a large batch of purchasing behaviors in an e-commerce platform, a large amount of repeated information may appear in order information of the malicious user stored in an e-commerce platform database: such as address information, contact phone numbers, the name of the harvester, or the Internet Protocol (IP) address of the terminal used when placing the order. Therefore, in the related art, malicious users are generally identified by performing similarity evaluation on order information. For example, the similarity of the address information in each order may be calculated, an order whose address information similarity exceeds a certain threshold value is determined as a candidate order, and if the number of the candidate orders also exceeds a certain threshold value, the candidate order may be determined as a malicious order.

disclosure of Invention

in order to solve the problems in the related art, the present disclosure provides a malicious order identification method and apparatus. The technical scheme is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a malicious order identification method, including:

performing word segmentation on an address to be recognized according to a preset first word segmentation algorithm to obtain words to be recognized in the address to be recognized;

counting the number of words belonging to a malicious word library in the words to be recognized, wherein the malicious word library is pre-established and records at least one malicious word for identifying a malicious address;

judging whether the address to be identified is a malicious address or not according to the number of the words belonging to the malicious word library;

and when the address to be identified is a malicious address, determining that an order corresponding to the address to be identified is a malicious order.

optionally, the determining, according to the number of the words belonging to the malicious word library, whether the address to be identified is a malicious address includes:

counting the number n of the words to be recognized in the address to be recognized;

Calculating a malice degree score S1 of the address to be recognized through an address malice degree scoring formula according to the number m of the words belonging to the malicious word library and the number n of the words to be recognized in the address to be recognized, wherein the address malice degree scoring formula is as follows: s1 ═ m/n;

judging whether the maliciousness degree score S1 of the address to be identified is larger than a preset threshold value t or not;

when the maliciousness degree score S1 is larger than a preset threshold t, determining that the address to be identified is a malicious address;

when the maliciousness score S1 is not greater than a preset threshold t, determining that the address to be identified is not a malicious address.

Optionally, the method further includes: acquiring a word set to be recognized;

for any word in the word set to be recognized, calculating a maliciousness score of the word according to a word maliciousness score formula S2, wherein the word maliciousness score formula is as follows:

Wherein abs () represents the absolute value of the content in parentheses, k1 is a preset common constant, k2 is a preset length characteristic value, tf is the word frequency of any word in the database, df is the document frequency of any word in the database, and L is the number of addresses with different numbers of characters in the address containing any word in the database;

Judging whether the maliciousness score S2 of any word is smaller than a preset evaluation threshold k 3;

When the maliciousness score S2 of any word is smaller than a preset evaluation threshold k3, determining that the any word is a malicious word;

and establishing the malicious word library according to the malicious words in the word set to be recognized.

optionally, the obtaining of the set of words to be recognized includes:

Segmenting each address in the database according to a preset second segmentation algorithm;

And forming the words after word segmentation into the word set to be recognized.

optionally, the second word segmentation algorithm includes the first word segmentation algorithm, and the second word segmentation algorithm includes at least one of a two-word segmentation algorithm and a three-word segmentation algorithm.

According to a second aspect of the embodiments of the present disclosure, there is provided a malicious order identification apparatus, the apparatus including:

the word segmentation module is configured to segment words of the address to be recognized according to a preset first word segmentation algorithm to obtain words to be recognized in the address to be recognized;

the counting module is configured to count the number of words belonging to a malicious word library in the words to be recognized, the malicious word library is pre-established, and at least one malicious word for identifying a malicious address is recorded in the malicious word library;

The first judging module is configured to judge whether the address to be identified is a malicious address according to the number of the words belonging to the malicious word library;

the first determining module is configured to determine that an order corresponding to the address to be identified is a malicious order when the address to be identified is a malicious address.

optionally, the first determining module is configured to:

Optionally, the apparatus further comprises:

The acquisition module is configured to acquire a set of words to be recognized;

a calculating module configured to calculate, for any word in the set of words to be recognized, a maliciousness score S2 of the any word according to a word maliciousness score formula, where the word maliciousness score formula is:

The second judging module is configured to judge whether the maliciousness score S2 of any word is smaller than a preset evaluation threshold k 3;

A second determination module configured to determine that any word is a malicious word when the maliciousness score S2 of the any word is smaller than a preset evaluation threshold k 3;

And the establishing module is configured to establish the malicious word library according to the malicious words in the word set to be recognized.

optionally, the obtaining module is configured to:

According to a third aspect of the embodiments of the present disclosure, there is provided a malicious order identification apparatus, the apparatus including:

A processor;

a memory for storing executable instructions of the processor;

Wherein the processor is configured to:

the technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the malicious order identification method and device provided by the embodiment of the disclosure, the word segmentation can be performed on the address to be identified according to a preset first word segmentation algorithm, so as to obtain the word to be identified in the address to be identified; counting the number of words belonging to a malicious word library in the words to be recognized, wherein the malicious word library is pre-established and records at least one malicious word for identifying a malicious address; judging whether the address to be identified is a malicious address or not according to the number of the words belonging to the malicious word library; and when the address to be identified is a malicious address, determining that an order corresponding to the address to be identified is a malicious order. According to the malicious order identification method provided by the embodiment of the disclosure, the address in the order can be identified through the pre-established malicious word library, so that whether the order is a malicious order is determined, and the accuracy in identifying the malicious order is improved.

it is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

drawings

in order to more clearly illustrate the embodiments of the present disclosure, the drawings that are needed to be used in the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings can be obtained by those skilled in the art without inventive effort.

1-1 is a schematic diagram of an implementation environment in which a malicious order identification method provided by some embodiments of the present disclosure is involved;

1-2 are flow diagrams illustrating a malicious order identification method according to an exemplary embodiment;

FIG. 2-1 is a flow diagram illustrating another malicious order identification method in accordance with an exemplary embodiment;

FIG. 2-2 is a flow diagram illustrating a method of building a malicious word corpus in accordance with an exemplary embodiment;

FIG. 3-1 is a block diagram illustrating a malicious order identification apparatus in accordance with an exemplary embodiment;

FIG. 3-2 is a block diagram illustrating another malicious order identification apparatus according to an example embodiment;

fig. 4 is a block diagram illustrating yet another malicious order identification apparatus according to an example embodiment.

the accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more clear, the present disclosure will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.

Fig. 1-1 is a schematic diagram of an implementation environment related to a malicious order identification method provided in an embodiment of the present disclosure. The implementation environment may include: a server 110 and at least one terminal 120. The server 110 may be a server, a server cluster composed of several servers, or a cloud computing service center. The terminal 120 can be a smartphone, a computer, a multimedia player, an e-reader, a wearable device, etc. The connection between the server 110 and the terminal 120 may be established through a wired network or a wireless network.

the user may fill a shopping order in the shopping platform through the terminal 120, and the server 110 corresponding to the shopping platform may store the shopping order filled by the user, analyze the order, and determine whether the order is a malicious order.

fig. 1-2 is a flow chart illustrating a malicious order identification method that may be applied to the server 110 shown in fig. 1-1, as shown in fig. 1-2, according to an example embodiment, the method including:

In step 101, according to a preset first word segmentation algorithm, performing word segmentation on an address to be recognized to obtain a word to be recognized in the address to be recognized.

in step 102, the number of words belonging to a malicious word library in the to-be-recognized words is counted, the malicious word library is pre-established, and at least one malicious word for identifying a malicious address is recorded in the malicious word library.

In step 103, it is determined whether the address to be identified is a malicious address according to the number of words belonging to the malicious word library.

In step 104, when the address to be identified is a malicious address, it is determined that the order corresponding to the address to be identified is a malicious order.

in summary, according to the malicious order identification method provided by the embodiment of the present disclosure, a word of an address to be identified may be segmented according to a preset first segmentation algorithm, so as to obtain a word to be identified in the address to be identified; counting the number of words belonging to a malicious word library in the words to be recognized; judging whether the address to be identified is a malicious address or not according to the number of the words belonging to the malicious word library; and when the address to be identified is a malicious address, determining that an order corresponding to the address to be identified is a malicious order. According to the malicious order identification method provided by the embodiment of the disclosure, the address in the order can be identified through the pre-established malicious word library, and whether the order is a malicious order is further determined, and the malicious order identification method is high in accuracy.

According to the number m of the words belonging to the malicious word library and the number n of the words to be recognized in the address to be recognized, calculating a malicious degree score of the address to be recognized through an address malicious degree scoring formula S1, wherein the address malicious degree scoring formula is as follows: s1 ═ m/n;

judging whether the maliciousness score S1 of the address to be identified is greater than a preset threshold t or not;

When the maliciousness score S1 is greater than a preset threshold t, determining that the address to be identified is a malicious address;

when the maliciousness score S1 is not greater than the preset threshold t, it is determined that the address to be identified is not a malicious address.

optionally, the method further includes: acquiring a word set to be recognized;

for any word in the word set to be recognized, calculating the maliciousness degree score S2 of the word according to a word maliciousness degree score formula, wherein the word maliciousness degree score formula is as follows:

Optionally, the obtaining of the set of words to be recognized includes:

Fig. 2-1 is a flow chart illustrating another malicious order identification method that may be applied to the server 110 shown in fig. 1-1, according to an example embodiment, and includes:

In step 201, according to a preset first word segmentation algorithm, a word is segmented for an address to be recognized, so as to obtain a word to be recognized in the address to be recognized. Step 202 is performed.

in the embodiment of the present disclosure, when the server needs to determine whether a certain order is a malicious order, the address included in the order may be used as an address to be recognized, and a word is segmented for the address to be recognized according to a preset first word segmentation algorithm, where the preset first word segmentation algorithm may be a two-word segmentation algorithm or a three-word segmentation algorithm. The two-character segmentation algorithm and the three-character segmentation algorithm are used for sequentially segmenting the address to be recognized according to every two or every three single characters so as to obtain the word to be recognized. When the two-character or three-character segmentation is performed, consecutive numbers or letters can be determined as a single character, for example, a word obtained by performing the three-character segmentation on the "science and technology road 20A seat" may be: science and technology road, technical road 20, road 20A, 20A seat. Wherein, the continuous number "20" and the letter "A" are respectively used as a single character.

Further, in order to improve the accuracy of address recognition, before performing word segmentation on the address to be recognized, stop words in the address to be recognized may be removed, where the stop words may be some preset characters that have a small influence on the address to be recognized, such as punctuation marks, spaces, and other special characters.

For example, assuming that the address to be recognized in a certain order is "2 # which is a new rare in the street and the country", the word segmentation is performed on the address to be recognized according to a two-character segmentation algorithm, and the obtained word to be recognized may be: the stomach is in the street, is natural, rural, new and micro, and is remote, rarely used, days, 2.

in step 202, the number m of words belonging to the malicious word library in the to-be-recognized words is counted. Step 203 is performed.

in the embodiment of the present disclosure, a malicious word library may be pre-established in the server, where the malicious word library records at least one malicious word for identifying a malicious address. After the server determines the words to be recognized, whether the words to be recognized are malicious words stored in the malicious word library can be detected, and the number m of the words belonging to the malicious word library in the words to be recognized is counted.

Fig. 2-2 is a flow diagram illustrating a method of building a malicious word library according to an example embodiment, as shown in fig. 2-2, the method including:

In step 2021, each address in the database is segmented according to a preset second segmentation algorithm. Step 2022 is performed.

in the embodiment of the present disclosure, when the number of orders stored in the server database reaches a certain threshold, a word may be segmented for an address included in each order in the database according to a preset second word segmentation algorithm, so as to establish a malicious word language library. In order to facilitate comparison between the words to be recognized and the words stored in the malicious word library, the preset second word segmentation algorithm needs to include the first word segmentation algorithm, and the second word segmentation algorithm may be at least one of a two-word segmentation algorithm and a three-word segmentation algorithm, that is, the second word segmentation algorithm may be a two-word segmentation algorithm, a three-word segmentation algorithm, or a combination algorithm of the two-word segmentation algorithm and the three-word segmentation algorithm. If the second segmentation algorithm only comprises one segmentation algorithm, the first segmentation algorithm is the same as the second segmentation algorithm, and if the second segmentation algorithm comprises at least two segmentation algorithms, the first segmentation algorithm may be the same as the second segmentation algorithm, or may be any one of the at least two segmentation algorithms comprised in the second segmentation algorithm.

For example, assume that the address in the order stored in the database includes: "street tripe natural country original new microcentrifuge is rarely used for days No. 2" and "the nineteenth new science and technology has crafted round pustular day No. 2 a", then the result after the word segmentation is carried out to the address "street tripe natural country original microcentrifuge is rarely used for days No. 2" according to the two-character segmentation algorithm is: { street belly, natural countryside, country origin, original novelty, micro-theory, obscure, few days, No. 2 }, namely the address can be divided into 12 words through a two-word segmentation algorithm; the result of word segmentation of the address "nineteenth science and technology crafty circle puputian No. 2 a" is: { nineteenth, nine new, new science and technology, crafty, circular, pustular, sky 2, No. a }, i.e. the address can be divided into 11 words by a two-character segmentation algorithm.

In step 2022, the segmented words are grouped into a set of words to be recognized. Step 2023 is performed.

after each address in the database is segmented, words obtained by segmenting all addresses in the database can be combined into a word set to be recognized. For example, after performing word segmentation on the addresses "street mornings countryside original new mindset few days No. 2" and "nineteenth science crafty round pustular a seat road", the word set to be recognized, which is composed according to the word segmentation result, may be: { street tripe, natural countryside, country, original, new, micro, obscure, little, few days, Tian 2, No. 2, nineteen, nine-new, new science and technology, crafty, round, pustule, pustular, Putian, No. A }.

In step 2023, for any word in the set of words to be recognized, a maliciousness score of the any word is calculated according to a word maliciousness score formula S2. Step 2024 is performed.

the word maliciousness degree scoring formula is as follows:

wherein abs () represents the absolute value of the parenthesis, k1 is a preset common constant, k2 is a preset length characteristic value, and tf is the word frequency of the word in the database, that is, the number of times the word appears in all addresses stored in the database; df is the document frequency of the term in the database, namely the number of the addresses containing the term in the database; and L is the number of addresses with different character numbers in the addresses containing the words in the database.

in the embodiment of the present disclosure, the preset commonality constant k1 and the preset length feature value k2 may be set according to the actual situation of the address stored in the database, or some already labeled corpora may be referred to, and parameter training is performed in a machine learning manner by combining the above word maliciousness scoring formula, so as to determine the preset commonality constant k1 and the preset length feature value k 2.

for example, assuming that the preset commonality constant k1 is 10 and the preset length eigenvalue k2 is 2, for the word "street belly" in the word set to be recognized, the word "street belly" occurs 1 time in all the addresses stored in the database, and thus the word frequency of the word "street belly" can be determined as follows: tf is 1; because the database comprises two addresses, and the number of the addresses containing the word "street belly" is 1, the document frequency of the word "street belly" can be determined as follows: if df is 1, the number of addresses with different numbers of characters in the address containing the word "street belly" in the database is as follows: therefore, the maliciousness score of the term "street belly" can be calculated according to the term maliciousness score formula as follows:

further, for the word "number 2" in the word set to be recognized, the word frequency of the word in the database may be determined as: tf is 2, and the document frequency is: if df is 2, the number of addresses containing the word "2" in the database is 2, and the number of characters contained in the 2 addresses is different, so that L can be determined to be 2, and according to the word maliciousness degree scoring formula, the maliciousness degree score of the word "2" can be calculated as:

it should be noted that, if the word set to be recognized is determined by using the three-character segmentation algorithm, since the number of the single characters included in the word to be recognized is large and the influence on the address type is also large, when the word to be recognized including the three single characters is determined, the length of the word itself can be used as a parameter in the word maliciousness degree scoring formula, so as to improve the accuracy of the word maliciousness degree scoring formula.

in step 2024, it is determined whether the maliciousness score S2 of any word is smaller than a preset evaluation threshold k 3.

When the maliciousness score S2 of any word is smaller than the preset evaluation threshold k3, execute step 2025; when the maliciousness score S2 of any word is not less than the preset evaluation threshold k3, step 2026 is executed. In the embodiment of the present disclosure, the preset evaluation threshold k3 may be set according to the actual situation of the address stored in the database, or parameter training may be performed in a machine learning manner by referring to some already labeled corpora and combining the above word malice scoring formula, so as to determine the preset evaluation threshold k 3.

for example, assuming that the preset evaluation threshold k3 is 1, the server may determine whether the maliciousness score S2 of each term in the set of terms to be recognized is greater than 1, for example, since the maliciousness score of the term "street belly" is 0.5 and is less than the preset evaluation threshold 1, the server may perform step 2025; since the term "2" has a maliciousness score of 1, which is not less than the preset evaluation threshold 1, the server may execute step 2026

in step 2025, the any word is determined to be a malicious word. Step 2027 is performed.

when the maliciousness score S2 of any word is smaller than a preset evaluation threshold k3, the any word is determined to be a malicious word. For example, the server may determine the word "street belly" as a malicious word.

In step 2026, it is determined that any of the words is not a malicious word.

When the maliciousness score S2 of any word is not less than a preset evaluation threshold k3, determining that any word is not a malicious word. For example, the server may determine that the word "number 2" is not a malicious word.

in step 2027, the malicious word library is built according to the malicious words in the set of words to be recognized.

After the server finishes the evaluation of the maliciousness of all the words in the set of words to be recognized, a malicious word library can be established according to the words determined as the malicious words in the set of words to be recognized. For example, it is assumed that, according to the method described in steps 2023 to 2026 above, for a set of words to be recognized: { street tripe, natural countryside, country, new, micro, obscure, few days, No. 2, nineteen, nine-new, new science, science and technology, crafty, circular, pustular, No. a }, and malicious words determined by the server are: street tripe, former new, new little, little aim at, it is obscure, rarely used several, skill crafty, dolomism, round pustule, then the server according to above-mentioned malicious word, the malicious word storehouse of establishment can be: { street tripe, original new, new and tiny, minute, obscure, rarely used, crafty, round, pustule }.

for the above malicious word library, it is assumed that the words to be recognized are: the server can determine that the words belonging to the malicious word library in the words to be recognized are the following words according to the malicious word library: the number m of the words belonging to the malicious word library in the words to be recognized is 7 through statistics.

in step 203, the number n of the words to be recognized in the address to be recognized is counted. Step 204 is performed.

For example, for the word to be recognized in the address "street, natural, countryside, new rare days and No. 2": the server can determine that the number of words to be recognized in the address to be recognized is as follows: n is 12.

in step 204, according to the number m of the words belonging to the malicious word library and the number n of the words to be recognized in the address to be recognized, a malicious degree score of the address to be recognized is calculated through an address malicious degree scoring formula S1. Step 205 is performed.

the address malice degree scoring formula is as follows: and (S1) m/n, namely the server can determine the maliciousness degree score of the address according to the proportion of the malicious words in the address to be identified. For example, for an address to be identified "street, torrid, country and country are rare days 2", since the number m of words belonging to the malicious word library in the address is 7, and the number n of words to be identified is 12, it may be determined that the maliciousness score S1 of the address to be identified is: s1 is 7/12 is 0.583.

In step 205, it is determined whether the maliciousness score S1 of the address to be identified is greater than a preset threshold t.

when the maliciousness score S1 is greater than the preset threshold t, execute step 206; when the maliciousness score S1 is not greater than the preset threshold t, step 207 is executed. The preset threshold t may be set according to the actual condition of the address stored in the database, and in general, the preset threshold t may be set to 0.5. For example, since the maliciousness score S1 of the address to be identified, i.e., the new obscure few days 2 in the street, the country, is 0.583, which is greater than the preset threshold of 0.5, the server may execute step 206.

In step 206, the address to be identified is determined to be a malicious address. Step 208 is performed.

when the maliciousness score S1 is greater than a preset threshold t, it is determined that the address to be identified is a malicious address. For example, the server may determine the address to be identified as "street morse countryside new rare days 2" as a malicious address.

In step 207, it is determined that the address to be identified is not a malicious address.

When the maliciousness score S1 is not greater than the preset threshold t, it is determined that the address to be identified is not a malicious address. Namely, when the proportion of the malicious words in the address to be recognized is not greater than the preset threshold t, the server can determine that the address to be recognized is not a malicious address.

In step 208, the order corresponding to the address to be identified is determined to be a malicious order.

after the server determines that the address to be identified is a malicious address, the server can determine the order corresponding to the address to be identified as a malicious order, and further determine the user corresponding to the malicious order as a malicious user. Then, the server may perform a preset operation on the malicious user, such as logging off an account of the malicious user or canceling an order of the malicious user, so as to ensure the benefits of other normal users.

In summary, according to the malicious order identification method provided by the embodiment of the present disclosure, a word of an address to be identified may be segmented according to a preset first segmentation algorithm, so as to obtain a word to be identified in the address to be identified; counting the number of words belonging to a malicious word library in the words to be recognized; judging whether the address to be identified is a malicious address or not according to the number of the words belonging to the malicious word library; and when the address to be identified is a malicious address, determining that an order corresponding to the address to be identified is a malicious order. According to the malicious order identification method provided by the embodiment of the disclosure, the address in the order can be identified through the pre-established malicious word library, and then whether the order is a malicious order is determined, the accuracy of the malicious order identification method is high, and for some irregular malicious orders with low similarity, the malicious order identification method can also effectively identify the malicious orders, so that the accuracy of malicious order identification is greatly improved.

it should be noted that the order of the steps of the malicious order identification method provided by the embodiment of the present disclosure may be appropriately adjusted, and the steps may also be correspondingly increased or decreased according to the situation. Any method that can be easily conceived by those skilled in the art within the technical scope of the present disclosure is covered by the protection scope of the present disclosure, and thus, the detailed description thereof is omitted.

fig. 3-1 is a block diagram illustrating a malicious order identification apparatus according to an exemplary embodiment, as shown in fig. 3-1, the apparatus including:

The word segmentation module 301 is configured to perform word segmentation on the address to be recognized according to a preset first word segmentation algorithm, so as to obtain a word to be recognized in the address to be recognized.

The counting module 302 is configured to count the number of words belonging to a malicious word library in the to-be-recognized words, where the malicious word library is pre-established and records at least one malicious word for identifying a malicious address.

The first determining module 303 is configured to determine whether the address to be identified is a malicious address according to the number of the words belonging to the malicious word library.

the first determining module 304 is configured to determine that the order corresponding to the address to be identified is a malicious order when the address to be identified is a malicious address.

in summary, the malicious order recognition device provided by the embodiment of the present disclosure may perform word segmentation on an address to be recognized according to a preset first word segmentation algorithm, so as to obtain a word to be recognized in the address to be recognized; counting the number of words belonging to a malicious word library in the words to be recognized; judging whether the address to be identified is a malicious address or not according to the number of the words belonging to the malicious word library; and when the address to be identified is a malicious address, determining that an order corresponding to the address to be identified is a malicious order. According to the malicious order identification method provided by the embodiment of the disclosure, the address in the order can be identified through the pre-established malicious word library, and whether the order is a malicious order is further determined, and the malicious order identification method is high in accuracy.

fig. 3-2 is a block diagram illustrating another malicious order identification apparatus according to an example embodiment, as illustrated in fig. 3-2, the apparatus including:

An obtaining module 305 configured to obtain a set of words to be recognized.

A calculating module 306, configured to calculate, for any term in the set of terms to be recognized, a maliciousness score S2 of the term according to a term maliciousness score formula, where the term maliciousness score formula is:

wherein abs represents an absolute value, k1 is a preset common constant, k2 is a preset length characteristic value, tf is the word frequency of any word in the database, df is the document frequency of any word in the database, and L is the number of addresses with different numbers of characters in the address containing any word in the database.

a second judging module 307 configured to judge whether the maliciousness score S2 of any word is smaller than a preset evaluation threshold k 3.

And the second determining module 308 is configured to determine that any word is a malicious word when the maliciousness score S2 of the any word is smaller than a preset evaluation threshold k 3.

the establishing module 309 is configured to establish the malicious word library according to the malicious words in the set of words to be recognized.

Optionally, the first determining module 303 is configured to:

Optionally, the obtaining module 305 is configured to:

with regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

fig. 4 is a block diagram illustrating yet another malicious order identification apparatus 400 according to an example embodiment. For example, the apparatus 400 may be provided as a server. Referring to fig. 4, apparatus 400 includes a processing component 422 that further includes one or more processors and memory resources, represented by memory 432, for storing instructions, such as applications, that are executable by processing component 422. The application programs stored in memory 432 may include one or more modules that each correspond to a set of instructions. Further, the processing component 422 is configured to execute instructions to perform the above-described malicious order identification method, the method comprising:

Performing word segmentation on the address to be recognized according to a preset first word segmentation algorithm to obtain words to be recognized in the address to be recognized;

and when the address to be identified is a malicious address, determining that the order corresponding to the address to be identified is a malicious order.

Optionally, the method further includes: acquiring a word set to be recognized;

Optionally, the obtaining of the set of words to be recognized includes:

the apparatus 400 may also include a power component 426 configured to perform power management of the apparatus 400, a wired or wireless network interface 450 configured to connect the apparatus 400 to a network, and an input output (I/O) interface 458. The apparatus 400 may operate based on an operating system stored in the memory 432, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

it will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. a malicious order identification method is applied to a server corresponding to a shopping platform, and comprises the following steps:

removing stop words in the address to be identified;

when the address to be identified is a malicious address, determining that an order corresponding to the address to be identified is a malicious order;

Executing a preset operation on a malicious user corresponding to the malicious order, wherein the preset operation is to log off an account of the malicious user or cancel the order of the malicious user;

the method further comprises the following steps:

acquiring a word set to be recognized;

Wherein abs () represents an absolute value of the content in parentheses, k1 is a preset common constant, k2 is a preset length characteristic value, tf is the word frequency of any word in the database, df is the document frequency of any word in the database, L is the number of addresses with different numbers of characters in the addresses containing any word in the database, and the preset common constant and the preset length characteristic value are obtained by machine learning training according to a labeled corpus and the word maliciousness scoring formula;

Judging whether the maliciousness score S2 of any word is smaller than a preset evaluation threshold k3, wherein the preset evaluation threshold k3 is obtained by training in a machine learning mode according to the labeled corpus and the word maliciousness score formula;

establishing the malicious word library according to the malicious words in the word set to be recognized;

the acquiring of the word set to be recognized comprises:

2. The method according to claim 1, wherein the determining whether the address to be recognized is a malicious address according to the number of words belonging to the malicious word library comprises:

judging whether the maliciousness degree score S1 of the address to be identified is larger than a preset threshold t, wherein the preset threshold t is preset according to the address in the database;

3. The method of claim 1, wherein the second word segmentation algorithm comprises the first word segmentation algorithm, and wherein the second word segmentation algorithm comprises at least one of a two-word segmentation algorithm and a three-word segmentation algorithm.

4. A malicious order recognition device is applied to a server corresponding to a shopping platform, and the server is used for executing a preset operation on a malicious user corresponding to a malicious order after determining that an order corresponding to an address to be recognized is a malicious order, wherein the preset operation is to log off an account of the malicious user or cancel the order of the malicious user;

the device comprises:

The word segmentation module is configured to remove stop words in the address to be recognized, and perform word segmentation on the address to be recognized according to a preset first word segmentation algorithm to obtain words to be recognized in the address to be recognized;

The first determining module is configured to determine that an order corresponding to the address to be identified is a malicious order when the address to be identified is a malicious address;

the device further comprises:

the second judging module is configured to judge whether the maliciousness score S2 of any word is smaller than a preset evaluation threshold k3, wherein the preset evaluation threshold k3 is obtained by training in a machine learning mode according to the labeled corpus and the word maliciousness scoring formula;

the establishing module is configured to establish the malicious word library according to the malicious words in the word set to be recognized;

The acquisition module configured to:

5. The apparatus of claim 4, wherein the first determining module is configured to:

6. The apparatus of claim 4, wherein the second word segmentation algorithm comprises the first word segmentation algorithm, and wherein the second word segmentation algorithm comprises at least one of a two-word segmentation algorithm and a three-word segmentation algorithm.

7. a malicious order recognition device is applied to a server corresponding to a shopping platform, and the server is used for executing a preset operation on a malicious user corresponding to a malicious order after determining that an order corresponding to an address to be recognized is a malicious order, wherein the preset operation is to log off an account of the malicious user or cancel the order of the malicious user, and the device comprises:

A processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to:

Removing stop words in the address to be recognized;

The processor is further configured to:

Acquiring a word set to be recognized;

the acquiring of the word set to be recognized comprises: