CN113592036A - Flow cheating behavior identification method and device, storage medium and electronic equipment - Google Patents

Flow cheating behavior identification method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN113592036A
CN113592036A CN202110981015.0A CN202110981015A CN113592036A CN 113592036 A CN113592036 A CN 113592036A CN 202110981015 A CN202110981015 A CN 202110981015A CN 113592036 A CN113592036 A CN 113592036A
Authority
CN
China
Prior art keywords
user
time period
users
behavior
click
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110981015.0A
Other languages
Chinese (zh)
Inventor
孔梦醒
赵旭玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110981015.0A priority Critical patent/CN113592036A/en
Publication of CN113592036A publication Critical patent/CN113592036A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides a method and a device for identifying flow cheating behaviors, a storage medium and electronic equipment. The method comprises the following steps: acquiring user click data of a webpage accessed by a user; extracting user click data for accessing a first link within a first time period from the user click data; extracting user click data of each user from user click data accessing the first link within the first time period; extracting the click behavior characteristics of each user from the user click data of each user; and determining whether the users have clustering behavior according to the clicking behavior characteristics of each user, and if so, adding the users with the clustering behavior into the flow cheating group partner set. The embodiment of the invention realizes the identification of the cheating behavior of the group flow.

Description

Flow cheating behavior identification method and device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of internet access, in particular to a method and a device for identifying flow cheating behaviors, a readable storage medium and electronic equipment.
Background
In the internet era of current information explosion, traffic has a crucial value in the internet world, virtual traffic is a common behavior in the internet, and a group traffic cheating behavior of generating traffic by means of massive manual or machine simulation is currently occurring.
The current methods for identifying traffic cheating behaviors are all to identify a single cheating device. It has the following disadvantages:
firstly, the cheating behavior of group flow cannot be identified;
and secondly, the identification method is used for identifying according to empirical data and rules, has certain errors, can only identify the flow cheating behaviors in a short time, and cannot identify the flow cheating behaviors in any selected period when the periodic flow cheating behaviors are periodically identified in months, quarters, years and the like.
Disclosure of Invention
The embodiment of the invention provides a method and a device for identifying a flow cheating behavior, a readable storage medium and electronic equipment, so as to identify a group flow cheating behavior.
The technical scheme of the embodiment of the invention is realized as follows:
a traffic cheating behavior identification method comprises the following steps:
acquiring user click data of a webpage accessed by a user;
extracting user click data for accessing a first link within a first time period from the user click data;
extracting user click data of each user from user click data accessing the first link within the first time period;
extracting the click behavior characteristics of each user from the user click data of each user;
and determining whether the users have clustering behavior according to the clicking behavior characteristics of each user, and if so, adding the users with the clustering behavior into the flow cheating group partner set.
The determining whether the clustering behavior exists among the users comprises:
and calculating the click behavior similarity between every two users according to the click behavior characteristics of each user, and if the click behavior similarity is larger than a preset similarity threshold, determining that the two users corresponding to the click behavior similarity have a clustering behavior.
The user click data includes: user identification information, a webpage link identification clicked by a user and user click time;
the extracting the click behavior feature of each user from the user click data of each user comprises: dividing the first time period into at least one sub-time period;
for each sub-time period, acquiring user click data of each user accessing the first link in the sub-time period;
for each sub-time period, all users accessing the first link in the sub-time period are completely paired pairwise;
for each user pair accessing the first link in each sub-time period, according to the user click data of two users in the user pair in the current sub-time period, counting the number of days for the two users in the user pair to simultaneously access the first link in the current sub-time period, regarding the number of days as a first class of days, counting the number of days for the two users in the user pair to access the first link and only one user in the current sub-time period, and setting the number of days as a second class of days;
and taking the first type of days and the second type of days as the click behavior characteristics of two users in the user pair in the current sub-time period.
The determining whether the clustering behavior exists among the users comprises the following steps:
for each user pair accessing the first link in each sub-time period, calculating the sum of the first type days and the second type days corresponding to the user pair, and dividing the first type days by the sum to obtain the initial value of the click behavior similarity between the two users in the user pair; dividing the first type of days by the total days of the current sub-time period to obtain a weight; multiplying the weight by the initial value of the click behavior similarity to obtain the click behavior similarity between two users in the user pair, and if the click behavior similarity is greater than a preset similarity threshold, determining that a clustering behavior exists between the two users in the user pair;
the joining of the users with the clustering behavior into the flow cheating group partner set comprises the following steps:
and adding two users in the user pair into the flow cheating group and partner set in the current sub-time period.
After the two users in the user pair join the traffic cheating group aggregation in the current sub-time period, the method further comprises the following steps:
when the flow cheating group sets of all the sub-time periods in the first time period are obtained, selecting users who appear in each flow cheating group set;
deleting the users appearing in each flow cheating group set from each flow cheating group set respectively to obtain an updated flow cheating group set in each sub-time period;
and for each user appearing in each flow cheating group set, calculating the click behavior similarity of the user and each flow cheating group set, selecting the flow cheating group set with the maximum click behavior similarity, and adding the user into the selected flow cheating group set.
The calculating the similarity of the click behaviors of the user and each flow cheating group partner set comprises the following steps:
and for each flow cheating group set, respectively calculating the click behavior similarity of the user and each user in the flow cheating group set, and selecting the minimum click behavior similarity as the click behavior similarity of the user and the flow cheating group set.
The dividing the first time period into at least one sub-time period comprises:
dividing the first time period into at least one sub-time period according to the initial length of a preset sub-time period or the length and the adjustment step length of the current sub-time period;
and after the two users in the user pair are added into the traffic cheating group aggregation in the current sub-time period, the method further comprises the following steps:
after the flow cheating group set of all the current sub-time periods is obtained, judging whether the length of the current sub-time period reaches the maximum length of a preset sub-time period or not;
if not, returning to the action of dividing the first time period into at least one sub-time period according to the length of the current sub-time period and the adjustment step length;
and if so, selecting the optimal at least one flow cheating group set as a final detection result in the at least one flow cheating group set detected each time according to the principle that the more members of the detected flow cheating group sets are and the better detection result is.
The dividing the first time period into at least one sub-time period comprises:
dividing the first time period into a plurality of sub-time periods, and enabling two adjacent sub-time periods to be overlapped to preset a second time length.
After the extracting the user click data of each user from the user click data accessing the first link in the first time period and before the extracting the click behavior feature of each user from the user click data of each user, further comprising:
for each user, according to the user click data of the user, counting the maximum time length of the user for continuously accessing the first link in the first time period, if the maximum time length is less than the preset first time length, determining that the user is not a flow cheating group member, and deleting the user click data of the user from the user click data of each user accessing the first link in the first time period.
After the user who has the clustering behavior joins the traffic cheating group aggregation, the method further includes:
for each user with the clustering behavior, respectively judging whether the user meets the following conditions, and if so, deleting the user from the traffic cheating cluster set:
the dwell time of the user on the webpage of the first link is larger than a preset time threshold, or/and the access depth of the user to the first link is larger than a preset depth threshold, or/and the number of types of items purchased by the user on the first link is larger than a preset number.
A traffic cheating behavior recognition device, the device comprising:
the click behavior feature extraction module is used for acquiring user click data of a webpage accessed by a user; extracting user click data for accessing a first link within a first time period from the user click data; extracting user click data of each user from user click data accessing the first link within the first time period; extracting the click behavior characteristics of each user from the user click data of each user;
and the identification module is used for determining whether the clustering behavior exists among the users according to the clicking behavior characteristics of each user, and if so, adding the users with the clustering behavior into the flow cheating clustering set.
A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the traffic cheating behavior identification method as recited in any one of the above.
An electronic device comprising a non-transitory computer readable storage medium as described above, and the processor having access to the non-transitory computer readable storage medium.
In the embodiment of the invention, the user click data accessing the first link in the first time period is extracted from the user click data, the click behavior characteristic of each user is extracted from the user click data of each user, whether the clustering behavior exists among the users is determined according to the click behavior characteristic of each user, and if the clustering behavior exists, the users with the clustering behavior are added into the flow cheating group partner set, so that the identification of the group flow cheating behavior is realized.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart of a method for identifying a traffic cheating action according to an embodiment of the present invention;
FIG. 2 is an exemplary application of the present invention;
fig. 3 is a flowchart of a method for identifying cheating actions in traffic according to another embodiment of the present invention;
FIG. 4 is an exemplary application of the sliding time window of the present invention;
fig. 5 is a schematic structural diagram of a device for identifying a traffic cheating action according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
The technical solution of the present invention will be described in detail with specific examples. Several of the following embodiments may be combined with each other and some details of the same or similar concepts or processes may not be repeated in some embodiments.
The embodiment of the invention provides a flow cheating behavior identification method, which comprises the steps of acquiring user click data of a user access webpage; extracting user click data for accessing a first link within a first time period from the user click data; extracting user click data of each user from user click data accessing the first link within the first time period; extracting the click behavior characteristics of each user from the user click data of each user; and determining whether the users have clustering behavior according to the clicking behavior characteristics of each user, and if so, adding the users with the clustering behavior into the flow cheating group partner set. The embodiment of the invention realizes the identification of the cheating behavior of the group flow.
Fig. 1 is a flowchart of a method for identifying a traffic cheating action according to an embodiment of the present invention, which includes the following steps:
step 101: and acquiring user click data of a user accessing the webpage.
In practical application, the user click data of the user accessing the webpage can be acquired from a station long grouping statistical table, a wake-up UUID (universal Unique Identifier) distribution table, an equipment browsing information table and the like.
The user click data at least includes: user identification information, a webpage link identification clicked by a user and user click time, wherein the webpage link identification is as follows: URL (Uniform Resource Locator).
The user identification information is as follows: user ID, user equipment ID, browser ID, login account number, or any combination thereof.
The user click data may also include: user transaction information such as: the type of item purchased by the user, the user's GMV (Gross Merchandis Volume), etc., or a combination thereof.
Step 102: and extracting user click data for accessing the first link in a first time period from the acquired user click data.
The first time period may be selected as desired.
Step 103: user click data for each user is extracted from user click data for accessing the first link within the first time period.
Step 104: and extracting the click behavior characteristics of each user from the user click data of each user.
Step 105: and determining whether the users have clustering behavior according to the clicking behavior characteristics of each user, and if so, adding the users with the clustering behavior into the flow cheating group partner set.
In the embodiment, the user click data accessing the first link in the first time period is extracted from the user click data, the click behavior feature of each user is extracted from the user click data of each user, whether the clustering behavior exists among the users is determined according to the click behavior feature of each user, and if the clustering behavior exists, the users with the clustering behavior are added into the flow cheating group partner set, so that the identification of the group flow cheating behavior is realized
In an alternative embodiment, the step 105 of determining whether there is a clustering behavior among the users comprises: and calculating the click behavior similarity between every two users according to the click behavior characteristics of each user, and if the click behavior similarity is larger than a preset similarity threshold, determining that the two users corresponding to the click behavior similarity have a clustering behavior.
In the embodiment, the judgment of the clustering behavior between the users is realized by calculating the similarity of the click behavior between every two users.
In an alternative embodiment, in step 104, extracting the click behavior feature of each user from the user click data of each user includes: dividing the first time period into at least one sub-time period; for each sub-time period, acquiring user click data of each user accessing the first link in the sub-time period; for each sub-time period, all users accessing the first link in the sub-time period are completely paired pairwise; for each user pair accessing the first link in each sub-time period, according to the user click data of two users in the user pair in the current sub-time period, counting the number of days for the two users in the user pair to simultaneously access the first link in the current sub-time period, regarding the number of days as a first class of days, counting the number of days for the two users in the user pair to access the first link and only one user in the current sub-time period, and setting the number of days as a second class of days; and taking the first type of days and the second type of days as the click behavior characteristics of two users in the user pair in the current sub-time period.
The term "pair-by-pair complete pairing" herein means that, for each user accessing the first link during the inter-sub period, the user is paired with each of the other users, respectively. For example: the total number of users accessing the first link in the sub-period is 4, which are respectively: a. b, c and d, the pairing result is as follows: ab. ac, ad, bc, bd, cd. When m users access the first link in a sub-period, the matched user pairs share: 1+2+ 3. + (m-1) ═ m (m-1)/2.
In the above embodiment, for each user pair, the number of days that two users in the user pair access the first link simultaneously in the current sub-time period is counted, and the number of days that two users in the user pair access the first link and only one user in the current sub-time period is counted, and the number of days is set as the second type of days; and taking the first type days and the second type days as the click behavior characteristics of the two users in the user pair in the current sub-time period, thereby realizing the extraction of the click behavior characteristics of the users.
Consider that: when the number of days that two users do not access the first link at the same time is large, if the click behavior similarity calculation method is not suitable, the click behavior similarity of the two users calculated under the condition is high, so that the click behavior similarity is not consistent with the actual condition, and aiming at the condition, the embodiment of the invention provides the following click behavior similarity calculation method: in an alternative embodiment, the step 105 of determining whether there is a clustering behavior among the users comprises: for each user pair accessing the first link in each sub-time period, calculating the sum of the first type days and the second type days corresponding to the user pair, and dividing the first type days by the sum to obtain the initial value of the click behavior similarity between the two users in the user pair; dividing the first type of days by the total days of the current sub-time period to obtain a weight; multiplying the weight by the initial value of the click behavior similarity to obtain the click behavior similarity between two users in the user pair, and if the click behavior similarity is greater than a preset similarity threshold, determining that a clustering behavior exists between the two users in the user pair;
and, in step 105, joining the users who have the clustering behavior into the traffic cheating group set includes: and adding two users in the user pair into the flow cheating group and partner set in the current sub-time period.
In the embodiment, the quotient obtained by dividing the first-class days by the total days of the current sub-time period is used as the weight, so that the occurrence of the situation that the similarity of the clicking behaviors of two users is high when the number of days that the two users in the user pair do not access the first link at the same time is large is reduced, and the misjudgment of the cheating behavior of the group flow is finally reduced.
For example: there are 4 users accessing the first link in a sub-period of time, which are: a. b, c and d, the user pair comprises: ab. and ac, ad, bc, bd and cd, wherein the similarity of the clicking behaviors of the users to the ab and the bd is larger than a preset similarity threshold, and then the users a, b and d are added into the flow cheating group and partner set in the sub-time period.
In an optional embodiment, after two users in the pair of users join in the traffic cheating group aggregation in the current sub-time period, the method further includes: when the flow cheating group sets of all the sub-time periods in the first time period are obtained, selecting users who appear in each flow cheating group set; deleting the users appearing in each flow cheating group set from each flow cheating group set respectively to obtain an updated flow cheating group set in each sub-time period; and for each user appearing in each flow cheating group set, calculating the click behavior similarity of the user and each flow cheating group set, selecting the flow cheating group set with the maximum click behavior similarity, and adding the user into the selected flow cheating group set.
In the embodiment, the users who repeatedly appear in all the traffic cheating group sets are finally divided into the traffic cheating group set according to the similarity of the users and the clicking behaviors of each traffic cheating group set, so that the identification accuracy of the traffic cheating group sets is further improved, and the management of the traffic cheating group sets is facilitated.
In an alternative embodiment, calculating the similarity of the click behavior of the user and each traffic cheating group set comprises: and for each flow cheating group set, respectively calculating the click behavior similarity of the user and each user in the flow cheating group set, and selecting the minimum click behavior similarity as the click behavior similarity of the user and the flow cheating group set.
The above embodiment provides a specific scheme of how to determine the similarity of the click behaviors of the user and the traffic cheating group.
Fig. 2 is an application example of the present invention. In this example, the first time period is divided into 4 sub-time periods, each of which is 4 days, and the situation that the users a to m access the first link on each day of the first time period is shown in fig. 2, wherein a value of 1 indicates that the user accesses the first link on the current day, and a value of 0 indicates that the user does not access the first link on the current day.
Respectively calculating the click behavior similarity between every two users in each sub-time period, and adding the users with the click behavior similarity larger than a preset similarity threshold value into the flow cheating group set of the corresponding sub-time period to obtain:
traffic cheating group set for sub-period 1: x1 ═ { a, b, f, h, k };
traffic cheating group set for sub-period 2: x2 ═ { a, b, d, e, h };
traffic cheating group set for sub-period 3: x3 ═ { a, b, i, l };
traffic cheating group set for sub-period 4: x4 ═ { a, b, g, j, l }.
Then: the users who have appeared in each traffic cheating group set are: Φ ═ X1 ═ X2 ═ X3 ═ X4 ═ a, b };
then X1 is updated to X1- Φ { f, h, k }, X2 is X2- Φ { d, e, h }, X3 is X3- Φ { i, l }, and X4 is X4- Φ { g, j, l }.
And for a, calculating click behavior similarities of a and X1, X2, X3 and X4 respectively, and selecting Xn (n is 1, 2, 3 or 4) corresponding to the maximum click behavior similarity as the traffic cheating group set of a. Among them, for example: when the click behavior similarity of a and X1 is calculated, the click behavior similarity of a and f, h and k in X1 is calculated respectively, and the smallest click behavior similarity is selected as the click behavior similarity of a and X1.
And selecting one of X1, X2, X3 and X4 as the flow cheating group set of b by adopting the same method as the method of a.
Consider that: the length of the sub-time period may affect the identification accuracy of the group flow cheating behavior, and for the situation, the embodiment of the invention provides the following optimization scheme:
in an alternative embodiment, dividing the first time period into at least one sub-time period comprises: dividing the first time period into at least one sub-time period according to the initial length of a preset sub-time period or the length and the adjustment step length of the current sub-time period;
and after two users in the user pair are added into the traffic cheating group aggregation in the current sub-time period, the method further comprises the following steps: after the flow cheating group set of all the current sub-time periods is obtained, judging whether the length of the current sub-time period reaches the maximum length of a preset sub-time period or not; if not, returning to the action of dividing the first time period into at least one sub-time period according to the length of the current sub-time period and the adjustment step length; and if so, selecting the optimal at least one flow cheating group set as a final detection result in the at least one flow cheating group set detected each time according to the principle that the more members of the detected flow cheating group sets are and the better detection result is.
In the embodiment, by changing the length of the sub-time period, a plurality of groups of flow cheating group sets are calculated (wherein, a corresponding flow cheating group set is calculated every time the sub-time period is divided), and according to the principle that the detected flow cheating group set is more and more excellent in detection result, an optimal flow cheating group set is selected as a final detection result in the detected plurality of groups of flow cheating group sets, so that the identification precision of the group flow cheating behavior is improved.
In an alternative embodiment, dividing the first time period into at least one sub-time period comprises: dividing the first time period into a plurality of sub-time periods, and enabling two adjacent sub-time periods to be overlapped to preset a second time length.
Consider that: therefore, in order to reduce the workload of identifying the group flow cheating behavior and accelerate the identification speed of the group flow cheating behavior, the embodiment of the invention provides the following optimization scheme:
in an optional embodiment, after step 103 and before step 104, further comprising: for each user, according to the user click data of the user, counting the maximum time length of the user for continuously accessing the first link in the first time period, if the maximum time length is less than the preset first time length, determining that the user is not a flow cheating group member, and deleting the user click data of the user from the user click data of each user accessing the first link in the first time period.
In an optional embodiment, after joining the users with the clustering behavior into the traffic cheating group set in step 105, the method further includes: for each user with the clustering behavior, respectively judging whether the user meets the following conditions, and if so, deleting the user from the traffic cheating cluster set: the dwell time of the user on the webpage of the first link is larger than a preset time threshold, or/and the access depth of the user to the first link is larger than a preset depth threshold, or/and the number of types of items purchased by the user on the first link is larger than a preset number.
According to the embodiment, whether the user belongs to the traffic cheating group member or not is further confirmed by analyzing the stay time of the user on the webpage of the first link, or/and the access depth of the user on the first link, or/and the type of the article purchased by the user on the first link, misjudgment on the traffic cheating behavior is avoided, and the accuracy of traffic cheating behavior identification is improved.
Fig. 3 is a flowchart of a method for identifying a traffic cheating action according to another embodiment of the present invention, which includes the following specific steps:
step 301: and acquiring user click data of a user accessing the webpage.
In practical application, user click data of a user access webpage can be acquired from a station long grouping statistical table, a wake-up UUID distribution table, an equipment browsing information table and the like.
The user click data at least includes: user identification information, a webpage link identification clicked by a user and user click time, wherein the webpage link identification is as follows: URL (Uniform Resource Locator).
The user identification information is as follows: user ID, user equipment ID, browser ID, login account number, or any combination thereof.
The user click data may also include: user transaction information such as: the type of item purchased by the user, the user's GMV (Gross Merchandis Volume), etc., or a combination thereof.
In practical application, if some information in the user click data has data loss, the loss value can be filled by adopting methods such as an adjacent average value, a Bayesian formalization method or decision tree induction; in addition, noise filtering may be performed on each piece of information in the user click data, for example: and filtering according to the click time of the user, or filtering out null value data of the station leader ID, and the like.
Step 302: selecting a first time period, extracting user click data accessing the first link in the first time period from the user click data acquired in step 301, and setting an initial window length, an initial sliding step length, a window length adjustment step length, a sliding step length adjustment step length, a maximum window length and a maximum sliding step length of the sliding time window.
In practical applications, for example: in a traffic activation scenario, all users to be activated may be assigned to multiple traffic activation actors. Each traffic activation executor activates the user by posting a link of interest to the user on a certain web page or web pages to attract the user to click on the link. For this scenario, the first link may be a link that a certain traffic activation actor drops.
In practical applications, consider: therefore, in this step 302, after the user click data for accessing the first link in the first time period is extracted, the user click data in which the maximum duration for continuously accessing the first link is less than the first duration is filtered.
For example: and if the maximum time for continuously accessing the first link of a certain user does not exceed 30 days, the user is considered to be not a flow cheating group member, and the user click data is filtered out without participating in the subsequent flow.
Step 303: and sequentially sliding the sliding time window over the first time period according to the current window length and the current sliding step length.
In practical applications, the sliding time window may start from day q +1 if no user clicks on the first link for the first q days (q is a positive integer) within the first time period. For example: as shown in fig. 4, assuming that the first time period is 2020, the year and no user clicks on the first link in the first 58 days of 2020, the sliding time windows start from day 59, where in fig. 4, the window length of each time window is 120 days and the sliding step size is 10 days.
Step 304: for each time window, selecting all user click data accessing the first link within the time window from all user click data accessing the first link within the first time period.
Step 305: and for each user pair, calculating the similarity of the clicking behaviors of the two users according to the number of days that the two users in the user pair simultaneously access the first link in the time window and the number of days that only one user accesses the first link.
Here, to pair each user accessing the first link within the time window completely means that, for any user a, the user a is paired with each other user, for example: there are 4 users accessing the first link in the time window, which are: a. b, c and d, the pairing result is as follows: ab. ac, ad, bc, bd, cd. When there are m users accessing the first link within a time window, the resulting pair of users is 1+2+ 3. + (m-1) ═ m (m-1)/2.
The calculation formula of the click behavior similarity is as follows:
Figure BDA0003229111570000141
wherein, δ is click behavior similarity; n is the number of days per time window; k is the kth day in the current time window;
αkthe meaning of (A) is: if two users in the current user pair access the first link at the same time on the k-th day, alpha k1, otherwise, αk=0;
βkThe meaning of (A) is: if on day k, the first link is accessed by only one and only one user within the current user pair (i.e., either user a or user b accessed), then βk1, otherwise, βk=0。
That is to say, the position of the nozzle is,
Figure BDA0003229111570000142
the total number of days for two users in the current user pair to simultaneously access the first link in the current time window;
Figure BDA0003229111570000143
refers to the total number of days within the current time window that one of the two users within the current user pair accessed the first link.
For example: n is 120 days, the number of days for which the users a and b in the user pair simultaneously access the first link is 80 days, the number of days for which the users a and b do not access the first link is 10 days, and the number of days for which only one user accesses (i.e., either the user a accesses or the user b accesses) the first link is 30 days, then the click behavior similarity of the users a and b is:
Figure BDA0003229111570000144
in addition, in the formula (1)
Figure BDA0003229111570000145
The effect of the weight is to reduce the similarity of the click behaviors of two users in the user pair caused by the fact that the two users do not access the first link at the same time for a large number of days.
Step 306: and according to the click behavior similarity of each user pair in the time window, selecting each user pair with the click behavior similarity larger than a preset similarity threshold, and adding all the users in the selected user pairs into the flow cheating group set of the time window.
For example: there are 4 users accessing the first link in the time window, which are: a. b, c and d, the user pair comprises: ab. and ac, ad, bc, bd and cd, wherein the similarity of the clicking behaviors of the users to the ab and the bd is larger than a preset similarity threshold, and the users a, b and d are added into the flow cheating group and partner set of the time window.
Step 307: and selecting users appearing in the traffic cheating group set of each time window according to the traffic cheating group sets of the time windows in the first time period.
Step 308: and deleting the users appearing in the flow cheating group set of each time window from the flow cheating group set of each time window respectively to obtain the updated flow cheating group set of each time window.
Step 309: and for each user appearing in the flow cheating group set of each time window, respectively calculating the click behavior similarity of the user and the flow cheating group set of each time window, selecting the maximum click behavior similarity, and adding the user into the flow cheating group set corresponding to the maximum click behavior similarity.
The method comprises the following steps of calculating the similarity of the click behaviors of the user and the flow cheating group set of each time window, wherein the similarity of the click behaviors of the user and the flow cheating group set of each time window is specifically calculated as follows: and for the flow cheating group set of each time window, respectively calculating the click behavior similarity of the user and each user in the flow cheating group set, and selecting the minimum click behavior similarity as the click behavior similarity of the user and the flow cheating group set.
Step 310: judging whether the window length and the sliding step length of each current time window respectively reach the maximum window length and the maximum sliding step length, if so, executing a step 312; otherwise, step 311 is performed.
Step 311: and adjusting the window length of the sliding time window or/and the sliding step length according to the preset window length adjustment step length or/and the sliding step length adjustment step length, and returning to the step 303.
Step 312: according to the principle that the more the detected flow cheating group members are in each time window, the better the detection result is, an optimal group of flow cheating group set is selected from the detected multiple groups of flow cheating group sets, and the selected optimal group of flow cheating group set is used as a final detection result.
Wherein, a group of flow cheating group set is calculated according to each group of window length and sliding step length.
In practical application, the embodiment of the invention can be periodically executed, the flow cheating group partner set is updated according to the latest user click data, the identification mode can be carried out based on the real-time flow click behavior, so that the identification accuracy is improved, the periodic group partner flow cheating behavior can be identified at random, and the application range is wide.
Fig. 5 is a schematic structural diagram of a device for identifying a flow cheating behavior according to an embodiment of the present invention, where the device mainly includes:
the click behavior feature extraction module 51 is configured to obtain user click data of a user accessing a webpage; extracting user click data for accessing the first link within a first time period from the user click data; extracting user click data of each user from user click data accessing the first link in a first time period; and extracting the click behavior characteristics of each user from the user click data of each user.
The identifying module 52 is configured to determine whether a clustering behavior exists among the users according to the click behavior feature of each user extracted by the click behavior feature extraction module 51, and if so, join the users who have the clustering behavior into the traffic cheating group partner set.
In an alternative embodiment, the identification module 52 determining whether there is a clustering behavior between users includes: and calculating the click behavior similarity between every two users according to the click behavior characteristics of each user, and if the click behavior similarity is larger than a preset similarity threshold, determining that the two users corresponding to the click behavior similarity have a clustering behavior.
In an optional embodiment, the user click data obtained by the click behavior feature extraction module 51 includes: user identification information, a webpage link identification clicked by a user and user click time;
the click behavior feature extraction module 51 extracts click behavior features of each user from the user click data of each user, including: dividing the first time period into at least one sub-time period; for each sub-time period, acquiring user click data of each user accessing the first link in the sub-time period; for each sub-time period, all users accessing the first link in the sub-time period are completely paired pairwise; for each user pair accessing the first link in each sub-time period, according to the user click data of two users in the user pair in the current sub-time period, counting the number of days for the two users in the user pair to simultaneously access the first link in the current sub-time period, regarding the number of days as a first class of days, counting the number of days for the two users in the user pair to access the first link and only one user in the current sub-time period, and setting the number of days as a second class of days; and taking the first type of days and the second type of days as the click behavior characteristics of two users in the user pair in the current sub-time period.
In an alternative embodiment, the identification module 52 determines whether there is a clustering behavior between users, including: for each user pair accessing the first link in each sub-time period, calculating the sum of the first type days and the second type days corresponding to the user pair, and dividing the first type days by the sum to obtain the initial value of the click behavior similarity between the two users in the user pair; dividing the first type of days by the total days of the current sub-time period to obtain a weight; multiplying the weight by the initial value of the click behavior similarity to obtain the click behavior similarity between two users in the user pair, and if the click behavior similarity is greater than a preset similarity threshold, determining that a clustering behavior exists between the two users in the user pair;
the recognition module 52 joins users with a clustering behavior in a traffic cheating group aggregation, including: and adding two users in the user pair into the flow cheating group and partner set in the current sub-time period.
In an optional embodiment, after the identifying module 52 joins two users in the pair of users to the traffic cheating group aggregation in the current sub-time period, the method further includes: when the flow cheating group sets of all the sub-time periods in the first time period are obtained, selecting users who appear in each flow cheating group set; deleting the users appearing in each flow cheating group set from each flow cheating group set respectively to obtain an updated flow cheating group set in each sub-time period; and for each user appearing in each flow cheating group set, calculating the click behavior similarity of the user and each flow cheating group set, selecting the flow cheating group set with the maximum click behavior similarity, and adding the user into the selected flow cheating group set.
In an alternative embodiment, the identifying module 52 calculates the similarity of the click-through behavior of the user with each traffic cheating group, including: and for each flow cheating group set, respectively calculating the click behavior similarity of the user and each user in the flow cheating group set, and selecting the minimum click behavior similarity as the click behavior similarity of the user and the flow cheating group set.
In an alternative embodiment, the click behavior feature extraction module 51 divides the first time period into at least one sub-time period, which includes: dividing the first time period into at least one sub-time period according to the initial length of a preset sub-time period or the length and the adjustment step length of the current sub-time period;
moreover, after the identifying module 52 joins two users in the pair of users into the traffic cheating group aggregation in the current sub-time period, the method further includes: after the flow cheating group set of all the current sub-time periods is obtained, judging whether the length of the current sub-time period reaches the maximum length of a preset sub-time period or not; if not, the click behavior feature extraction module 51 is notified to return the action of dividing the first time period into at least one sub-time period according to the length and the adjustment step length of the current sub-time period; and if so, selecting the optimal at least one flow cheating group set as a final detection result in the at least one flow cheating group set detected each time according to the principle that the more members of the detected flow cheating group sets are and the better detection result is.
In an alternative embodiment, the click behavior feature extraction module 51 divides the first time period into at least one sub-time period, which includes: dividing the first time period into a plurality of sub-time periods, and enabling two adjacent sub-time periods to be overlapped to preset a second time length.
In an optional embodiment, after the extracting the user click data of each user from the user click data accessing the first link in the first time period and before extracting the click behavior feature of each user from the user click data of each user, the click behavior feature extracting module 51 further includes: for each user, according to the user click data of the user, counting the maximum time length of the user for continuously accessing the first link in the first time period, if the maximum time length is less than the preset first time length, determining that the user is not a flow cheating group member, and deleting the user click data of the user from the user click data of each user accessing the first link in the first time period.
In an optional embodiment, after the identifying module 52 joins the user with the clustering behavior in the traffic cheating group, the method further includes: for each user with the clustering behavior, respectively judging whether the user meets the following conditions, and if so, deleting the user from the traffic cheating cluster set: the dwell time of the user on the webpage of the first link is larger than a preset time threshold, or/and the access depth of the user to the first link is larger than a preset depth threshold, or/and the number of types of items purchased by the user on the first link is larger than a preset number.
Embodiments of the present application also provide a computer-readable storage medium storing instructions, which when executed by a processor, may perform the steps in the traffic cheating behavior identification method as described above. In practical applications, the computer readable medium may be included in each device/apparatus/system of the above embodiments, or may exist separately and not be assembled into the device/apparatus/system. Wherein instructions are stored in a computer readable storage medium, which stored instructions, when executed by a processor, may perform the steps in the traffic cheating behavior identification method as described above.
According to embodiments disclosed herein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, without limiting the scope of the present disclosure. In the embodiments disclosed herein, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
As shown in fig. 6, an embodiment of the present invention further provides an electronic device. As shown in fig. 6, it shows a schematic structural diagram of an electronic device according to an embodiment of the present invention, specifically:
the electronic device may include a processor 61 of one or more processing cores, memory 62 of one or more computer-readable storage media, and a computer program stored on the memory and executable on the processor. The above-described traffic cheating act recognition method may be implemented when the program of the memory 62 is executed.
Specifically, in practical applications, the electronic device may further include a power supply 63, an input/output unit 64, and the like. Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 6 is not intended to be limiting of the electronic device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. Wherein:
the processor 61 is a control center of the electronic device, connects various parts of the entire electronic device by various interfaces and lines, performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 62 and calling data stored in the memory 62, thereby performing overall monitoring of the electronic device.
The memory 62 may be used to store software programs and modules, i.e., the computer-readable storage media described above. The processor 61 executes various functional applications and data processing by executing software programs and modules stored in the memory 62. The memory 62 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 62 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 62 may also include a memory controller to provide the processor 61 access to the memory 62.
The electronic device further comprises a power supply 63 for supplying power to the various components, which can be logically connected to the processor 61 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system. The power supply 63 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The electronic device may also include an input-output unit 64, the input-unit output 64 operable to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. The input unit output 64 may also be used to display information input by or provided to the user as well as various graphical user interfaces, which may be made up of graphics, text, icons, video, and any combination thereof.
The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.
The principles and embodiments of the present invention are explained herein using specific examples, which are provided only to help understanding the method and the core idea of the present invention, and are not intended to limit the present application. It will be appreciated by those skilled in the art that changes may be made in this embodiment and its broader aspects and without departing from the principles, spirit and scope of the invention, and that all such modifications, equivalents, improvements and equivalents as may be included within the scope of the invention are intended to be protected by the claims.

Claims (13)

1. A flow cheating behavior identification method is characterized by comprising the following steps:
acquiring user click data of a webpage accessed by a user;
extracting user click data for accessing a first link within a first time period from the user click data;
extracting user click data of each user from user click data accessing the first link within the first time period;
extracting the click behavior characteristics of each user from the user click data of each user;
and determining whether the users have clustering behavior according to the clicking behavior characteristics of each user, and if so, adding the users with the clustering behavior into the flow cheating group partner set.
2. The method of claim 1, wherein determining whether there is a clustering behavior among users comprises:
and calculating the click behavior similarity between every two users according to the click behavior characteristics of each user, and if the click behavior similarity is larger than a preset similarity threshold, determining that the two users corresponding to the click behavior similarity have a clustering behavior.
3. The method of claim 1, wherein the user click data comprises: user identification information, a webpage link identification clicked by a user and user click time;
the extracting the click behavior feature of each user from the user click data of each user comprises: dividing the first time period into at least one sub-time period;
for each sub-time period, acquiring user click data of each user accessing the first link in the sub-time period;
for each sub-time period, all users accessing the first link in the sub-time period are completely paired pairwise;
for each user pair accessing the first link in each sub-time period, according to the user click data of two users in the user pair in the current sub-time period, counting the number of days for the two users in the user pair to simultaneously access the first link in the current sub-time period, regarding the number of days as a first class of days, counting the number of days for the two users in the user pair to access the first link and only one user in the current sub-time period, and setting the number of days as a second class of days;
and taking the first type of days and the second type of days as the click behavior characteristics of two users in the user pair in the current sub-time period.
4. The method of claim 3, wherein determining whether there is a clustering behavior among users comprises:
for each user pair accessing the first link in each sub-time period, calculating the sum of the first type days and the second type days corresponding to the user pair, and dividing the first type days by the sum to obtain the initial value of the click behavior similarity between the two users in the user pair; dividing the first type of days by the total days of the current sub-time period to obtain a weight; multiplying the weight by the initial value of the click behavior similarity to obtain the click behavior similarity between two users in the user pair, and if the click behavior similarity is greater than a preset similarity threshold, determining that a clustering behavior exists between the two users in the user pair;
the joining of the users with the clustering behavior into the flow cheating group partner set comprises the following steps:
and adding two users in the user pair into the flow cheating group and partner set in the current sub-time period.
5. The method of claim 4, wherein after joining two users in the pair of users into the traffic cheating group aggregation in the current sub-period of time, further comprising:
when the flow cheating group sets of all the sub-time periods in the first time period are obtained, selecting users who appear in each flow cheating group set;
deleting the users appearing in each flow cheating group set from each flow cheating group set respectively to obtain an updated flow cheating group set in each sub-time period;
and for each user appearing in each flow cheating group set, calculating the click behavior similarity of the user and each flow cheating group set, selecting the flow cheating group set with the maximum click behavior similarity, and adding the user into the selected flow cheating group set.
6. The method of claim 5, wherein calculating the similarity of click behavior of the user with each traffic cheating group comprises:
and for each flow cheating group set, respectively calculating the click behavior similarity of the user and each user in the flow cheating group set, and selecting the minimum click behavior similarity as the click behavior similarity of the user and the flow cheating group set.
7. The method of claim 4, wherein dividing the first time period into at least one sub-time period comprises:
dividing the first time period into at least one sub-time period according to the initial length of a preset sub-time period or the length and the adjustment step length of the current sub-time period;
and after the two users in the user pair are added into the traffic cheating group aggregation in the current sub-time period, the method further comprises the following steps:
after the flow cheating group set of all the current sub-time periods is obtained, judging whether the length of the current sub-time period reaches the maximum length of a preset sub-time period or not;
if not, returning to the action of dividing the first time period into at least one sub-time period according to the length of the current sub-time period and the adjustment step length;
and if so, selecting the optimal at least one flow cheating group set as a final detection result in the at least one flow cheating group set detected each time according to the principle that the more members of the detected flow cheating group sets are and the better detection result is.
8. The method of claim 3, wherein dividing the first time period into at least one sub-time period comprises:
dividing the first time period into a plurality of sub-time periods, and enabling two adjacent sub-time periods to be overlapped to preset a second time length.
9. The method of claim 1, wherein after extracting the user click data of each user from the user click data accessing the first link in the first time period and before extracting the click behavior feature of each user from the user click data of each user, further comprising:
for each user, according to the user click data of the user, counting the maximum time length of the user for continuously accessing the first link in the first time period, if the maximum time length is less than the preset first time length, determining that the user is not a flow cheating group member, and deleting the user click data of the user from the user click data of each user accessing the first link in the first time period.
10. The method of claim 1, wherein after joining a user with a clustering behavior into a traffic cheating group, further comprising:
for each user with the clustering behavior, respectively judging whether the user meets the following conditions, and if so, deleting the user from the traffic cheating cluster set:
the dwell time of the user on the webpage of the first link is larger than a preset time threshold, or/and the access depth of the user to the first link is larger than a preset depth threshold, or/and the number of types of items purchased by the user on the first link is larger than a preset number.
11. A traffic cheating behavior recognition apparatus, comprising:
the click behavior feature extraction module is used for acquiring user click data of a webpage accessed by a user; extracting user click data for accessing a first link within a first time period from the user click data; extracting user click data of each user from user click data accessing the first link within the first time period; extracting the click behavior characteristics of each user from the user click data of each user;
and the identification module is used for determining whether the clustering behavior exists among the users according to the clicking behavior characteristics of each user, and if so, adding the users with the clustering behavior into the flow cheating clustering set.
12. A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the traffic cheating behavior identification method recited in any of claims 1-10.
13. An electronic device comprising the non-transitory computer readable storage medium of claim 12, and the processor having access to the non-transitory computer readable storage medium.
CN202110981015.0A 2021-08-25 2021-08-25 Flow cheating behavior identification method and device, storage medium and electronic equipment Pending CN113592036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110981015.0A CN113592036A (en) 2021-08-25 2021-08-25 Flow cheating behavior identification method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110981015.0A CN113592036A (en) 2021-08-25 2021-08-25 Flow cheating behavior identification method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113592036A true CN113592036A (en) 2021-11-02

Family

ID=78239703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110981015.0A Pending CN113592036A (en) 2021-08-25 2021-08-25 Flow cheating behavior identification method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113592036A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511134A (en) * 2021-12-30 2022-05-17 北京字跳网络技术有限公司 Wind control strategy generation method, device, storage medium and program product
CN114818933A (en) * 2021-12-23 2022-07-29 金数信息科技(苏州)有限公司 Method and device for monitoring artificial flow cheating based on Epsilon greedy algorithm

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8533825B1 (en) * 2010-02-04 2013-09-10 Adometry, Inc. System, method and computer program product for collusion detection
CN104765874A (en) * 2015-04-24 2015-07-08 百度在线网络技术(北京)有限公司 Method and device for detecting click-cheating
CN108898505A (en) * 2018-05-28 2018-11-27 武汉斗鱼网络科技有限公司 Recognition methods, corresponding medium and the electronic equipment of cheating clique
JP2019003629A (en) * 2017-06-16 2019-01-10 Line株式会社 Cheating application identification method and system
CN110213209A (en) * 2018-05-11 2019-09-06 腾讯科技(深圳)有限公司 A kind of cheat detection method, device and storage medium that pushed information is clicked
CN110659954A (en) * 2019-08-29 2020-01-07 北京三快在线科技有限公司 Cheating identification method and device, electronic equipment and readable storage medium
CN112163096A (en) * 2020-09-18 2021-01-01 中国建设银行股份有限公司 Malicious group determination method and device, electronic equipment and storage medium
CN112488765A (en) * 2020-12-08 2021-03-12 深圳市欢太科技有限公司 Advertisement anti-cheating method, advertisement anti-cheating device, electronic equipment and storage medium
CN112766995A (en) * 2019-10-21 2021-05-07 招商证券股份有限公司 Article recommendation method and device, terminal device and storage medium
CN112800419A (en) * 2019-11-13 2021-05-14 北京数安鑫云信息技术有限公司 Method, apparatus, medium and device for identifying IP group
CN112989295A (en) * 2019-12-16 2021-06-18 北京沃东天骏信息技术有限公司 User identification method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8533825B1 (en) * 2010-02-04 2013-09-10 Adometry, Inc. System, method and computer program product for collusion detection
CN104765874A (en) * 2015-04-24 2015-07-08 百度在线网络技术(北京)有限公司 Method and device for detecting click-cheating
WO2016169193A1 (en) * 2015-04-24 2016-10-27 百度在线网络技术(北京)有限公司 Method and apparatus for detecting cheated clicks
JP2019003629A (en) * 2017-06-16 2019-01-10 Line株式会社 Cheating application identification method and system
CN110213209A (en) * 2018-05-11 2019-09-06 腾讯科技(深圳)有限公司 A kind of cheat detection method, device and storage medium that pushed information is clicked
CN108898505A (en) * 2018-05-28 2018-11-27 武汉斗鱼网络科技有限公司 Recognition methods, corresponding medium and the electronic equipment of cheating clique
CN110659954A (en) * 2019-08-29 2020-01-07 北京三快在线科技有限公司 Cheating identification method and device, electronic equipment and readable storage medium
CN112766995A (en) * 2019-10-21 2021-05-07 招商证券股份有限公司 Article recommendation method and device, terminal device and storage medium
CN112800419A (en) * 2019-11-13 2021-05-14 北京数安鑫云信息技术有限公司 Method, apparatus, medium and device for identifying IP group
CN112989295A (en) * 2019-12-16 2021-06-18 北京沃东天骏信息技术有限公司 User identification method and device
CN112163096A (en) * 2020-09-18 2021-01-01 中国建设银行股份有限公司 Malicious group determination method and device, electronic equipment and storage medium
CN112488765A (en) * 2020-12-08 2021-03-12 深圳市欢太科技有限公司 Advertisement anti-cheating method, advertisement anti-cheating device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HAYDEN CHEERS; YUQING LIN; SHAMUS P. SMITH: "Academic Source Code Plagiarism Detection by Measuring Program Behavioral Similarity", IEEE ACCESS, 29 March 2021 (2021-03-29) *
孙勇;谭文安;金婷;周亮广;: "基于在线聚类的协同作弊团体识别方法", 计算机研究与发展, no. 06, 15 June 2018 (2018-06-15) *
李彤岩;李兴明;: "基于双约束滑动时间窗口的告警预处理方法研究", 计算机应用研究, no. 02, 15 February 2013 (2013-02-15) *
陈霞;闵华清;宋恒杰;: "众包平台作弊用户自动识别", 计算机工程, no. 08, 9 March 2016 (2016-03-09) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818933A (en) * 2021-12-23 2022-07-29 金数信息科技(苏州)有限公司 Method and device for monitoring artificial flow cheating based on Epsilon greedy algorithm
CN114818933B (en) * 2021-12-23 2024-05-28 金数信息科技(苏州)有限公司 Method and device for monitoring artificial flow cheating based on Epsilon greedy algorithm
CN114511134A (en) * 2021-12-30 2022-05-17 北京字跳网络技术有限公司 Wind control strategy generation method, device, storage medium and program product

Similar Documents

Publication Publication Date Title
CN107613022B (en) Content pushing method and device and computer equipment
WO2019153604A1 (en) Device and method for creating human/machine identification model, and computer readable storage medium
CN108833458B (en) Application recommendation method, device, medium and equipment
CN103116582B (en) A kind of information retrieval method and related system and device
JP2019533205A (en) User keyword extraction apparatus, method, and computer-readable storage medium
CN110971659A (en) Recommendation message pushing method and device and storage medium
US20180060426A1 (en) Systems and methods for issue management
CN108259638B (en) Intelligent sorting method for personal group list, intelligent terminal and storage medium
CN108345601B (en) Search result ordering method and device
CN112380859A (en) Public opinion information recommendation method and device, electronic equipment and computer storage medium
CN113592036A (en) Flow cheating behavior identification method and device, storage medium and electronic equipment
US11887013B2 (en) System and method for facilitating model-based classification of transactions
WO2013082297A2 (en) Classifying attribute data intervals
CN108921587B (en) Data processing method and device and server
CN111563198B (en) Material recall method, device, equipment and storage medium
CN112561332B (en) Model management method, device, electronic equipment, storage medium and program product
CN111460384A (en) Policy evaluation method, device and equipment
US20190220924A1 (en) Method and device for determining key variable in model
CN111444438B (en) Method, device, equipment and storage medium for determining quasi-recall rate of recall strategy
CN110825868A (en) Topic popularity based text pushing method, terminal device and storage medium
CN111291082B (en) Data aggregation processing method, device, equipment and storage medium
CN113763066A (en) Method and apparatus for analyzing information
CN110929169A (en) Position recommendation method based on improved Canopy clustering collaborative filtering algorithm
CN110737432A (en) script aided design method and device based on root list
CN112529181B (en) Method and apparatus for model distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination