CN106657007A - Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model - Google Patents

Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model Download PDF

Info

Publication number
CN106657007A
CN106657007A CN201611019839.5A CN201611019839A CN106657007A CN 106657007 A CN106657007 A CN 106657007A CN 201611019839 A CN201611019839 A CN 201611019839A CN 106657007 A CN106657007 A CN 106657007A
Authority
CN
China
Prior art keywords
behavior
frequency
user
threshold values
cookie
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611019839.5A
Other languages
Chinese (zh)
Inventor
曹杰
冯雨晖
宿晓坤
杨睿
李学超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING HONGMA MEDIA CULTURE DEVELOPMENT CO LTD
Original Assignee
BEIJING HONGMA MEDIA CULTURE DEVELOPMENT CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING HONGMA MEDIA CULTURE DEVELOPMENT CO LTD filed Critical BEIJING HONGMA MEDIA CULTURE DEVELOPMENT CO LTD
Priority to CN201611019839.5A priority Critical patent/CN106657007A/en
Publication of CN106657007A publication Critical patent/CN106657007A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a method for recognizing an abnormal batch ticket booking behavior based on a DBSCAN model, and the method comprises the steps: carrying out the hashing of user IP, Cookie and access agent environment Agent in a network ticket booking behavior record of at least one marked highly-centralized registration account cluster into a global unique coding character string IP+Cookie+Agent after the registration number in a predetermined time period is higher than a recognition threshold value of the registration number in a reference time period, and forming a unique user identification; recognizing the user IP of an abnormal behavior attribute in network ticket booking behavior attributes, and storing the user IP of the abnormal behavior attribute in a blacklist for isolation. The method provides data criterion for the judgment of scalpers through determining various threshold values for the recognition of scalpers. The behavior features of users are recorded in real time, and the real-time criterion can be provided for the interception of scalpers. According to the method, the blacklist can be built, and the scalpers can be prevented in advance based on the blacklist, thereby enabling the resource distribution to be more reasonable and fair.

Description

Method of the identification based on the improper batch booking behavior of DBSCAN models
Technical field
The present invention relates to abnormal behaviour technology of identification field, more particularly to a kind of identification is based on the improper of DBSCAN models The method of batch booking behavior.
Background technology
Live performance ticket is few due to the high resource of price, can attract substantial amounts of ox come brush ticket (network booking exception row For), then high price is resell at a profit, the appearance of ox, damages the interests of user, greatly reduces the Consumer's Experience peace of online ticketing User's viscosity of platform.Ox, often by many accounts of machine batch registration, can also be carried out to rob ticket by multiple accounts High frequency, substantial amounts of access is placed an order with most fast speed and occupies resource.So ox typically can carry out brush ticket by program.At present Identification ox is originated by the access of counting user, and visitation frequency, is therefrom found out different from most of users access cycle Access exception, so as to be judged to ox.Set up ox blacklist.The not necessarily one real user of definition of ox, Can also be a resource, ox carries out brush ticket using the resource, and this resource is also brought in ox blacklist, thus, meeting There are IP blacklists, Cookie blacklists, account blacklist etc..
The mode of current identification ox, by parsing, calculates the IP in daily record mainly by monitoring access log, Cookie, equipment, the visitation frequency of account, access time is spaced to recognize abnormal access, and this is prevented to a certain extent Ox.But when using above-mentioned technology, inventor has found, the identification of single dimension, it is impossible to enough unique differentiation user equipmenies, Easily manslaughter normal users, such as IP, ox and normal users in same building or cell, with same outlet IP, if Recognized using IP, easily manslaughter normal users.Second, frequency identification can only recognize ox to a certain extent, when ox is drawn It is big to access interval, visitation frequency is reduced, with regard to bad judgement.And ox can simulate different clients, carrying out by all kinds of means is brushed Ticket.Ox can take a shortcut to quickly rob ticket, will not operate as normal users, thus its action trail also lacks committed step, So being currently based on the recognition methods of flow access exception, it has been unsatisfactory for recognizing the needs of ox.
The content of the invention
In order to solve above-mentioned technical problem, the invention provides a kind of improper batch of identification based on DBSCAN models is purchased The method of ticket behavior, can identify network batch booking abnormal behavior from normal booking behavioural characteristic, carry out Isolation, and probability of misrecognition is reduced, the distribution for making resource has more reasonability and fairness.
The invention provides a kind of method of identification based on the improper batch booking behavior of DBSCAN models, including:
After recognition threshold of the number-of-registration in monitoring predetermined amount of time higher than the number-of-registration of reference time section, base is obtained At least one high concentration marked after all registration behaviors that density clustering algorithm was scanned in the pre-identification time period Login account cluster;
By the User IP in the network booking behavior record of the login account cluster of at least one high concentration of the mark, Cookie and access agent environment Agents Hash turn to a globally unique coded string IP+Cookie+Agent, are formed Unique subscriber identification;
Extract the network in the web-based history booking behavior record and real-time network booking behavior record of the ID Booking behavior property;
The User IP of the abnormal behaviour attribute in the network booking behavior property is recognized, by the abnormal behaviour attribute User IP is stored in blacklist is isolated.
Further, in the booking behavior record by network User IP, Cookie and access agent environment Agents are breathed out It is uncommon to turn to a globally unique coded string IP+Cookie+Agent, unique subscriber identification is formed, including:
The User IP in network booking behavior record, Cookie and access agent environment Agents are breathed out by hash function It is uncommon to turn to a globally unique coded string IP+Cookie+Agent, form unique subscriber identification.
Further, the User IP of the abnormal behaviour attribute in the identification network booking behavior property, will be described The User IP of abnormal behaviour attribute is stored in blacklist is isolated, including:
The frequency threshold values and blacklist in the network booking behavior property is recognized, the frequency threshold values includes but do not limit In:Different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+ Cookie+agent accesses one or more in the frequency of difference url;
The User IP of abnormal behaviour is identified by the frequency threshold values and blacklist, the User IP for identifying is deposited Enter blacklist to be isolated.
Further, the web-based history booking behavior record for extracting the ID and real-time network booking behavior Network booking behavior property in record, including:
Extract the history frequency threshold values and history blacklist in the historical behavior record in the ID;
Extract and there is potential buying behavior exception threshold values in the historical trading behavior record in the ID and surpass Go out the blacklist of the abnormal register user of buying behavior exception threshold values;
Active user's visitation frequency and path in the current accessed behavior record of Real-time Collection ID.
Further, the history frequency threshold values in the historical behavior record extracted in the ID, including with Lower step:
Log file contents during historical behavior is recorded are loaded in big data number storehouse Hive, and daily record is set up in Hive Document formatting tables of data, log file contents are formatted in tables of data;
In tables of data, visitation frequency is calculated, and result of calculation is stored in big data number storehouse;The visitation frequency includes But it is not limited to different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+ Cookie+agent accesses one or more in the frequency of difference url;
Using histogram, observation frequency distribution, self-defined determination history frequency threshold values simultaneously stores the history frequency threshold values.
Further, the history blacklist in the historical behavior record extracted in the ID, including it is following Step:
The user access logses file of the previous day on server different in ngnix server proxy clusters is focused on On distributed memory system HDFS;
Log file contents are loaded in big data number storehouse Hive, log file formats data are set up in Hive Table, log file contents are formatted in tables of data;
In Hive, difference IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent are calculated Visitation frequency, IP+cookie+agent accesses the frequency of difference url;Result of calculation is stored in big data number storehouse;Using Nogata Figure, observation frequency distribution, self-defined determination history frequency threshold values;
Based on a determination that history frequency threshold values and frequency result of calculation, identify abnormal client, be deposited into blacklist table In.
Further, there is potential buying behavior in the historical trading behavior record extracted in the ID Abnormal threshold values, comprises the following steps:
Importing historical trading behavior record is in data warehouse;
Calculate the single game time booking number of each User IP, bought item number, average booking number;
Using histogram, observation single game time booking number, bought item number, average booking number distribution, according to custom rule There is potential buying behavior exception threshold values in analysis determination, and store and described there is potential buying behavior exception threshold values.
Further, beyond buying behavior exception valve in the historical trading behavior record extracted in the ID The blacklist of the abnormal register user of value, comprises the following steps:
Importing the previous day All Activity record and the transaction record of at least a year are in data warehouse;
Calculate the single game time booking number within each User IP 1 year, bought item number, average booking number;Using Nogata Figure, observation single game time booking number, bought item number, average booking number distribution exists potential according to custom rule analysis determination Buying behavior exception threshold values;
Based on a determination that potential buying behavior exception threshold values and frequency result of calculation, identify beyond buying behavior exception The abnormal register user of threshold values, in being deposited into blacklist table.
Further, the active user's visitation frequency in the current accessed behavior record of the Real-time Collection ID and Abnormal access path, including:
The access log file of nginx is read in real time and is sent to log processing system;
The daily record that log processing system real-time reception Log Collect System sends, with one second as a calculation window, meter IP visitation frequencies are calculated, the frequency of the access url of each IP, IP+cookie+agent visitation frequencies, IP+cookie+agent is visited The frequency and the abnormal access path of url are asked, and result of calculation is stored in caching.
Further, the calculation of the recognition threshold includes:
Wherein, the ratio that α uprushes for data, P1 be the pre-identification time period in number-of-registration, n be the pre-identification time period it A front continuous base, n+m continuous multiple bases before being the pre-identification time period, Pi is reference time section Number-of-registration, number-of-registration maximums of the Pmax before being the pre-identification time period in continuous multiple bases, Pmin is pre- Number-of-registration minimum of a value before recognition time section in continuous multiple bases.
The present invention is higher than the identification threshold of the number-of-registration of reference time section by the number-of-registration in monitoring predetermined amount of time After value, at least one marked after all registration behaviors scanned based on density clustering algorithm in the pre-identification time period is obtained The login account cluster of high concentration;The network booking behavior of the login account cluster of at least one high concentration of the mark is remembered User IP, Cookie and access agent environment Agents Hash in record turns to a globally unique coded string IP+ Cookie+Agent, forms unique subscriber identification;Extract the web-based history booking behavior record and Real-time Network of the ID Network booking behavior property in network booking behavior record;Recognize abnormal behaviour attribute in the network booking behavior property User IP, is stored in the User IP of the abnormal behaviour attribute blacklist and is isolated.It is determined that the various threshold values of identification ox, are Differentiate that ox provides data basis for estimation.The behavioural characteristic (frequency and track) of real time record user, can block for real-time ox Cut and real-time foundation is provided;Blacklist can be set up, blacklist can be based on, ox is intercepted in advance, the distribution for making resource is more closed Rationality and fairness.
Description of the drawings
Fig. 1 is based on the reality of the method for the improper batch booking behavior of DBSCAN models for a kind of identification that the present invention is provided Apply the flow chart of example one.
Specific embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only The embodiment of a part of the invention, rather than the embodiment of whole.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model of present invention protection Enclose.
It should be noted that description and claims of this specification and the term " first " in above-mentioned accompanying drawing, " Two " it is etc. the object for distinguishing similar, without for describing specific order or precedence.It should be appreciated that so using Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating here or Order beyond those of description is implemented.Additionally, term " comprising " and " having " and their any deformation, it is intended that cover Lid is non-exclusive to be included, and for example, process, method, system, product or the equipment for containing series of steps or unit is not necessarily limited to Those steps clearly listed or unit, but may include clearly not list or for these processes, method, product Or intrinsic other steps of equipment or unit.
Embodiment one
The embodiment of the present invention one provides a kind of method of identification based on the improper batch booking behavior of DBSCAN models, As shown in figure 1, including:Step S110 to S140.
In step s 110, identification of the number-of-registration in predetermined amount of time higher than the number-of-registration of reference time section is monitored After threshold value, at least marked after all registration behaviors scanned based on density clustering algorithm in the pre-identification time period is obtained The login account cluster of individual high concentration.
In the step s 120, the network booking behavior of the login account cluster of at least one high concentration of the mark is remembered User IP, Cookie and access agent environment Agents Hash in record turns to a globally unique coded string IP+ Cookie+Agent, forms unique subscriber identification.
In step s 130, web-based history booking behavior record and the real-time network booking behavior of the ID are extracted Network booking behavior property in record.
In step S140, the User IP of the abnormal behaviour attribute in the network booking behavior property is recognized, will be described The User IP of abnormal behaviour attribute is stored in blacklist is isolated.
Further, in the booking behavior record by network User IP, Cookie and access agent environment Agents are breathed out It is uncommon to turn to a globally unique coded string IP+Cookie+Agent, unique subscriber identification is formed, including:
The User IP in network booking behavior record, Cookie and access agent environment Agents are breathed out by hash function It is uncommon to turn to a globally unique coded string IP+Cookie+Agent, form unique subscriber identification.
Further, the User IP of the abnormal behaviour attribute in the identification network booking behavior property, will be described The User IP of abnormal behaviour attribute is stored in blacklist is isolated, including:
The frequency threshold values and blacklist in the network booking behavior property is recognized, the frequency threshold values includes but do not limit In:Different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+ Cookie+agent accesses one or more in the frequency of difference url;
The User IP of abnormal behaviour is identified by the frequency threshold values and blacklist, the User IP for identifying is deposited Enter blacklist to be isolated.
Further, the web-based history booking behavior record for extracting the ID and real-time network booking behavior Network booking behavior property in record, including:
Extract the history frequency threshold values and history blacklist in the historical behavior record in the ID;
Extract and there is potential buying behavior exception threshold values in the historical trading behavior record in the ID and surpass Go out the blacklist of the abnormal register user of buying behavior exception threshold values;
Active user's visitation frequency and path in the current accessed behavior record of Real-time Collection ID.
Further, the history frequency threshold values in the historical behavior record extracted in the ID, including with Lower step:
Log file contents during historical behavior is recorded are loaded in big data number storehouse Hive, and daily record is set up in Hive Document formatting tables of data, log file contents are formatted in tables of data;
In tables of data, visitation frequency is calculated, and result of calculation is stored in big data number storehouse;The visitation frequency includes But it is not limited to different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+ Cookie+agent accesses one or more in the frequency of difference url;
Using histogram, observation frequency distribution, self-defined determination history frequency threshold values simultaneously stores the history frequency threshold values.
Further, the history blacklist in the historical behavior record extracted in the ID, including it is following Step:
The user access logses file of the previous day on server different in ngnix server proxy clusters is focused on On distributed memory system HDFS;
Log file contents are loaded in big data number storehouse Hive, log file formats data are set up in Hive Table, log file contents are formatted in tables of data;
In Hive, difference IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent are calculated Visitation frequency, IP+cookie+agent accesses the frequency of difference url;Result of calculation is stored in big data number storehouse;Using Nogata Figure, observation frequency distribution, self-defined determination history frequency threshold values;
Based on a determination that history frequency threshold values and frequency result of calculation, identify abnormal client, be deposited into blacklist table In.
Further, there is potential buying behavior in the historical trading behavior record extracted in the ID Abnormal threshold values, comprises the following steps:
Importing historical trading behavior record is in data warehouse;
Calculate the single game time booking number of each User IP, bought item number, average booking number;
Using histogram, observation single game time booking number, bought item number, average booking number distribution, according to custom rule There is potential buying behavior exception threshold values in analysis determination, and store and described there is potential buying behavior exception threshold values.
Further, beyond buying behavior exception valve in the historical trading behavior record extracted in the ID The blacklist of the abnormal register user of value, comprises the following steps:
Importing the previous day All Activity record and the transaction record of at least a year are in data warehouse;
Calculate the single game time booking number within each User IP 1 year, bought item number, average booking number;Using Nogata Figure, observation single game time booking number, bought item number, average booking number distribution exists potential according to custom rule analysis determination Buying behavior exception threshold values;
Based on a determination that potential buying behavior exception threshold values and frequency result of calculation, identify beyond buying behavior exception The abnormal register user of threshold values, in being deposited into blacklist table.
Further, the active user's visitation frequency in the current accessed behavior record of the Real-time Collection ID and Abnormal access path, including:
The access log file of nginx is read in real time and is sent to log processing system;
The daily record that log processing system real-time reception Log Collect System sends, with one second as a calculation window, meter IP visitation frequencies are calculated, the frequency of the access url of each IP, IP+cookie+agent visitation frequencies, IP+cookie+agent is visited The frequency and the abnormal access path of url are asked, and result of calculation is stored in caching.
Further, the calculation of the recognition threshold includes:
Wherein, the ratio that α uprushes for data, P1 be the pre-identification time period in number-of-registration, n be the pre-identification time period it A front continuous base, n+m continuous multiple bases before being the pre-identification time period, Pi is reference time section Number-of-registration, number-of-registration maximums of the Pmax before being the pre-identification time period in continuous multiple bases, Pmin is pre- Number-of-registration minimum of a value before recognition time section in continuous multiple bases.
The embodiment of the present invention is by the number-of-registration in monitoring predetermined amount of time higher than the number-of-registration of reference time section After recognition threshold, obtain and marked extremely after all registration behaviors scanned based on density clustering algorithm in the pre-identification time period The login account cluster of a few high concentration;By the network booking of the login account cluster of at least one high concentration of the mark User IP, Cookie and access agent environment Agents Hash in behavior record turns to a globally unique coded string IP+Cookie+Agent, forms unique subscriber identification;Extract the web-based history booking behavior record and in real time of the ID Network booking behavior property in network booking behavior record;Recognize the abnormal behaviour attribute in the network booking behavior property User IP, the User IP of the abnormal behaviour attribute is stored in into blacklist and is isolated.It is determined that the various threshold values of identification ox, To differentiate that ox provides data basis for estimation.The behavioural characteristic (frequency and track) of real time record user, can be real-time ox Intercept and real-time foundation is provided;Blacklist can be set up, blacklist can be based on, ox is intercepted in advance, the distribution for making resource has more Reasonability and fairness.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention do not limited by described sequence of movement because According to the present invention, some steps can adopt other orders or while carry out.Secondly, those skilled in the art also should know Know, embodiment described in this description belongs to preferred embodiment, involved action and module is not necessarily of the invention It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, without the portion described in detail in certain embodiment Point, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed device, can be by another way Realize.For example, device embodiment described above is only schematic, such as division of described unit, is only one kind Division of logic function, can there is an other dividing mode when actually realizing, such as multiple units or component can with reference to or can To be integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed each other Coupling or direct-coupling or communication connection can be INDIRECT COUPLING or communication connection by some interfaces, device or unit, Can be electrical or other forms.
The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can according to the actual needs be selected to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
It may be noted that according to the needs implemented, each step/part described in this application can be split as into more multistep Suddenly/part, also can be combined into new step/part by the part operation of two or more step/parts or step/part, To realize the purpose of the present invention.
Above-mentioned the method according to the invention can be realized in hardware, firmware, or be implemented as being storable in recording medium Software or computer code in (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk), or it is implemented through network download Original storage is in long-range recording medium or nonvolatile machine readable media and will be stored in the meter in local recording medium Calculation machine code, so as to method described here can be stored in using all-purpose computer, application specific processor or programmable or special With the such software processing in the recording medium of hardware (such as ASIC or FPGA).It is appreciated that computer, processor, micro- Processor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example, RAM, ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize here The processing method of description.Additionally, when all-purpose computer accesses the code of the process being shown in which for realization, the execution of code All-purpose computer is converted to into the special-purpose computer for performing the process being shown in which.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by the scope of the claims.

Claims (10)

1. a kind of method that identification is based on the improper batch booking behavior of DBSCAN models, it is characterised in that include:
After recognition threshold of the number-of-registration in monitoring predetermined amount of time higher than the number-of-registration of reference time section, obtain based on close Degree clustering algorithm scans the registration of at least one high concentration marked after all registration behaviors in the pre-identification time period Account cluster;
By the User IP in the network booking behavior record of the login account cluster of at least one high concentration of the mark, Cookie and access agent environment Agents Hash turn to a globally unique coded string IP+Cookie+Agent, are formed Unique subscriber identification;
Extract the network booking in the web-based history booking behavior record and real-time network booking behavior record of the ID Behavior property;
The User IP of the abnormal behaviour attribute in the network booking behavior property is recognized, by the user of the abnormal behaviour attribute IP is stored in blacklist to be isolated.
2. the method for claim 1, it is characterised in that User IP in the booking behavior record by network, Cookie and access agent environment Agents Hash turn to a globally unique coded string IP+Cookie+Agent, are formed Unique subscriber identification, including:
By hash function by the User IP in network booking behavior record, Cookie and access agent environment Agents hashed For a globally unique coded string IP+Cookie+Agent, unique subscriber identification is formed.
3. method as claimed in claim 1 or 2, it is characterised in that different in the identification network booking behavior property Often the User IP of behavior property, is stored in the User IP of the abnormal behaviour attribute blacklist and is isolated, including:
The frequency threshold values and blacklist in the network booking behavior property is recognized, the frequency threshold values is included but is not limited to:No With IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+cookie+ Agent accesses one or more in the frequency of difference url;
The User IP of abnormal behaviour is identified by the frequency threshold values and blacklist, the User IP for identifying is stored in black List is isolated.
4. the method as described in one of claim 1-3, it is characterised in that the web-based history purchase of the extraction ID Network booking behavior property in ticket behavior record and real-time network booking behavior record, including:
Extract the history frequency threshold values and history blacklist in the historical behavior record in the ID;
Extract and there is potential buying behavior exception threshold values in the historical trading behavior record in the ID and beyond purchase Buy the blacklist of the abnormal register user of abnormal behavior threshold values;
Active user's visitation frequency and path in the current accessed behavior record of Real-time Collection ID.
5. method as claimed in claim 4, it is characterised in that in the historical behavior record in the extraction ID History frequency threshold values, comprise the following steps:
Log file contents during historical behavior is recorded are loaded in big data number storehouse Hive, and journal file is set up in Hive Format data table, log file contents are formatted in tables of data;
In tables of data, visitation frequency is calculated, and result of calculation is stored in big data number storehouse;The visitation frequency include but not It is limited to different IP visitation frequencies, the frequency of the access difference url of each IP, IP+cookie+agent visitation frequencies or IP+ Cookie+agent accesses one or more in the frequency of difference url;
Using histogram, observation frequency distribution, self-defined determination history frequency threshold values simultaneously stores the history frequency threshold values.
6. method as claimed in claim 4, it is characterised in that in the historical behavior record in the extraction ID History blacklist, comprise the following steps:
The user access logses file of the previous day on server different in ngnix server proxy clusters is focused on into distribution In formula storage system HDFS;
Log file contents are loaded in big data number storehouse Hive, log file formats tables of data is set up in Hive, will Log file contents are formatted in tables of data;
In Hive, difference IP visitation frequencies are calculated, the frequency of the access difference url of each IP, IP+cookie+agent is accessed The frequency, IP+cookie+agent accesses the frequency of difference url;Result of calculation is stored in big data number storehouse;Using histogram, Observation frequency distribution, self-defined determination history frequency threshold values;
Based on a determination that history frequency threshold values and frequency result of calculation, abnormal client is identified, in being deposited into blacklist table.
7. method as claimed in claim 4, it is characterised in that the historical trading behavior note in the extraction ID There is potential buying behavior exception threshold values in record, comprise the following steps:
Importing historical trading behavior record is in data warehouse;
Calculate the single game time booking number of each User IP, bought item number, average booking number;
Using histogram, observation single game time booking number, bought item number, average booking number distribution, analyzed according to custom rule It is determined that there is potential buying behavior exception threshold values, and store and described there is potential buying behavior exception threshold values.
8. method as claimed in claim 4, it is characterised in that the historical trading behavior note in the extraction ID Exceed the blacklist of the abnormal register user of buying behavior exception threshold values in record, comprise the following steps:
Importing the previous day All Activity record and the transaction record of at least a year are in data warehouse;
Calculate the single game time booking number within each User IP 1 year, bought item number, average booking number;Using histogram, see Single game time booking number is examined, bought item number, average booking number distribution determines there is potential purchase according to custom rule analysis Abnormal behavior threshold values;
Based on a determination that potential buying behavior exception threshold values and frequency result of calculation, identify beyond buying behavior exception threshold values Abnormal register user, in being deposited into blacklist table.
9. method as claimed in claim 4, it is characterised in that the current accessed behavior record of the Real-time Collection ID In active user's visitation frequency and abnormal access path, including:
The access log file of nginx is read in real time and is sent to log processing system;
The daily record that log processing system real-time reception Log Collect System sends, with one second as a calculation window, calculates IP Visitation frequency, the frequency of the access url of each IP, IP+cookie+agent visitation frequencies, IP+cookie+agent accesses url The frequency and abnormal access path, and by result of calculation store in caching.
10. the method for claim 1, it is characterised in that the calculation of the recognition threshold includes:
P 1 > α ( Σ n n + m P i - P m a x - P m i n ) ;
Wherein, the ratio that α uprushes for data, P1 is the number-of-registration in the pre-identification time period, and n connects before being the pre-identification time period A continuous base, n+m continuous multiple bases before being the pre-identification time period, Pi is the registration of reference time section Quantity, number-of-registration maximums of the Pmax before being the pre-identification time period in continuous multiple bases, Pmin is pre-identification Number-of-registration minimum of a value before time period in continuous multiple bases.
CN201611019839.5A 2016-11-18 2016-11-18 Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model Pending CN106657007A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611019839.5A CN106657007A (en) 2016-11-18 2016-11-18 Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611019839.5A CN106657007A (en) 2016-11-18 2016-11-18 Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model

Publications (1)

Publication Number Publication Date
CN106657007A true CN106657007A (en) 2017-05-10

Family

ID=58808057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611019839.5A Pending CN106657007A (en) 2016-11-18 2016-11-18 Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model

Country Status (1)

Country Link
CN (1) CN106657007A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI638319B (en) * 2017-08-25 2018-10-11 拓元股份有限公司 Internet ticketing system
CN108900478A (en) * 2018-06-11 2018-11-27 阿里巴巴集团控股有限公司 The detection method and device of unusual fluctuation attack, safety protection equipment
CN109685536A (en) * 2017-10-18 2019-04-26 北京京东尚科信息技术有限公司 Method and apparatus for output information
CN109949069A (en) * 2019-01-28 2019-06-28 平安科技(深圳)有限公司 Suspicious user screening technique, device, computer equipment and storage medium
CN110322573A (en) * 2018-03-30 2019-10-11 北京红马传媒文化发展有限公司 User authentication method, user authentication device and electronic equipment
CN110322028A (en) * 2018-03-29 2019-10-11 北京红马传媒文化发展有限公司 Method for managing resource, device and electronic equipment
CN110675228A (en) * 2019-09-27 2020-01-10 支付宝(杭州)信息技术有限公司 User ticket buying behavior detection method and device
CN111723655A (en) * 2020-05-12 2020-09-29 五八有限公司 Face image processing method, device, server, terminal, equipment and medium
CN111860644A (en) * 2020-07-20 2020-10-30 北京百度网讯科技有限公司 Abnormal account identification method, device, equipment and storage medium
CN111899856A (en) * 2020-07-25 2020-11-06 广州海鹚网络科技有限公司 Risk control method, device, equipment and storage medium for hospital registration
CN111984634A (en) * 2019-05-22 2020-11-24 ***通信集团山西有限公司 Alarm transaction extraction method, device, equipment and computer storage medium
CN112364347A (en) * 2020-11-19 2021-02-12 全知科技(杭州)有限责任公司 High-performance computing method for identifying high-frequency data access and operation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791255A (en) * 2014-12-23 2016-07-20 阿里巴巴集团控股有限公司 Method and system for identifying computer risks based on account clustering
CN105808988A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Method and device for identifying exceptional account
CN105956911A (en) * 2016-05-23 2016-09-21 北京小米移动软件有限公司 Purchase request processing method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791255A (en) * 2014-12-23 2016-07-20 阿里巴巴集团控股有限公司 Method and system for identifying computer risks based on account clustering
CN105808988A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Method and device for identifying exceptional account
CN105956911A (en) * 2016-05-23 2016-09-21 北京小米移动软件有限公司 Purchase request processing method and device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI638319B (en) * 2017-08-25 2018-10-11 拓元股份有限公司 Internet ticketing system
CN109685536A (en) * 2017-10-18 2019-04-26 北京京东尚科信息技术有限公司 Method and apparatus for output information
CN110322028A (en) * 2018-03-29 2019-10-11 北京红马传媒文化发展有限公司 Method for managing resource, device and electronic equipment
CN110322573A (en) * 2018-03-30 2019-10-11 北京红马传媒文化发展有限公司 User authentication method, user authentication device and electronic equipment
CN108900478A (en) * 2018-06-11 2018-11-27 阿里巴巴集团控股有限公司 The detection method and device of unusual fluctuation attack, safety protection equipment
CN109949069A (en) * 2019-01-28 2019-06-28 平安科技(深圳)有限公司 Suspicious user screening technique, device, computer equipment and storage medium
WO2020155508A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Suspicious user screening method and apparatus, computer device and storage medium
CN111984634A (en) * 2019-05-22 2020-11-24 ***通信集团山西有限公司 Alarm transaction extraction method, device, equipment and computer storage medium
CN111984634B (en) * 2019-05-22 2023-07-21 ***通信集团山西有限公司 Alarm transaction extraction method, device, equipment and computer storage medium
CN110675228A (en) * 2019-09-27 2020-01-10 支付宝(杭州)信息技术有限公司 User ticket buying behavior detection method and device
CN111723655A (en) * 2020-05-12 2020-09-29 五八有限公司 Face image processing method, device, server, terminal, equipment and medium
CN111723655B (en) * 2020-05-12 2024-03-08 五八有限公司 Face image processing method, device, server, terminal, equipment and medium
CN111860644A (en) * 2020-07-20 2020-10-30 北京百度网讯科技有限公司 Abnormal account identification method, device, equipment and storage medium
CN111899856A (en) * 2020-07-25 2020-11-06 广州海鹚网络科技有限公司 Risk control method, device, equipment and storage medium for hospital registration
CN112364347A (en) * 2020-11-19 2021-02-12 全知科技(杭州)有限责任公司 High-performance computing method for identifying high-frequency data access and operation

Similar Documents

Publication Publication Date Title
CN106657007A (en) Method for recognizing abnormal batch ticket booking behavior based on DBSCAN model
CN106453357A (en) Network ticket buying abnormal behavior recognition method and system and equipment
CN103795612B (en) Rubbish and illegal information detecting method in instant messaging
CN103793484B (en) The fraud identifying system based on machine learning in classification information website
CN108809745A (en) A kind of user's anomaly detection method, apparatus and system
CN108170580A (en) A kind of rule-based log alarming method, apparatus and system
CN102629904A (en) Detection and determination method of network navy
CN110648172B (en) Identity recognition method and system integrating multiple mobile devices
WO2022247955A1 (en) Abnormal account identification method, apparatus and device, and storage medium
CN108647730A (en) A kind of data partition method and system based on historical behavior co-occurrence
CN107977855B (en) Method and device for managing user information
CN113902534A (en) Interactive risk group identification method based on stock community relation map
Liu et al. SDHM: A hybrid model for spammer detection in Weibo
CN114140248A (en) AI artificial intelligence technology-based abnormal transaction identification method
CN110457601B (en) Social account identification method and device, storage medium and electronic device
CN114187036A (en) Internet advertisement intelligent recommendation management system based on behavior characteristic recognition
CN109919667A (en) A kind of method and apparatus of the IP of enterprise for identification
CN117875501A (en) Social media user behavior prediction system and method based on big data
CN115378619A (en) Sensitive data access method, electronic equipment and computer readable storage medium
CN112199388A (en) Strange call identification method and device, electronic equipment and storage medium
CN116402546A (en) Store risk attribution method and device, equipment, medium and product thereof
CN107705135A (en) A kind of method that potential commercial value is evaluated based on company's storage contact data
Amat Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
CN114385899A (en) User group accurate identification system and method based on big data analysis
CN113743838A (en) Target user identification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510