CN109600344B - Method and device for identifying risk group and electronic equipment - Google Patents

Method and device for identifying risk group and electronic equipment Download PDF

Info

Publication number
CN109600344B
CN109600344B CN201710937630.5A CN201710937630A CN109600344B CN 109600344 B CN109600344 B CN 109600344B CN 201710937630 A CN201710937630 A CN 201710937630A CN 109600344 B CN109600344 B CN 109600344B
Authority
CN
China
Prior art keywords
communication
group
number combination
combination
numbers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710937630.5A
Other languages
Chinese (zh)
Other versions
CN109600344A (en
Inventor
刘站奇
李健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710937630.5A priority Critical patent/CN109600344B/en
Publication of CN109600344A publication Critical patent/CN109600344A/en
Application granted granted Critical
Publication of CN109600344B publication Critical patent/CN109600344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention discloses a method and a device for identifying a risk group and electronic equipment. The method comprises the following steps: acquiring historical behavior data corresponding to the communication number; adding at least two communication numbers with at least one time of similar network behaviors in the historical behavior data to a number combination; respectively calculating the association weight corresponding to each number combination; clustering all communication numbers in each number combination according to the association weight respectively corresponding to each number combination to obtain at least one group; identifying risk groups based on the number of risk numbers in the groups. In the embodiment of the invention, the group is determined by comprehensively considering the data of multiple dimensions such as the IP address, the request time, the number characteristic and the like, and the risk group is further identified according to the number of the risk numbers in the group, so that the accuracy rate of determining the group is higher and the accuracy rate of identifying the risk group is higher.

Description

Method and device for identifying risk group and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of data analysis, in particular to a method and a device for identifying risk groups and electronic equipment.
Background
At present, an internet platform attracts potential users by means of marketing activities, and part of lawbreakers participate in the marketing activities by means of batch registration of members, which can be called as risk groups and also called as "wool parties". To avoid wasting the resources needed to launch a marketing campaign, it is often necessary to identify risk groups.
In the related art, the method for identifying the risk group is as follows: the risk group is identified by counting the number of registered members in a certain dimension (such as the same IP (Internet Protocol) address and the same time period). For example, if the number of members registered under the same IP address exceeds a first threshold, it is determined that the members registered under the same IP address belong to the same risk group. For another example, if the number of registered members in the same time period exceeds the second threshold, it is determined that the registered members in the same time period belong to the same risk group.
The method for identifying the risk group provided by the related technology has lower accuracy in identifying the risk group.
Disclosure of Invention
The embodiment of the invention provides a method and a device for identifying a risk group and electronic equipment, which are used for solving the problem of low accuracy rate of identifying the risk group in the related technology. The technical scheme is as follows:
in a first aspect, there is provided a method of identifying a risk group, the method comprising:
obtaining historical behavior data corresponding to a communication number, wherein the historical behavior data comprises a plurality of historical behavior records, and each historical behavior record comprises: an Internet Protocol (IP) address used by the communication number when executing the network behavior and a request moment of the communication number for executing the network behavior;
adding at least two communication numbers with at least one time of similar network behaviors in the historical behavior data to a number combination, wherein the similar network behaviors refer to network behaviors which use the same IP address and are in the same time period at the request moment;
respectively calculating the association weight corresponding to each number combination, wherein the association weight corresponding to the number combination is used for representing the association degree between the communication numbers in the number combination;
clustering all communication numbers in each number combination according to the association weight respectively corresponding to each number combination to obtain at least one group;
identifying a risk group based on the number of risk numbers in the group.
Optionally, the respectively calculating the association weight corresponding to each number combination includes:
respectively calculating characteristic values corresponding to the number combinations, wherein the characteristic values comprise at least one of a first characteristic value, a second characteristic value and a third characteristic value; the first characteristic value corresponding to the number combination is used for representing the times of the similar network behaviors of the communication numbers in the number combination, the second characteristic value corresponding to the number combination is used for representing the similarity between the communication numbers in the number combination, and the third characteristic value corresponding to the number combination is used for representing the type information of the IP address used by the communication numbers in the number combination when the similar network behaviors are executed each time;
and determining the associated weight corresponding to each number combination according to the characteristic value corresponding to each number combination.
Optionally, when the feature value includes the second feature value, the calculating the feature value corresponding to each combination of numbers respectively includes:
for each number combination, acquiring number characteristic values respectively corresponding to communication numbers in the number combination, wherein the number characteristic values comprise at least one of call characteristic values, binding characteristic values and active characteristic values; the call characteristic value is obtained according to the call behavior quantification corresponding to the communication number, the binding characteristic value is obtained according to the binding behavior quantification corresponding to the communication number, and the activity characteristic value is obtained according to the activity quantification corresponding to the application program bound by the communication number;
and calculating a second characteristic value corresponding to the number combination according to the number characteristic values respectively corresponding to the communication numbers in the number combination.
Optionally, when the feature value includes the third feature value, the calculating the feature value corresponding to each combination of numbers respectively includes:
for each number combination, acquiring the type of an IP address used by the communication number in the number combination when the similar network behavior is executed each time;
and determining a third characteristic value corresponding to the number combination according to the using number of the IP addresses of the specified type.
Optionally, when the number combination includes two communication numbers, clustering all the communication numbers in each number combination according to the association weights respectively corresponding to each number combination to obtain at least one group, including:
constructing a group characteristic graph, wherein one node in the group characteristic graph represents one communication number included in each number combination, and a connecting line between two connected nodes in the group characteristic graph represents the association weight corresponding to the number combination formed by the communication numbers respectively corresponding to the two nodes;
adding different labels to each node in the group characteristic graph;
executing at least one round of updating process on the label of each node in the group characteristic graph, and updating the label of each node according to the labels of other nodes connected with the node in each round of updating process;
and when the execution of the at least one round of updating process is completed, adding the communication numbers corresponding to the nodes with the same label in the group feature map to the same group.
Optionally, when the number of the communication numbers in the number combination is greater than 2, the clustering all the communication numbers in each number combination according to the association weight corresponding to each number combination respectively to obtain at least one group includes:
when the associated weight corresponding to one number combination is larger than a first threshold, adding the communication numbers in the one number combination to the same group;
and/or the presence of a gas in the gas,
and if the association weight weights respectively corresponding to the plurality of number combinations are larger than a second threshold and the number of the same communication numbers of any two number combinations in the plurality of number combinations is larger than a third threshold, adding the communication numbers in the plurality of number combinations to the same group.
Optionally, after determining whether the group is a risk group according to the number of risk numbers in the group, the method further includes:
and adding communication numbers which are not recorded by the blacklist in the risk group to the blacklist.
In a second aspect, there is provided an apparatus for identifying a risk group, the apparatus comprising:
a data obtaining module, configured to obtain historical behavior data corresponding to a communication number, where the historical behavior data includes a plurality of historical behavior records, and each historical behavior record includes: an Internet Protocol (IP) address used by the communication number when the network behavior is executed and a request moment when the communication number executes the network behavior;
the combination extraction module is used for adding at least two communication numbers with at least one time of similar network behaviors in the historical behavior data to a number combination, wherein the similar network behaviors refer to network behaviors which use the same IP address and are in the same time period at the request moment;
the weight calculation module is used for calculating the association weight corresponding to each number combination respectively, wherein the association weight corresponding to the number combination is used for representing the association degree between the communication numbers in the number combination;
the clustering module is used for clustering all the communication numbers in each number combination according to the associated weight corresponding to each number combination to obtain at least one group;
and the group determination module is used for identifying the risk group according to the number of the risk numbers in the group.
Optionally, the weight calculating module includes:
a first calculating unit, configured to calculate feature values corresponding to the number combinations, where the feature values include at least one of a first feature value, a second feature value, and a third feature value; the first characteristic value corresponding to the number combination is used for representing the times of the similar network behaviors of the communication numbers in the number combination, the second characteristic value corresponding to the number combination is used for representing the similarity between the communication numbers in the number combination, and the third characteristic value corresponding to the number combination is used for representing the type information of the IP address used by the communication numbers in the number combination when the similar network behaviors are executed each time;
and the second calculating unit is used for determining the association weight corresponding to each number combination according to the characteristic value corresponding to each number combination.
Optionally, when the feature value includes the second feature value, the first calculating unit is configured to:
for each number combination, acquiring number characteristic values respectively corresponding to communication numbers in the number combination, wherein the number characteristic values comprise at least one of call characteristic values, binding characteristic values and active characteristic values; the call characteristic value is obtained according to the call behavior quantification corresponding to the communication number, the binding characteristic value is obtained according to the binding behavior quantification corresponding to the communication number, and the activity characteristic value is obtained according to the activity quantification corresponding to the application program bound by the communication number;
and calculating a second characteristic value corresponding to the number combination according to the number characteristic values respectively corresponding to the communication numbers in the number combination.
Optionally, when the feature value includes the third feature value, the first calculating unit is configured to:
for each number combination, acquiring the type of an IP address used by the communication number in the number combination when the similar network behavior is executed each time;
and determining a third characteristic value corresponding to the number combination according to the using number of the IP addresses of the specified type.
Optionally, when the combination of numbers includes two communication numbers, the clustering module includes:
a feature graph constructing unit, configured to construct a group feature graph, where a node in the group feature graph represents one communication number included in each number combination, and a connection line between two nodes connected in the group feature graph represents an association weight corresponding to a number combination composed of communication numbers respectively corresponding to the two nodes;
the label adding unit is used for adding different labels to each node in the group characteristic graph;
the label updating unit is used for executing at least one round of updating process on the label of each node in the group characteristic diagram, and in each round of updating process, the label of each node in the group characteristic diagram is updated according to the labels of other nodes connected with the node;
and the first clustering unit is used for adding the communication numbers corresponding to the nodes with the same label in the group characteristic diagram to the same group when the execution of the at least one round of updating process is completed.
Optionally, when the number of communication numbers in the number combination is greater than 2, the clustering module includes:
the second clustering unit is used for adding the communication numbers in one number combination to the same group when the association weight corresponding to the one number combination is greater than a first threshold;
and/or the presence of a gas in the gas,
and the third clustering unit is used for adding the communication numbers in the plurality of number combinations to the same group when the association weight weights respectively corresponding to the plurality of number combinations are larger than the second threshold and the number of the same communication numbers of any two number combinations in the plurality of number combinations is larger than the third threshold.
Optionally, the apparatus further comprises:
and the number adding module is used for adding the communication numbers which are not recorded by the blacklist in the risk group into the blacklist.
In a third aspect, there is provided an electronic device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of identifying a risk group according to the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method of identifying a risk group as described in the first aspect.
In a fifth aspect, there is provided a computer program product for performing the method of identifying a risk group of the first aspect when the computer program product is executed.
The technical scheme provided by the embodiment of the invention can bring the following beneficial effects:
the association weight between the communication numbers which execute the network behaviors at the same time interval and by adopting the same IP address is calculated, clustering is carried out according to the association weight, and the group is determined.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow diagram of a method of identifying a risk group provided by one embodiment of the invention;
FIG. 2 is a schematic view relating to the embodiment shown in FIG. 1;
FIG. 3 is a flow chart of a method of identifying a risk group provided by another embodiment of the present invention;
FIG. 4 is a schematic diagram of a population characteristic map provided by one embodiment of the present invention;
FIG. 5 is a schematic view relating to the embodiment shown in FIG. 3;
FIG. 6 is a block diagram of an apparatus for identifying risk groups provided by one embodiment of the present invention;
fig. 7 is a block diagram of an electronic device provided by an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The embodiment of the invention provides a method, a device and electronic equipment for identifying risk groups, wherein the groups are determined by calculating the association weight between communication numbers executing network behaviors at the same time interval by adopting the same IP address and clustering according to the association weight, and the groups are determined by comprehensively considering data of multiple dimensions such as the IP address, the request time, the number characteristic and the like, and are further identified according to the number of the risk numbers in the groups.
In the method provided by the embodiment of the invention, the execution main body of each step can be an electronic device with data analysis and processing capabilities. Optionally, the electronic device is a server. The server may be one server, a server cluster composed of a plurality of servers, or a cloud computing service center.
Referring to FIG. 1, a flow diagram of a method for identifying risk groups is shown according to one embodiment of the present invention. The method may comprise the steps of:
step 101, obtaining historical behavior data corresponding to the communication number.
The historical behavior data includes a plurality of historical behavior records. The historical behavior data may be collected by an electronic device that establishes a communication connection with the mobile terminal. For example, if it is required to identify whether a risk group or a risk number exists in all communication numbers corresponding to the designated application, the electronic device may be a background server corresponding to the designated application. Optionally, the historical behavior data is behavior data within a preset time period, and the preset time period may be set according to actual requirements, for example, the historical behavior data is behavior data within the last 7 days.
Each historical behavior record includes: an IP address used by the communication number when performing the network behavior and a request time at which the communication number performs the network behavior. And the network behavior is executed by the mobile terminal corresponding to the communication number. Optionally, the network action comprises at least one of a registration action and a transaction action. The registration behavior refers to registering a related account of the application program by using a communication number, and the transaction behavior is to complete a transaction through the registered account, for example, getting a coupon, getting a red packet, and the like.
In the embodiment of the invention, the related communication number refers to an identification number which is distributed by an operator for identifying the user identity and the mobile terminal identity. In the usual case, the communication number refers to a mobile phone number, i.e., a cell phone number. In other possible examples, the communication number is a number in an Instant Messaging (Instant Messaging) application.
The historical behavior data may be referenced in table-1 below.
TABLE-1
Time IP address Mobile phone number
2017-06-01 10.10.10.10 13600000001
2017-06-01 10.10.10.10 13600000002
2017-06-01 10.10.10.10 13600000003
2017-06-02 11.11.11.11 13600000001
2017-06-02 11.11.11.11 13600000002
2017-06-02 11.11.11.11 13600000004
Step 102, at least two communication numbers with at least one time of similar network behaviors in the historical behavior data are added to one number combination.
Similar network behavior refers to network behavior using the same IP address and with the same period of time at the time of the request.
The electronic equipment detects the communication numbers with similar network behaviors from the historical behavior data, and then extracts the number combination from the communication numbers with similar network behaviors. The communication numbers included in the number combination may be all or part of communication numbers with similar network behaviors, and the number of the communication numbers included in the number combination is not limited in the embodiment of the present invention.
Taking table-1 above as an example, when the communication numbers included in the number combinations are all communication numbers having similar network behavior, the number combinations extracted by the electronic device include (13600000001,13600000002,13600000003) and (13600000001,13600000002,13600000004). When the communication numbers included in the number combination are partial communication numbers having similar network behaviors, the number combination extracted by the electronic device includes (13600000001,13600000002), (13600000001,13600000003), (13600000002,13600000003), and (13600000001,13600000004).
And 103, respectively calculating the corresponding association weight of each number combination.
The association weight corresponding to the number combination is used for characterizing the association degree between the communication numbers in the number combination. The association weight corresponding to the number combination and the association degree between the communication numbers in the number combination are in positive correlation. That is, the greater the association weight corresponding to the number combination, the stronger the association degree between the communication numbers in the number combination; the smaller the association weight corresponding to a number combination is, the weaker the association degree between the communication numbers in the number combination is. The specific process of calculating the association weight corresponding to each combination of numbers will be described in the following embodiments.
And 104, clustering all the communication numbers in each number combination according to the association weight respectively corresponding to each number combination to obtain at least one group.
Clustering refers to the process of dividing a collection of physical or abstract objects into classes composed of similar objects. In the embodiment of the present invention, clustering refers to a process of dividing all the communication numbers included in each of the number combinations into a plurality of groups, each group includes a plurality of communication numbers, and the degree of association between the communication numbers is high. In the embodiment of the present invention, an algorithm used for clustering all communication numbers included in the n number combinations may be a Label Propagation Algorithm (LPA), an improved label propagation algorithm (SLPA), or an HANP algorithm, which is not limited in the embodiment of the present invention.
And 105, identifying risk groups according to the number of the risk numbers in the groups.
The risk number refers to a communication number recorded in the blacklist. Alternatively, the risk number refers to a communication number with little or no call behavior, and the communication number may be referred to as a "cat pool number". Optionally, the risk number refers to a communication number whose IP addresses used when performing the network behavior are all risk IPs.
Optionally, if the number of risk numbers in the population exceeds a preset threshold, the population is determined as a risk population. The preset threshold may be actually determined according to the number of the communication numbers included in the first group. Optionally, the electronic device determines the preset threshold to be 60% of the number of communication numbers included in the first group. In other possible examples, the preset threshold may be set manually. And if the number of the communication numbers included in the first group and belonging to the risk numbers is less than or equal to a preset threshold value, determining the first group as a safe group. After the risk group is determined, when the subsequent internet platform releases the marketing campaign, the participation of the communication numbers in the risk group can be forbidden, so that the waste of resources required by the marketing campaign is avoided.
In other possible examples, the population is determined to be a risk population if the number of security numbers in the population is less than a specified threshold. The security number may be a communication number that has been recorded in a white list.
Referring collectively to fig. 2, a schematic diagram is shown relating to the embodiment shown in fig. 1. The electronic equipment acquires historical behavior data in the last 7 days, analyzes the mobile phone number executing the network behavior in the last 7 days and the IP address used by the mobile phone number to execute the network behavior according to a preset algorithm (such as a label propagation algorithm), and determines a group. The subsequent electronic device may further determine whether all the mobile phone numbers included in the group are risk numbers according to the risk numbers recorded in the blacklist, and update the blacklist for recording the risk numbers and the blacklist for recording the risk IPs for querying. Wherein k is zero or a positive integer.
In summary, in the method provided in the embodiment of the present invention, the association weights between the communication numbers performing the network behavior at the same time interval and using the same IP address are calculated, and clustering is performed according to the association weights, so as to determine the group.
Referring to FIG. 3, a flow diagram illustrating a method for identifying risk groups according to one embodiment of the invention is shown. The method may include several steps as follows.
Step 301, obtaining historical behavior data corresponding to the communication number.
The historical behavior data includes a plurality of historical behavior records. Each historical behavior record of the plurality of historical behavior records comprises: an IP address used by the communication number when the network behavior is executed, and a request time at which the communication number executes the network behavior.
At step 302, at least two communication numbers having at least one similar network behavior in the historical behavior data are added to a combination of numbers.
Similar network behavior refers to network behavior using the same IP address and with the same period of time at the time of the request.
Step 303, respectively calculating feature values corresponding to each number combination, where the feature values include at least one of a first feature value, a second feature value, and a third feature value.
The first characteristic value corresponding to the number combination is used for representing the times of similar network behaviors of the communication numbers in the number combination.
Taking table-1 as an example, if the number combination (13600000001,13600000002) has the similar network behavior frequency of 2, the first characteristic value corresponding to the number combination (13600000001,13600000002) is 2; if the number of times of the similar network behaviors of the number combination (13600000001,13600000003) is 1, the first characteristic value corresponding to the number combination (13600000001,13600000003) is 1; if the number of times of the similar network behaviors of the number combination (13600000002,136000000003) is 1, the first characteristic value corresponding to the number combination (13600000002,136000000003) is 1; if the number of times of the similar network behavior of the number combination (13600000001,13600000004) is 1, the first eigenvalue corresponding to the number combination (13600000001,13600000004) is 1.
And the second characteristic value corresponding to the number combination is used for representing the similarity between the communication numbers in the number combination. The second characteristic value corresponding to the number combination has positive correlation with the similarity between the communication numbers in the number combination. That is, the greater the second characteristic value corresponding to the number combination is, the higher the similarity between the communication numbers in the number combination is; the smaller the second characteristic value corresponding to the number combination is, the lower the similarity between the communication numbers in the number combination is.
Optionally, the second feature value corresponding to the number combination may be calculated by the following sub-steps:
303a, for each number combination, obtaining number characteristic values respectively corresponding to the communication numbers in the number combination, wherein the number characteristic values comprise at least one of a call characteristic value, a binding characteristic value and an active characteristic value;
the call characteristic value is obtained according to the call behavior quantification corresponding to the communication number. The call behavior corresponding to the communication number includes the number of calls, the call duration, and the like. The call characteristic value corresponding to the communication number is in positive correlation with the call times and the call duration of the communication number. The more the number of calls corresponding to the communication number is, the higher the call characteristic value corresponding to the communication number is; the smaller the number of times of call corresponding to the communication number, the lower the call characteristic value corresponding to the communication number. The longer the call duration corresponding to the communication number is, the higher the call characteristic value corresponding to the communication number is; the shorter the number of times of call corresponding to the communication number is, the lower the call characteristic value corresponding to the communication number is.
Specifically, for each number combination, the electronic device obtains data such as call times and call duration corresponding to the call numbers in the number combination, and quantizes the data according to the data to obtain call characteristic values corresponding to the call numbers in the number combination. For example, the number 13600000001 has 6 calls, a total call duration of 31 minutes, and the electronic device quantifies the call characteristic value for the number 13600000001 to be 0.8. For another example, the number 13600000002 has 2 calls, a total call duration of 3 minutes, and a call characteristic value quantified for the electronic device as number 13600000002 is 0.1.
The binding characteristic value is obtained according to the binding behavior quantification corresponding to the communication number. The binding behavior corresponding to the communication number includes whether the communication number binds to the application program, the number of the application programs bound to the communication number, and the like. The binding characteristic value corresponding to the communication number of the unbound application should be smaller than the call characteristic value corresponding to the communication number of the bound application. For the communication numbers all bound with the application programs, the binding characteristic value corresponding to the communication number is in positive correlation with the number of the application programs bound with the communication number. The more the number of the application programs bound by the communication number is, the lower the binding characteristic value corresponding to the communication number is; the smaller the number of applications bound to the communication number is, the higher the binding characteristic value corresponding to the communication number is.
Specifically, for each number combination, the electronic device obtains data such as whether the call numbers in the number combination are bound with the application programs and the number of the bound application programs, and quantizes according to the data to obtain the binding characteristic values corresponding to the call numbers in the number combination. For example, the number 13600000001 binds 13 applications, and the electronic device quantifies a binding feature value of 0.7 for the number 13600000001. For another example, the number 13600000002 binds 2 applications, and the electronic device quantifies a binding feature value of 0.1 for the number 13600000002.
The active characteristic value is obtained according to the activity degree quantification corresponding to the application program bound by the communication number. The activeness corresponding to the application program bound by the communication number is in positive correlation with the active characteristic value. That is, the greater the activity corresponding to the application program bound by the communication number is, the greater the activity characteristic value corresponding to the communication number is, the smaller the activity corresponding to the application program bound by the communication number is, and the smaller the activity characteristic value corresponding to the communication number is.
The activity corresponding to the application program bound by the communication number can be measured by the number of times that the user logs in the client of the application program, and the greater the number of times that the user logs in the client of the application program, the greater the activity corresponding to the application program bound by the communication number. When the application program is a social application program, the activity may also be measured by the number of sessions between the user and other users through the client of the application program, and the greater the number of sessions between the user and other users through the client of the application program, the greater the activity corresponding to the application program bound to the communication number. When the application program is a shopping application program, the activity can also be measured by the number of times that the user purchases through the client of the application program, and the more the number of times that the user purchases through the client of the application program is, the greater the activity corresponding to the application program bound by the communication number is. The embodiment of the invention does not limit the way of measuring the activity corresponding to the application program bound by the communication number.
Specifically, for each number combination, the electronic device obtains data such as activity of the bound application programs corresponding to the call numbers in the number combination, and quantizes the data according to the data to obtain active characteristic values corresponding to the call numbers in the number combination. For example, the electronic device has an active feature value of 0.9 quantified for number 13600000001. As another example, the electronic device has a binding characteristic value of 0.2 for number 13600000002.
And step 303b, calculating a second characteristic value corresponding to the number combination according to the number characteristic values respectively corresponding to the communication numbers in the number combination.
Optionally, the electronic device calculates similarity between the communication numbers in the number combination according to the number characteristic values respectively corresponding to the communication numbers in the number combination, so as to obtain a second characteristic value corresponding to the number combination. The algorithm used for calculating the similarity between the communication numbers in the number combination may be euclidean Distance (edge measure), Jaccard Distance (Jaccard Distance), cosine similarity, and the like, which is not limited in the embodiment of the present invention.
Taking the algorithm as cosine similarity and the number combination (13600000001,13600000002) as an example, the call characteristic value, the binding characteristic value and the active characteristic value corresponding to the number 13600000001 are 0.8, 0.7 and 0.9 respectively, the call characteristic value, the binding characteristic value and the active characteristic value corresponding to the number 13600000002 are 0.1, 0.1 and 0.2 respectively, and then the second characteristic value corresponding to the number combination (13600000001,13600000002) is:
Figure GDA0002854120820000121
and the third characteristic value corresponding to the number combination is used for representing the type information of the IP address used by the communication number in the number combination when similar network behaviors are executed each time. Optionally, the type information of the IP address includes whether the IP address is an IP address of a specified type, a usage number of the IP address of the specified type, and the like. The IP address of the specified type may be set in advance, for example, the IP address of the specified type is a risk IP. The risk IP may be a proxy IP or a black-involved IP.
Alternatively, the third characteristic value corresponding to the number combination may be calculated by the following substeps.
Step 303c, for each number combination, obtaining the type of the IP address used by the communication number in the number combination when each time the similar network behavior is executed;
and step 303d, determining a third characteristic value corresponding to the number combination according to the use number of the IP addresses of the specified type.
Optionally, the electronic device directly determines the usage number of the IP addresses of the specified type as the third feature value corresponding to the number combination. For example, when the number combination (13600000001,13600000002) performs similar network behaviors, the IP addresses used are IP address 1 and IP address 2, respectively, wherein the IP address 1 is a proxy IP, the IP2 is a black-involved IP, and the IP1 and the IP2 are both risk IPs, the electronic device determines the number combination (13600000001,13600000002) to be 2.
And 304, determining the associated weight corresponding to each number combination according to the characteristic value corresponding to each number combination.
Optionally, the first characteristic value, the second characteristic value, and the third characteristic value corresponding to each number combination are summed to obtain the associated weight corresponding to each number combination. Taking the code combination (1360000001,13600000002) as an example, if the first characteristic value, the second characteristic value and the third characteristic value corresponding to the code combination (1360000001,13600000002) are 2, 0.46 and 2, respectively, then the association weight corresponding to the code combination (1360000001,13600000002) is 2+0.46+2 or 4.46.
And 305, constructing a group characteristic graph when the number combination comprises two communication numbers.
One node in the group feature map represents one communication number included in each number combination. A connecting line between two connected nodes in the group characteristic diagram represents the association weight corresponding to a number combination consisting of communication numbers respectively corresponding to the two nodes;
referring collectively to fig. 4, there is shown a schematic diagram of a population characteristic map provided by one embodiment of the present invention. The nodes 1 to 6 represent mobile phone numbers 1 to 6 respectively, the connecting lines between the nodes 1 and 2 indicate that the association weight corresponding to the number combination composed of the mobile phone number 1 and the mobile phone number 2 is 4.46, and the meaning indicated by the connecting lines between other nodes can be analogized in the same way.
And step 306, adding different labels to each node in the group feature graph.
Taking the group feature graph shown in fig. 4 as an example, the labels added to the nodes 1 to 6 by the electronic device are the groups 1 to 6, respectively.
And 307, executing at least one round of updating process on the label of each node in the group feature graph, and updating the label of each node according to the labels of other nodes connected with the node in each round of updating process for each node of the group feature graph.
The number of rounds of the update process may be actually determined according to the number of nodes included in the population characteristic map. The more the number of nodes included in the group characteristic graph is, the more the number of rounds of the updating process is; the fewer the number of nodes included in the population characteristic graph, the fewer the number of rounds of the update process. The other nodes connected to the node are nodes with a connection between them.
The process of step 307 above may be referred to as "tag propagation". Optionally, for each node of the group feature map, the label of the node is updated according to the associated weights corresponding to the number combinations composed of the respective nodes adjacent to the node and the communication numbers respectively corresponding to the node. Specifically, the electronic device selects any one node as a starting point of label propagation, then obtains association weights corresponding to number combinations composed of communication numbers respectively corresponding to each node adjacent to the node and the node, and updates the label of the node to the label of another node included in the number combination with the maximum association weight.
Taking the group characteristic diagram shown in fig. 4 as an example, the association weight corresponding to the combination of the communication numbers corresponding to the nodes 1 and 2 is 4.46, the association weight corresponding to the combination of the communication numbers corresponding to the nodes 1 and 3 is 2.15, and the association weight corresponding to the combination of the communication numbers corresponding to the nodes 1 and 4 is 1.75. And if the association weight corresponding to the number combination composed of the communication numbers respectively corresponding to the node 1 and the node 2 is the largest, the electronic equipment updates the label of the node 1 from the group 1 to the group 2.
And 308, when at least one round of updating process is completed, adding the communication numbers corresponding to the nodes with the same label in the group feature map to the same group.
Optionally, when the label of each node in the group feature map is no longer changed, at least one round of updating process is performed, and at this time, the electronic device adds the communication number corresponding to the node having the same label to the same group.
Taking the group characteristic diagram shown in fig. 4 as an example, when at least one round of updating process is completed, the labels of the node 1, the node 2, the node 4 and the node 5 are all the group 5, and the labels of the node 3 and the node 6 are all the group 6, the electronic device adds the mobile phone number 1, the mobile phone number 2, the mobile phone number 4 and the mobile phone number 5 corresponding to the node 1, the node 2, the node 4 and the node 5 to the same group, and adds the mobile phone number 3 and the mobile phone number 6 corresponding to the node 3 and the node 6 to the same group.
Step 309, identifying risk groups according to the number of risk numbers in the groups
The risk number refers to a communication number recorded in the blacklist.
And step 310, adding the communication numbers which are not recorded by the blacklist in the risk group to the blacklist.
Because the group is composed of communication numbers with high association degree, when most of the communication numbers in the group are risk numbers, the communication numbers in the group are all considered to be risk numbers, and the electronic equipment also adds the communication numbers which are not recorded by the blacklist in the risk group to the blacklist.
Optionally, the electronic device determines an IP address used by the communication numbers in the group to perform similar network behaviors as a risk IP, and adds the risk IP to a blacklist for recording the risk IP.
Reference is now made in conjunction with fig. 5, which shows a schematic diagram relating to the embodiment shown in fig. 3. Wherein, the mobile phone number 1, the mobile phone number 2 and the mobile phone number 3 all use the IP address 1 to execute the network behavior, the mobile phone number 1, the mobile phone number 2 and the mobile phone number 4 all use the IP address 2 to execute the network behavior, the number combination (the mobile phone number 1, the mobile phone number 2) is extracted according to the history behavior record, then the times (also called the first characteristic value) of the similar network behaviors of the mobile phone number 1 and the mobile phone number 2, the similarity (also called the second characteristic value) between the mobile phone number 1 and the mobile phone number 2, the IP characteristics (also called the third characteristic value) respectively corresponding to the IP address 1 and the IP address 2 are respectively calculated, the association weights corresponding to the number combination (the mobile phone number 1, the mobile phone number 2) are obtained by integrating the first characteristic value, the second characteristic value and the third characteristic value, and then a preset algorithm (such as a label propagation algorithm) is adopted for clustering, and finally determining a risk group according to the risk numbers recorded by the blacklist, further adding communication numbers which are not marked as the risk numbers in the risk group to the blacklist, and adding IP addresses used by the risk numbers for executing similar network behaviors to the blacklist for recording the risk IP.
In addition, when the number of the communication numbers included in the number combination is greater than 2, clustering all the communication numbers in each number combination according to the association weight respectively corresponding to each number combination to obtain at least one group may include the following two possible implementation manners.
In a possible implementation manner, if the association weight corresponding to one number combination is greater than the first threshold, the communication numbers included in one number combination are added to the same group. The first threshold may actually be determined based on the accuracy requirements for determining the population. If the precision requirement of the group-determining group is high, the first threshold is large; if the accuracy requirement of the group is determined to be low, the first threshold is low. For example, the first threshold is 7, that is, if the association weight corresponding to a certain number combination is greater than 7, the electronic device adds the communication numbers included in the number combination to the same group.
In another possible implementation manner, if the association weights respectively corresponding to the plurality of number combinations are greater than the second threshold, and the number of the same communication numbers of any two number combinations in the plurality of number combinations is greater than the third threshold, the communication numbers included in the plurality of number combinations are added to the same group. The second threshold and the third threshold may also be actually determined according to the accuracy requirements for determining the population. For example, if the second threshold is 5, the third threshold is 6, the association weights corresponding to the combination 1 and the combination 2 are 5.16 and 5.27, respectively, and the combination 1 and the combination 2 have 9 identical communication numbers, the electronic device adds the communication number included in the combination 1 and the communication number included in the combination 2 to the same group.
In summary, in the method provided in the embodiment of the present invention, the association weights between the communication numbers performing the network behavior at the same time interval and using the same IP address are calculated, and clustering is performed according to the association weights, so as to determine the group.
And the risk groups which are not marked as risk numbers are updated to the blacklist, so that the accuracy of the subsequent determination of the risk groups can be improved.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Referring to fig. 6, a block diagram of an apparatus for identifying risk groups according to an embodiment of the present invention is shown. The apparatus has functions of implementing the above method examples, and the functions may be implemented by hardware or by hardware executing corresponding software. The apparatus may include: a data acquisition module 601, a combination extraction module 602, a weight calculation module 603, a clustering module 604, and a population determination module 605.
A data obtaining module 601, configured to obtain historical behavior data corresponding to a communication number, where the historical behavior data includes a plurality of historical behavior records, and each historical behavior record includes: an IP address used by the communication number when performing a network action and a request time when the communication number performs the network action.
A combination extracting module 602, configured to add at least two communication numbers having at least one time of similar network behaviors in the historical behavior data to one number combination, where the similar network behaviors refer to network behaviors that use the same IP address and are in the same time period at the request time.
A weight calculating module 603, configured to calculate association weights corresponding to the number combinations respectively, where the association weights corresponding to the number combinations are used to represent association degrees between communication numbers in the number combinations.
A clustering module 604, configured to cluster all the communication numbers in each number combination according to the association weight corresponding to each number combination, so as to obtain at least one group.
A group determination module 605 for identifying a risk group according to the number of risk numbers in the group.
In an optional embodiment provided based on the embodiment shown in fig. 6, the weight calculating module 603 includes: a first calculation unit and a second calculation unit (not shown in the figure).
A first calculating unit, configured to calculate feature values corresponding to the number combinations, where the feature values include at least one of a first feature value, a second feature value, and a third feature value; the first characteristic value corresponding to the number combination is used for representing the times of the similar network behaviors of the communication numbers in the number combination, the second characteristic value corresponding to the number combination is used for representing the similarity between the communication numbers in the number combination, and the third characteristic value corresponding to the number combination is used for representing the type information of the IP address used by the communication numbers in the number combination when the similar network behaviors are executed each time.
And the second calculating unit is used for determining the association weight corresponding to each number combination according to the characteristic value corresponding to each number combination.
In another optional embodiment provided on the basis of the embodiment shown in fig. 6, when the feature value includes the second feature value, the first calculation unit is configured to:
for each number combination, acquiring number characteristic values respectively corresponding to communication numbers in the number combination, wherein the number characteristic values comprise at least one of call characteristic values, binding characteristic values and active characteristic values; the call characteristic value is obtained according to the call behavior quantification corresponding to the communication number, the binding characteristic value is obtained according to the binding behavior quantification corresponding to the communication number, and the activity characteristic value is obtained according to the activity quantification corresponding to the application program bound by the communication number;
and calculating a second characteristic value corresponding to the number combination according to the number characteristic values respectively corresponding to the communication numbers in the number combination.
In another optional embodiment provided on the basis of the embodiment shown in fig. 6, when the feature value includes the third feature value, the first calculation unit is configured to:
for each number combination, acquiring the type of an IP address used by the communication number in the number combination when the similar network behavior is executed each time;
and determining a third characteristic value corresponding to the number combination according to the using number of the IP addresses of the specified type.
In another alternative embodiment provided based on the embodiment shown in fig. 6, when the combination of numbers includes two communication numbers, the clustering module 604 includes: the device comprises a feature map building unit, a label adding unit, a label updating unit and a first clustering unit (not shown in the figure).
And the characteristic graph constructing unit is used for constructing a group characteristic graph, wherein one node in the group characteristic graph represents one communication number included in each number combination, and a connecting line between two connected nodes in the group characteristic graph represents the association weight corresponding to the number combination formed by the communication numbers respectively corresponding to the two nodes.
And the label adding unit is used for adding different labels to each node in the group feature graph.
And the label updating unit is used for executing at least one round of updating process on the label of each node in the group characteristic diagram, and in each round of updating process, updating the label of each node in the group characteristic diagram according to the labels of other nodes connected with the node.
And the first clustering unit is used for adding the communication numbers corresponding to the nodes with the same label in the group characteristic diagram to the same group when the execution of the at least one round of updating process is completed.
In another optional embodiment provided based on the embodiment shown in fig. 6, when the number of communication numbers in the number combination is greater than 2, the clustering module includes: second and/or third polymeric units (not shown).
And the second clustering unit is used for adding the communication numbers included in one number combination to the same group when the association weight corresponding to the one number combination is greater than the first threshold.
And the third clustering unit is used for adding the communication numbers in the plurality of number combinations to the same group if the association weight weights respectively corresponding to the plurality of number combinations are larger than the second threshold and the number of the same communication numbers of any two number combinations in the plurality of number combinations is larger than the third threshold.
In another optional embodiment provided based on the embodiment shown in fig. 6, the apparatus further comprises: a number adding module (not shown in the figure).
And the number adding module is used for adding the communication numbers which are not recorded by the blacklist in the risk group into the blacklist.
In summary, the apparatus provided in the embodiment of the present invention determines the group by calculating the association weight between the communication numbers performing the network behavior at the same time interval and using the same IP address, and performing clustering according to the association weight, and determines the group by comprehensively considering data of multiple dimensions, such as the IP address, the request time, and the number feature, to determine the group, and further identifies the risk group according to the number of the risk numbers in the group, so that the accuracy rate of determining the group is higher, and the accuracy rate of identifying the risk group is also higher.
Referring to fig. 7, a block diagram of an electronic device 700 according to another embodiment of the invention is shown. The electronic device 700 is configured to implement the method for identifying a risk group provided in the above-described embodiments.
The electronic device 700 includes a Central Processing Unit (CPU)701, a system memory 704 including a Random Access Memory (RAM)702 and a Read Only Memory (ROM)703, and a system bus 705 connecting the system memory 704 and the central processing unit 701. The electronic device 700 also includes a basic input/output system (I/O system) 706 that facilitates transfer of information between devices within the computer, and a mass storage electronic device 707 for storing an operating system 713, application programs 714, and other program modules 715.
The basic input/output system 706 includes a display 708 for displaying information and input electronics 709, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 708 and input electronics 709 are connected to the central processing unit 701 through an input output controller 710 connected to the system bus 705. The basic input/output system 706 may also include an input/output controller 710 for receiving and processing input from a number of other electronic devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 710 may also provide output to a display screen, a printer, or other type of output electronic device.
The mass storage electronics 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705. The mass storage electronic device 707 and its associated computer-readable media provide non-volatile storage for the electronic device 700. That is, the mass storage electronics 707 may include a computer-readable medium (not shown), such as a hard disk or CD-ROM drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage electronic devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 704 and mass storage electronics 707 described above may be collectively referred to as memory.
The electronic device 700 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present invention. That is, the electronic device 700 may be connected to the network 712 through the network interface unit 711 connected to the system bus 705, or may be connected to another type of network or a remote computer system (not shown) using the network interface unit 711.
The memory has stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor to implement the above-described method of identifying a risk group.
In an exemplary embodiment, a computer readable storage medium is further provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which is loaded and executed by a processor of an electronic device to implement the method of identifying a risk group in the above method embodiments.
Alternatively, the computer-readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. As used herein, the terms "first," "second," and the like, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The present invention is not limited to the above exemplary embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A method of identifying a risk group, the method comprising:
obtaining historical behavior data corresponding to a communication number, wherein the historical behavior data comprises a plurality of historical behavior records, and each historical behavior record comprises: an Internet Protocol (IP) address used by the communication number when executing the network behavior and a request time when the communication number executes the network behavior;
adding at least two communication numbers with at least one time of similar network behaviors in the historical behavior data to a number combination, wherein the similar network behaviors refer to network behaviors which use the same IP address and are in the same time period at the request moment;
respectively calculating characteristic values corresponding to the number combinations, wherein the characteristic values comprise at least one of a first characteristic value, a second characteristic value and a third characteristic value; the first characteristic value corresponding to the number combination is used for representing the times of the similar network behaviors of the communication numbers in the number combination, the second characteristic value corresponding to the number combination is used for representing the similarity between the communication numbers in the number combination, and the third characteristic value corresponding to the number combination is used for representing the type information of the IP address used by the communication numbers in the number combination when the similar network behaviors are executed each time;
determining an association weight corresponding to each number combination according to the characteristic value corresponding to each number combination, wherein the association weight corresponding to each number combination is used for representing the association degree between the communication numbers in the number combination;
clustering all communication numbers in each number combination according to the association weight respectively corresponding to each number combination to obtain at least one group;
identifying a risk group based on the number of risk numbers in the group.
2. The method according to claim 1, wherein when the feature value includes the second feature value, the calculating the feature value corresponding to each combination of the numbers respectively comprises:
for each number combination, acquiring number characteristic values respectively corresponding to communication numbers in the number combination, wherein the number characteristic values comprise at least one of call characteristic values, binding characteristic values and active characteristic values; the call characteristic value is obtained according to the call behavior quantification corresponding to the communication number, the binding characteristic value is obtained according to the binding behavior quantification corresponding to the communication number, and the activity characteristic value is obtained according to the activity quantification corresponding to the application program bound by the communication number;
and calculating a second characteristic value corresponding to the number combination according to the number characteristic values respectively corresponding to the communication numbers in the number combination.
3. The method according to claim 1, wherein when the feature value includes the third feature value, the calculating the feature value corresponding to each combination of the numbers respectively comprises:
for each number combination, acquiring the type of an IP address used by the communication number in the number combination when the similar network behavior is executed each time;
and determining a third characteristic value corresponding to the number combination according to the using number of the IP addresses of the specified type.
4. The method according to any one of claims 1 to 3, wherein when the number combination includes two communication numbers, the clustering all the communication numbers in each number combination according to the association weight respectively corresponding to each number combination to obtain at least one group comprises:
constructing a group characteristic graph, wherein one node in the group characteristic graph represents one communication number included in each number combination, and a connecting line between two connected nodes in the group characteristic graph represents the association weight corresponding to the number combination formed by the communication numbers respectively corresponding to the two nodes;
adding different labels to each node in the group characteristic graph;
executing at least one round of updating process on the label of each node in the group characteristic graph, and updating the label of each node according to the labels of other nodes connected with the node in each round of updating process;
and when the execution of the at least one round of updating process is completed, adding the communication numbers corresponding to the nodes with the same label in the group feature map to the same group.
5. The method according to any one of claims 1 to 3, wherein when the number of the communication numbers in the number combination is greater than 2, the clustering all the communication numbers in each number combination according to the association weight respectively corresponding to each number combination to obtain at least one group comprises:
if the associated weight corresponding to one number combination is larger than a first threshold, adding the communication numbers in the one number combination to the same group;
and/or the presence of a gas in the gas,
and if the association weight weights respectively corresponding to the plurality of number combinations are larger than a second threshold and the number of the same communication numbers of any two number combinations in the plurality of number combinations is larger than a third threshold, adding the communication numbers in the plurality of number combinations to the same group.
6. The method of any one of claims 1 to 3, wherein after identifying a risk group based on the number of risk numbers in the group, further comprising:
and adding the communication numbers which are not recorded by the blacklist in the risk group to the blacklist.
7. An apparatus for identifying an at risk group, the apparatus comprising:
a data obtaining module, configured to obtain historical behavior data corresponding to a communication number, where the historical behavior data includes a plurality of historical behavior records, and each historical behavior record includes: an IP address used by the communication number when executing the network behavior and a request moment of the communication number for executing the network behavior;
the combination extraction module is used for adding at least two communication numbers with at least one time of similar network behaviors in the historical behavior data to a number combination, wherein the similar network behaviors refer to network behaviors which use the same IP address and are in the same time period at the request moment;
the first calculation unit is used for calculating characteristic values corresponding to the number combinations respectively, wherein the characteristic values comprise at least one of a first characteristic value, a second characteristic value and a third characteristic value; the first characteristic value corresponding to the number combination is used for representing the times of the similar network behaviors of the communication numbers in the number combination, the second characteristic value corresponding to the number combination is used for representing the similarity between the communication numbers in the number combination, and the third characteristic value corresponding to the number combination is used for representing the type information of the IP address used by the communication numbers in the number combination when the similar network behaviors are executed each time;
a second calculating unit, configured to determine, according to the feature values respectively corresponding to the number combinations, association weights corresponding to the number combinations; the clustering module is used for clustering all the communication numbers in each number combination according to the associated weight corresponding to each number combination to obtain at least one group;
and the group determination module is used for identifying the risk group according to the number of the risk numbers in the group.
8. An electronic device, comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and wherein the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method of identifying a risk group according to any one of claims 1 to 6.
9. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of identifying a risk group according to any one of claims 1 to 6.
CN201710937630.5A 2017-09-30 2017-09-30 Method and device for identifying risk group and electronic equipment Active CN109600344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710937630.5A CN109600344B (en) 2017-09-30 2017-09-30 Method and device for identifying risk group and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710937630.5A CN109600344B (en) 2017-09-30 2017-09-30 Method and device for identifying risk group and electronic equipment

Publications (2)

Publication Number Publication Date
CN109600344A CN109600344A (en) 2019-04-09
CN109600344B true CN109600344B (en) 2021-03-23

Family

ID=65956849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710937630.5A Active CN109600344B (en) 2017-09-30 2017-09-30 Method and device for identifying risk group and electronic equipment

Country Status (1)

Country Link
CN (1) CN109600344B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225036B (en) * 2019-06-12 2022-03-22 北京奇艺世纪科技有限公司 Account detection method, device, server and storage medium
CN110166635B (en) * 2019-07-11 2021-06-08 中国联合网络通信集团有限公司 Suspicious terminal identification method and suspicious terminal identification system
CN112351441B (en) * 2019-08-06 2023-08-15 ***通信集团广东有限公司 Data processing method and device and electronic equipment
CN111245815B (en) * 2020-01-07 2022-09-09 同盾控股有限公司 Data processing method and device, storage medium and electronic equipment
CN111931047B (en) * 2020-07-31 2022-06-21 中国平安人寿保险股份有限公司 Artificial intelligence-based black product account detection method and related device
CN112615966B (en) * 2020-12-14 2023-04-14 南方电网海南数字电网研究院有限公司 Cat pool terminal identification method
CN113641970B (en) * 2021-08-16 2022-08-26 深圳竹云科技有限公司 Risk detection method and device and computing equipment
CN114221807B (en) * 2021-12-14 2024-07-05 平安付科技服务有限公司 Access request processing method, device, monitoring equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102413013A (en) * 2011-11-21 2012-04-11 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting abnormal network behavior
CN103577991A (en) * 2012-08-03 2014-02-12 阿里巴巴集团控股有限公司 User identification method and device
CN104933570A (en) * 2014-03-20 2015-09-23 阿里巴巴集团控股有限公司 User detection method and device
CN106157326A (en) * 2015-04-07 2016-11-23 中国科学院深圳先进技术研究院 Group abnormality behavioral value method and system
CN106339615A (en) * 2016-08-29 2017-01-18 北京红马传媒文化发展有限公司 Abnormal registration behavior recognition method, system and equipment
CN106919953A (en) * 2017-02-23 2017-07-04 北京工业大学 A kind of abnormal trip Stock discrimination method based on track traffic data analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9367872B1 (en) * 2014-12-22 2016-06-14 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102413013A (en) * 2011-11-21 2012-04-11 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting abnormal network behavior
CN103577991A (en) * 2012-08-03 2014-02-12 阿里巴巴集团控股有限公司 User identification method and device
CN104933570A (en) * 2014-03-20 2015-09-23 阿里巴巴集团控股有限公司 User detection method and device
CN106157326A (en) * 2015-04-07 2016-11-23 中国科学院深圳先进技术研究院 Group abnormality behavioral value method and system
CN106339615A (en) * 2016-08-29 2017-01-18 北京红马传媒文化发展有限公司 Abnormal registration behavior recognition method, system and equipment
CN106919953A (en) * 2017-02-23 2017-07-04 北京工业大学 A kind of abnormal trip Stock discrimination method based on track traffic data analysis

Also Published As

Publication number Publication date
CN109600344A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN109600344B (en) Method and device for identifying risk group and electronic equipment
CN109450771B (en) Method and device for adding friends, computer equipment and storage medium
CN110046929B (en) Fraudulent party identification method and device, readable storage medium and terminal equipment
CN105630977B (en) Application program recommended method, apparatus and system
CN108833453B (en) Method and device for determining application account
CN111371767B (en) Malicious account identification method, malicious account identification device, medium and electronic device
CN112613917A (en) Information pushing method, device and equipment based on user portrait and storage medium
CN111954173B (en) Method, device, server and computer readable storage medium for sending short message
US20130124448A1 (en) Method and system for selecting a target with respect to a behavior in a population of communicating entities
CN106713290B (en) Method for identifying main user account and server
CN108985954B (en) Method for establishing association relation of each identifier and related equipment
CN110166344B (en) Identity identification method, device and related equipment
CN110544109A (en) user portrait generation method and device, computer equipment and storage medium
CN113412607B (en) Content pushing method and device, mobile terminal and storage medium
CN111090807A (en) Knowledge graph-based user identification method and device
CN111400600A (en) Message pushing method, device, equipment and storage medium
CN107657357B (en) Data processing method and device
CN110009365A (en) User group's detection method, device and the equipment of improper transfer electronic asset
CN112887371A (en) Edge calculation method and device, computer equipment and storage medium
CN109450963B (en) Message pushing method and terminal equipment
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
CN112508630B (en) Abnormal conversation group detection method and device, computer equipment and storage medium
CN110737691B (en) Method and apparatus for processing access behavior data
CN110717653A (en) Risk identification method and device and electronic equipment
CN114285896B (en) Information pushing method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant