CN111586001B - Abnormal user identification method and device, electronic equipment and storage medium - Google Patents

Abnormal user identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111586001B
CN111586001B CN202010351557.5A CN202010351557A CN111586001B CN 111586001 B CN111586001 B CN 111586001B CN 202010351557 A CN202010351557 A CN 202010351557A CN 111586001 B CN111586001 B CN 111586001B
Authority
CN
China
Prior art keywords
user
users
similarity
abnormal
central
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010351557.5A
Other languages
Chinese (zh)
Other versions
CN111586001A (en
Inventor
王浩然
邵传贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd, China Mobile Communications Group Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN202010351557.5A priority Critical patent/CN111586001B/en
Publication of CN111586001A publication Critical patent/CN111586001A/en
Application granted granted Critical
Publication of CN111586001B publication Critical patent/CN111586001B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Power Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an abnormal user identification method, an abnormal user identification device, electronic equipment and a storage medium; the method comprises the following steps: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.

Description

Abnormal user identification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of network security, and in particular, to a method and an apparatus for identifying an abnormal user, an electronic device, and a storage medium.
Background
Abnormal login refers to login behavior that is significantly different from the user's daily habits. Since abnormal login is a common phenomenon of network intrusion behavior, a user with the abnormal login behavior is likely to be an implementer of the network intrusion behavior, and therefore, the identification of the abnormal login user is of great significance in the field of network security.
In the prior art, an abnormal login user is usually discovered according to the login times of the user, an IP address used during login and a device used during login. However, when a network attacker uses a decentralized IP or analog device to implement login behavior, the abnormal login user identification method in the prior art will be difficult to find the abnormal user.
In summary, the method for identifying an abnormal login user in the prior art is difficult to discover a hidden abnormal user, and the efficiency of discovering the abnormal user is low.
Disclosure of Invention
The embodiment of the invention provides an abnormal user identification method, an abnormal user identification device, electronic equipment and a storage medium, which are used for solving the defects that a hidden abnormal user is difficult to find by an abnormal login user identification method in the prior art and the efficiency of finding the abnormal user is low.
An embodiment of a first aspect of the present invention provides a method for identifying an abnormal user, including:
an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users;
a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users;
determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more;
and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
In the above technical solution, the similarity between the target users is a sum of similarities between any one of the target users and all users except the any one of the target users.
In the above technical solution, before the step of initially grouping, the method further includes:
calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;
and summing the similarity between any user in the target users and all the users except any user in the target users to obtain the sum of the similarity between any user in the target users and all the users except any user in the target users.
In the above technical solution, the determining the central user from the target users according to the similarity between the target users includes:
determining a first user according to the maximum similarity between the target users;
determining the first user as a central user;
determining a second user according to a minimum similarity value between a non-central user and the central user in the target users and a preset first threshold;
and determining the second user as a new central user, and returning to the step of determining the second user according to the minimum value of the similarity between the non-central user and the central user in the target users and a preset first threshold value, and repeating the steps until the number of the central users reaches a preset number threshold value.
In the above technical solution, the determining the second user according to the minimum value of the similarity between the non-central user of the target user and the central user and a preset first threshold includes:
calculating a minimum similarity value between a non-central user of the target user and the central user;
when the sum of the minimum similarity values between the first n non-central users of the target users and the central user is smaller than a first threshold value, and the sum of the minimum similarity values between the first n +1 non-central users of the target users and the central user is not smaller than the first threshold value, determining the n +1 th non-central user as the second user;
wherein the first threshold is a random value between 0 and a first similarity sum, and the first similarity sum is a similarity minimum sum between all non-central users in the target users and the central user; n is a natural number.
In the above technical solution, the calculating a similarity between any one of the target users and one of the target users other than the any user includes:
calculating the difference degree between a third user and a fourth user according to the login record of the third user and the login record of the fourth user;
the third user is any user in the target users, and the fourth user is one user except any user in the target users;
and calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user.
In the above technical solution, the following formula is adopted for calculating the difference between the third user and the fourth user according to the login record of the third user and the login record of the fourth user:
Figure BDA0002471989920000031
wherein d (i, j) represents a degree of difference between the third user and the fourth user;
when the similarity includes a similarity in a time dimension, a parameter H i A record representing that a third user i logged in within a first time period; h j A record representing that a fourth user j logged in within a first time period;
when the similarity includes a similarity on a user login platform, a parameter H i A record representing a third user i logged in on the first platform; h j A record representing a fourth user j logged in on the first platform;
when the similarity includes a similarity on a user login device, the parameter H i A record representing that a third user i is logged in on the first device; h j A record representing a fourth user j logged in on the first device;
the calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user adopts the following formula:
Figure BDA0002471989920000041
wherein sim (i, j) represents a similarity between the third user and the fourth user.
An embodiment of a second aspect of the present invention provides an abnormal user identification apparatus, including:
the preliminary grouping module is used for determining a central user from the target users according to the similarity between the target users and initially grouping the target users according to the similarity between the central user and the users except the central user in the target users;
a grouping adjustment module, configured to re-determine the central user according to a similarity between users in the initial grouping, and re-group the target users according to a similarity between the re-determined central user and a user other than the re-determined central user among the target users;
a grouping determining module for repeatedly executing the grouping adjusting step until the users contained in each group do not change any more;
and the abnormal user identification module is used for determining abnormal groups according to the number of the known abnormal users contained in each group and determining the users in the abnormal groups as the abnormal users.
In a third embodiment of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method for identifying an abnormal user according to the first embodiment are implemented.
A fourth aspect of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the abnormal user identification method according to the first aspect.
According to the abnormal user identification method, the abnormal user identification device, the electronic equipment and the storage medium, clustering of users is achieved through similarity among the users, and based on the characteristic that the abnormal users have similar behavior tracks, the groups where the abnormal users are located are found out by utilizing the found abnormal users, so that more hidden abnormal users are found, and the abnormal user identification method and the abnormal user identification device have the advantages of being high in identification efficiency and strong in identification capacity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of an abnormal user identification method according to an embodiment of the present invention;
fig. 2 is a structural diagram of an abnormal user identification apparatus according to an embodiment of the present invention;
fig. 3 illustrates a physical structure diagram of an electronic device.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of an abnormal user identification method according to an embodiment of the present invention, and as shown in fig. 1, the abnormal user identification method according to the embodiment of the present invention includes:
step 101, determining a central user from the target users according to the similarity between the target users, and initially grouping the target users according to the similarity between the central user and the users except the central user in the target users.
The basic idea of the abnormal user identification method provided by the embodiment of the invention is that based on the fact that abnormal users have similar behavior tracks, more hidden abnormal users can be found by using the found abnormal users. Based on this idea, data of a plurality of target users is first collected. Some of the plurality of target users have been identified as abnormal users, for example, using a method of identifying abnormal users based on the number of logins of the users or an IP address used at the time of login or a device used at the time of login in the related art. However, most of the target users in the group fail to recognize their identities, which may be normal users or hidden abnormal users.
Before calculating the similarity of the target users, data of the target users are collected firstly. The collected data of the plurality of target users includes log logs of the target users. The log generally includes a large amount of behavior records of the target user, such as the login time of the target user, the login platform of the target user, and the device used by the target user during login. The target user's data may also include the target user's identity label information, such as anomalous users that have been identified, that may be marked in their identity label information.
These target users may form a user set. Any target user in the set can calculate the similarity with other single target users in the set, and based on the similarity between one target user and other single target users, the similarity index between one user and other multiple target users can be further calculated. The similarity index reflects the overall similarity relationship between one user and other multiple target users. In the embodiment of the present invention, the similarity index is a sum of similarities. In other embodiments of the present invention, the similarity index may also be other expression modes such as a square sum of similarities.
In the embodiment of the invention, the similarity between one target user and other single target users in the set is added, so that the sum of the similarity between the target user and all other target users except the target user in the set is obtained. In the embodiment of the present invention, the method for calculating the similarity between users is not limited, and a method for calculating the similarity known to those skilled in the art may be used, or a method for calculating the similarity described in another embodiment of the present invention may be used. In the embodiment of the present invention, the time point of the similarity calculation is not limited, and the similarity calculation may be completed in advance before the step is executed, or may be completed in real time during the step.
A plurality of grouped center users may be determined from the plurality of target users based on a sum of similarities between any one of the plurality of target users and all other target users.
The number of the plurality of groups can be determined according to actual conditions, such as the number of target users and the like.
When determining a plurality of grouped central users, firstly, the target user with the highest similarity and the highest similarity with other target users in all the target users is selected as the central user C1 of the first group. The resulting C1 may be added to the cluster center set.
And then selecting a target user with a larger similarity to the target user C1 as the center C2 of another group. When selecting C2, if the target user with the greatest similarity to C1 is selected, noise influence may be caused by outlier users, so that the user with the greatest similarity to C1 is selected as C2. The selection process of the user with the greater similarity here is as follows: taking a random value, wherein the random value is greater than 0 and less than the sum of the similarity between the central user C1 and all other target users; after calculating the similarity between each single target user except C1 and C1 among all the target users, adding the similarities, wherein in the adding process, the sum of the similarities of the current n target users is smaller than the random value, and the sum of the similarities of the (n + 1) th target user is larger than the random value, so that the (n + 1) th target user is C2. The resulting C2 may be added to the cluster center set. Wherein n is a positive integer.
Then, the sum of the similarities of all the other target users except C1 and C2 and the group center set is calculated (when C2 is calculated, the sum of the similarities between all the target users and C1 is calculated, which may be regarded as the sum of the similarities between all the target users and the group center set including only C1). At this time, the group center set includes two users C1 and C2, but when the group center set is regarded as a whole and the similarity between another target user and the group center set is calculated, only one value needs to be calculated. The value is the minimum value of the similarity between another target user and each group center user in the group center set, namely: when the similarity between a target user and the grouping center set is calculated, the similarity between the target user and the C1 and the similarity between the target user and the grouping center set are respectively calculated, and then the smaller similarity value is taken as the similarity between the target user and the grouping center set.
After the similarity between all the other target users except C1 and C2 and the grouping center set is calculated, another random value can be generated according to the sum of the similarities, and the random value is larger than 0 and smaller than the sum of the similarity between all the other target users except C1 and C2 and the grouping center set. And then adding the similarity degrees between all the other target users except C1 and C2 and the grouping center set, wherein the sum of the similarity degrees of the current n target users is smaller than the other random value, and the sum of the similarity degrees of the (n + 1) th target user is larger than the other random value, so that the (n + 1) th target user is C3. The resulting C3 may be added to the cluster center set.
And continuously selecting new grouping centers according to the method until the number of the grouping centers reaches K, wherein K is the number of the groups to be grouped.
After obtaining a plurality of grouped central users, calculating the similarity between the target user which is not determined as the grouped central user and the central user of each group, and distributing the target user to the group where the grouped central user with the highest similarity is located. The resulting packet here is the initial packet.
102, re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user.
And 103, repeatedly executing the step 102 until the users contained in each group do not change any more.
After assigning the target user to each group, the target user may not be the best match to the group in which it is located. And therefore requires adjustments to the users contained in the packet.
In the adjustment, the central users of the individual groups are first determined anew. In the embodiment of the invention, the sum of the similarity between the target user and all other target users in the group is calculated, and the target user with the maximum value of the sum of the similarities is determined as the new central user of the group.
After the central users of each group are re-determined, the similarity between the target user which is not determined as the central user of the group and the new central user of each group is calculated, and the target user is distributed to the group where the central user of the group with the highest similarity is located.
The above process of adjusting the target users included in the group needs to be performed iteratively until the users included in each group do not change any more.
And 104, determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
It has been mentioned in the foregoing that, when acquiring basic information of users, some users have been identified as abnormal users, and therefore, after finally determining the users included in each group, the number of known abnormal users in each group can be acquired.
In view of the fact that abnormal users have similar behavior tracks, the group in which the number of abnormal users in one group is higher than the abnormal determination threshold can be determined as an abnormal user group, and the users in the abnormal user group are determined as abnormal users. It can be seen from the description of this step that hidden abnormal users in the abnormal user group that are not identified by other abnormal identification methods can be found by the method provided by the embodiment of the present invention.
In a group, after the number of abnormal users or the proportion of the abnormal users in the group is known, the number or the proportion is compared with an abnormal judgment threshold value, if the number or the proportion is higher than the abnormal judgment threshold value, the group is an abnormal group, and the users in the group are abnormal users.
The abnormality determination threshold may be a specific numerical value or a proportional value, and the specific value is determined according to the actual application condition.
The abnormal user identification method provided by the embodiment of the invention realizes user clustering through the similarity between target users, and based on the characteristic that the abnormal users have similar behavior tracks, the discovered abnormal users are utilized to find out the groups where the abnormal users are located, so that more hidden abnormal users are discovered, and the abnormal user identification method has the advantages of high identification efficiency and strong identification capability.
Based on any one of the above embodiments, in an embodiment of the present invention, before step 101, the method further includes:
calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;
and summing the similarity between any user in the target users and all the users except any user in the target users to obtain the sum of the similarity between any user in the target users and all the users except any user in the target users.
In the embodiment of the present invention, the similarity index is a sum of similarities. When the similarity between one user and other single users in the user set is calculated, the similarity is calculated from the time dimension, the platform dimension and the equipment dimension respectively. The specific description is as follows.
(1) Similarity in time dimension
The login time of the user can correspond to a discrete time period set with a uniform length; each of which is of fixed length and equal length. The specific length of the time period can be set according to needs, and the smaller the time period is, the more accurate the time period is, but the more sparse the data is.
In an embodiment of the invention, the login time is divided into 48 consecutive time series in statistical units of days, i.e. each time period represents half an hour. Thus, a login time set of each user is obtained, and a user i corresponds to a check-in time set S in the time sequence it Comprises the following steps: { S i1 ,S i2 ,…S i48 }. Wherein S is it And recording the login of the user i in the time period t, wherein if the user logs in, the value corresponding to the vector is 1, and otherwise, the value is 0. It should be noted that the login involved in the embodiment of the present invention is not limited to the time point of the login operation, but also includes all the usage time of the user after login and before logout. Similarly, a set S of check-in times of the user j in the same time can be obtained jt :{S j1 ,S j2 ,…S j48 }. Accordingly, the similarity simt of the user i and the user j in the time dimension can be calculated:
Figure BDA0002471989920000091
Figure BDA0002471989920000092
wherein S is it A record representing that user i logged in during a time period t; s jt Representing a record of user j logging in during time period t.
It can be seen from the above formula that, as the difference d between the user i and the user j is smaller, the user similarity is higher.
(2) Similarity in platform dimensions
For the platform that the user logs in, P is used i,p The login behavior of user i on platform p is identified. When the user i has login behavior on the platform P, P i,p The value of (1) is 0 without logging. Thereby obtaining login platform vectors (P) corresponding to m platforms logged in by user i i1 ,P i2 ,…,P im ). Similarly, it can be obtained that the user j logs in the same m platformsLogin platform vector (P) j1 ,P j2 ,…,P jm )。
From this, the similarity simp of user i and user j in the platform dimension can be calculated:
Figure BDA0002471989920000101
Figure BDA0002471989920000102
wherein, P ip A record representing that user i is logged on platform p; p is jp A record representing that user j is logged on platform p.
(3) Similarity in device dimensions
For devices on which the user logs in, use E i,s The login behavior of user i on device s is identified. When user i has login behavior on device s, then E i,s The value of (1) is 1, and the value of no registration action is 0. The vector (E) of the login device of user i is obtained i1 ,E i2 ,…,E iw ). Similarly, a login device vector (E) corresponding to the same w devices logged in by user j can be obtained j1 ,E j2 ,…,E jw )。
Accordingly, the similarity sime of the user i and the user j on the device dimension can be calculated:
Figure BDA0002471989920000103
Figure BDA0002471989920000104
wherein E is is A record representing that user i is logged in on device s; e js Representing a record of user j logging on device s.
It should be noted that, because the types of devices are various, the device values of the device dimensions are relatively cluttered, and in the embodiment of the present invention, devices with a large number of users using devices in the existing network are mainly taken. For users using a device with a smaller number of users, the values of the vectors are all 0. For all 0 users, sim =1.
After respective similarities in the time dimension, the platform dimension, and the device dimension are calculated, the similarities between users can be calculated. In the embodiment of the present invention, the similarity sim between users is obtained by averaging the similarities of three dimensions, i.e., sim =1/3 × x (simt + simp + sime). In other embodiments of the present invention, a certain weight may be set for the similarity of the three dimensions according to actual needs, so as to calculate the similarity between users.
The similarity between one user and another user can be calculated through the above description, and the sum of the similarities between one user and all other users can be obtained by summing up the similarities between one user and all other users.
The abnormal user identification method provided by the embodiment of the invention obtains the similarity among users by calculating the similarity among the users in three dimensions of time dimension, equipment dimension and platform dimension, realizes the clustering of the users by utilizing the similarity among the users, and finds out the grouping where the abnormal users are based on the characteristic that the abnormal users have similar behavior tracks by utilizing the discovered abnormal users, thereby discovering more hidden abnormal users and having the advantages of high identification efficiency and strong identification capability.
Based on any one of the above embodiments, in an embodiment of the present invention, between step 101 and step 102, the method further includes:
and reducing the dimension of the information of the target user in the initial grouping.
In the embodiment of the invention, the dimension reduction of the information of the target user in the initial grouping is realized in the time dimension.
Since the target user has sparsity in the time dimension, after the initial grouping is obtained, a subset is selected for the time period set of the users in the initial grouping.
When the subset is selected, the information entropy of all users in the initial grouping in different time periods is firstly calculated. The calculation formula of the information entropy is as follows:
Figure BDA0002471989920000111
wherein e is ti Representing the information entropy of all target users in the initial group in the ith time period t; p (u) j ) Representing target user u j Probability of logging in at the i-th time period t. n is the number of all target users in the initial packet. P (u) j ) The calculation method is as follows:
total number of check-ins by target user/total number of time periods.
And then, after entropy values of all time periods are calculated for all target users in the initial grouping, selecting the time period with the entropy value larger than a threshold value a as a login time period of the initial grouping.
By the calculation of the login time periods of the group set, the number of time dimensions can be effectively reduced, such as the division of a day into 48 time periods in the previous example is reduced to 24 time periods.
The dimension reduction of the target user information is beneficial to reducing the calculation amount of subsequent operation.
In other embodiments of the present invention, after the login time period of the group set is calculated, the similarity between users in the group set is calculated in a new login time period. And (4) keeping users with the similarity larger than the threshold b, and removing the users with the similarity smaller than or equal to the threshold b from the grouping set as discrete users. The removed discrete users can be used as suspected login abnormal users, and other methods in the prior art are adopted to detect whether the users are abnormal users or not.
The abnormal user identification method provided by the embodiment of the invention is beneficial to reducing the calculation amount through reducing the dimension of the user information in the initial grouping, and improves the real-time property of identification while ensuring the identification effect of hiding the abnormal user.
Based on any of the foregoing embodiments, fig. 2 is a structural diagram of an abnormal user identification apparatus according to an embodiment of the present invention, and as shown in fig. 2, the abnormal user identification apparatus according to the embodiment of the present invention includes:
a preliminary grouping module 201, configured to determine a central user from target users according to a similarity between the target users, and perform preliminary grouping on the target users according to a similarity between the central user and a user other than the central user in the target users;
a grouping adjustment module 202, configured to re-determine the central user according to the similarity between the users in the initial grouping, and re-group the target users according to the similarity between the re-determined central user and the users other than the re-determined central user in the target users;
a grouping determining module 203 for repeatedly executing the grouping adjusting step until the users included in each group do not change any more;
and the abnormal user identification module 204 is used for determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
The abnormal user identification device provided by the embodiment of the invention realizes user clustering through the similarity between target users, and finds out the group where the abnormal user is located by utilizing the discovered abnormal user based on the characteristic that the abnormal user has similar behavior tracks, thereby discovering more hidden abnormal users, and having the advantages of high identification efficiency and strong identification capability.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor) 310, a communication Interface (communication Interface) 320, a memory (memory) 330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform the following method: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and initially grouping the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 310, the communication interface 320, the memory 330, and the communication bus 340 shown in fig. 3, where the processor 310, the communication interface 320, and the memory 330 complete mutual communication through the communication bus 340, and the processor 310 may call the logic instruction in the memory 330 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the methods provided by the above-mentioned method embodiments, for example, comprising: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to execute the method provided by the foregoing embodiments, for example, the method includes: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. An abnormal user identification method is characterized by comprising the following steps:
an initial grouping step, determining a central user from the target users according to the similarity between the target users, calculating the similarity between the target users which are not determined as the central users and the central users of each group, and distributing the target users which are not determined as the central users to the group in which the central user with the highest similarity is located, wherein the group is used as an initial group, and the similarity comprises one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device; the plurality of target users comprise abnormal users which are identified, normal users and hidden abnormal users;
a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users;
determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more;
an abnormal user identification step, namely determining abnormal groups according to the number of known abnormal users contained in each group, and determining the users in the abnormal groups as abnormal users;
and the similarity between the target users is the sum of the similarities between any user in the target users and all users except the user in the target users.
2. The abnormal user identification method according to claim 1, further comprising, before the initial grouping step:
calculating the similarity between any user in the target users and one user except the any user in the target users;
summing the similarity between any user in the target users and all users except any user in the target users to obtain the sum of the similarity between any user in the target users and all users except any user in the target users.
3. The abnormal user identification method according to claim 1, wherein the determining a center user from the target users according to the similarity between the target users comprises:
determining a first user according to the maximum similarity between the target users;
determining the first user as a central user;
determining a second user according to a minimum similarity between a non-central user and the central user in the target users and a preset first threshold;
and determining the second user as a new central user, and returning to the step of determining the second user according to the minimum value of the similarity between the non-central user and the central user in the target users and a preset first threshold value, and repeating the steps until the number of the central users reaches a preset number threshold value.
4. The abnormal user identification method according to claim 3, wherein the determining the second user according to the minimum value of the similarity between the non-central user of the target user and the central user and a preset first threshold value comprises:
calculating the minimum similarity between the non-central user of the target user and the central user;
when the sum of the minimum similarity values between the first n non-central users of the target users and the central user is smaller than a first threshold value, and the sum of the minimum similarity values between the first n +1 non-central users of the target users and the central user is not smaller than the first threshold value, determining the n +1 non-central user as the second user;
wherein the first threshold is a random value between 0 and a first similarity sum, and the first similarity sum is a similarity minimum sum between all non-central users of the target users and the central user; n is a positive integer.
5. The abnormal user identification method according to claim 2, wherein the calculating of the similarity between any one of the target users and one of the target users other than the any user comprises:
calculating the difference degree between a third user and a fourth user according to the login record of the third user and the login record of the fourth user;
the third user is any user in the target users, and the fourth user is one user except any user in the target users;
and calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user.
6. The abnormal user identification method according to claim 5, wherein the calculating the degree of difference between the third user and the fourth user according to the log-in record of the third user and the log-in record of the fourth user adopts the following formula:
Figure 914313DEST_PATH_IMAGE001
wherein,
Figure 192848DEST_PATH_IMAGE002
representing a degree of difference between the third user and the fourth user;
when the similarity includes a similarity in a time dimension, a parameter
Figure 606512DEST_PATH_IMAGE003
Representing a third useriA record logged in for a first period of time;
Figure 991619DEST_PATH_IMAGE004
representing a fourth userjA record logged in for a first period of time;
when the similarity comprises similarity on a user login platform, a parameter
Figure 465325DEST_PATH_IMAGE003
Representing a third useriA record of logging in on a first platform;
Figure 434418DEST_PATH_IMAGE004
representing a fourth userjA record of a login on a first platform;
when the similarity includes a similarity on a user login device, a parameter
Figure 120615DEST_PATH_IMAGE003
Representing a third useriA record of login on a first device;
Figure 124343DEST_PATH_IMAGE004
representing a fourth userjA record of login on a first device;
the calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user adopts the following formula:
Figure 768951DEST_PATH_IMAGE005
wherein,
Figure 225340DEST_PATH_IMAGE006
representing a similarity between the third user and the fourth user.
7. An abnormal user identification apparatus, comprising:
a preliminary grouping module, configured to determine a central user from the target users according to similarities between the target users, calculate similarities between the target users that are not determined as the central users and the central users of each group, and allocate the target users that are not determined as the central users to a group in which the central user with the highest similarity is located, where the group is used as an initial group, and the similarities include one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device; the plurality of target users comprise abnormal users which are identified, normal users and hidden abnormal users;
a grouping adjustment module, configured to re-determine the central user according to a similarity between users in the initial grouping, and re-group the target users according to a similarity between the re-determined central user and a user other than the re-determined central user among the target users;
a grouping determining module for repeatedly executing the grouping adjusting step until the users contained in each group do not change any more;
the abnormal user identification module is used for determining abnormal groups according to the number of the known abnormal users contained in each group and determining the users in the abnormal groups as the abnormal users;
and the similarity between the target users is the sum of the similarities between any one of the target users and all the users except the any one of the target users.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for abnormal user identification according to any of claims 1 to 6.
9. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for abnormal user identification according to any one of claims 1 to 6.
CN202010351557.5A 2020-04-28 2020-04-28 Abnormal user identification method and device, electronic equipment and storage medium Active CN111586001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010351557.5A CN111586001B (en) 2020-04-28 2020-04-28 Abnormal user identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010351557.5A CN111586001B (en) 2020-04-28 2020-04-28 Abnormal user identification method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111586001A CN111586001A (en) 2020-08-25
CN111586001B true CN111586001B (en) 2022-11-22

Family

ID=72120084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010351557.5A Active CN111586001B (en) 2020-04-28 2020-04-28 Abnormal user identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111586001B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163096A (en) * 2020-09-18 2021-01-01 中国建设银行股份有限公司 Malicious group determination method and device, electronic equipment and storage medium
CN112488175B (en) * 2020-11-26 2023-06-23 中孚安全技术有限公司 Abnormal user detection method based on behavior aggregation characteristics, terminal and storage medium
CN113521749B (en) * 2021-07-15 2024-02-13 珠海金山数字网络科技有限公司 Abnormal account detection model training method and abnormal account detection method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3003779A1 (en) * 2017-05-05 2018-11-05 Servicenow, Inc. Identifying clusters for service management operations
CN109873832A (en) * 2019-03-15 2019-06-11 北京三快在线科技有限公司 Method for recognizing flux, device, electronic equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107579956B (en) * 2017-08-07 2021-05-11 奇安信科技集团股份有限公司 User behavior detection method and device
CN107730271B (en) * 2017-09-20 2021-09-17 北京奇艺世纪科技有限公司 Similar user determination method and device based on virtual interaction object and electronic equipment
CN110876072B (en) * 2018-08-31 2022-02-08 武汉斗鱼网络科技有限公司 Batch registered user identification method, storage medium, electronic device and system
CN109873812B (en) * 2019-01-28 2020-06-23 腾讯科技(深圳)有限公司 Anomaly detection method and device and computer equipment
CN110225036B (en) * 2019-06-12 2022-03-22 北京奇艺世纪科技有限公司 Account detection method, device, server and storage medium
CN110309424A (en) * 2019-07-04 2019-10-08 东北大学 A kind of socialization recommended method based on Rough clustering
CN110532429B (en) * 2019-09-04 2021-05-11 重庆邮电大学 Online user group classification method and device based on clustering and association rules
CN110706092B (en) * 2019-09-23 2021-05-18 前海飞算科技(深圳)有限公司 Risk user identification method and device, storage medium and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3003779A1 (en) * 2017-05-05 2018-11-05 Servicenow, Inc. Identifying clusters for service management operations
CN109873832A (en) * 2019-03-15 2019-06-11 北京三快在线科技有限公司 Method for recognizing flux, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111586001A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN111586001B (en) Abnormal user identification method and device, electronic equipment and storage medium
US11087329B2 (en) Method and apparatus of identifying a transaction risk
US9223968B2 (en) Determining whether virtual network user is malicious user based on degree of association
US10103942B2 (en) Computer processing method and system for network data
CN110807488B (en) Anomaly detection method and device based on user peer-to-peer group
CN108881250B (en) Power communication network security situation prediction method, device, equipment and storage medium
CN110032583B (en) Fraudulent party identification method and device, readable storage medium and terminal equipment
RU2011148277A (en) NETWORK COMPUTING SYSTEM AND METHOD FOR SOLVING A COMPUTING PROBLEM
CN108985954B (en) Method for establishing association relation of each identifier and related equipment
US11636400B2 (en) Federated doubly stochastic kernel learning on vertical partitioned data
CN110166344A (en) A kind of identity recognition methods, device and relevant device
CN113726783A (en) Abnormal IP address identification method and device, electronic equipment and readable storage medium
CN115509875A (en) Server health degree evaluation method and device
WO2018077301A1 (en) Account screening method and apparatus
CN108076032B (en) Abnormal behavior user identification method and device
CN110876072B (en) Batch registered user identification method, storage medium, electronic device and system
CN108363740B (en) IP address analysis method and device, storage medium and terminal
CN110443061B (en) Data encryption method and device
CN110222297B (en) Identification method of tag user and related equipment
CN112070161A (en) Network attack event classification method, device, terminal and storage medium
JP2016103205A (en) Data classification device, data classification program and data classification method
CN111784381B (en) Power customer subdivision method and system based on privacy protection and SOM network
CN111026816B (en) High-net-value customer group identification method and device based on knowledge graph and storage medium
US10057803B2 (en) Wi-Fi adoption index
CN111353904B (en) Method and device for determining social hierarchy of node in social network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant