CN114037514A - Method, device, equipment and storage medium for detecting fraud risk of user group - Google Patents

Method, device, equipment and storage medium for detecting fraud risk of user group Download PDF

Info

Publication number
CN114037514A
CN114037514A CN202111321581.5A CN202111321581A CN114037514A CN 114037514 A CN114037514 A CN 114037514A CN 202111321581 A CN202111321581 A CN 202111321581A CN 114037514 A CN114037514 A CN 114037514A
Authority
CN
China
Prior art keywords
community
user
fraud
detected
user node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111321581.5A
Other languages
Chinese (zh)
Inventor
陈丽娜
咸瑞
化国伟
彭超
陈小军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhenai Jieyun Information Technology Co ltd
Original Assignee
Shenzhen Zhenai Jieyun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhenai Jieyun Information Technology Co ltd filed Critical Shenzhen Zhenai Jieyun Information Technology Co ltd
Priority to CN202111321581.5A priority Critical patent/CN114037514A/en
Publication of CN114037514A publication Critical patent/CN114037514A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention provides a method, a device, equipment and a storage medium for detecting fraud risk of a user group, wherein the method comprises the following steps: acquiring a user information set of a target user group, and constructing an information relationship network diagram of the target user group based on the user information set; carrying out community division on the information relation network graph by using a Louvain algorithm; screening out communities to be tested from the plurality of divided communities based on preset screening conditions; carrying out fraud user verification on each user node to be detected in each community to be detected so as to screen out fraud risk communities from the communities to be detected; then calculating the risk value of each non-fraud user node in the fraud risk community; and visually displaying the fraud risk community and the risk value in a webpage form. When the group fraud risk detection is carried out on the target user group, the community division and screening are carried out on the target user group, the data volume in the detection process is reduced, the detection efficiency is improved, and the detection result is visualized.

Description

Method, device, equipment and storage medium for detecting fraud risk of user group
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a fraud risk of a user group.
Background
With the increasing development of the internet, internet technologies and service industries are continuously combined, and various internet services are derived, wherein the development of internet financial services is particularly prominent, and with the development of mobile internet, internet finance brings great convenience to the lives of users, at the present day when the internet finance is increasingly developed, lawless persons gradually change from individual fraud into group fraud with certain organization, the existing method for detecting fraud risks of users can only meet the fraud risk detection of individuals or small-batch users, for the user groups with large numbers of detection personnel, the existing method for detecting fraud of users generally has the problem of slow detection efficiency, and is difficult to meet the requirement for quickly detecting fraud risks of large-batch users. Thus, there is still a need for improvement and development of the prior art.
Disclosure of Invention
The invention mainly aims to solve the technical problems that the conventional method for detecting the fraud risk of the user is difficult to adapt to the fraud risk detection of a user group, and the processing efficiency is low due to the large amount of data required to be processed.
The first aspect of the present invention provides a method for detecting a fraud risk of a user group, including:
acquiring a user information set of a target user group, and constructing an information relationship network diagram of the target user group based on the user information set;
carrying out community division on the information relation network graph by using a Louvain algorithm;
screening out communities to be tested from the plurality of divided communities based on preset screening conditions;
carrying out fraud user verification on each user node to be detected in each community to be detected so as to screen out a fraud risk community containing the fraud user node from the community to be detected;
calculating a risk value of each non-fraudulent user node in the fraud risk community;
and visually displaying the fraud risk community and the risk value in a webpage form.
In an optional implementation manner of the first aspect of the present invention, the calculating the risk value of each non-fraudulent user node in the fraud risk community includes:
judging whether the user node to be detected is a fraudulent user node or not aiming at each user node to be detected, wherein if yes, the risk value of the fraudulent user node is 1; if not, determining that the user node to be detected is a non-fraudulent user node;
obtaining at least one fraud user node in the fraud risk community, wherein the fraud user node has a direct link relation with the non-fraud user node;
and calculating at least one associated value corresponding to the non-fraudulent user node and the at least one fraudulent user node, and accumulating the at least one associated value to obtain a fraud risk value corresponding to the non-fraudulent user node.
In an alternative embodiment of the first aspect of the present invention, the calculation formula of the correlation value is:
Figure BDA0003345744030000021
wherein d is a damping coefficient, r (a) is an association value of any non-fraudulent user node a, pr (T) is a PageRank value of any fraudulent user node T in the fraud risk community, which has a direct link relationship with the non-fraudulent user node a, and l (T) is a link number of the fraudulent user node T.
In an optional implementation manner of the first aspect of the present invention, the screening, based on a preset screening condition, a to-be-tested community from a plurality of divided communities includes:
setting a plurality of community screening conditions based on the number of user nodes to be detected, the type of node relationship edges, the number of node relationship edges and the weight of the node relationship edges;
and for any community in the plurality of communities, if the community meets at least one community screening condition, determining that the community is a community to be tested.
In an optional implementation manner of the first aspect of the present invention, the performing fraud user verification on each user node to be detected in each community to be detected to screen out a fraud risk community including a fraud user node from the community to be detected includes:
acquiring user information of user nodes to be detected in each community to be detected;
extracting the verification state of the user node to be detected from the user information of the user node to be detected aiming at each user node to be detected, judging whether the verification state of the user node to be detected is a limiting state or not, and if so, judging that the user node to be detected is a fraudulent user node;
and aiming at each community to be detected, if the community to be detected contains the fraud user node, judging that the community to be detected is a fraud risk community.
In an optional implementation manner of the first aspect of the present invention, the node relationship edge type includes: the method comprises the steps of sharing equipment by users, sharing payment account numbers by users, sharing WIFI by users, sharing GPS by users, sharing login IP by users, sharing registration IP by users and sharing registration mobile phone numbers by users.
In an alternative embodiment of the first aspect of the present invention, the user information set includes at least one user information, and the user information includes login information, transaction information, and usage behavior information.
A second aspect of the present invention provides a device for detecting a fraud risk of a user group, including:
the system comprises a construction module, a data processing module and a data processing module, wherein the construction module is used for acquiring a user information set of a target user group and constructing an information relationship network diagram of the target user group based on the user information set;
the dividing module is used for carrying out community division on the information relation network graph by using a Louvain algorithm;
the screening module is used for screening out communities to be tested from the plurality of divided communities based on preset screening conditions;
the verification module is used for carrying out fraud user verification on each user node to be detected in each community to be detected so as to screen out a fraud risk community containing the fraud user node from the community to be detected;
the calculation module is used for calculating the risk value of each non-fraud user node in the fraud risk community;
and the display module is used for visually displaying the fraud risk community and the risk value in a webpage form.
A third aspect of the present invention provides a device for detecting a risk of fraud for a user group, the device comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invokes the instructions in the memory to cause the fraud risk detection apparatus to perform a method of detecting a risk of fraud for a group of users as described in any one of the above.
A fourth aspect of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of detecting a risk of user group fraud as recited in any of the above.
In this embodiment, when the method for detecting fraud risk of a user group of the present invention detects fraud risk of a target user group, a knowledge map technology is adopted, an information relationship network diagram of the target user group is established according to a user information set, then a Louvain algorithm is used to perform community division on the information relationship network diagram, then a to-be-detected community with detection significance is screened out from the divided communities by using preset screening conditions, and then subsequent fraud risk detection is performed on the to-be-detected community. The fraud risk detection efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a method for detecting a fraud risk of a user group according to the present invention;
FIG. 2 is a schematic diagram of an embodiment of an apparatus for detecting fraud risk of a user group according to the present invention;
fig. 3 is a schematic diagram of an embodiment of a device for detecting fraud risk of a user group according to the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for detecting fraud risk of a user group.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For the convenience of understanding, a specific flow of an embodiment of the present invention is described below, and with reference to fig. 1, a first aspect of the present invention provides a method for detecting a fraud risk of a user group, including:
s100, acquiring a user information set of a target user group, and constructing an information relationship network diagram of the target user group based on the user information set; the user information set comprises at least one piece of user information, and the user information comprises login information, transaction information and use behavior information; the login information specifically includes a login user name, a login password, a login terminal name and model, and a login IP address, and the transaction information includes, for example, a transaction account number, a transaction amount, a transaction time, a transaction location, and a transaction type; the usage behavior information includes, for example, shared devices, shared WiFi, shared login IP, and shared GPS; when an information relation network graph is constructed, a node connection relation between users is established according to whether login information, transaction information or use behavior information exists between the users, and if at least one of the login information, the transaction information or the use behavior information of the two compared users is the same, the node connection relation between the two users is established;
in this step S100, an exemplary process of constructing an information relationship network diagram of the target user group based on the user information set includes: acquiring user data and relationship data (namely login information, transaction information, use behavior information and the like) from the user information, converting the user data and the relationship data table into proper data formats, importing the user data and the relationship data after the format conversion into a Neo4j database, and constructing an information relationship network diagram based on Neo4j, wherein Neo4j is a high-performance NOSQL graphic database which stores structured data on the network instead of tables and is an embedded Java persistence engine which is based on a disk and has complete transaction characteristics, but stores the structured data on the network instead of tables, and Neo4j can also be regarded as a high-performance diagram engine which has all characteristics of a mature database, and programmers work on an object-oriented object, Flexible network structure rather than a strict, static table.
In the information relationship network graph constructed based on Neo4j, each node represents a user, if a relationship exists between users, two corresponding nodes are connected through a relationship edge, for example, a user a connects WiFi at a certain position at a first time, a user B also connects the same WiFi at the position at a second time, it can be determined that a shared WiFi relationship exists between the user a and the user B, the user a and the user B are connected through the relationship edge, and the information of the relationship edge of the shared WiFi is added beside the relationship edge.
S200, carrying out community division on the information relationship network graph by using a Louvain algorithm;
the Louvain algorithm is a community discovery algorithm based on Modularity (modulation), the algorithm is good in efficiency and effect, a hierarchical community structure can be discovered, and the optimization aim is to maximize the Modularity of the attribute structure (community network) of the whole graph; according to the scheme, the Louvain algorithm is used for carrying out community division on the information relation network graph, so that the division efficiency can be improved;
in this step S200, the performing community division on the information relationship network graph by using the Louvain algorithm specifically includes:
the method comprises the following steps: pre-calculating the modularity of each user node to be tested in the information relation network diagram;
the modularity is a measurement method for evaluating the division quality of a community network, the physical meaning of the modularity is the difference between the number of connected edges of nodes in the community and the edge number under random conditions, the value range of the modularity is [ -1/2,1), and the modularity is defined as follows:
Figure BDA0003345744030000071
Figure BDA0003345744030000072
wherein, Aij is the weight of the edge between the node i and the node j, and when the network is not a weighted graph, the weight of all the edges can be regarded as 1; ki is sigmajAij represents the sum of the weights (in degrees) of all edges connected to node i; c. CiRepresenting the community to which the node i belongs; m-1/2 sigmaijAij represents the sum of the weights of all edges (number of edges).
In the formula Aij-kikj/2m=Aij-ki*(kj2m), the probability that node j is connected to any one node is kj2m, node i now has kiSo that the edges of nodes i and j are ki (k) at randomj/2m)。
The formula definition of modularity can be simplified as follows:
Figure BDA0003345744030000073
where Σ in represents the sum of weights of edges within the community c, and Σ tot represents the sum of weights of edges connected to nodes within the community c.
The above equation can be further simplified as:
Figure BDA0003345744030000074
the modularity can be understood as the sum of the weight of the edge inside the community minus the weight of all edges connected with the community nodes, and the undirected graph is better understood, namely the degree of the edge inside the community minus the total degree of the nodes inside the community, and the community discovery algorithm based on the modularity aims at maximizing the modularity Q.
Step two: taking any user node to be tested of the information relationship network graph as an example, the user node to be tested takes the user node to be tested as a community; then respectively calculating the profit which can be brought to the modularity of the community of the user node to be detected after each user node to be detected adjacent to the user node to be detected is added into the community of the user node to be detected; if the income brought to the modularity of the community of the user node to be detected is not all zero after each user node to be detected adjacent to the user node to be detected is added into the community of the user node to be detected, then selecting one user node to be detected which can bring the maximum income to the modularity of the community of the user node to be detected from all the user nodes to be detected adjacent to the user node to be detected to be added into the community of the user node to be detected;
step three: and after all the user nodes to be tested in the information relationship network graph execute the step two, taking each community obtained in the information relationship network graph as the user node to be tested in the next round, and repeatedly executing the step one and the step two.
Briefly, the idea of the Louvain algorithm is very simple, and the specific idea is as follows:
1) each user node to be tested in the information relation network graph is regarded as an independent community, and the number of the frequency communities is the same as the number of the nodes;
2) for each user node i to be tested, sequentially trying to allocate the user node i to be tested to the community where each neighbor node is located, calculating the modularity change delta Q before and after allocation, and recording the neighbor node with the maximum delta Q, if max delta Q is greater than 0, allocating the node i to the community where the neighbor node with the maximum delta Q is located, otherwise, keeping unchanged;
3) repeating 2) until the community to which all the nodes belong does not change;
4) compressing the graph, compressing all nodes in the same community into a new node, converting the weight of edges between the nodes in the community into the weight of a ring of the new node, and converting the weight of edges between the community into the weight of edges between the new nodes;
5) repeat 1) until the modularity of the entire graph is no longer changed.
From the flow, the algorithm can generate a hierarchical community structure, wherein the computation time is more time-consuming for the community division of the bottom layer, after the nodes are compressed according to the communities, the number of edges and the nodes is greatly reduced, and the change of the modularity when the node i is distributed to the neighbor j is only related to the communities of the nodes i and j and is unrelated to other communities, so the computation is fast. In the paper, the modularity when the node i is allocated to the community c where the neighbor node j is located is changed as follows:
Figure BDA0003345744030000091
wherein k isi,inIs the sum of the edge weights of the node and the node i in the community c, and the attention pair ki,inCorresponding edge weights are added and multiplied by 2, and delta Q is divided into two parts, wherein the front part represents the modularity after the node i is added into the community c, and the rear part represents the modularity of the node i as an independent community and the community c, because delta Q is less than the step of deleting the node i from the original community, and when the corresponding edge weights are divided later, the community where the node i is located can have a plurality of nodes.
Further, the modularity variation can be simplified during implementation, the above formula (1) is expanded, a plurality of terms are offset, and the sum of the simplification is as follows:
Figure BDA0003345744030000092
s300, screening out communities to be tested from the plurality of divided communities based on preset screening conditions; the main significance of step S300 is to screen out communities with detection significance from the multiple communities, and screening out communities to be detected from the multiple communities obtained by division based on preset screening conditions includes:
setting a plurality of community screening conditions based on the number of user nodes to be detected, the type of node relationship edges, the number of node relationship edges and the weight of the node relationship edges; and for any community in the plurality of communities, if the community meets at least one community screening condition, determining that the community is a community to be tested.
When community screening conditions are set, the conditions of the number of the user nodes to be detected, the type of the node relationship edges (comprising user shared equipment, user shared payment account, user shared WIFI, user shared GPS, user shared login IP, user shared registration IP and user shared registration mobile phone number), the number of the node relationship edges and the weight of the node relationship edges can be singly used or combined by one or more,
when the method is used alone, for example, the number of the user nodes to be detected in the community to be detected is greater than a first threshold (for example, 20) as a first community screening condition, the number of the node relationship edges in the community to be detected is greater than a second threshold (for example, 5) as a second community screening condition, and so on.
When a plurality of screening conditions are used in combination, referring to table 1, firstly, all relation edge types required to be used are listed in a table mode, and a relation edge weight is configured for each relation edge type, for example, the relation edge weight occupied by the relation edge type of "user sharing equipment" is configured to be 2, the relation edge weight occupied by the relation edge type of "user sharing payment account" is configured to be 2, the relation edge weight occupied by the relation edge type of "user sharing WIFI" is configured to be 1, the relation edge weight occupied by the relation edge type of "user sharing registration mobile phone number" is configured to be 2, and for a relation edge type having strong correlation with time, for example, "user sharing GPS", the relation edge type can also be used with a time interval (within 7 days) where a GPS address appears, for example, the relation edge weight is set to be 0.7 for meeting the "user sharing GPS for" time interval (within 7 days) where the GPS address appears <12h ", satisfying the requirements that users share the GPS and the time interval of the GPS address is <24h ", setting the weight of the relation edge to be 0.6, and the like, an exemplary community screening condition table in the invention is as follows:
Figure BDA0003345744030000101
Figure BDA0003345744030000111
further, the relationship side weight configured for each relationship side type in the technical scheme of the present invention may be adjusted according to an actual application scenario, so as to satisfy a situation that different relationship side types contribute to fraud differently due to different emphasis points of fraud occurrence in different scenarios, for example, in a phone fraud scenario, a relationship side type of "user shares a mobile phone number" may be given a higher weight due to a higher contribution of "user shares a mobile phone number" to fraud occurrence, and a relationship side type of "user shares a payment account number" may be given a higher weight due to a higher contribution of "user shares a payment account number" to fraud occurrence than in a payment fraud scenario.
In the embodiment of the step S300, because there are multiple relationships among the user nodes to be detected, different community screening conditions are set for importance of different relationships in fraud problems, the invention excavates communities with fraud detection significance, for community division with smaller scale, because the possibility of containing potential fraud nodes is very low and does not conform to the basic definition of fraud groups, the invention sets at least one community screening condition by combining weight design, relationship composition and community scale comprehensive consideration of different relationships in communities, and subsequent detection analysis can be performed only when the divided communities meet one or more of the at least one community screening condition, thereby greatly reducing the data amount required to be processed in the detection process.
S400, carrying out fraud user verification on each user node to be detected in each community to be detected so as to screen a fraud risk community containing the fraud user node from the community to be detected;
in this step, the fraud user verification is performed on each user node to be detected in each community to be detected, so that a fraud risk community containing the fraud user node is screened out from the community to be detected, and user information (including user data and relationship data) of the user node to be detected in each community to be detected is obtained; in this step, user data in the user information is acquired, specifically, user state attributes (including three states of restriction, normal, and logout) in the user data;
extracting the verification state of the user node to be detected from the user information of the user node to be detected aiming at each user node to be detected, judging whether the verification state of the user node to be detected is a limiting state or not, and if so, judging that the user node to be detected is a fraudulent user node; in the step, specifically, a "verifystatus" keyword is searched in a text file of user data, a current attribute state of a user node to be detected is found after the "verifystatus" keyword, and if the attribute state is "limited", the user node to be detected is determined to be a fraudulent user node;
and aiming at each community to be detected, if the community to be detected contains the fraud user node, judging that the community to be detected is a fraud risk community. In this embodiment, after the community to be detected is obtained, the community to be detected is further screened, that is, whether the community to be detected includes the fraudulent user node is judged, the number of the fraudulent user nodes included in the community to be detected can be one or more, and the technical scheme directly judges whether the user node to be detected is the fraudulent user node through the check state in the user information of the user node to be detected, for example, if the check state of the user node to be detected is the limit state, the user node to be detected is abnormal, and the probability that the user node to be detected is the fraudulent user node is very high.
S500, calculating a risk value of each non-fraud user node in the fraud risk community; in this step, calculating the risk value of each non-fraudulent user node in the fraud risk community specifically includes:
judging whether the user node to be detected is a fraudulent user node or not aiming at each user node to be detected, wherein if yes, the risk value of the fraudulent user node is 1; if not, determining that the user node to be detected is a non-fraudulent user node; the risk value of the non-fraudulent user node is obtained by calculating the association degree between the non-fraudulent user node and the fraudulent user node, so that the non-fraudulent user node in the fraudulent risk community needs to be found out firstly, and in order to divide the risk degree, the risk value of the fraudulent user node is defined as 1, namely the risk level is highest;
obtaining at least one fraud user node in the fraud risk community, wherein the fraud user node has a direct link relation with the non-fraud user node; in this step, since there may be many fraudulent user nodes in the fraud risk community, but not all fraudulent user nodes will have a fraudulent effect on a non-fraudulent user node, it is desirable to screen out rogue user nodes that are in direct link relationship to, in another embodiment, the fraudulent user node may also include a fraudulent user node having a one-level indirect link relation, for example, the non-fraudulent user node a is linked to the non-fraudulent user node B, the non-fraudulent user node B is linked to the fraudulent user node C, the one-level indirect link relation between the non-fraudulent user node a and the fraudulent user node C is called a one-level indirect link relation, since the possibility that the fraudulent user node C will affect the non-fraudulent user node a through the non-fraudulent user node B in the one-level indirect linking relation is still higher, so in another approach the rogue user node may also be indirectly linked to a node that is a non-rogue user;
and calculating at least one associated value corresponding to the non-fraudulent user node and the at least one fraudulent user node, and accumulating the at least one associated value to obtain a fraud risk value corresponding to the non-fraudulent user node. In this embodiment, for example, there are 10 fraudulent user nodes in a fraud risk community, when calculating the risk value of any non-fraudulent user node in the fraud risk community, first find out the fraudulent user node having a direct link relation with the non-fraudulent user node from the 10 fraudulent user nodes, then calculate the associated values between the non-fraudulent user node and the found fraudulent user node having the direct link relation one by one, and finally add the obtained total value of the plurality of associated values to serve as the risk value of the non-fraudulent user node.
The risk value of the fraud user node is used as a standard for evaluating the risk value of the non-fraud user node, the risk value of the fraud user node is 1, the represented risk degree is highest, the risk value of the non-fraud user node in the fraud risk community is obtained by utilizing the association degree with all fraud user nodes, the higher the association degree is, the higher the probability of fraud of the represented non-fraud user node is, and the risk degree of all non-fraud user nodes in each fraud risk community can be evaluated and obtained through the step.
Specifically, the calculation formula of the association value of each non-fraudulent user node of the present invention is:
Figure BDA0003345744030000141
wherein d is a damping coefficient, the value range of d is 0.1-0.9, R (A) is the correlation value of any non-fraudulent user node A, PR (T) is the PageRank value of any fraudulent user node T which has a direct link relation with any non-fraudulent user node A in the fraud risk community, and L (T) is the link number of any fraudulent user node T.
In the embodiment, the relevance value is calculated by utilizing a PR algorithm which is originally used for calculating the webpage link to calculate the webpage importance, each cheating user node and each non-cheating user node in the cheating risk community are regarded as a webpage when the algorithm starts, then the PageRank value of each cheating user node and each non-cheating user node in the cheating risk community is calculated according to the PR algorithm, and then the relevance value of each non-cheating user node and each cheating user node is calculated by utilizing a calculation formula of the relevance value of the invention;
for example, in a specific scenario, the value d is 0.6, the PageRank value of a rogue user node T directly linked with a non-rogue user node is calculated to be 0.9 by a PR algorithm, 2 links between the rogue user node T and other neighboring rogue user nodes or non-rogue user nodes are obtained from a relationship network diagram, when calculating the association value r (a) of any non-rogue user node a having a direct link relationship with the rogue user node T, r (a) ((1-0.6) +0.6(0.9/2) ((0.67), the technical solution calculates the association value of the non-rogue user node a by using the PageRank value of the rogue user node T, if a non-rogue user node a has many rogue user nodes T pointing to it, it represents that the non-rogue user node a is easily influenced by the rogue user node T, that is, the higher the probability that the fraudulent user node T is affected by the non-fraudulent user node a, the higher the risk of fraud.
S600, visually displaying the fraud risk community and the risk value in a webpage form. In this embodiment, the relevant results of fraud detection are visually displayed in a webpage form, and more specifically, in addition to calculating the risk value, the technical scheme may also calculate other relevant indexes of the fraud community, such as statistical indexes of relationships in the community, statistical indexes of portrait of community users, statistical indexes of basic attributes of accounts in the community, and the like.
To sum up, when the detection method of the fraud risk of the user group of the invention detects the fraud risk of the target user group, the knowledge map technology is adopted, firstly, an information relation network diagram of the target user group is established according to a user information set, then, the information relation network diagram is divided into communities by the Louvain algorithm, then, the communities to be detected with detection significance are screened out from the communities obtained after the division by utilizing the preset screening condition, and then, the subsequent fraud risk detection is carried out on the communities to be detected, the detection method of the fraud risk of the invention can quickly find out a small amount of communities to be detected from the large data of tens of millions and above, the task quantity of the communities to be detected to screen out the communities with fraud risk from the communities to be detected is small, the fraud risk value of each user node in the community with fraud risk can be calculated more quickly, the detection method of the fraud risk of the user group of the invention has fewer communities to be detected, it is more targeted.
Referring to fig. 2, a second aspect of the present invention provides a device for detecting a risk of fraud for a user group, including:
the building module 10 is configured to obtain a user information set of a target user group, and build an information relationship network diagram of the target user group based on the user information set;
the dividing module 20 is configured to perform community division on the information relationship network graph by using a Louvain algorithm;
the screening module 30 is configured to screen out a to-be-tested community from the plurality of divided communities based on a preset screening condition;
the verification module 40 is configured to perform fraud user verification on each user node to be detected in each community to be detected, so as to screen out a fraud risk community including the fraud user node from the community to be detected;
a calculating module 50, configured to calculate a risk value of each non-fraudulent user node in the fraud risk community;
and a display module 60, configured to visually display the fraud risk community and the risk value in a webpage form.
In an alternative embodiment of the second aspect of the present invention, the screening module 30 comprises:
the device comprises a setting unit, a selection unit and a selection unit, wherein the setting unit is used for setting a plurality of community screening conditions based on the number of user nodes to be detected, the type of node relationship edges, the number of node relationship edges and the weight of the node relationship edges;
and the determining unit is used for determining that the community is the community to be detected if the community meets at least one community screening condition for any community in the plurality of communities.
In an alternative embodiment of the first aspect of the present invention, the verification module 40 includes:
the first acquisition unit is used for acquiring user information of the user nodes to be detected in each community to be detected;
the first judging unit is used for extracting the verification state of the user node to be detected from the user information of the user node to be detected aiming at each user node to be detected, judging whether the verification state of the user node to be detected is a limiting state or not, and judging that the user node to be detected is a fraudulent user node if the verification state of the user node to be detected is the limiting state;
and the judging unit is used for judging that the community to be detected is a fraud risk community if the community to be detected contains the fraud user node.
In an alternative embodiment of the second aspect of the present invention, the calculation module 50 includes:
a second judging unit, configured to judge, for each to-be-detected user node, whether the to-be-detected user node is a fraudulent user node, where if yes, a risk value of the fraudulent user node is 1; if not, determining that the user node to be detected is a non-fraudulent user node;
the second obtaining unit is used for obtaining at least one cheating user node which has a direct link relation with the non-cheating user node in the cheating risk community;
and the calculating unit is used for calculating at least one associated value corresponding to the non-fraudulent user node and the at least one fraudulent user node, and accumulating the at least one associated value to obtain a fraud risk value corresponding to the non-fraudulent user node.
In an alternative embodiment of the second aspect of the present invention, the calculation formula of the correlation value is:
Figure BDA0003345744030000171
wherein d is a damping coefficient, r (a) is an association value of any non-fraudulent user node a, pr (T) is a PageRank value of any fraudulent user node T in the fraud risk community, which has a direct link relationship with the non-fraudulent user node a, and l (T) is a link number of the fraudulent user node T.
Fig. 3 is a schematic structural diagram of a device for detecting a risk of user group fraud according to an embodiment of the present invention, which may generate a relatively large difference due to different configurations or performances, and may include one or more processors 70 (CPUs) (e.g., one or more processors) and a memory 80, and one or more storage media 90 (e.g., one or more mass storage devices) for storing applications or data. The memory and storage medium may be, among other things, transient or persistent storage. The program stored on the storage medium may include one or more modules (not shown), each of which may include a series of instruction operations in a detection device for risk of fraud for a user group. Still further, the processor may be configured to communicate with a storage medium, where a series of instruction operations are executed on a detection device of a risk of fraud for a group of users.
The apparatus for detecting the risk of fraud in a user group may also include one or more power supplies 100, one or more wired or wireless network interfaces 110, one or more input-output interfaces 120, and/or one or more operating systems, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. It will be appreciated by those skilled in the art that the arrangement of the detection apparatus for risk of fraud for a user group shown in figure 3 does not constitute a limitation of the detection apparatus for risk of fraud for a user group and may comprise more or fewer components than shown, or some components may be combined, or a different arrangement of components.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, or a volatile computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the method for detecting a risk of fraud for a user group.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses, and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for detecting a risk of fraud for a group of users, comprising:
acquiring a user information set of a target user group, and constructing an information relationship network diagram of the target user group based on the user information set;
carrying out community division on the information relation network graph by using a Louvain algorithm;
screening out communities to be tested from the plurality of divided communities based on preset screening conditions;
carrying out fraud user verification on each user node to be detected in each community to be detected so as to screen out a fraud risk community containing the fraud user node from the community to be detected;
calculating a risk value of each non-fraudulent user node in the fraud risk community;
and visually displaying the fraud risk community and the risk value in a webpage form.
2. The method according to claim 1, wherein the calculating the risk value of each non-fraudulent user node in the fraud risk community comprises:
judging whether the user node to be detected is a fraudulent user node or not aiming at each user node to be detected, wherein if yes, the risk value of the fraudulent user node is 1; if not, determining that the user node to be detected is a non-fraudulent user node;
obtaining at least one fraud user node in the fraud risk community, wherein the fraud user node has a direct link relation with the non-fraud user node;
and calculating at least one associated value corresponding to the non-fraudulent user node and the at least one fraudulent user node, and accumulating the at least one associated value to obtain a fraud risk value corresponding to the non-fraudulent user node.
3. The method of claim 2, wherein the correlation value is calculated by the following formula:
Figure FDA0003345744020000011
wherein d is a damping coefficient, r (a) is an association value of any non-fraudulent user node a, pr (T) is a PageRank value of any fraudulent user node T in the fraud risk community, which has a direct link relationship with the non-fraudulent user node a, and l (T) is a link number of the fraudulent user node T.
4. The method for detecting the fraud risk of the user group according to claim 1, wherein the step of screening the communities to be tested from the plurality of divided communities based on the preset screening conditions comprises:
setting a plurality of community screening conditions based on the number of user nodes to be detected, the type of node relationship edges, the number of node relationship edges and the weight of the node relationship edges;
and for any community in the plurality of communities, if the community meets at least one community screening condition, determining that the community is a community to be tested.
5. The method for detecting the fraud risk of the user group according to claim 1, wherein the step of performing fraud user verification on each user node to be detected in each community to be detected so as to screen out a fraud risk community containing fraud user nodes from the community to be detected comprises:
acquiring user information of user nodes to be detected in each community to be detected;
extracting the verification state of the user node to be detected from the user information of the user node to be detected aiming at each user node to be detected, judging whether the verification state of the user node to be detected is a limiting state or not, and if so, judging that the user node to be detected is a fraudulent user node;
and aiming at each community to be detected, if the community to be detected contains the fraud user node, judging that the community to be detected is a fraud risk community.
6. The method of claim 4, wherein the node relationship edge type comprises: the method comprises the steps of sharing equipment by users, sharing payment account numbers by users, sharing WIFI by users, sharing GPS by users, sharing login IP by users, sharing registration IP by users and sharing registration mobile phone numbers by users.
7. The method of claim 1, wherein the set of user information includes at least one user information, the user information including login information, transaction information, and usage behavior information.
8. An apparatus for detecting a risk of fraud for a user group, the apparatus comprising:
the system comprises a construction module, a data processing module and a data processing module, wherein the construction module is used for acquiring a user information set of a target user group and constructing an information relationship network diagram of the target user group based on the user information set;
the dividing module is used for carrying out community division on the information relation network graph by using a Louvain algorithm;
the screening module is used for screening out communities to be tested from the plurality of divided communities based on preset screening conditions;
the verification module is used for carrying out fraud user verification on each user node to be detected in each community to be detected so as to screen out a fraud risk community containing the fraud user node from the community to be detected;
the calculation module is used for calculating the risk value of each non-fraud user node in the fraud risk community;
and the display module is used for visually displaying the fraud risk community and the risk value in a webpage form.
9. A device for detecting a risk of fraud in a user group, the device comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invoking the instructions in the memory to cause the fraud risk detection apparatus to perform the user group fraud risk detection method of any of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program, which, when being executed by a processor, carries out the method for detecting a risk of fraud for a group of users according to any one of claims 1 to 7.
CN202111321581.5A 2021-11-09 2021-11-09 Method, device, equipment and storage medium for detecting fraud risk of user group Pending CN114037514A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111321581.5A CN114037514A (en) 2021-11-09 2021-11-09 Method, device, equipment and storage medium for detecting fraud risk of user group

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111321581.5A CN114037514A (en) 2021-11-09 2021-11-09 Method, device, equipment and storage medium for detecting fraud risk of user group

Publications (1)

Publication Number Publication Date
CN114037514A true CN114037514A (en) 2022-02-11

Family

ID=80137080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111321581.5A Pending CN114037514A (en) 2021-11-09 2021-11-09 Method, device, equipment and storage medium for detecting fraud risk of user group

Country Status (1)

Country Link
CN (1) CN114037514A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114938370A (en) * 2022-04-08 2022-08-23 重庆扬成大数据科技有限公司 Working method for carrying out community administration data security access through cloud platform
CN117575782A (en) * 2024-01-15 2024-02-20 杭银消费金融股份有限公司 Leiden community discovery algorithm-based group fraud identification method
CN117575782B (en) * 2024-01-15 2024-05-07 杭银消费金融股份有限公司 Leiden community discovery algorithm-based group fraud identification method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114938370A (en) * 2022-04-08 2022-08-23 重庆扬成大数据科技有限公司 Working method for carrying out community administration data security access through cloud platform
CN114938370B (en) * 2022-04-08 2024-03-29 重庆扬成大数据科技有限公司 Working method for safely accessing community management data through cloud platform
CN117575782A (en) * 2024-01-15 2024-02-20 杭银消费金融股份有限公司 Leiden community discovery algorithm-based group fraud identification method
CN117575782B (en) * 2024-01-15 2024-05-07 杭银消费金融股份有限公司 Leiden community discovery algorithm-based group fraud identification method

Similar Documents

Publication Publication Date Title
CN103412918A (en) Quality of service (QoS) and reputation based method for evaluating service trust levels
CN110689084B (en) Abnormal user identification method and device
CN101997709A (en) Root alarm data analysis method and system
CN111596924B (en) Micro-service dividing method and device
CN114282011B (en) Knowledge graph construction method and device, and graph calculation method and device
CN114037514A (en) Method, device, equipment and storage medium for detecting fraud risk of user group
CN111612085A (en) Method and device for detecting abnormal point in peer-to-peer group
CN115203496A (en) Project intelligent prediction and evaluation method and system based on big data and readable storage medium
US20170032707A1 (en) Method for determining a fruition score in relation to a poverty alleviation program
CN107563588A (en) A kind of acquisition methods of personal credit and acquisition system
KR20070070062A (en) Service evaluation method, system, and computer program product
CN114140221A (en) Fraud risk early warning method, device and equipment
CN109428760B (en) User credit evaluation method based on operator data
CN111428092B (en) Bank accurate marketing method based on graph model
CN115204881A (en) Data processing method, device, equipment and storage medium
CN115437965B (en) Data processing method suitable for test management platform
CN110807171A (en) Method and device for analyzing adequacy of seat personnel in business based on weight division
CN116739742A (en) Monitoring method, device, equipment and storage medium of credit wind control model
CN115225543A (en) Flow prediction method and device, electronic equipment and storage medium
CN108880835B (en) Data analysis method and device and computer storage medium
CN110489568B (en) Method and device for generating event graph, storage medium and electronic equipment
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
CN114331665A (en) Training method and device for credit judgment model of predetermined applicant and electronic equipment
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment
CN112070548A (en) User layering method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination