CN112667863B - Financial fraud group identification method based on hypergraph segmentation - Google Patents

Financial fraud group identification method based on hypergraph segmentation Download PDF

Info

Publication number
CN112667863B
CN112667863B CN202110058766.5A CN202110058766A CN112667863B CN 112667863 B CN112667863 B CN 112667863B CN 202110058766 A CN202110058766 A CN 202110058766A CN 112667863 B CN112667863 B CN 112667863B
Authority
CN
China
Prior art keywords
hypergraph
call
node
data
construction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110058766.5A
Other languages
Chinese (zh)
Other versions
CN112667863A (en
Inventor
张涛
张宗旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110058766.5A priority Critical patent/CN112667863B/en
Publication of CN112667863A publication Critical patent/CN112667863A/en
Application granted granted Critical
Publication of CN112667863B publication Critical patent/CN112667863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a financial fraud group identification method based on hypergraph segmentation, which decomposes the hypergraph segmentation process into 6 subprocesses: feature extraction, data normalization processing, data cleaning, data storage, index selection, call network construction, adjacency matrix construction, nonnegative matrix factorization and result acquisition. The invention is realized based on the concept of one-side multi-node hypergraph, the property of the hypergraph can enable the model to obtain the approximate value of the global optimal solution, and the current group fraud identification scheme is mostly based on the algorithm of the traditional graph and can only consider the local optimal solution. And the sphere of detection of the group is more dependent on the global optimal solution. For a scheme of high-latitude calculation, a hypergraph regularization term is added into a loss function of non-negative matrix factorization, so that high-dimensional information can be encoded, and iteration efficiency is improved.

Description

Financial fraud group identification method based on hypergraph segmentation
Technical Field
The invention belongs to the fields of financial anti-fraud and machine learning, and relates to an effective method for identifying financial fraud partners.
Background
In the field of internet finance, fraud is the most dominant factor in the loss of lending institutions, and research has found that credit fraud is often a partner, and that these partners are necessarily directly linked to each other.
The method for finding the trace of the fraudulent group is relatively feasible and effective by analyzing the social behavior of the client by the operator data, but the operator communication data is quite huge, and the general statistical method is incapable of carrying out effective analysis, so that the client group is partitioned by means of a machine learning technology to find the fraudulent group. The hypergraph breaks through a graph which commonly describes binary relationships, and one hyperedge can contain a plurality of vertexes, so that the hypergraph is more suitable for describing multiple relationships. So far, no efficient community segmentation method based on hypergraph is applied to the field of financial anti-fraud.
Hypergraph (Hypergraph) is a generalized graph whose one edge can connect any number of vertices. Whereas a common graph has only two vertices on one side. When the common graph is expanded to the hypergraph, the relation among the nodes becomes higher-order. The network can be projected into a potential low-dimensional space based on a non-negative matrix factorization method, and the result can be expressed in a distinguishing way; and secondly, the calculation can be made efficient and feasible.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a financial fraud partner identification method based on hypergraph segmentation, so as to solve the problems that the traditional method only considers the limitation of local feasibility and how to efficiently calculate for a high-order information network due to the use of a common graph. And constructing an information network by using call records among individuals, and aggregating the network by using a hypergraph segmentation method. And because of the characteristic of aggregation of the partner fraud, the result of partner identification can be obtained.
In order to solve the problems, the invention adopts the following technical scheme:
a financial fraud partner identification method based on hypergraph segmentation comprises the following steps:
step S1, feature extraction: factors such as contact person ID, call duration, call times and the like are extracted from the original data of the call records to form a JSON string form, so that the subsequent data processing is convenient;
step S2, data normalization processing: the first rule is that when the caller ID is the same as the callee, the caller ID does not meet the specification, and the record is deleted; the second rule is that the end time-start time is not equal to the call duration or the end time is earlier than the start time, and the record is deleted;
step S3, data cleaning: and deleting call records such as harassment, express delivery, meal delivery, promotion, invalid numbers, service numbers and the like. The interference call records can promote the contact person IDs which are not related to each other to be directly clustered, so that the result of cluster recognition is interfered;
step S4, data storage: the cleaned data is stored in JanusGraph, so that development of a system level is facilitated;
step S5, index selection and call network construction: taking the contact person ID as a node of the network, taking call association as a side of the network, and selecting indexes such as call times, duration, information entropy, time interval and the like to finish construction of a call information network;
step S6: constructing an adjacency matrix: calculating weights according to the indexes in the S5, and constructing an adjacency matrix of the undirected graph according to the weights, wherein the weights E [0,1];
step S7: non-negative matrix factorization: critical matrix a in S6The method comprises the following steps of: a is approximately equal to WH T The method comprises the steps of carrying out a first treatment on the surface of the Wherein,different from the traditional meaning, one edge only has two node diagrams, all nodes of the hypergraph are in a high-dimensional space, the hypergraph regular term is added to encode the high-order relation of nonnegative matrix factorization, and the loss function is: />Where k represents the number of divided communities, a ij Representing node v i And node v j Probability of connection, w il And h hl Representing v i The probability that the degree of ingress and egress belongs to community l. W= [ W ] il ]∈R n×k And H= [ H ] il ]∈R n×k Is a non-negative matrix. z ij Representing node v i And node v j Is of the order of (1), Z= [ Z ] il ]∈R n×n Is reliable a priori information (if v i And v j Z is not related to ij =0, if v i And v j Belonging to the same community, z ij =1 and h i And h j Approximately equal), λ is the adjustment parameter between the regularization term and the loss function.
Step S8: the result is obtained: according to the loss function in S7, iterating, updating the rule through W and H, converging the function, and according toAnd obtaining a result of the community detection algorithm. According to the characteristic that the fraudulent group has clustering in the call network, the clustered nodes are identified as members of the fraudulent group.
Preferably, the information such as indexes used by the model is obtained by carrying out feature development on the original data;
preferably, the research result is applied to the development of an actual system through the construction of a call information network, and plays a key role in the construction of an anti-fraud engine;
preferably, the traditional graph of different paired nodes is considered, and a hypergraph concept of one-side multi-node is provided, so that the accuracy of an algorithm result is higher;
preferably, the calculation is made feasible by means of non-negative matrix factorization;
preferably, the loss function of the hypergraph regularization term is added to encode the high-latitude information, so that the efficiency of the hypergraph-based graph segmentation algorithm is greatly improved.
The above-graph segmentation-based group partner fraud identification method is decoupled into a plurality of sub-processes, realizes model iteration and engine development under a big data scene through task encapsulation, allocation and flow control, and can intercept more than 99% of group partner fraud.
The implementation technical scheme of the invention is as follows: the hypergraph segmentation process is broken down into 6 sub-processes: feature extraction, data normalization processing, data cleaning, data storage, index selection, call network construction, adjacency matrix construction, nonnegative matrix factorization and result acquisition.
Compared with the prior art, the invention has the following obvious advantages and beneficial effects:
(1) The invention is realized based on the concept of one-side multi-node hypergraph, the property of the hypergraph can enable the model to obtain the approximate value of the global optimal solution, and the current group fraud identification scheme is mostly based on the algorithm of the traditional graph and can only consider the local optimal solution. And the sphere of detection of the group is more dependent on the global optimal solution.
(2) For a scheme of high-latitude calculation, a hypergraph regularization term is added into a loss function of non-negative matrix factorization, so that high-dimensional information can be encoded, and iteration efficiency is improved.
Drawings
Figure 1 is a specific flow chart of a method according to the invention.
FIG. 2 is a schematic diagram of encoding node information in a hypergraph high-dimensional space in a new space according to the present invention.
FIG. 3 is a flow chart of an embodiment implementation of model construction and specific practice.
Detailed Description
The present invention will be described in detail below with reference to the drawings and examples.
The technical scheme adopted by the invention is a financial fraud partner identification method based on hypergraph segmentation, which comprises the following steps of S1, feature extraction: extracting factors such as contact person ID, call duration, call times and the like from the original data of the call records to form a JSON string;
step S2, data normalization processing: the first rule is that when the caller ID is the same as the callee, the caller ID does not meet the specification, and the record is deleted; the second rule is that the end time-start time is not equal to the call duration or the end time is earlier than the start time, and the record is deleted;
step S3, data cleaning: deleting call records such as harassment, express delivery, meal delivery, promotion, invalid number, service number and the like;
step S4, data storage: the cleaned data is stored in JanusGraph, so that development of a system level is facilitated;
step S5, index selection and call network construction: taking the contact person ID as a node of the network, taking call association as a side of the network, and selecting indexes such as call times, duration, information entropy, time interval and the like to finish construction of a call information network;
step S6: constructing an adjacency matrix: calculating weights according to the indexes in the S5 through AHP, and constructing an adjacency matrix of the undirected graph according to the weights;
step S7: non-negative matrix factorization: the critical matrix a in S6 is decomposed into: a is approximately equal to WH T The method comprises the steps of carrying out a first treatment on the surface of the Wherein,different from the traditional meaning, one edge only has two node diagrams, all nodes of the hypergraph are in a high-dimensional space, the hypergraph regular term is added to encode the high-order relation of nonnegative matrix factorization, and the loss function is: />
Step S8: the result is obtained: according to the loss function in S7, iterating, updating the rule through W and H, converging the function, and according toAnd obtaining a result of the community detection algorithm.
Finally, it should be noted that: the above examples are only for illustrating the invention and are not intended to limit the technical solutions described by the invention; thus, while the invention has been described in detail with reference to the examples set forth above, it will be appreciated by those skilled in the art that modifications and equivalents may be made thereto; all technical solutions and modifications thereof that do not depart from the spirit and scope of the invention are intended to be covered by the scope of the appended claims.

Claims (4)

1. A financial fraud partner identification method based on hypergraph segmentation is characterized in that: comprises the steps of,
step S1, feature extraction: the contact person ID, the call time length and the call frequency factor are extracted from the original data of the call record to form a JSON string form, so that the subsequent data processing is convenient;
step S2, data normalization processing: the first rule is that when the caller ID is the same as the callee, the caller ID does not meet the specification, and the record is deleted; the second rule is that the end time-start time is not equal to the call duration or the end time is earlier than the start time, and the record is deleted;
step S3, data cleaning: deleting harassment, express delivery, meal delivery, promotion, invalid number and service number call records; the interference call records can promote the contact person IDs which are not related to each other to be directly clustered, so that the result of cluster recognition is interfered;
step S4, data storage: the cleaned data is stored in JanusGraph, so that development of a system level is facilitated;
step S5, index selection and call network construction: taking the contact person ID as a node of the network, taking the call association as a side of the network, and selecting call times, duration, information entropy and time interval indexes to finish construction of a call information network;
step S6: construction of an adjacency matrix A: calculating weights according to the indexes in the S5, and constructing an adjacency matrix of the undirected graph according to the weights, wherein the weights E [0,1];
step S7: non-negative matrix factorization: decomposing the adjacency matrix A in S6 into: a is approximately equal to WH T The method comprises the steps of carrying out a first treatment on the surface of the Wherein,on the high-dimensional space, adding hypergraph regularization terms to all nodes of the hypergraph encodes the higher-order relation of non-negative matrix factorization, and the loss function: />Where k represents the number of divided communities, a ij Representing node v i And node v j Probability of connection, w il And h hl Representing v i Probability that the degree of entry and the degree of exit belong to community l; w= [ W ] il ]∈R n×k And H= [ H ] il ]∈R n ×k Is a non-negative matrix; z ij Representing node v i And node v j Is of the order of (1), Z= [ Z ] il ]∈R n×n Is reliable a priori information if v i And v j Z is not related to ij =0, if v i And v j Belonging to the same community, z ij =1 and h i And h j Is approximately equal, λ is the adjustment parameter between the regularization term and the loss function;
step S8: the result is obtained: according to the loss function in S7, iterating, updating the rule through W and H, converging the function, and according toObtaining a result of a community detection algorithm; according to the characteristic that the fraudulent group has clustering in the call network, the clustered nodes are identified as members of the fraudulent group.
2. The method for identifying financial fraud partners based on hypergraph segmentation as defined in claim 1, wherein: and obtaining index information used by the financial fraud partner identification method based on hypergraph segmentation by carrying out feature development on the original data.
3. The method for identifying financial fraud partners based on hypergraph segmentation as defined in claim 1, wherein: the construction of the anti-fraud engine plays a key role through the construction of the call information network.
4. The method for identifying financial fraud partners based on hypergraph segmentation as defined in claim 1, wherein: considering the traditional graph of different paired nodes, the hypergraph concept of one-side multi-node is proposed.
CN202110058766.5A 2021-01-16 2021-01-16 Financial fraud group identification method based on hypergraph segmentation Active CN112667863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110058766.5A CN112667863B (en) 2021-01-16 2021-01-16 Financial fraud group identification method based on hypergraph segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110058766.5A CN112667863B (en) 2021-01-16 2021-01-16 Financial fraud group identification method based on hypergraph segmentation

Publications (2)

Publication Number Publication Date
CN112667863A CN112667863A (en) 2021-04-16
CN112667863B true CN112667863B (en) 2024-02-02

Family

ID=75415399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110058766.5A Active CN112667863B (en) 2021-01-16 2021-01-16 Financial fraud group identification method based on hypergraph segmentation

Country Status (1)

Country Link
CN (1) CN112667863B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840545A (en) * 2018-12-26 2019-06-04 江苏理工学院 A kind of robustness structure Non-negative Matrix Factorization clustering method based on figure regularization
CN111861756A (en) * 2020-08-05 2020-10-30 哈尔滨工业大学(威海) Group partner detection method based on financial transaction network and implementation device thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840545A (en) * 2018-12-26 2019-06-04 江苏理工学院 A kind of robustness structure Non-negative Matrix Factorization clustering method based on figure regularization
CN111861756A (en) * 2020-08-05 2020-10-30 哈尔滨工业大学(威海) Group partner detection method based on financial transaction network and implementation device thereof

Also Published As

Publication number Publication date
CN112667863A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN108681936B (en) Fraud group identification method based on modularity and balanced label propagation
CN109918511B (en) BFS and LPA based knowledge graph anti-fraud feature extraction method
CN106780639B (en) Hash coding method based on significance characteristic sparse embedding and extreme learning machine
CN113869052B (en) AI-based house address matching method, storage medium and equipment
CN113822419A (en) Self-supervision graph representation learning operation method based on structural information
CN112153221A (en) Communication behavior identification method based on social network diagram calculation
Zheng et al. Anomalous telecom customer behavior detection and clustering analysis based on ISP’s operating data
CN112330158A (en) Method for identifying traffic index time sequence based on autoregressive differential moving average-convolution neural network
CN113516501A (en) User communication behavior prediction method and device based on graph neural network
CN117272204A (en) Abnormal data detection method, device, storage medium and electronic equipment
CN117478390A (en) Network intrusion detection method based on improved density peak clustering algorithm
CN112667863B (en) Financial fraud group identification method based on hypergraph segmentation
CN114359632A (en) Point cloud target classification method based on improved PointNet + + neural network
CN115115369A (en) Data processing method, device, equipment and storage medium
CN113378842A (en) Recommendation method based on segmented image feature extraction
CN111291625A (en) Friend recommendation method and system based on face retrieval
CN114897097A (en) Power consumer portrait method, device, equipment and medium
CN117194966A (en) Training method and related device for object classification model
CN112445939A (en) Social network group discovery system, method and storage medium
CN116738201B (en) Illegal account identification method based on graph comparison learning
CN118297640B (en) Product marketing management system and method based on big data
Fu et al. A Near-Duplicate Video Cleaning Method Based on AFENet Adaptive Clustering
CN116405330B (en) Network abnormal traffic identification method, device and equipment based on transfer learning
CN117972458A (en) Customer label calculation method based on stream batch integration
CN115967683A (en) Unknown protocol classification method based on Canopy-FCM weighted attribute

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant