CN116862667A

CN116862667A - Fraud detection and credit assessment method based on comparison learning and graph neural decoupling

Info

Publication number: CN116862667A
Application number: CN202311033556.6A
Authority: CN
Inventors: 陈刚; 马治国
Original assignee: Hangzhou Spin Technology Co ltd
Current assignee: Hangzhou Spin Technology Co ltd
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2023-10-10

Abstract

The invention discloses a fraud detection and credit assessment method based on comparison learning and decoupling graph nerves. Step 1, establishing a multi-relation diagram; step 2, constructing a fraud detection network FTNet-1; step 2.1, calculating contrast similarity; step 2.2, performing contrast learning; step 2.3, sampling the consistency neighbors; step 2.4, enhancing multi-relation aggregation; step 3, constructing a credit evaluation network FTNet-2; step 3.1, embedding a specific view angle; step 3.2, cross visual angle fusion; and 4, training the FTNet, wherein the FTNet consists of two subnets, the FTNet-1 is a fraud detection network, and the FTNet-2 is a credit evaluation network. The method can be widely applied to the fields of electronic commerce, finance and the like, provides a new technical solution for protecting transaction safety, improving user trust and promoting economic development, and has important application value and social significance.

Description

Fraud detection and credit assessment method based on comparison learning and graph neural decoupling

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a fraud detection and credit assessment method based on contrast learning and graph neural decoupling.

Background

With the rapid development of electronic commerce and online transactions, the economic and social impact of fraud is becoming increasingly significant. Conventional fraud detection and credit assessment methods rely mainly on rules, statistics and conventional machine learning techniques, but these methods have certain limitations in coping with complex fraud practices and large-scale data processing. Therefore, developing an efficient and accurate fraud detection and credit assessment method is of great importance for protecting transaction security and improving user trust.

Based on contrast learning and decoupling graph neural networks, we propose a new fraud detection and credit assessment method. The method automatically learns and extracts advanced features and patterns from large-scale data regarding fraud and user credit. By constructing a proper deep neural network model and combining the technologies of contrast learning and decoupling graph neural network, the method can realize accurate prediction and evaluation of fraudulent behavior and user credit.

Compared with the traditional method, the method has the following advantages: firstly, by utilizing a deep learning technology, features can be automatically learned and extracted from large-scale data, so that dependence of artificial feature engineering is reduced; and secondly, by introducing a contrast learning and decoupling graph neural network, the complex relationship between the fraudulent behavior and the credit of the user can be better captured, and the accuracy and the robustness of fraud detection and credit evaluation are improved. The method can be widely applied to the fields of electronic commerce, finance and the like, provides a new technical solution for protecting transaction safety, improving user trust and promoting economic development, and has important application value and social significance.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a fraud detection and credit assessment method based on comparison learning and graph neural decoupling. Fraud detection and credit assessment are implemented. The invention provides a deep learning model named FTNet (Fraud Trust Network), which comprises the following specific steps:

step 1, establishing a multi-relation diagram;

step 2, constructing a fraud detection network FTNet-1;

step 3, constructing a credit evaluation network FTNet-2;

and 4, training the FTNet, wherein the FTNet consists of two subnets, the FTNet-1 is a fraud detection network, and the FTNet-2 is a credit evaluation network.

The step 1 specifically comprises the following steps:

step 1.1, constructing a multi-relation diagram.

We define a multiple relationship graph wherein /> Is the set of all nodes in the graph. />Is the set of features of all nodes in the graph, and +.>(d is the dimension of the node feature). Epsilon _r Representing all edges under the relationship R, where r.epsilon. {1, R.Is an adjacency matrix corresponding to R relations. />Is all node set +.>The labels of all nodes in the network.

For a given graphComprising fraud node label set->y ^L Is a labeled node set, y ^U Is a set of unlabeled nodes. Only two values are included for any node, 0 or 1, respectively. 0 represents a non-rogue node and 1 represents a rogue node. In addition to fraud labels, given diagram->Credit evaluation tag set also containing nodesp ^L Is a labeled node set, p ^U Is a set of unlabeled nodes. Wherein the credit rating is a discrete value between 0 and 1, representing a credit rating, 0 being the lowest credit and 1 being the highest credit.

The step 2 specifically comprises the following steps:

step 2.1, calculating the contrast similarity.

In the field of fraud detection, the fraud detection network framework allows us to measure the similarity of nodes and front prototypes using a method of supervised contrast learning. For the relation epsilon _r The central node v of the lower layer, we first projects it into the potential space.

wherein Is the relation epsilon _r The potential nodes of the lower layer l-1 are embedded. />Is a learnable weight parameter. Sigma is an activation function, < >>Is a contrast insert.

Then calculate the node v and the relation epsilon _r Front prototype of lower layerSimilarity between them, the similarity is taken as the comparison similarity +.>

wherein Is a distance measurement function that can be taken as a learnable parameter for a front prototype in potential space. Here we use the cosine distance.

This comparative similarity is sufficient to gauge the likelihood of rogue node embedment. In the subsequent contrast learning process, the contrast embedding of the rogue node will be close to that of the front prototype.

And 2.2, performing contrast learning.

The comparative similarity calculation is learnable because there are learnable parameters and />In order to save storage resources and calculation resources, prototype supervision comparison loss compares each sample with all other types of prototypes, and is more suitable for long-tail distributed data. In the dichotomous learning we combine prototypes with supervised contrast loss. In relation epsilon _r The following is calculated:

where τ is the super-parameter of the device, and />Is l ₂ And (5) standardization. Note that in the actual training process, a small batch training technique is used due to the large size of the graph. Because nodes will be randomly assigned to any one batch in each epoch, each hub node is likely to enter with any other nodeRow contrast. To update parameters in all relationships simultaneously, the overall contrast learning loss function can be expressed as:

the contrast loss function takes all active prototypes and active examples as product pairs. While the active prototype and all negative examples are the cancellation pair. The purpose of the loss function is to have all active pairs closer in potential space and all passive pairs farther away. Parameters (parameters) and />Directly by loss function->Updating.

And 2.3, performing consistent neighbor sampling.

After we obtain the contrast similarity between the center node and its neighbor nodes. For a positive prototype under the same relationship, we will sample the consistent neighbors. Fraudsters are very adept at disguising relationships, such as interfacing with benign entities, resulting in inconsistencies in the context information. In order to solve the problem of inconsistent context information and the problem of feature camouflage, a consistent neighbor sampling method is provided, and the method comprises two parts, namely consistency contrast similarity sampling and consistency feature sampling.

Based on our calculation of the comparison similarity, we need to sample the neighbor nodes with the highest similarity from the center node. We define the relation epsilon _r The contrast similarity difference set of the central node of the lower layer and all neighbor nodes u of the central node are

We pick the first k neighbors with the smallest contrast similarity differenceThe sampled neighbor set is noted as a sample

The contrast similarity difference only evaluates the similarity difference between the center node contrast insert and the surrounding node contrast inserts. To active prototypes, which do not measure similarity to each other. Thus, we continue to sample neighbors according to the consistency level of the feature. We define a set of feature similarities,

the first k neighbors with the maximum feature similarity are sampled, and the sampled neighbor set is recorded as

Finally, we compare the concurrency set of similar neighborsSampling is performed and similar neighbors are the consistent neighbors.

Step 2.4 enhances multi-relational aggregation.

And after sampling the neighbors, obtaining a consistent neighbor set, and then aggregating the information of the consistent neighbors under all relations. We use a multi-relationship aggregator to exploit semantic information of the multi-relationship graph and generate discriminant embedding. To better identify characteristics of rogue nodes, we aggregate the comparative embedding of rogue nodes through internal relational features of the previous layer at the same time

wherein ,is the relation epsilon _r An aggregator of the lower layer +.>Representing a stitching operation,/->Is a weight matrix.

After aggregating information in one relationship, we aggregate information from all relationships.

When the aggregation is completed, the embedded features of the last layerIs sent into a layer of MLP to obtain probability vector +.>Meaning the probability of being a fraudster.

We train the model using a cross entropy loss function. At the same time we also increase contrast loss.

wherein For cross entropy loss function, +.>To compare loss functions, lambda ₁ In order to be able to adjust the super-parameters,the total loss function for the FTNet-1 model.

The step 3 specifically comprises the following steps:

step 3.1 specific view embedding.

In the graph fraud detection scenario, traditional messaging along neighboring nodes is not suitable for graph signals, making it more difficult for fraudsters to distinguish. To alleviate the inconsistency problem, we will decouple the graph neural network, separate the topology information and the attribute information, and encode them in parallel.

For a given rogue networkIt can be separated into topology information +.>And attribute information->For this purpose, we have devised encoders f for two kinds of view information _A and f_X . In particular, we use a multi-layer perceptron as an encoder to obtain embedded features Z for a particular view ^A ,/>(where N is the number of nodes and d is the characteristic dimension of the node).

Z ^A ＝f ^A (A)

Z ^X ＝f _X (X)

Step 3.2 cross view blending.

Now we have embedded Z for two specific views ^A and Z^X We then use the attention for cross-view fusion. Attention value omega _i Can be expressed as:

where q represents a learnable attention vector, W is a weight matrix, and b is a bias vector.

Thus, we can get the attention value and />Embedding Z for a particular view ^A and Z^X . They were then normalized by the softmax function to give the final weights.

Note that the larger the weight αi, the more important the corresponding embedding, which is determined by the particular dataset.

The final output embeddings are then combined by embedding of the two specific views, with the corresponding attention weights:

we then put it into a linear classifier while training through the MSE loss function:

where W 'and b' are the weight matrix and bias vector, respectively, of the linear classifier, σ is the softmax activation function,is a set of nodes that are trained.

The step 4 specifically comprises the following steps:

step 4.1 calculates FTNet loss function.

The loss function of FTNet consists of two parts, namely, fraud detection loss of FTNet-1 and credit assessment loss of FTNet-2, as follows:

fraud detection loss of FTNet-1, < ->Is the credit assessment penalty for FTNet-2.

Step 4.2 training FTNet.

The training environment for the network is RTX3090 with 24G memory batch size and epoch size set to 4 and 300, respectively. We use a Poly learning strategy, where the initial learning rate is multiplied after each epoch is over. On the optimizer we use SGD with momentum of 0.9 and weight decay of 0.0001. Finally obtaining the trained neural network.

Drawings

FIG. 1 is a flow chart of a fraud detection and credit assessment method based on contrast learning and decoupling graph nerves.

Detailed Description

The invention will be further described with reference to the drawings and the specific examples.

In order to solve the problems encountered in fraud detection and trust evaluation, the invention designs a fraud detection and trust evaluation network FTNet based on the comparison learning and decoupling graph nerves, which uses fraud detection and trust evaluation in the transaction process. Specifically, for the inputted transaction network information, we first build up a transaction multi-relationship network. And then, the distinction between the fraudulent node and the normal node is measured by using the comparison relation degree and comparison learning, and the information between the node and the node neighbor is converged by using multi-relation aggregation and consistency neighbor sampling. In the credit evaluation process, we use the decoupling graph neural network to alleviate the inconsistency problem of the graph structure, we separate the topology information and the attribute information and code them in parallel.

Example 1 creates a multiple relationship graph.

(1) Defining a graph model

(2) Construction of multiple relationship graphs

Example 2 a fraud detection network FTNet-1 was constructed.

(1) And calculating the contrast similarity.

(2) Contrast learning

(3) Consistent neighbor sampling

(4) Enhanced multiple relational aggregation

Example 3 a credit evaluation network FTNet-2 was constructed.

(1) A particular view is embedded.

(2) Cross-view fusion.

Example 4 FTNet network model was trained.

(1) Calculating FTNet loss function

(2) FTNet is trained.

Claims

1. A fraud detection and credit assessment method based on comparison learning and decoupling graph nerves is characterized by comprising the following steps:

step 1, establishing a multi-relation diagram;

step 2, constructing a fraud detection network FTNet-1;

step 3, constructing a credit evaluation network FTNet-2;

2. The fraud detection and credit assessment method based on contrast learning and decoupling graph nerve according to claim 1, wherein the step 1 is specifically implemented as follows:

we define a multiple relationship graph wherein /> Is the set of all nodes in the graph. />Is the set of features of all nodes in the graph, and +.>(d is the dimension of the node feature). Epsilon _r Representing all edges under the relationship R, where r.epsilon. {1, R. />Is an adjacency matrix corresponding to R relations. />Is all node set +.>The labels of all nodes in the network.

3. A fraud detection and credit assessment method based on contrast learning and decoupling graph nerves according to claim 1, characterized in that said step 2 comprises the steps of:

step 2.1, calculating contrast similarity;

step 2.2, performing contrast learning;

step 2.3, sampling the consistency neighbors;

step 2.4 enhances multi-relational aggregation.

4. A fraud detection and credit assessment method based on contrast learning and decoupling graph nerves according to claim 3, characterized by the step of calculating contrast similarity as follows:

5. A fraud detection and credit assessment method based on contrast learning and decoupling graph nerves according to claim 3, characterized by the steps of contrast learning as follows:

the comparative similarity calculation is learnable because there are learnable parameters and />In order to save storage resources and calculation resources, prototype supervision comparison loss compares each sample with all other types of prototypes, and is more suitable for long tail divisionCloth data. In the dichotomous learning we combine prototypes with supervised contrast loss. In relation epsilon _r The following is calculated:

where τ is the super-parameter of the device, and />Is l ₂ And (5) standardization. Note that in the actual training process, a small batch training technique is used due to the large size of the graph. Because nodes will be randomly assigned to any one batch in each epoch, each central node is likely to be compared to any other node. To update parameters in all relationships simultaneously, the overall contrast learning loss function can be expressed as:

6. A fraud detection and credit assessment method based on contrast learning and de-coupling graph nerves according to claim 3, characterized by the step of performing consistent neighbor sampling as follows:

The first k neighbors with the smallest contrast similarity difference are sampled, and the sampled neighbor set is recorded as

7. A fraud detection and credit assessment method based on contrast learning and de-coupling graph nerves according to claim 3, characterized by the step of performing enhanced multi-relational aggregation as follows:

When the aggregation is completed, the embedded features of the last layerIs sent into a layer of MLP to obtain probability vector P _v Meaning the probability of being a fraudster.

wherein For cross entropy loss function, +.>To compare loss functions, lambda ₁ Is an adjustable superparameter->The total loss function for the FTNet-1 model.

8. The social network credit assessment method based on heterogeneous network embedding according to claim 1, wherein the step 3 comprises the following steps:

step 3.1, embedding a specific view angle;

step 3.2 cross view blending.

9. The fraud detection and credit assessment method based on contrast learning and decoupling graph nerve according to claim 8, characterized by the step of embedding a specific view angle as follows:

Z ^A ＝f _A (A)

Z ^X ＝f _X (X)

10. The fraud detection and credit assessment method based on contrast learning and decoupling graph nerve of claim 8, characterized by the step of cross-view fusion as follows: