CN114742564A

CN114742564A - False reviewer group detection method fusing complex relationships

Info

Publication number: CN114742564A
Application number: CN202210449853.8A
Authority: CN
Inventors: 于硕; 李世豪; 雷启航; 夏锋
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-07-12

Abstract

The invention belongs to the field of artificial intelligence, and provides a false comment group detection method fusing complex relationships, which is used for false comment group detection on an online trading platform. The method comprises three stages of node representation updating, model training and false comment group detection. The method applies the trained model to a real data set, can identify the false reviewers, and can well distinguish the false reviewer group from the normal reviewers. The method is based on the complex relation characteristics of the nodes, makes full use of valuable relation information among reviewers, integrates the embedding process and the clustering detection process to obtain a false reviewer group detection model taking a target as a guide, and can overcome the problems of poor universality, low detection effect and the like of the conventional group detection method.

Description

False reviewer group detection method fusing complex relationships

Technical Field

The invention relates to the field of artificial intelligence, in particular to a false reviewer group detection method fusing complex relationships.

Background

The rapid popularity of the online comment system enables comments to become an important basis for people to buy commodities, more and more people can check the comments on a platform before buying the commodities and can also make evaluations on the commodities after buying the commodities. These reviews may provide useful information and first-hand merchandise experience to the customer, and thus the quality of online reviews is particularly important, and false reviews that do not conform to the fact of the merchandise may affect the reputation of the merchandise and may also obscure the line of sight of the buyer.

Most of the existing false comment detection technologies are realized by a big data and artificial intelligence method, the traditional detection technology utilizes manually generated features to classify reviewers, and relationship features between users are captured based on behavior features, language features in comments and construction graphs. In the past, researchers mainly focus on detecting individual false reviewers, however, a false review group often causes more harm to an online review system, and difficulty exists in finding out the false reviewers of the group: the false comments in the group may be normal individual comments, and the previous individual false comment detection techniques are difficult to work. In addition, relationships between false reviewers are difficult to establish, and such complex relationships can enable the model to grab connections between reviewers within a group, thereby assisting in false reviewer group detection.

Current false comment population detection methods can be categorized into the following categories:

a detection method based on a clustering algorithm. The detection algorithm based on the clustering algorithm generally uses algorithms such as a graph neural network to learn node embedding expression, then the nodes are clustered through the clustering algorithm, and finally a false comment group is detected through the detection method. Common clustering algorithms are, for example, the partition-based clustering algorithm KMeans, the density-based clustering algorithm DBSCAN.

(1) The KMeans clustering algorithm mainly divides all points in a sample space into K groups, similarity is usually measured by using euclidean distance, and the main flow of the algorithm is as follows: k centroids are randomly placed, one centroid being present in each cluster. The distance of each point to the centroid is calculated and each data point is assigned to its nearest centroid, forming a cluster. In an iterative process, the position of the centroid K is recalculated.

(2) The DBSCAN clustering algorithm first determines the type of each point, each of which in the data set may be a core point or a boundary point. A data point is a core point if at least M points in its neighborhood are within a specified radius R, and a boundary point if less than M data points in its neighborhood, or it can be reached from a core point, i.e. it is within R distance from the core point. The core points that are neighbors will be connected and placed in the same cluster and boundary points will be assigned to each cluster.

Graph-based detection methods. Starting from the subgraph, judging the doubtful character of the group by using the node or the attribute of the subgraph, thereby realizing the whole detection process. Some methods aggregate relationships from differences in graph topology, time, and scores, using joint probabilities to detect false reviewer populations. The method ignores the structural characteristics of the nodes and does not consider the complex relation among the nodes. Still other methods address several main characteristics of the population, such as synchronicity, mildness and dispersion, and detect population abnormalities by calculating certain indices. The method is lack of universality in practical application, specific indexes need to be provided for different networks or data sets to well complete the detection task of the false comment group, and if the method is popularized, the detection precision is greatly reduced. In addition, such methods only consider features within the population, and still lack consideration of complex relationships between reviewers.

Disclosure of Invention

In the existing false comment group detection method, an embedding process and a subsequent clustering and detection process are separated, a training process lacks target guidance, and if a result representing learning is not suitable for detection, the obtained false comment group detection result is poor. In addition, complex relationships in the comment network are ignored, and valuable relationship information among the reviewers in the group cannot be utilized.

Aiming at the problems in the prior art, the invention provides a false comment crowd detection method fusing complex relationships, which is used for false comment crowd detection on an online selling platform, wherein a target is used as guidance, complex relationship characteristics of nodes are based, complex relationship representation of the nodes is learned by utilizing the characteristics, and topological information of a picture is reconstructed by using an automatic encoder; in order to integrate the embedding process and the clustering and detecting process, the method adopts a self-supervision training model and guides the optimization of the model by using the clustering and detecting results.

In order to achieve the purpose, the invention adopts the technical scheme that: a false reviewer group detection method fusing complex relationships is characterized and updated by using a graph neural network based on an attention mechanism to comment nodes in a comment network; performing model training on the reconfiguration loss and the self-supervision distribution loss of the design drawing, obtaining an optimal model, and applying the optimal model to a group of false reviewers to detect and identify the group of the false reviewers in the comment network; the method comprises the following specific steps:

firstly, updating a node representation to obtain a reconstructed image; the model extracts an adjacency matrix and an attribute matrix of the comment network, and obtains a complex relation matrix according to the adjacency matrix. After the complex relation matrix is obtained, the attention encoder fuses the complex relation with the message transmission process, effectively encodes the high-order structure information and the node attribute information of the network, and then updates the node representation. A graph neural network based on an attention mechanism is used as an encoder; taking the initial characteristics of the nodes as the initial embedding of the nodes, and fusing the complex relationship of the nodes on a graph neural network based on an attention mechanism to ensure that the node characterization expresses high-order structural characteristics and attribute characteristics at the same time;

1.1) calculating node similarity; in order to simplify the calculation and reduce the model parameters, the node is limited in the first-order neighbor node of the central node, and the calculation formula is as follows:

c_ij＝a(Wh_i，Wh_j)#(1)

in the formula, c_ijRepresenting the importance of the node j to the node i, and W represents a weight matrix; h is_iAnd h_jRespectively representing the feature vectors of the node i and the node j; a represents a function of computing node similarity；

1.2) calculating a complex relation matrix; the comment network has complex structural relationships, and the complex relationships among the nodes of the comment network contain valuable information. Obtaining a complex relation matrix of the node by considering a high-order neighbor node of the node:

M＝(B+B²+…+B^t)/t# (2)

where B is the transition matrix, B is the transition matrix when an edge exists between node i and node j_ij＝1/d_iWherein d is_iDegree of a node; when there is no edge between node i and node j, B _ij0; matrix M represents a complex relationship matrix, M_ijThe complex relation of the node i and the node j under the order t is obtained;

1.3) fusing complex relationships; the method comprises the steps that a single-layer feedforward neural network is used as a calculation mode, a complex relation matrix M is fused with a graph neural network based on an attention mechanism, specifically, the complex relation matrix is multiplied with node similarity, and when the similarity between nodes is calculated, not only the similarity between node representations but also the influence of the complex relation between the nodes on the similarity are considered; LeakyReLU is selected as an activation function to increase model non-linearity factors, so that the feature expression capability of the model is enhanced. After the complex relationships are fused, the importance expression of the node j to the node i is rewritten as follows:

1.4) updating the node representation; the softmax function is used for carrying out normalization processing on the importance of the neighbor nodes, so that the importance of the first-order neighbor nodes to the central node is distributed between [0 and 1], and the characteristics of the neighbor nodes are aggregated to update the node representation;

in the formula (4), α_ijRepresenting the normalized attention coefficient; n is a radical of_iA first-order neighbor set representing a node i;

in the formula (5), the reaction mixture is,

a representation on level i of a neighbor node j to node i,

represents the representation of node i on the l +1 th; the final representation of the node is obtained by multilayer aggregation;

secondly, training a model; the model first uses the topology information reconstructed from the encoder to calculate the loss, which is the first partial loss, by calculating the difference between the original and the reconstructed adjacency matrix. The second part of loss is obtained by a self-supervision training mode, the model determines core points in the comment network by using a DBSCAN clustering algorithm, the distances between all nodes and the core points are calculated, and KL divergence is used as the loss of the second part of loss. The final loss function is composed of the two loss functions and is used for jointly training the model. And after loss is calculated, updating model parameters by using a gradient descent algorithm, and finishing training.

Designing a graph reconstruction loss function and an automatic supervision distribution loss function, updating graph neural network model parameters based on an attention mechanism, and completing training, wherein the method comprises the following specific steps:

2.1) calculating a graph reconstruction loss function; calculating the difference between the adjacent matrixes according to the topological information of the reconstructed graph of the encoder to obtain the reconstruction loss of the reconstructed graph and the original graph; the formula is as follows:

in the formula (I), the compound is shown in the specification,

is a contiguous matrix; h is an updated node characterization matrix; σ is an activation function;

in the training process, cross entropy is adopted as a loss function:

where y represents the value of an element in the adjacency matrix,

representing the corresponding elements in the reconstructed adjacency matrix. This part of the training requires minimizing the reconstruction loss, which is defined as follows:

2.2) calculating an automatic supervision distribution loss function; one of the challenges of the false comment detection method is the training of no label-guided models; the model adopts a self-supervision training mode and adopts the embedded expression of pseudo-labeled optimization nodes; clustering nodes by adopting a clustering algorithm, and clustering by adopting a K-Means algorithm in the model:

in the formula, mu_iIs S_iThe mean value of all nodes in the cluster, k is the number of sets to be clustered.

After all the false comment groups are obtained, determining core points in the comment network by adopting a DBSCAN clustering algorithm, and calculating the distance distribution between each node and the core points;

during training, the distribution of data needs to be continuously learned to distinguish normal nodes from abnormal nodes, p_iuRepresenting pseudo-labels, q, calculated by the model_iuThe distance distribution between the features of all nodes and the core points detected by DBSCAN is represented. q. q.s_iuIs defined as follows:

in the formula u_uA characterization representing core points detected by the DBSCAN; z_iA representation representing a current processing node; u. of_kRepresenting a characterization of the core points of the kth class. The formula calculates the distance between the characterization of the node and the characterization of the core point, and if the distance between the node and the core point is close enough, the node can be considered to belong to the group and is considered to be a normal node. Assuming that a node is far away from the core point, the node can be regarded as an outlier, i.e., a corresponding group of false comments. The node label can be obtained by the following formula:

S_i＝argmax·q_iu#(11)

using the KL divergence as a loss function to measure the difference between the distance distribution between the node and the core point and the pseudo label thereof;

the KL divergence mainly measures the difference between the probability distribution Q and the reference probability distribution P. Unlike the label obtained in equation (11), the target distribution p_iuConsidered as a true label, is calculated by Q in the training process, p_iuThe P distribution is relied on and updated according to the phase, and the P distribution is regarded as an automatic supervision label in the phase. The main function of the target distribution is to supervise the learning of the model and guide the updating of the distribution Q. The formula for P is as follows:

in the formula, q_ikRepresenting the distance distribution between the features of all nodes and the core point of the kth class. The loss function for the self-supervised optimization embedding is as follows:

2.3) calculating a joint loss function; the joint loss function expression is:

L＝·L_r+βL_c#(14)

in the formula, L_rReconstruction of the loss function for the graph, L_cIs an auto-supervised distributed loss function, the weight between two loss functions;

2.4) model training, setting initial parameters of the graph neural network model based on the attention mechanism, and iterating the training process based on the joint loss function to obtain the optimal parameters of the graph neural network model based on the attention mechanism;

thirdly, detecting a false comment group; and detecting the real comment network by adopting the attention-based graph neural network model obtained in the second step, and storing the detection result.

The graph reconstruction loss function adopts a cross entropy loss function; the clustering algorithm for clustering the nodes adopts a KMeans clustering algorithm.

The specific method for the model training in 2.4) is as follows:

setting initial parameters of the graph neural network model based on the attention mechanism, wherein the initial parameters comprise the number of aggregation layers, node embedding dimensions, the number of clustering of a KMeans clustering algorithm, training iteration times and the like of the graph neural network model based on the attention mechanism;

continuously adjusting parameters in the training process of the model, and determining optimal parameters according to the descending condition of the joint loss function in the training process or the final detection result of the model;

the method specifically comprises the following steps: inputting the comment network and the adjacency matrix of the network into a model, operating and training the model, recording the detection performance of the model after the training, repeatedly training for many times under the same set of hyper-parameters, and taking the average value of the detection precision as the final result detection precision; after model training under a group of parameters is completed, parameters in the model are adjusted according to a control variable method, one parameter of the model is adjusted according to the direction of increasing the average precision, and other parameters are kept unchanged; and repeatedly adjusting parameters, reserving a group of parameter settings for enabling the average discrimination precision of the model to reach the highest, and finishing the model training.

The invention has the beneficial effects that: the method can identify the false reviewers and can well distinguish the false reviewer group from the normal reviewers. The method is based on the complex relation characteristics of the nodes, makes full use of valuable relation information among reviewers, integrates the embedding process and the clustering detection process to obtain a false reviewer group detection model taking a target as a guide, and can overcome the problems of poor universality, low detection effect and the like of the conventional group detection method.

Drawings

FIG. 1 is a basic framework diagram of the present invention;

FIG. 2 is a flow chart of the present invention;

FIG. 3 is a graph of recall rate changes during training in accordance with an embodiment of the present invention;

FIG. 4 is a graph of the variation of the loss function during training according to an embodiment of the present invention;

FIG. 5 is a visualization diagram of the population detection result according to an embodiment of the present invention.

Detailed Description

The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.

A false reviewer group detection method fusing complex relationships comprises three stages: updating the node representation; training a model; false comment population detection.

In the first step, the node representation is updated. In the stage, the graph neural network based on the attention mechanism is used as an encoder, the initial embedding of the nodes is represented by the initial characteristics of the nodes, and the complex relationships of the nodes are fused on the graph neural network based on the attention mechanism, so that the node characterization has the capacity of expressing high-order structural characteristics and attribute characteristics.

1.1) calculating node similarity. In order to simplify the calculation and reduce the model parameters, the node is limited in a one-hop neighbor of the central node, and the calculation formula is as follows:

c_ij＝a(Wh_i，Wh_j)#(1)

in the formula, c_ijRepresenting the importance of the node j to the node i, and W represents a weight matrix; (ii) a h is_iAnd h_jRespectively representing the feature vectors of the node i and the node j; a represents a function for computing node similarity;

1.2) calculating a complex relation matrix. The comment network has complex structural relationships, and the complex relationships among the nodes of the comment network contain valuable information. By considering the higher-order neighbors of a node, a complex relationship matrix of the node can be obtained:

M＝(B+B²+…+B^t)/t#(2)

where B is the transition matrix, and if there is an edge between node i and node j, B is_ij＝1/d_iWherein d is_iAs the degree of a node, when there is no edge between node i and node j, B _ij0. Matrix M represents a complex relationship matrix, M_ijIs a complex relationship of node i and node j in order t.

1.3) fusing complex relationships. The method comprises the specific steps that a single-layer feedforward neural network is selected as a calculation mode, a complex relation matrix M is fused with a graph attention network, and the complex relation matrix is multiplied by the similarity between nodes to show that not only the similarity between node representations but also the influence of the complex relation between the nodes on the similarity need to be considered when the similarity between the nodes is calculated. Finally, LeakyReLU is selected as an activation function to increase model nonlinear factors, so that the feature expression capability of the model is enhanced. After the complex relationship is fused, the importance expression of the node j to the node i is rewritten as follows:

1.4) updating the node representation. In order to enable the importance of the neighbor nodes to the central node to be distributed between [0 and 1], the importance of the neighbor nodes is normalized by utilizing a softmax function, and the characteristics of the neighbor nodes are aggregated to update the node representation.

In the formula (4), α_ijRepresents the normalized attention coefficient, N_iA first-order neighbor set representing a node i; in the formula (5), the reaction mixture is,

a representation on level i of a neighbor node j to node i,

representing the representation of node i on the l +1 th. The final characterization of the nodes is obtained by multilayer aggregation.

And secondly, training a model. Firstly, a loss function is designed, and after loss is calculated by using the designed loss function, model parameters are updated so as to complete training. The model first reconstructs the original network using a decoder to calculate the adjacency matrix difference loss of the original network and the reconstructed network. Because nodes in the detection task of the false reviewer group have no labels, the embedding is optimized by adopting a self-supervision training mode, a core point in the review network is generated by utilizing a DBSCAN clustering algorithm, the distance between the core point and other nodes is measured by using KL divergence, and then the difference between the false mark and the learned embedding distribution is calculated. After the loss calculation is completed, the gradient descent algorithm is used to update the model parameters, completing the training.

2.1) calculate graph reconstruction loss. Reconstructing an original graph by adopting an inner product mode, wherein a reconstruction formula is as follows:

in the formula, H is the embedded vector of the learned node,

for reconstructing the patterned adjacency matrix, for enabling the reconstructed adjacency matrix

As similar as possible to the input adjacency matrix. In the training process, cross entropy is adopted as lossFunction:

where y represents the value of an element in the adjacency matrix,

2.2) calculating the distribution loss. One of the challenges of the false comment detection method is the training of the label-free guidance model. The model adopts a self-supervision mode and uses pseudo marks to optimize node embedded representation. Because the nodes in the graph are independent, all the nodes are clustered firstly in the training process, and the model adopts a K-Means algorithm to cluster:

in the formula, mu_iIs S_iThe mean value of all nodes in the cluster, k is the number of sets to be clustered. After all the false comment groups are obtained, the DBSCAN algorithm is adopted to detect abnormal groups. The DBSCAN algorithm firstly distinguishes core points and boundary points in the graph, takes the detected core points as the core points in the training model, and calculates the distance between the characterization of other nodes and the characterization of the core points. During the training process, the distribution of data needs to be continuously learned to distinguish normal nodes from abnormal nodes, p_iuRepresenting pseudo-labels, q, calculated by the model_iuRepresents the distance distribution between the features of all nodes and the core points detected by the DBSCAN. q. q.s_iuIs defined as follows:

in the formula u_uRepresenting a characterization of the core points detected by DBSCAN. The formula calculates the distance between the representation of the node and the representation of the core point, and if the distance between the node and the core point is close enough, the node can be considered to belong to the group and be considered as a normal node. Assuming that a node is far away from the core point, the node can be regarded as an outlier, i.e., a corresponding group of false comments. The node label can be obtained by the following formula:

S_i＝argmax·q_iu#(11)

the model adopts KL divergence to measure the difference between the pseudo-marker and the learned distribution, and the KL divergence mainly measures the difference between the probability distribution Q and the reference probability distribution P. Unlike the label obtained in equation (11), the target distribution p_iuConsidered as a true label, is calculated by Q in the training process, p_iuThe P distribution is relied on and updated according to the phase, and the P distribution is regarded as an automatic supervision label in the phase. The main function of the target distribution is to supervise the learning of the model and guide the updating of the distribution Q. The formula for P is as follows:

the loss function for the self-supervised optimization embedding is as follows:

2.3) calculating a joint loss function. The overall loss function of the model consists of a graph reconstruction loss function and an automatic supervision distribution loss function, and the final loss function expression is as follows:

L＝·L_r+βL_c#(14)

in the formula, L_rTo reconstruct the loss, L_cTo distribute the losses, β is used to control the weight between the two losses.

2.4) model training. The training of the model is carried out according to the following steps: setting initial hyper-parameters including the aggregation layer number of the graph attention network, the node embedding dimension, the clustering number of the KMeans clustering algorithm, the training iteration number and the like.

In the training process of the model, the hyper-parameters need to be adjusted manually, so that the detection effect of the model is optimal. Generally speaking, the hyper-parameters need to be determined according to the drop condition of the loss function in the training process or the final detection result of the model. After the hyper-parameters are set, inputting information such as a comment network and an adjacent matrix of the network into the model, operating the model, waiting for the model training to be finished, recording the detection performance of the model after the training, repeating the process for a plurality of times under the same group of hyper-parameters, and taking the average value of the detection precision as the final result detection precision. After model training under a group of hyper-parameters is completed, the hyper-parameters in the model are adjusted according to a control variable method, one hyper-parameter of the model is adjusted according to the direction of increasing the average precision, and other parameters are kept unchanged. And repeating the adjustment process of the hyper-parameters, reserving a group of hyper-parameter settings for enabling the average discrimination precision of the model to reach the highest, and finishing the model training.

And thirdly, detecting a false comment group. And detecting the real comment network by using the model trained in the last step and the hyper-parameter, and storing the detection result of the model on the comment network.

Table 1 the algorithm runs

In conjunction with the protocol of the present invention, the experimental analysis was performed as follows:

the invention verifies the detection effect of the false comment population on an Amazon data set processed by a researcher, and the basic situation of the data set is shown in Table 2. The relationship type U-P-U in the table represents that two users have at least commented on one same product. U-S-U represents that two reviewers reviewed the same score within a week. U-V-U represents that two reviewers have similar comments. The experiment was performed on four datasets, corresponding to the above three relationships, and one dataset consisting of the three relationships, the four datasets being Amazon _ p, Amazon _ s, Amazon _ v, and Amazon datasets, respectively.

TABLE 2 basic cases of false comment data sets used in the experiment

The experimental analysis process of the false reviewer group detection method fusing complex relationships can be divided into two parts: comparing the method with the existing false comment group detection method, and verifying the superiority of the method by taking the recall rate as an evaluation index; and performing a visual experiment on the training process and the detection result, thereby more intuitively analyzing the rationality of model design and the effectiveness of the detection effect.

(1) Test result comparison experiment

Several false comment population detection methods that researchers have proposed are compared with the present method, wherein Graph-developer uses a Graph-based approach to find the target item and on this basis detects a group of false reviewers, and the population detection problem is solved by a 2-hop diagram. Collueage uses a Markov random field to detect colluded false reviewers and false comment activity. The method comprises the steps that the DeFrauder detects candidate fraud groups by utilizing a product review graph and combining behavior signals, maps the candidate fraud groups into an embedding space, assigns scores to each group, and finally determines a false reviewer group according to the scores. Besides the comparison method, in order to verify the effectiveness of the modules in the method, two decoupling detection methods are additionally added in the experiment: GCN + KMeans + DBSCAN and GAT + KMeans + DBSCAN. The first method is to embed the initial data set by GCN, the second method is to embed the initial data set by GAT, and after the embedding is obtained, the embedding results are detected by both KMeans clustering method and DBSCAN method.

The experimental results of the present method and the comparative method are shown in table 3. Through the longitudinal comparison of experimental results, the performance of the method is obviously superior to that of other methods, and the detection effect is greatly improved. The results of GAT + KMeans + DBSCAN are superior to GCN + KMeans + DBSCAN, demonstrating the effectiveness of using GAT as a graph encoder. Compared with GCN, GAT can aggregate neighbor characteristics according to the similarity between a central node and neighbor nodes, so that a large amount of information of false nodes cannot be aggregated in a characterization result in a normal node. Through transverse comparison of experimental results, it can be seen that under the condition that three different relationships and all relationships are considered, the method obtains the optimal result, which shows that the KMeans clustering algorithm is fused in the deep learning model, and the core points are continuously updated in an iterative manner in the training process, so that a more accurate detection result can be obtained.

TABLE 2 test results

(2) Visual experiment of training process and detection result

The visualization experiment aims to express the reasonability of the design of the method by analyzing the loss and the change of the result recall rate in the training process, and the effectiveness of the detection result is visually expressed by the means of visualization of the detection result.

Fig. 3 shows the change of the recall rate in the training process, and the overall situation of the graph shows that the recall rate of the detection result is continuously improved along with the training, and the rationality of the model design is verified.

The variation of the loss function during training is shown in fig. 4. Analysis is performed by combining fig. 3 and fig. 4, and as the loss function is continuously reduced, the recall rate is continuously improved, which indicates that the obtained representation learning result can be also suitable for the detection of the false reviewer group while the representation learning result is continuously updated by the method. In a reverse way, the loss function designed by the method can well feed back the loss to the model and supervise the learning of the model, and the problem that the representation learning result is possibly not suitable for the detection method is solved.

FIG. 5 shows the clustering result of the model on the Amazon data set, and it can be seen that the method has a good effect on the detection problem of the false reviewer population. Wherein the black entities represent the false comment population, mainly concentrated on the lower left, and the gray entities represent the normal comment nodes, mainly concentrated on the upper right.

The above embodiments only express the embodiments of the present invention, but not should be understood as the limitation of the scope of the invention patent, it should be noted that, for those skilled in the art, many variations and modifications can be made without departing from the concept of the present invention, and these all fall into the protection scope of the present invention.

Claims

1. A false reviewer group detection method fusing complex relationships is characterized in that the false reviewer group detection method fusing complex relationships uses a graph neural network based on an attention mechanism to perform representation updating on comment nodes in a comment network; performing model training on the reconfiguration loss and the self-supervision distribution loss of the design drawing, obtaining an optimal model, and applying the optimal model to a group of false reviewers to detect and identify the group of the false reviewers in the comment network; the method comprises the following specific steps:

firstly, updating node representation to obtain a reconstructed graph; a graph neural network based on an attention mechanism is used as an encoder; taking the initial characteristics of the nodes as the initial embedding of the nodes, and fusing the complex relationship of the nodes on a graph neural network based on an attention mechanism to ensure that the node characterization expresses high-order structural characteristics and attribute characteristics at the same time;

1.1) calculating node similarity; and limiting the node in a first-order neighbor node of the central node, wherein the calculation formula is as follows:

c_ij＝a(Wh_i，Wh_j)#(1)

in the formula, c_ijRepresenting the importance of the node j to the node i, and W represents a weight matrix; h is_iAnd h_jRespectively representing the feature vectors of the node i and the node j; a represents a function for computing node similarity;

1.2) calculating a complex relation matrix; obtaining a complex relation matrix of the node by considering a high-order neighbor node of the node:

M＝(B+B²+…+B^t)/t# (2)

where B is a transition matrix, and B is the transition matrix when an edge exists between node i and node j_ij＝1/d_iWherein d is_iDegree of a node; when there is no edge between node i and node j, B_ij0; matrix M represents a complex relationship matrix, M_ijThe complex relation of the node i and the node j under the order t is obtained;

1.3) fusing complex relationships; fusing a complex relation matrix M and a graph neural network based on an attention mechanism by taking a single-layer feedforward neural network as a calculation mode, specifically multiplying the complex relation matrix by the similarity of nodes; and selecting LeakyReLU as an activation function, fusing the complex relationship, and rewriting the importance expression of the node j to the node i into:

in the formula (5), the reaction mixture is,

a representation on level i of a neighbor node j to node i,

representing the representation of node i on the l +1 th; the final representation of the node is obtained by multilayer aggregation;

secondly, training a model; designing a graph reconstruction loss function and an automatic supervision distribution loss function, updating graph neural network model parameters based on an attention mechanism, and completing training, wherein the method specifically comprises the following steps:

2.1) calculating a graph reconstruction loss function; calculating the difference between adjacent matrixes according to the topological information of the reconstructed image of the encoder to obtain the reconstruction loss of the reconstructed image and the original image; the formula is as follows:

in the formula (I), the compound is shown in the specification,

2.2) calculating an automatic supervision distribution loss function; adopting a self-supervision training mode and adopting a pseudo-label optimization node embedded representation; clustering the nodes by adopting a clustering algorithm, determining core points in the comment network by adopting a DBSCAN clustering algorithm, and calculating the distance distribution between each node and the core points; using the KL divergence as a loss function to measure the difference between the distance distribution between the node and the core point and the pseudo label thereof;

2.3) calculating a joint loss function; the joint loss function expression is:

L＝·L_r+βL_c# (7)

2. The method for detecting the false reviewer population fusing the complex relationships according to claim 1, wherein the graph reconstruction loss function adopts a cross entropy loss function; the clustering algorithm for clustering the nodes adopts a KMeans clustering algorithm.

3. The method for detecting the false reviewer population fusing the complex relationship according to claim 2, wherein the model training in 2.4) is as follows:

setting initial parameters of the graph neural network model based on the attention mechanism, wherein the initial parameters comprise the number of aggregation layers, node embedding dimensions, the number of clustering of a KMeans clustering algorithm and training iteration times of the graph neural network model based on the attention mechanism;

continuously adjusting parameters in the training process of the model, and determining optimal parameters according to the descending condition of the joint loss function or the final detection result of the model in the training process;