CN113807616B

CN113807616B - Information diffusion prediction system based on space-time attention and heterogeneous graph convolution network

Info

Publication number: CN113807616B
Application number: CN202111235561.6A
Authority: CN
Inventors: 刘小洋; 苗琛香; 高绿苑; 张子扬
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2021-10-22
Filing date: 2021-10-22
Publication date: 2022-11-04
Anticipated expiration: 2041-10-22
Also published as: CN113807616A

Abstract

The invention provides an information diffusion prediction system based on space-time attention and a heterogeneous graph convolution network, which comprises a data representation fusion module, an embedded prediction module, a prediction diffusion module and a data optimization module; the data output end of the data representation fusion module is connected with the data input end of the embedded prediction module, the data output end of the embedded prediction module is connected with the data input end of the prediction diffusion module, and the data output end of the prediction diffusion module is connected with the data input end of the data optimization module; the data representation fusion module is used for learning the end user representation of the behavior diagram and the influence diagram structure by using a multi-layer diagram convolution network and fusing the learned user representation; the embedded prediction module is used for predicting information in real time and embedding the time sequence into the heterogeneous graph; and the prediction diffusion module performs information diffusion prediction by using a multi-head attention network mechanism. The invention improves the efficiency of encoding, learning and capturing context dependent information for the user context and effectively improves the information diffusion prediction precision.

Description

Information diffusion prediction system based on space-time attention and heterogeneous graph convolution network

Technical Field

The invention relates to the field of information propagation, in particular to an information diffusion prediction system based on space-time attention and a heterogeneous graph convolution network.

Background

Social networks are now an indispensable part of people's daily lives, and their presence makes communication between people more convenient, and people can more easily publish or transfer certain information. The rapid development of wireless communication technology and internet and the convenience and intellectualization of communication equipment greatly promote information dissemination and information interaction to become faster and more convenient. Online social network predictive dissemination plays a significant role in practical applications.

Information prediction is to study how information is spread among people, and accordingly, the development trend of the next information is judged, people adopt certain measures to promote spreading or inhibit according to the development trend, and the best prediction is realized in the shortest time, which is the optimal state of information prediction. Information diffusion prediction is an important and challenging task that aims to predict future properties or behavior of information cascades, such as predicting the size of the spread or predicting the next infected user. The applications of information diffusion predictive research are also widespread today, including epidemiology, viral marketing, media advertising, and dissemination of news. The information prediction analysis modeling is beneficial to finding out the characteristics and the evolution rule of information propagation in the social network and knowing the evolution trend of information propagation in the social network or virus-type marketing, so that the information is effectively intervened and controlled in real time.

For the research of information diffusion prediction, some scholars mainly use social relationship networks among users, namely social influence to perform information diffusion prediction, and some scholars mainly use the past diffusion behaviors of the users to learn user representation and provide some models related to diffusion paths to perform prediction. The latest interesting topics and attention objects of the user can be seen from the diffusion behavior diagram of the user, people often have greater forwarding possibility for the interesting things, and whether the user is interested in the information can be analyzed by analyzing the diffusion behavior of the user, so that the accuracy of predicting the information is greatly improved. The former diffusion behavior path can reflect the propagation path trend of the information within a certain time, so that the models can well predict the user diffusion sequence to a certain extent, the accuracy of information diffusion propagation is greatly improved according to the tracing of the diffusion sequence, and the information can be well determined to be propagated or inhibited after the propagation trend of the information is known.

In addition to the diffusion prediction of information based on the past diffusion behavior of the user, some scholars also use the influence of the user to predict information. Based on user homogeneity, it is generally easier for similar individuals to have the same hobbies and to take similar actions in the same situation. According to the principle of "things-by-things and people-by-groups", researchers in the same field can easily establish social relationships and participate in the same academic reports, conferences and the like, and have a higher possibility of becoming friends in the same way. According to the influence of users, social relations influence individual characteristics, influence ranges are different according to different influences, therefore, predicted information transmission sequences are different, obtained information transmission trends are different, and according to the assumption, many researches learn the homogeneity and influence among users by using social networks among users to improve the prediction performance. In addition, time has an effect on the diffusion of information, information occurring in the recent past often makes people memorize deeply, information occurring in the long term may be ignored, for example, information in a hot search is more likely to be noticed or forwarded, and after the time period, the hot search slowly fades out of sight of people, so that the message may be obscured many years later, and the influence of the message is slight. Information is time-sensitive, so time is also important to improve information propagation prediction accuracy when information prediction is considered.

Most researchers in the past focused on the traditional relational model, and early research work assumed that a prior diffusion model, such as an independent cascade model or a linear threshold model, existed in the information diffusion process. Although these models fit well the influence relationships between users, noise, partial relationship features are inevitably introduced, and these relationships are often unable to learn complex, deep relationship features. The social network of users in real life all involves complex dependency relationships among instances, the effectiveness of the methods depends on the assumption of an a priori information diffusion model, but the assumption is difficult to verify in practice, so that the accuracy of information prediction is low.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly provides an information diffusion prediction system based on space-time attention and a heterogeneous graph convolution network.

In order to achieve the above object, the present invention provides an information diffusion prediction system based on spatio-temporal attention and heterogeneous graph convolution network, comprising a data representation fusion module, an embedded prediction module, a prediction diffusion module and a data optimization module;

the data output end of the data representation fusion module is connected with the data input end of the embedded prediction module, the data output end of the embedded prediction module is connected with the data input end of the prediction diffusion module, and the data output end of the prediction diffusion module is connected with the data input end of the data optimization module;

the data representation fusion module is used for learning the end user representation of the behavior diagram and the influence diagram structure by using a multi-layer diagram convolution network and fusing the learned user representation;

the embedded prediction module is used for predicting information in real time and embedding the time sequence into the heterogeneous graph;

and the prediction diffusion module performs information diffusion prediction by using a multi-head attention network mechanism.

Further, the mechanism of learning in the data representation fusion module includes:

wherein,

is a user representation of the (n + 1) th layer user attention relationship;

σ (-) is the activation function;

F _A an adjacency matrix representing a relationship of interest in the influence diagram;

X ⁽ⁿ⁾ a user representation representing an nth layer;

is a learnable parameter of the attention relationship of the nth layer of users;

is the user representation of the (n + 1) th layer user forwarding relation;

denotes t _i An adjacency matrix of forwarding relationships at times;

t _i is the time interval of the user's heterogeneous network;

is a learnable parameter of the nth layer user forwarding relationship.

Further, the user representation fusion in the data representation fusion module comprises:

S-A, calculating node v _i The weight between the concern relationship in the influence and the forwarding relationship in the behavior diagram;

and S-B, learning the characteristics of the nodes by adopting an attention network, and performing Hadamard product on the obtained weight matrix and user relation expression to obtain final user expression.

Further, the method of time series embedding in the embedding prediction module comprises:

an approximation strategy or attention mechanism strategy;

the attention mechanism strategy includes:

t'＝mixTogether(t _i ) (6)

wherein t' is the result representation after converting the time interval into time embedding;

mixTogether (. Cndot.) is a function of embedding time intervals;

α _i is a weight coefficient;

softmax (·) is a normalization function;

represents t _i A user representation of a time of day;

k _i is a mask matrix;

v' final user representation;

t represents a total of T times.

Further, the formula of the information diffusion prediction in the prediction diffusion module is as follows:

wherein softmax (·) is a normalization function;

v' represents a diffusion sequence;

· ^T represents a transpose of a matrix;

d _r d/G, d being the dimension of the user-embedded representation, G being the number of heads of multi-head attention;

C _ij is a mask matrix;

m represents the final predicted user representation;

represents a pair b _h To be spliced, b _h Indicates the h-th head attention, b _h ∈[b ₁ ,b ₂ ,...,b _G ]；

It is the parameter that can be learned that,

is expressed as d × d _r Real number of dimensions;

after obtaining the predicted M, calculating the probability of information diffusion by using two layers of fully-connected neural networks as follows:

p＝W'σ(W”M ^T +λ ₁ )+λ ₂ (11)

wherein p represents the probability of information diffusion;

w', W "are learnable parameters,

a real number representing dimension | V | × d;

a real number representing dimension d × d;

d is the dimension of the user-embedded representation;

| V | represents the number of users;

λ ₁ is the first learnable parameter, λ ₂ Is a second learnable parameter, λ ₁ 、λ ₂ Are all constants;

· ^T representing a matrix transposition.

The data input end of the embedded prediction module is connected with the data input end of the prediction diffusion module, and the data output end of the prediction diffusion module is connected with the data input end of the data optimization module;

the prediction diffusion module carries out information diffusion prediction by utilizing a multi-head attention network mechanism;

the data optimization module is used for optimizing the prediction diffusion module.

Further, the method for optimizing in the data optimization module comprises:

wherein N represents the number of diffusion time intervals;

l V represents the number of users;

p _ik denotes v _i And v _k Probability of forwarding action occurring between, v _i Denotes the ith user, v _k Represents the kth user;

log (-) is a logarithmic function;

represents p _ik An estimated value of (d);

χ represents a learnable parameter, namely all parameters needing to be learnt in the model;

the optimizer calculation is as follows:

θ _t+1 ＝θ _t +Δx (16)

wherein l _t A second moment representing a gradient h (t);

β ₂ represents the introduced second moment attenuation parameter and is a constant;

represents the parameter beta ₂ Results taken into the L infinite paradigm;

V _t-2 represents the sum of the squared gradients at the first t-2 time;

h (t) is the gradient of the parameter at time t;

|h(t)| ^∞ represents the result of bringing the parameter | h (t) | into the L infinite normal form;

epsilon is a sliding term parameter;

η represents the correction of the first moment of the gradient h (t);

θ _t+1 the optimization result at the moment of t +1, namely the final optimization result is shown;

θ _t showing the optimization result at the time t.

The data display system further comprises a display module, wherein the data output end of the data representation fusion module is connected with the data input end of the embedded prediction module, the data output end of the embedded prediction module is connected with the data input end of the prediction diffusion module, the data output end of the prediction diffusion module is connected with the data input end of the data optimization module, and the data output end of the data optimization module is connected with the data input end of the display module. The display module is used for displaying the diffusion prediction information. In order to prevent the propagation of the false information, a prompt is sent to a user aiming at the false information.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

(1) The GCN is used for learning the structural features of the user attention relationship and the forwarding relationship in the heterogeneous network, and a novel social network user representation fusion method is provided in order to effectively use an attention mechanism to fuse the learned user representations together.

(2) To improve the efficiency of encoding, learning, and capturing context-dependent information for a user's context, a multi-headed Attention mechanism with MASK Attention is proposed.

(3) In order to effectively improve the information diffusion prediction precision, an ASTHGCN model based on a space-time attention mechanism heterogeneous graph convolution network is constructed. The proposed ASTHGCN model utilizes an attention mechanism and a graph convolution neural network to fuse the influences of spatial factors such as time factors, influence, diffusion behaviors and the like.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic diagram of an information diffusion process.

Fig. 2 is a frame diagram of the ASTHGCN model proposed by the present invention.

FIG. 3 shows the present invention t _i And (4) storing schematic diagrams of matrixes of users at the moment.

Fig. 4 is a schematic diagram of the information diffusion process of the present invention.

FIG. 5 is a diagram illustrating a user attention relationship for graph convolution learning according to the present invention.

FIG. 6 is a diagram illustrating MSLE metrics on a double, memetracker, twitter dataset according to the present invention.

FIG. 7 is a schematic diagram of an ablation experiment performed by the present invention in modules such as a heterogeneous graph, behavioral relationships, social networks, temporal attention mechanism, heuristic fusion mechanism, etc.

FIG. 8 is a schematic diagram of the comparative analysis of the performance indexes of the ASTHGCN model at different time intervals.

FIG. 9 is a schematic diagram of comparative analysis of performance indexes of ASTHGCN models of different head numbers.

FIG. 10 is a schematic diagram of the comparative analysis of performance indicators of different dimensions according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention.

1 related work

There are two methods for the research work of the existing social network information diffusion prediction. One is based on the study of user behavior and thus information prediction, and the other is based on the influence relationship of the user.

1.1 methods based on user behavior

The user behavior-based approach learns the interpersonal relationships of users according to a behavior diffusion sequence of a given user, thereby performing information dissemination prediction. The user behavior reflects the interest degree of the user behavior in the information and the information propagation trend to a certain extent, and the publishing behavior, the attention behavior and the forwarding behavior of the user are generally used for quantification. Past researchers often employed traditional models built on a priori models, such as independent cascade models and linear threshold models. In conventional relationship model research, the conventional propagation model is more suitable for a uniform network and cannot be effectively applied to a real (non-uniform scale-free) social network because it cannot learn complex deep relationship features. Although these models can also fit the relationships between users, some noise and partial relationship features are introduced, the real social diffusion network is complicated, the effectiveness of the methods depends on the assumption of the information diffusion model, and the authenticity and the effectiveness of the assumption are not easy to verify. The conventional model needs to be imported with enough parameters in advance, for example, the IC model must be specified with the probability of diffusion of each link in the network in advance, and the probability of diffusion through the link on any real network cannot be known in advance in reality, so that the method has certain limitations.

In recent years, the application of deep learning technology in the fields of computer vision and the like is more and more skillful. Some studies employ deep learning to diffuse behavior predictions from past users. For example, deep cas proposes for the first time to convert a cascade graph into a node sequence by random walk, and provides an algorithm for learning cascade graph representation in an end-to-end mode for prediction. The DeepHawkes ignores the structural information in the cascade graph, converts the cascade graph into a diffusion path describing the information propagation process between users, and predicts the information cascade information by using the predicting force of end-to-end deep learning. Topo-LSTM expands the original LSTM model, considers diffusion time and the diffusion process of the complex structure of the diffusion time, takes dynamic directed acyclic graph DAG as input, and generates topology perception embedding for each node in the DAG as output to learn the chain structure of the information diffusion sequence. The CYAN-RNN and the DeepDiffuse take time stamps into consideration to conduct information propagation prediction, the CYAN-RNN corresponds a chain structure to a diffusion tree, and an attention-based RNN model is provided to capture cross dependence in cascade to conduct information prediction. There are also some models based on attention mechanisms, such as DAN, hi-DAN, NDM. The diffusion behavior of the user is beneficial to learning the social relationship of the user and knowing the interest degree of the user in the message, and further determining whether the user performs the operations of closing notes or forwarding and the like on similar messages, so that the diffusion sequence of the user can be well predicted by the models, and the propagation trend of the information is predicted.

The existing method based on the user behavior integrates the problem into a sequence prediction task, and explores how the time sequence influences future information diffusion according to the existing user diffusion behavior sequence. However, social relationships among users and influence thereof are also great influence factors influencing information diffusion, so that the accuracy of the method for not considering the influence of the users is poor, and the diffusion trend of the information cannot be accurately judged.

1.2 user influence-based model

And determining an influence probability parameter of the model through the characteristics of information propagation based on the user influence model to carry out information diffusion prediction. Similar individuals are more likely to have the same hobbies and to take similar actions on similar information, based on user homogeneity. In the social network model, social influence among different individuals is different, for example, influence of tremble large V, microblog large V, common users and famous stars is different, generated information prediction sequences are different, the higher the popularity is, the higher the influence is, the higher the information diffusion probability is, and the more favorable the diffusion of information is. Therefore, information prediction according to user influence is based. Some existing studies are information prediction methods based on user influence, for example, coupledGNN uses two coupled graph neural networks to capture the interaction between nodes, and uses information initiators and social network relationships to predict information dissemination. The HDGNN expands the heterogeneous GNNs, combines the characteristic of time evolution, and combines the complex relation among nodes to dynamically predict information. DyHGCN proposes a heterogeneous graph convolution network to consider social networks and diffusion path networks for information prediction. The HDD utilizes a meta-path representation learning method to carry out information diffusion on codes of a heterogeneous network, chen et al provides a semi-supervision method of CasCN under an end-to-end deep learning framework, the method combines the cascade directivity and time attenuation effect to avoid complex characteristics, only learns structural characteristics and time sequence characteristics to predict information diffusion, so that the prediction performance is further improved, and an author populates the method to other scene applications.

In daily life, research work based on user influence is very necessary, in order to enable commodities to be better publicized, merchants earn higher profits in real life, influential stars are often invited to carry out introduction, the influential stars can enable fans of the merchants to see the commodities, and when the exposure rate of the commodities reaches a certain amount, sales volume can also rise, namely, the larger the propagation probability of the commodities recommended by the stars is, the more benefits the merchants can obtain. However, this method only considers the influence of the user, does not consider the user behavior or the time-dependent factors of the information, cannot capture the global relationship, cannot simulate the complexity of the diffusion sequence well, and also reduces the accuracy of information prediction.

2 preliminary introduction

The section mainly visualizes the diffusion problem and provides a convolution network model based on a space-time attention heterogeneous graph for information propagation prediction. Firstly, the model adopts a multilayer graph convolution neural network to learn the diffusion behavior and the influence structure diagram of a user, then the results are fused to be used as user representation, a time sequence is embedded into the abnormal graph, and finally a multi-head attention mechanism is proposed to carry out information propagation prediction.

Given a set of users (set of nodes) V = { V = } ₁ ,v ₂ ,v ₃ ,...v _N } and a set of information M = { M ₁ ,m ₂ ,m ₃ ,...m _K Where N refers to the number of users, K refers to the number of information, and suppose that information m _k Propagating between nodes V. The invention treats each piece of information as a file, namely information m _k Propagation between users can be seen as a process in which nodes are constantly activated. The diffusion process of the information can be recorded as

Wherein

Representing user v ₃ In that

Information m is forwarded or issued at any moment _k ，

Representing a user

In that

Information m is forwarded or issued at any moment _k ，N _m Represents a message m _K The number of cascade of (2) and the maximum length of the diffusion sequence.

Indicating a certain subscriber forwarding information d _k Is at a time t _c Forwarding sets

Is a tuple representing user v _c In that

Information m is forwarded or issued at any moment _k Wherein

Representing user v _c Forwarding information m _k ，v _c Indicating the c-th user. To consider the time series in information prediction, assuming that a certain node can only be activated once, the information diffusion process is as shown in fig. 1.

In FIG. 1, the left side is information m ₁ ,m ₂ ,m ₃ Is a propagation process of message m ₁ Can be expressed as

I.e. user v ₁ In that

Time of day forwarding information m ₁ User v ₂ In that

Time of day forwarding information m ₁ User v ₃ In that

Time of day forwarding information m ₁ . The social relationship influence force diagram of the user is arranged on the upper right side of the figure 1, the user diffusion behavior diagram is arranged on the lower side of the figure 1, and the behavior of a certain user or the propagation trend of certain information is predicted according to the existing behavior diagram and influence force diagram. The solid black line represents the user's behavior at time t, and the dashed red line depicts the user v ₅ And on the basis of the information of the time t, the forwarding behavior possibly occurring at the time t' and the possibly infected user nodes.

The information diffusion prediction is to predict the information diffusion situation at the t +1 moment according to the diffusion situation at the t moment and by combining other factors. The invention discloses a method for predicting the behavior of a user at the time t +1 according to a propagation behavior diagram of information at the time t among users and the distribution condition of social relationship influence diagrams of the users, and judging when the information is forwarded to which users along a certain path.

As shown in fig. 1, the information diffusion process delivers many information, such as when the user forwards the information, which information is forwarded, and so on. The invention carefully analyzes the factors of information propagation, mainly considers the factors of influence of users, diffusion behavior of users and time, and researches the information prediction problem. In the right-hand side of FIG. 1, the impact diagram and the behavior diagram both provide the basis for information propagation, e.g., assume user v at time t ₅ Where the information is received, then is the information propagated at time t + 1? From the figure it can be seen that no user is from v ₅ Forwarding information so that all users have the same probability of becoming the next activated user, but it can be seen from the impact force diagram that v ₂ ，v ₆ For v ₅ Regarding injection behavior, and because each node has only one activation opportunity, there is a greater likelihood that information will propagate to v at the next time ₂ ，v ₆ Making it the next activated node. And because of v ₃ Forwards v twice ₂ Is known from the influence diagram ₃ Attention v ₆ Therefore, the information may also pass through v ₂ Or v ₆ Is propagated to v ₃ . Therefore, the influence and the user diffusion behavior are comprehensively considered, so that the possible diffusion path of the user can be comprehensively considered, and the accuracy of information prediction is greatly improved.

3 the model

S1, constructing a heterogeneous network with a forwarding relation and an attention relation, modeling a behavior diagram and an influence diagram of a user by using GCN to obtain better user structure learning, and fusing the learned user representation by using an attention mechanism.

And S2, embedding the time into the heterogeneous network by using methods such as a time attention mechanism and the like to obtain more accurate user representation.

And S3, finally learning the context information by adopting a multi-head Attention mechanism with MASK Attention so as to realize the information diffusion prediction, and simultaneously solving the context dependence problem of the current diffusion path.

3.1 model architecture

The invention patent uses the ASTHGCN framework for information prediction based on deep learning as shown in FIG. 2. The framework mainly comprises three parts, and information is predicted in real time by combining an influence diagram, a behavior diagram and a time factor of a user. Firstly, the end user representation of a behavior diagram and an influence diagram structure are learned by using a multi-layer diagram convolution network and fused, secondly, in order to predict information in real time, a time sequence is embedded into a heterogeneous diagram, so that the user representation is more comprehensive and complete, and finally, a multi-head attention network mechanism is adopted to predict information and solve the problem of context dependence.

3.2 learning user representations

In a colloquial language, "class by class, group by group", people with the same quality often have similar interests, and if a micro blogger V or a famous star forwards a micro blog, the fan of the user has a very high possibility to forward the micro blog, so that the influence of one person is very favorable for predicting whether the user forwards or releases the information. In addition, if a user has forwarded similar information indicating that the user is interested in such content or the user, it is possible to forward or publish the information or the user at a later time, so that the user's past forwarding or publishing behavior is also beneficial to the prediction of the information. Therefore, the invention learns the representation of the user by combining the influence relationship and the behavior relationship of the user so as to accurately predict the information in real time.

The network employed by the present patent is a heterogeneous network, which has one node (user) and two types of relationships (attention relationship and forwarding relationship) as shown in fig. 1. At a certain time t _i ,i∈[1,n]Using a contiguous matrix

Representing various information of the anomaly map, as shown in FIG. 3, wherein F _A ∈R ^|V|×|V| A adjacency matrix representing the relationship of interest in the influence diagram,

represents t _i An adjacency matrix of the forwarding relation at the time, | V | represents the number of users. The invention stores the influence relation among the users into a directed and unweighted influence diagram, and simultaneously stores the forwarding condition of the users at each moment into a directed weighted behavior diagram. In order to better represent the process of information diffusion, the invention patent uses each time interval t _i The heterogeneous diffusion map representation of the user is shown in fig. 4.

After an information structure is constructed, the invention adopts a multilayer graph convolution neural network to carry out structure learning on spatial factors such as influence, diffusion behavior and the like of information, learns the structural characteristics of a user and fuses the structural characteristics into a new abnormal graph. The research on the influence and the diffusion behavior is in the spatial dimension, the influence of the mutual relation among different users is complex, is subtly hidden and is a large factor influencing the accuracy of information prediction, the influence of the users can directly influence the information diffusion extent, the diffusion behavior of the users can learn the social relation of the users according to the diffusion sequence, the influence relation is analyzed for diffusion, the message is forwarded as far as possible or recommended to the users with large influence for forwarding, and the message is quickly propagated. As shown in FIG. 5, the invention uses a multi-layer graph volume network to adaptively capture dynamic social relations and forwarding relations between users, and learns the characteristics of more users by adopting the graph volume network according to the existing social relation situation, so as to obtain more complete user structure representation.

And learning the user structure representation by respectively adopting a multi-layer graph convolution network to learn the attention relationship in the influence diagram and the forwarding relationship in the behavior diagram, and forming a new user structure representation of the attention relationship and the forwarding relationship with all characteristics, wherein the learning mechanism is as follows.

Wherein, X ⁽ⁿ⁾ A user representation representing the nth layer,

it is the parameter that can be learned that,

is a learnable parameter of the nth level user's concern,

is a learnable parameter of the nth layer user forwarding relationship. t is t _i ∈R ^d Is the time interval of the user's heterogeneous network, d is the dimension of the user's embedded representation, n represents the number of layers of the GCN,

is a user representation of the nth level user attention relationship,

is a user representation of a layer n user forwarding relationship, X ⁽⁰⁾ ∈R ^|V|×d Is normally distributed randomly initialized user embedding. σ (-) adopts the ReLU activation function, which is better than other activation functions, overcomes the problem of gradient disappearance, and trains quickly.

3.3 user representation fusion mechanism

Deriving attention relationships from impact force diagrams

And deriving forwarding relations from the behavior graph

After obtaining the two important factors of the user attention relationship and the forwarding relationship, how to merge the two relationships will be discussed next. Taking the microblog big V as an example, the content forwarded by the big V can enable more peopleIt is seen that its impact may be trending more spotters about or forward this event. Second, how does the information make these large influential and forwardable large V see? If the large V has a user who has forwarded a similar article, video or pays attention to the topic before, the user can see again at the moment that the user has a higher possibility of performing secondary forwarding, and therefore, the attention relationship and the forwarding relationship are very important. In order to better fuse the two factors and generate more accurate output, the invention adopts attention and user relation combination to realize the aim of the node v _i Firstly, calculating the weight between the attention relation in the influence and the forwarding relation in the behavior diagram, adopting the attention network to carry out the feature learning of the node, and carrying out the Hadamard product on the obtained weight matrix and the user relation expression to obtain the final user expression.

e _ij ＝a(Wh _i ||Wh _j ),j∈X _T (3)

α _ij ＝softmax(e _ij ) (4)

Where a (-) denotes mapping high-dimensional node features to real numbers, wh _i ||Wh _j Represents the pair Wh _i And Wh _j Splicing is carried out, h _i 、h _j Is a characteristic matrix of a user attention relation and a forwarding relation, W is a learnable parameter and is a constant; alpha (alpha) ("alpha") _ij And expressing an attention weight coefficient between the attention relation and the forwarding relation, wherein softmax (·) is a normalization function, exp (·) is an exponential function with a natural constant e as a base, and LeakReLU (·) is a leakage correction linear unit. Wherein, u is a hadamard product,

represents t _i User representation at level n +1 of the moment, α _iA Representing user v _i Is given by the attention relation weight magnitude, alpha _iT Representing a user v _i The forwarding relation weight size of (2); x _T Representing the obtained forwarding relation;

is a user representation of the (n + 1) th level user attention relationship,

is a user representation of a layer n +1 user forwarding relationship. The algorithm for learning user representations from different heterogeneous dynamics graphs is shown as algorithm 1 below.

In the algorithm 1, an attention relation matrix and a forwarding relation matrix F are constructed from an influence diagram and a behavior diagram _A And

and performing feature learning on the attention relationship and the forwarding relationship by using a multi-layer graph convolution network, calculating a weight by using an attention mechanism, fusing the weight into a new user representation, dividing time into a plurality of time intervals, fusing the new user representation learned in each time interval into a new heterogeneous graph by using the attention mechanism, and obtaining the user representations at all the moments.

3.4 time embedding strategy

After the influence relation and the behavior relation are fused to obtain user expression, time is embedded into information for real-time information prediction, and two different time embedding strategies are adopted. Where the behavioral relationships are diffusion behaviors.

3.4.1 approximation strategy

The approximate strategy is that for each user in diffusion, the behavior relation graph of the user in each time interval is different, but the attention points andsince the interest does not change instantaneously and the time is continuous, when predicting a diffusion map at a certain time, the diffusion map at the latest certain time (the previous time is used in this text) is directly specified as the final user representation of the user. For example, when predicting t ∈ [ t ] ₇ ,t ₈ ) Can be based on t ₇ The information propagation of the time is used for predicting the information propagation trend at the time t.

3.4.2 attention mechanism strategy

The approximate strategy only designates the user representation at a certain moment as the final user representation, and the user behavior in the period of time cannot be fully utilized to carry out more accurate user representation learning on the t-th moment, so the invention patent adopts an attention mechanism to estimate the user representation at the t-th moment from the user representations at all moments in the time sequence. The purpose of the graph attention mechanism is to aggregate the node feature representation of each time point to the central vertex to learn a new node feature expression.

From the above steps, given a user v, a user representation in all time intervals of the user learned by the multi-layer graph convolution network can be obtained

A user representation obtained by convolution of the ith layer graph representing the tl time; obtaining a user representation

If the user is at a certain moment t _i Forwarding the message for a time t _i ,t _i+1 ) When considering the previous t _i The action influence of the user before the moment, and the time embedding method based on the attention mechanism is designed as follows.

t'＝mixTogether(t _i ) (6)

Wherein t' is the result representation after converting the time interval into time embedding, alpha _i Is the weight coefficient calculated by equation 7, and T represents a total of T time instants. v' is the final user representation, i.e. the user representation after time embedding;

in the above formula

Is a mask matrix, when t' < t _i When is k _i = infinity, meaning that the softmax function is zero weight, turning attention off over a time frame. The mixtogetherer function is the embedding of time intervals, which is initialized by the normal distribution. The end-user representation v' is obtained by multiplying the embedded weights with the user over time. The algorithm for final time embedding is as follows.

In the algorithm 2, the mixtogetherer function is used for embedding the user time before the moment, the user embedding weight at each moment is generated, whether the user embedding at the moment is effective or not is judged through the mask matrix k, and the user representation obtained through the weight calculated in the way is more suitable for the state of the user forwarding moment, so that the information can be predicted in the next step.

3.5 information propagation prediction

After obtaining the user node representation, in order to better perform information prediction and capture context dependent information, the obtained user representation may be constructed as a diffusion sequence V '= { V' ₁ ,v' ₂ ,...,v' _N }, attention networks use a linear mapping of a shared parameter to nodesAnd performing dimension increase, and executing Mask Attention operation to combine the obtained user node representation with the Attention mechanism, wherein the Mask Attention means that the operation of the Attention mechanism is only operated on the nodes meeting the condition, and not all the nodes are operated. The formula of information prediction is as follows:

wherein · - ^T Representing the transpose of the matrix, M representing the user representation of the final prediction, b representing a matrix. Matrix C in the above formula _ij ，

Is a mask matrix when i>When j is, i.e. C _ij = infinity, meaning that the softmax function is a zero weight, turning attention off over a time horizon, so that only eligible nodes are operated, wherein

Is a learnable parameter, d _r d/G, G is the number of heads of multi-head attention.

After obtaining the predicted M, calculating the probability of information diffusion by using a two-layer fully-connected neural network

p＝W'σ(W”M ^T +λ ₁ )+λ ₂ (11)

In the above formula, the first and second carbon atoms are,

representing the probability of information diffusion, W', W "are learnable parameters,

λ ₁ ，λ ₂ are learnable parameters, all of which are constants; with V representing the userThe number, d is the dimension of the user embedded representation, σ (-) is the activation function, and the patent of the invention adopts the ReLU activation function. A means of ^T Representing a matrix transpose.

The loss function adopted by the invention is a cross entropy loss function as an objective function, and the formula is shown as follows.

Wherein | V | represents the number of users, p _ik Denotes v _i And v _k The probability of a forwarding action occurring in-between,

represents p _ik An estimate of (d). When p is _ik =0, it means that no information diffusion occurs, and when p _ik And =1, the vocal spread behavior is represented, and χ represents a parameter which can be learned, namely all parameters which need to be learned in the model, and is updated by an Adamax optimizer, and the calculation formula of the optimizer is as follows.

l _t ＝max(β ₂ *V _t-1 ,|h(t)|) (14)

θ _t+1 ＝θ _t +Δx (16)

Wherein l _t The coefficient representing the update rule of the optimizer, max (·,) represents taking the maximum of the two, β ₂ Denotes the introduced parameter,/ _t The value of (C) can be obtained by the following equations (13) and (14), (14) is a simplified version of (13),

to representL infinite form of expression

h (t) is the parameter gradient at time t, the second order momentum V (t) is the sum of the squared gradients, ε is a smoothing parameter for the denominator of 0, β ₂ ∈[0.9,0.999]，ε＝10 ^-9 。θ _t+1 And represents the optimization result at the time t +1, namely the final optimization result.

4 Experimental and results analysis

In this section, the data sets used in the experiments, the advanced deep diffuse reference model, the ablation experiments, and the parameter tuning experiments are mainly presented. This will be compared with the ASTHGCN model proposed by the patent of the present invention, and further introduces an evaluation index for evaluating the performance of the ASTHGCN model.

4.1 preparation of the experiment

The invention adopts three public data sets of double, twitter and Memetracker. The statistical data of the number of three data sets is shown in table 1 below, where User represents the number of users, link represents the number of User attention relationships, cascades represents the number of User forwarding sequences, and avg.

TABLE 1 data set

Database	Douban	Twitter	Memetracker
				user	23123	12627	4709
Link	348280	309631	NULL
				Cascades	10602	3442	12661
Avg.length	2714	3260	1624

Twitter is a social media network providing micro-blog service, 12627 users in 10 months 2010 and tweets with attention relations and diffusion sequences are extracted from a Twitter data set, wherein the tweets comprise URLs of message bodies, each URL is a unique mark of information, and the influence relation of the users is the attention relation on tweets.

Memetracker1 contains a lot of online mainstream social media activities, the data set adopted by the patent is that millions of news stories and blog articles are collected from an online network, the URL of each website or blog is regarded as a user, the application of each common quotation and phrase among the users is tracked, and the data set has no social graph.

The double is a social service network platform which can share the content of books or movies, each book or movie is regarded as one piece of information, when a user reads the book, the user is activated, and when two or more users activate the same book or movie more than 20 times, the two users are considered to be homogeneous persons.

Following the previous experimental setup, 80% of the data was randomly sampled for training, 10% for validation and 10% for testing.

4.2 reference method

The present patent lists several most advanced baseline methods, compared to the ASTHGCN model proposed by the present patent.

DeepDiffuse: is an LSTM-based model that utilizes node sequences and attention mechanisms and considers user activation time stamps, and can predict when a user is activated based on previous concatenation sequences.

Topo-LSTM: is a model based on LSTM heuristic information diffusion using a Directed Acyclic Graph (DAG) structure that takes a dynamic DAG as input to the LSTM model and the probability computed by the embedding function as the probability of infection at each time to generate an embedding with topology perception as output.

NDM is a model which does not need a diffusion graph and adopts a convolution network and self-attention mechanism modeling to relieve the problem of long-term dependence.

SNIDSA is a novel sequential neural network with structural attention, which not only utilizes a recurrent neural network to model sequence information, but also utilizes a gating mechanism to capture structural dependence among users.

FOREST: the method is a multi-scale diffusion prediction model for predicting the popularity of information under the guidance of reinforcement learning. The model extracts potential social graph information and integrates macroscopic prediction by means of reinforcement learning.

DyHGCN: the model is a model for carrying out dynamic information prediction by adopting GCN learning user social graph and diffusion graph structure characteristics, and the model time adopts a hard selection strategy model (DyHGCN-H) or a soft selection strategy model (DyHGCN-S) to carry out information prediction.

The patent method of the invention (ASTHGCN _ A, ASTHGCN _ T): ASTHGCN _ A is a method for embedding time by adopting an approximate strategy time embedding strategy and ASTHGCN _ T is a method for embedding time by adopting a time attention mechanism.

4.3 evaluation index and Experimental settings

From previous studies, there may be any number of potential candidates and information diffusion prediction may be considered the next infected user's retrieval task. Because both SNIDSA and TopolSTM model datasets require a potential social graph, whereas Memetracker datasets do not have a social graph, they are not taken into account in the Memetracker dataset comparison experiments.

The patent of the invention adopts an intuitive evaluation method, namely, ranking indexes in information retrieval are utilized. The uninfected nodes are ranked according to infection probability, and the performance of the ASTHGCN model is evaluated by using two widely popular evaluation methods, hits @ N and MAP @ N, and MSLE mean square logarithm error indexes. The experimental setup N =10, 50, 100 was evaluated.

The method adopts a GPU (GeForceRTX 3060) and a PyTorch1.9.1 framework to carry out experiment realization, uses an Adamax optimizer to carry out gradient descent updating parameters in small batches, and carries out test evaluation on a test set to the performance of the ASTHGCN model, wherein the selected parameter setting is shown in a table 2.

TABLE 2 parameter settings

Parameters	Value
		Batch Size
	16
		Learning Rate	0.001
β	β∈[0.9,0.999]
		Dropout Rate	0.1
Optimizer	Adamax
		Num Epoch
	50
		kernel size	128
d_model	64
		time_step	8
n_heads	14

4.4 Experimental results and parameter settings experiments

In this section, a comparison test is set, and the results of the experiments of the respective models such as DeepDiffuse are compared, and an analysis comparison test for setting parameters is performed.

4.4.1 results of the experiment

The experimental results of the ASTHGCN model and the reference model on the three data sets of double and the like are respectively presented in tables 3, 4 and 5. The evaluation indexes of all models are shown in the table, the superiority of the ASTHGCN model can be seen in hits @ N and map @ N indexes, and the result shows that the ASTHGCN model can successfully carry out information propagation prediction.

As can be seen from tables 3, 4, and 5, dyHGCN was the most advanced model before the ASTHGCN experiment was not submitted. From the results it can be seen that the ASTHGCN is always superior to the most advanced methods, leading to the following conclusions:

table 3 experimental results on the Douban dataset

Table 4 experimental results on the memitracker dataset

TABLE 5 Experimental results on the Twitter data set

(1) Compared with models based on user influence research, such as SNIDSA and FOREST, the ASTHGCN-A model has nearly 5% of promotion on hits @10 index, and the ASTHGCN model has 11% of absolute promotion on hits @50 and hits @100 index on Twitter and Douban datA sets. There was an absolute boost of 4% for map @10 over all three datasets. SNIDSA and FOREST only consider influence of users, and carry out information prediction according to social relations of the users, but do not consider influence of diffusion behaviors of the users.

(2) Compared with DeepDiffuse, topolSt and NDM models based on user diffusion behavior research, the ASTHGCN model has 10% of absolute improvement on hits @10 index, 17% of improvement on hit @50 and 20% of absolute improvement on hit @100 in double and Twitter data sets. The evaluation index of map @ is improved by 7% in absolute terms. The DeepDiffuse, topolSTM and NDM models predict information according to the previous diffusion behaviors of users without considering factors such as influence of the users, the influence of the users can reflect the information propagation capacity and speed, and experiments prove that the factors of the influence of the users are very important for researching information prediction.

(3) Compared with the most advanced DyHGCN model, the ASTHGCN-T model has 5% absolute promotion on hits @ index and 3% promotion on map @ index. The DyHGCN model and the ASTHGCN model consider the forwarding relation, the attention relation and the time factor of a user at the same time, however, the ASTHGCN model considers the dependency relation of the user context when learning and fusing the user relation structure, the user representation is learned by adopting a method of combining the attention mechanism and the graph convolution, so that the information prediction performance is further improved, and experiments prove that the influence of the user context dependency relation on the information prediction performance is very obvious.

Next, the Mean Square Log Error (MSLE) indicator of the three data sets was tested, and the results are shown in the following radar fig. 6 for convenience of comparison.

The experimental results on the three datasets of Douban et al, MSLE, the lower the score the better. Since the TopolSTM model has a score greater than 10, significantly higher than the other models, and is not shown for easier viewing, SNIDSA is not applicable to the dataset because there is no social graph in the Memetracker model, and the MSLE value is set to 0. In the above description, the ASTHGCN model considers spatial factors and time factors such as behavior relation and influence relation of users, and uses an attention mechanism to relate the application of model ideas such as fusion of user representations together, thereby showing the effectiveness and accuracy of the ASTHGCN model.

4.4.2 ablation experiments

In order to study the effectiveness of each factor in the ASTHGCN model, the invention patent performs some additional ablation experiments on the basis of the DYHGCN model to verify the performance of each factor. The patent of the invention carries out ablation experiments from the following aspects:

the isomerism graph: the encoding modules in the heteromorphic graph are removed and the user representation is studied using only homogeneous networks.

The behavior relationship: and removing the behavior relation in the abnormal graph, removing the convolution operation which represents learning by the user, and only considering the diffusion behavior relation of the user.

Social networking: and removing the social influence relationship in the abnormal graph and removing the convolution operation of the user representation learning, wherein only the influence relationship of the user is considered.

Time attention mechanism embedding: when considering the time attention mechanism, an approximation strategy can be considered for time embedding.

User representation fusion method: when the user attention fusion mechanism is considered, heuristic strategy fusion can be adopted.

The ablation experiments of various modules such as ASTHGCN model differential graphs and the like are performed on the Twitter and double data sets, and the experimental results are shown in FIG. 7. As can be seen from fig. 7, the application of each module in the ASTHGCN is necessary, and each module is improved to some extent. First, when the coding module in the heterogeneous graph is removed, the performance is significantly lower than that of the ASTHGCN when the homogeneous network is only used for researching the user representation for information prediction, and the ASTHGCN model is improved by 7 points on a Twitter data set and is improved by ten points on a Douban data set. This indicates that heterogeneous networks have a promoting role in information prediction. Secondly, in experiments that a behavior relation, a social relation, a time attention mechanism embedding time factor and the like are respectively lacked, the three can improve the information prediction performance on the original basis, however, the performance is obviously insufficient compared with the ASTHGCN model performance, and the performance of information prediction can be obviously improved only by combining and considering all the influencing factors. Finally, the model indexes after the heuristic fusion mechanism are remarkably reduced by 6 points compared with the model indexes adopting the time attention mechanism, the advantages of the fusion adopting the time attention mechanism are fully shown, and the time attention mechanism can more comprehensively perform user representation fusion, so that the model performance is improved. In conclusion, each module of the ASTHGCN model improves the whole information prediction performance, and the research of the ASTHGCN model is very meaningful.

4.4.3 parameter tuning experiment

In this section, different selections of parameter settings and performance analysis thereof are performed using the Twitter data set, and the number of attention heads and the number of time division intervals are mainly tested to verify the optimal parameter settings.

Influence of the number of time intervals: the invention considers the time factor of information transmission, divides the diffusion time sequence into the number of time intervals, and probably influences the ASTHGCN model performance more or less directly or indirectly. As the number of time intervals increases, the user can learn finer-grained representation, so that the learned user representation is more comprehensive, but the parameter tuning experiment is carried out because the setting of the parameters influences the final performance. The results of the experiment are shown in FIG. 8.

As can be seen from fig. 8, the performance of the ASTHGCN model increases with the number of time interval divisions, but when the time interval increases to 8, the performance of the ASTHGCN model starts to dip and then has a limited change in performance with the number of time interval divisions. This is because the larger the number of intervals dividing the user time series is, the more comprehensive the user represents, the more comprehensive the learned user characteristics will be, and when the number of intervals is larger, the learned performance change is limited, so that the performance index change in the whole information dissemination process is also limited, so the number of the selected time intervals in the invention is 8.

The multi-head attention mechanism influences the number of heads: the ASTHGCN model utilizes a multi-head attention mechanism to obtain more characteristics through different projections when different heads are calculated, so that the prediction performance of information propagation is influenced. The performance index of the model is influenced by the arrangement of the number of heads of the multi-head attention machine, so that the patent of the invention carries out a parameter tuning experiment, and the experimental result is shown in fig. 9.

It can be seen that as the number of attention-driven heads increases, the performance of the ASTHGCN model is improved continuously, because as the number of heads increases, the captured information is more comprehensive and more accurate. The performance of the ASTHGCN model is optimal when the number of attention heads reaches 14, and begins to dip as the number of heads continues to be larger. This is because when the number of attention-driven heads is excessive, the model over-trained fits result in degraded performance.

Influence of model dimensions: the invention patent researches how the representation of the dimension of the node V influences the performance of the model. When D is equal to {16,32,64,128}, the performance of the ASTHGCN model method is verified. The experimental results are shown in fig. 10, and the performance is enhanced continuously with the increase of the dimension. However, it can be seen on the Douban dataset that the ASTHGCN model performs best when the dimension is 64, and performance degrades significantly when the dimension increases, possibly due to the dataset being too large to be over-fit. However, on the Memetracker data set, the performance is converged when the dimensionality is 128, the performance increase is gradually gentle, and probably because the Memetracker data set has a larger data set, and the ASTHGCN model dimensionality is set to be 64 by combining the performance results on the three data sets.

5 conclusion

The invention researches the influence of space factors such as the influence relation, the diffusion behavior relation and the like of a heterogeneous network and the influence of time factors on information propagation diffusion, and designs an information prediction model of a convolutional network based on a space-time attention machine diagram. The ASTHGCN model comprehensively considers influence, diffusion behavior and time factors and applies an Attention mechanism fusion algorithm, so that user fusion and user representation are more comprehensive and more accurate, the accuracy of information prediction is improved, and in addition, the multi-head Attention mechanism with MASK Attention solves the problems of timestamp information prediction and information context dependence. Experimental results on three data sets showed that the performance of the ASTHGCN model is optimal compared to other reference models. The main conclusions are as follows: (1) Through research on the heterogeneous network with influence and diffusion behavior, user characteristics are learned and fused into the heterogeneous graph, the learned user representation is more consistent with real life, and a user sample of information prediction research is effectively provided. (2) In the user structure learning process, the dependency relationship of user context is fused, and the user representation is learned by adopting a method of combining an attention mechanism and graph convolution, so that the information prediction performance is further and completely improved, more accurate user output is achieved, and the aim of higher information prediction accuracy is fulfilled.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. An information diffusion prediction system based on space-time attention and heterogeneous graph convolution network is characterized by comprising a data representation fusion module, an embedded prediction module, a prediction diffusion module and a data optimization module;

2. The system according to claim 1, wherein the mechanism for learning in the data representation fusion module comprises:

wherein,

is a user representation of the (n + 1) th layer user attention relationship;

σ (-) is an activation function;

X ⁽ⁿ⁾ a user representation representing an nth layer;

is a learnable parameter of the nth layer of user attention relationship;

is the user representation of the (n + 1) th layer user forwarding relation;

represents t _i An adjacency matrix of forwarding relationships at times;

t _i is the time interval of the user's heterogeneous network;

is a learnable parameter of the nth layer user forwarding relationship.

3. The system according to claim 1, wherein the user representation fusion in the data representation fusion module comprises:

and S-B, performing feature learning of the nodes by adopting an attention network, and performing Hadamard product on the obtained weight matrix and user relation expression to obtain final user expression.

4. The system for predicting information diffusion based on spatio-temporal attention and heterogeneous graph convolution network according to claim 1, wherein the method for embedding the time series in the embedded prediction module comprises:

an approximation strategy or attention mechanism strategy;

the attention mechanism strategy includes:

t'＝mixTogether(t _i ) (6)

mixTogether (. Cndot.) is a function of embedding time intervals;

α _i is a weight coefficient;

softmax (·) is a normalization function;

v _ti represents t _i A user representation of a time of day;

k _i is a mask matrix;

v' is the final user representation;

t represents a total of T times.

5. The system for predicting information diffusion based on spatio-temporal attention and heterogeneous graph convolution network according to claim 1, wherein a formula of information diffusion prediction in the prediction diffusion module is as follows: