CN115858899A

CN115858899A - Network event label popularity prediction method based on multi-label influence

Info

Publication number: CN115858899A
Application number: CN202211605375.1A
Authority: CN
Inventors: 周斌; 田磊; 高立群; 赵学臣; 韩跃; 谢锋; 张中; 李爱平; 江荣; 王晔; 涂宏魁
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-03-28

Abstract

The invention provides a network event label popularity prediction method based on multi-label influence, which is used for collecting event label propagation data related to an event and related user data; constructing a label propagation relationship network, acquiring a node relationship and a node attribute, and establishing the label propagation relationship network comprising: the feature aggregation component comprises a static semantic feature aggregation process and a dynamic population propagation feature aggregation process; a local aggregation component, consisting of a network of graph capsules, learning a feature representation of local aggregation of labels; a dynamic time sequence representation component for learning the time sequence process of label propagation evolution; the three components simulate a propagation popularity prediction model of a propagation influence process among the labels, and train the model; inputting the event labels needing to be predicted and the propagation influence network data related to the labels into a trained model, and outputting popularity indexes which are possibly generated in the future by concerned social network event labels. The method and the system can predict the future popularity of the concerned social network event label in the social media.

Description

Network event label popularity prediction method based on multi-label influence

Technical Field

The invention relates to the technical field of deep learning and social network public opinion analysis, in particular to a network event label popularity prediction method based on multi-label influence.

Background

Social media, as an extension of real events on the internet, has become an important platform for social event content to propagate in networks. Official media or self-media publish events occurring in the society to the social network in real time through social network services, attract wide user participation after becoming network events through propagation, and further cause social influence. Thus, the popularity of a network event has a direct relationship to the degree of impact of a particular event in reality. Events in the society are usually covered by semantically generalized labels (beginning with a # number and a generalized phrase), and when the events enter the outbreak propagation situation, a plurality of labels are introduced and maintained. The network event continues to spread under the mutual influence of the tags, so that the effect of expanding the influence range of the social network is achieved. In reality, if a network event has a certain influence in social media, for example, after the network event becomes a "hot search" event, development of the accompanying event often generates a plurality of tags which feed back semantic connotation of the event, and the tags related to the event interact with each other to form a competition or cooperative relationship on propagation traffic, and the strong correlation between the tags pushes the extent of event propagation in the network. The appearance of the social network event tag mode changes the mode of social hotspot event propagation in the network, and the tag has stronger aggregation effect and semantic generalization capability, thereby promoting the participation of network users in the propagation process. The propagation of network trending events in social networks is usually accompanied by the formation of multiple tags, attracting traffic in the form of topics of different tags.

Social content popularity prediction is a hot research problem in network propagation prediction. The popularity prediction research can intuitively evaluate the amount of the event concerned by the user to obtain the propagation in a future period of time, provide data support for understanding the propagation rule of the social network event, and predict the propagation scale of the event through social media data. Because each sub-topic generated by the network event is participated by user groups with different interests, and the topics represent the expression of the groups aiming at the event, the propagation popularity of the event label set can be regarded as a measurement method for evaluating group attention in the event, the propagation influence relationship of a plurality of labels under the network popular event is analyzed, and the research for predicting the popularity by utilizing the mutual influence of the labels can provide support for specific tasks such as public opinion event monitoring, viewpoint and position prediction.

On one hand, the methods are generally established on the basis of macroscopic analysis and provide a cascading propagation model of a single label item, but the methods cannot directly capture the joint dynamic propagation process of a group of label information items taking a network event as a whole, so that the propagation influence of a network event in an outbreak period to generate a plurality of related and complex labels in the same time window cannot be reflected. On the other hand, most existing analysis methods for social network tag propagation influence are based on a point process time sequence model, the model usually only contains single attributes (such as forwarding amount and time sequence relation) of an event tag, and a generation model is used for predicting the popularity of future tags.

When the network event propagation is in a hot period, the generated tags can be regarded as a semantic representation with population aggregation, i.e. different user population variables are implied under different tags of the event, so that the propagation properties of the population are implied by the tags. The invention provides an end-to-end deep learning regression prediction model, deep learning modeling is carried out by considering the mutual propagation influence between different labels in an event label propagation stage, and semantic features and two group indexes are introduced as key features influencing propagation, so that the future social network popularity of a target label is predicted more accurately.

Disclosure of Invention

The invention aims to provide a network event label popularity prediction method based on multi-label influence, and solves the existing problems.

The technical scheme is as follows:

a network event label popularity prediction method based on multi-label influence comprises the following steps:

s1, propagating social network events in the real world, and crawling labels and texts related to the social network events;

further, in S1, in a data set propagated by a social network event in the real world, a social text in a certain time period is crawled by using an equal topic word as a keyword.

S2, cleaning and sorting the crawled data; and preprocessing the relation characteristic preprocessing. The method specifically comprises the following steps:

s201, data cleaning is conducted, publication conditions of the text pushing are recorded in the csv file after the data cleaning, and data fields comprise user ids for publishing the text pushing, text pushing contents, publication time and user ids for text pushing interaction (including forwarding, original creation and comment, and original interaction ids are the same as text sending ids). Each record represents a tweet release related to an event;

s202, relation characteristic preprocessing, namely additionally extracting titles representing general events in the preprocessing process according to the characteristics of the text pushing in different data sets, and establishing label relations for the text pushing with a plurality of labels in one text pushing.

S3, collecting event related labels aiming at network events, and constructing an event label propagation relation graph, a global label relation graph G and a local influence attribute graph G by using the association among the labels _i (ii) a And calculating the propagation characteristic attribute of the node, firstly carrying out window division on time according to observable data, and then constructing a local propagation influence characteristic graph G of a network event label i according to data under a time window _i Including extracting static semantic feature maps

And a dynamic population propagation timing diagram sequence>

The method specifically comprises the following steps: s301, constructing an event label propagation relation graph, and selecting two indexes as sources of label relations in the relation graph, wherein the two indexes are an explicit relation and an implicit semantic relation;

(A) And an explicit relation index which indicates that the labels have a propagation influence relation in the process of propagation if a user explicitly aggregates more than two labels in a tweet. The specific formalization is as follows:

wherein C is _O-occur (i, j) represents the frequency of two labels appearing in one tweet at the same time, N represents the set of all the neighbor co-occurrence nodes of the node i, and the formula expresses that the labels i and the labels j are in the whole event corpusThe degree of association in the library can represent an explicit propagation impact.

(B) And when the network event labels after the explosion lack the explicit relationship, the implicit semantic relationship index utilizes the extracted explicit labels and establishes relationship association through semantic similarity in an observation window, so that the aim of extracting the event labels which are semantically related but have no explicit marked # characters is fulfilled. Specifically, the model uses a point-by-Point Mutual Information (PMI) method to establish a link for the semantic relationship between the tags, and the method can express the weight relationship of the tags between event semantic data, and is specifically formalized as follows:

/>

where d (i, j) is the total number of tweets in which the observable window event label i and label j occur simultaneously, where an explicit feature, unlike an explicit feature, is an explicit label with a # number evident in the tweet, and where d (i, j) is a co-occurrence that occurs in the tweet without the # being included. d (i) and d (j) are the total number of tweets in the set that contain i and j at least once. D is the total number of tweets in the social networking event. In general, a positive PMI value means that the tags in the event tag library have a high semantic relevance.

(C) And establishing relation weight between the social network event tags, wherein the implicit semantic relation only establishes association between the tag pairs with positive PMI values. And finally, determining the relationship weight between the social network event tags by an addition method:

R(i,j)＝{R _ex (i,j)+R _im (i,j)}

for social network events, the above process can build an event label relationship graph G = < V, E > in the data of the observable time window;

(D) Sampling a fixed size for a target label i from a network event label relation global graph GOf sub-network G _i Instead of directly processing the global network G itself with much noise information, the sampling method at this step selects the restart random walk algorithm RWR, and with the RWR method, the relevant labels most likely to affect propagation can be selected, and the upper limit of the associated label nodes is constrained, that is:

G _i ＝RWR(G,i,m,R)

wherein G represents an event label global relationship graph, m represents the constraint number of sampling nodes, i represents a target label, and R represents the weight R (i, j) between the nodes of the event label relationship graph.

S302, constructing a static semantic relation attribute graph, and constructing an event label semantic relation attribute graph G by taking semantic features as static attributes (label semantics are unrelated to time sequence) in the label relation graph ^sem The process is as follows:

(A) The extraction of the semantic features of the tags is difficult to obtain enough semantic information directly from the texts of the tags because most of the tags are abbreviations of event generalization formulas and are particularly obvious in English data sets. And supplementing the semantic information of the labels in the observable time window with the pushtext where the labels are located, so as to obtain the pushtext s with the labels, the maximum number of which is forwarded in the observable time, and taking the pushtext s as the characteristic for explaining the label semantics.

(B) Embedding and expressing the semantic features of the event labels, establishing a text library of semantic interpretation of the labels, calling a sentenceTransformer5 interface based on a bert model to carry out semantic initial vector embedding on the labels, and formalizing into:

H _s ＝{bert2sentence(i)}

wherein i is more than or equal to 0 and less than or equal to | V |, H _s ∈R ^|v|×d V represents the number of all label nodes, d represents the embedding dimension;

(C) Static semantic relationship attribute graph construction, using event tags in step 301 to propagate local influence relationship graph G _i Constructing static semantic relation attribute graph with the node attribute obtained in the last step

Wherein, V _i Set of associated nodes as target label i, E _i For a tag node relationship set, be>

Is node V _i Represents a collection of semantic features, an

S303, constructing an event label propagation time sequence attribute graph sequence, and propagating a time sequence attribute graph based on the event label

The model divides the node based on a time sequence subgraph, namely the effective node in a retention time window t>

And side->

And according to the group influence of the label nodes at different moments, calculating the node group characteristic Ht influencing propagation so as to make the ^ er/standard value ^ er>

Wherein t is not less than 0<n, represents the time window of the serialization. The dynamic influence process among different topic labels in the time window can be fed back by the sampling method, and the step is divided into four steps: />

(A) Calculating the social influence of participating in the group, defining the social influence of the label i in a t window, and calculating each label node as follows:

wherein N is ^t (N) represents the total number of tag tweets contained in the time window t, N ^t (E) Representing the total number of tweets of all associated label nodes in the subgraph under the time window t. Intuitively, the index tableTo the extent that the population participating in the tag has an overall impact on the event tag at time t.

(B) And calculating the participation population propagation influence, and defining the population propagation influence of hashtag at the time t. For each node the following calculations are performed:

wherein the content of the first and second substances,

representing the sum of the number of user fans that represent the pushings published by the tags at time t of the time window,

representing the total number of users who issue the event-related tweet within the time window t. The index expresses the degree of population interest of the population participating in the event tag with respect to the current tag.

(C) Constructing tag dynamic group attributes H ^inf (t) selecting a dynamic population impact characteristic H which is decisive for propagation ^inf (t) as a node attribute. Because the propagation influences of different labels are different in different time windows, the key indexes influenced by the two labels are used as the influence evaluation standard of label participation groups under the time sequence, and then the dynamic propagation attribute H of the target label i is obtained ^inf (t), formalized as follows:

H ^inf (t)＝[O(t),M(t)]

(D) Constructing a sequence of timing diagrams

Propagating an attribute H through dynamics ^inf (t) constructing an attribute map G based on dynamic attributes and dynamic relationships _t Then, acquiring a sequence of event label propagation time sequence attribute graphs in each time window:

wherein

Then, for tag i of each hot spot event in the dataset, a signature graph of target tag i is constructed according to the above-described method>

As sample input for the deep learning model.

And 4, step 4: constructing semantic features and time-series population features, wherein the constructed semantic features and time-series population features comprise: the method comprises a graph attention network-based propagation relation feature embedding layer, a feature fusion layer fusing static and dynamic features and a local feature aggregation layer based on a graph capsule network, wherein the graph attention network-based propagation relation feature embedding layer comprises a static semantic feature aggregation representation and a dynamic propagation feature aggregation representation. Aiming at the static semantic relation attribute graph, the graph is

The static semantic node vector is input into the static semantic feature aggregation representation layer to obtain a static semantic node vector representation matrix H ^sem For a sequence of dynamic propagation timing attribute maps, the propagation attribute map ≧ for each time window>

Obtaining a dynamic propagation matrix H 'formed in t windows in observable time as input into a dynamic propagation feature aggregation representation layer' _dym (ii) a Representing static semantic node vectors into a matrix H ^sem And a dynamic propagation matrix H' _dym As input to the feature fusion layer, obtaining a fusion feature H fused with the semantic feature and the propagation feature of the fusion label node _f (ii) a Map the propagation attribute->

Graph embedding vector h obtained by local feature aggregation layer based on graph capsule network as input _G 。

The method specifically comprises the following steps:

s401, learning a propagation relation feature embedding layer, wherein the propagation relation feature embedding layer is mainly responsible for learning propagation influence relations among different labels and comprises static semantic feature aggregation representation and dynamic propagation feature aggregation representation. A graph neural network GAT capable of learning influence relation weight is selected as a method for representing the characteristics of the graph neural network, different weight coefficients between nodes are learned in a GAT supervised learning process, and hidden mutual influence relations are obtained in node representation.

The input to GAT contains two parts, the node feature vector H ∈ R ^|V|×d And adjacency matrix N ^t ∈R ^|V|×|V| D represents the dimension of the feature and n represents the number of nodes in the subgraph. For each node vi and attribute hi in the graph, there are:

H＝[h ₁ ,h ₂ ,...,h ₁ ,] ^T

H _o ＝[h' ₁ ,h' ₂ ,...,h' _n ,] ^T ,H _o ∈R ^|V|×d'

wherein α ∈ R ^2×d’ Theta represents a trainable weight matrix, j epsilon N (i) represents that label nodes j and i have edges in the adjacency matrix (representing that j and i are related), a LeakyReLu method is selected as a nonlinear activation function, and a _i,j Represents the weight of the mutual influence relationship between the label node i and the label node j, H _O For the matrix formed by the output embedded vectors, | V | represents the number of graph nodes, d' represents the dimension of the output node characteristics, and | | represents the splicing operation.

For static semantic relationship attribute maps, will

The static semantic node vector is input into a learning propagation relation feature embedding layer to obtain a static semantic node vector expression matrix H _sem ，/>

For a sequence of dynamic propagation timing attribute maps, the propagation attribute map ≧ for each time window>

As input, obtaining tensor H 'formed in t windows in observable time' _dym ：

Yi Jian

Where t represents the number of time windows, | V | represents the number of sub-graph nodes, F' ₂ Representing the characteristic dimensions of the output node.

S402, the feature fusion module is mainly responsible for fusing semantic features and propagation features of the label nodes and further serves as input of a next layer, and in order to keep node semantic invariance in a time sequence process, the model is used for enabling the node to be H _sem In matrix H' _dym The feature representation layer is formalized as follows:

H _f ＝H _sem ||H’ _dym

| | denotes a stitching operation, then there is

S403, according to the characteristic that the local strong correlation of the semantic of the labels occurs in the event situation development, the labels exist in a strongly connected state in the graph, in order to capture the local strong correlation between the propagation graphs and better express the hierarchical structure relationship from local to whole, and the model provided by the invention is inspired by a graph capsule network by applying a routing mechanism to the group of label nodesVoting is carried out according to the effect so as to better capture the relationship from the local part to the whole part in the graph, then the hierarchical relationship from the local part to the whole part is inferred in a multi-round iteration mode, and finally the characteristic representation of the timing diagram is obtained. Specifically, the process is mainly composed of a time sequence attribute chart

Mapping to a graph embedding vector h _G The process comprises three steps:

(A) The timing diagram is layered, and the relation between the lower-layer part and the upper-layer whole part of the timing diagram is established through a voting mechanism. Where v denotes a capsule node in the lower-level voting graph and u denotes a capsule node in the higher-level routing graph, the feature fusion layer and the sequence chart are first used as an initial voting matrix, that is:

wherein N is ^t A adjacency matrix representing the lower-level voting map,

voting representing vectors, each vector &' s>

Viewed as v _i For capsule node u in high-level cluster _j Represents the voting weight, | N | represents the number of nodes in the capsule, F' ₃ ＝F' ₁ +F' ₁ Representing the dimensions of the input vector.

(B) Establishing dynamic routing, the task of this process being to iteratively calculate the routing weights C between the lower level nodes v and the higher level capsules u _i,j I.e. v of which lower layers _j The group can activate the high-level cluster node u _j (u _j Can be viewed as a more closely related relationship cluster in a subgraph) to obtain local-to-global activation relationships. Then, the votes in step (A) are weighted to obtain b in the lower layer diagram _j Routing to locally aggregated high-level graphsWeight C _i,j j, namely:

initializing to 0, and then carrying out R times of iterations through the following three formulas to calculate the dynamic routing weight->

The three formulas are specifically as follows:

b _i,j ＝b _i,j +v _j|i .u _j

wherein the function of the squash (non-linear "squeeze" function) is to calculate the capsule u _j Node v in _j|i Of node v, i.e. node v _i|j Vote for u _j Wherein v is _i|j ·u _j The effect of (a) is to compute the consistency between each set of votes and the higher-level capsule, which may focus more on aggregating information from neighbors that may be in the same cluster. After R iterations, this process obtains the capsule node u and the high-level abstract adjacency matrix of the high-level aggregation graph, which are expressed as:

G _route ＝(A,u)

A＝C ^T NC,A∈R ^|V|×|U|

where N is an adjacency matrix in the lower-level voting graph, | V | represents the number of nodes in the lower-level voting graph, | U | represents the number of nodes in the higher-level routing graph, and C represents the number of nodes in the higher-level routing graph _i,j A routing weight matrix from a lower layer to a higher layer is formed, C belongs to R ^|V|×|U| Therefore, a can be regarded as an adjacency matrix of the high-level routing graph, and u is a feature vector of a node in the high-level routing graph, which is obtained by the formula in the previous step. The above process can therefore be simplified to a down-conversion:

U,A＝Route(Vote(V,N))

namely:

U,A＝RV(V,N)

(C) Establishing a time sequence attribute graph representation, then repeating the step A and the step B, abstracting the high-level cluster graph to the whole graph embedded representation again, and the effect of doing so is to reserve the characteristics of the influence of local hashtag labels on the propagation to the maximum extent on the basis of a graph capsule network, namely:

and 1 in the formula is a graph representation which is obtained by only aggregating one node after the 1 representation is abstracted to a higher layer, and the feature vector of the node is represented as a propagation influence representation vector of the label relation graph under the current time window t. Then, for each time series property graph, a graph representation process is performed using a local aggregation layer:

s5, representing vectors h of different time sequence subgraphs in the sample _t Inputting the dynamic time sequence representation learning into an LSTM model, inputting the result into a full-connection layer to obtain a prediction result, and guiding the model learning by the error between the prediction result and the obtained real value label of the sample, wherein the method comprises the following three steps:

s501, then, a vector h is represented through different time sequence subgraphs in the sample ^t In order to utilize these timing characteristics and better capture the propagation effect caused by characteristic variation in the timing, a long-time and short-time memory LSTM kernel is applied in this part. The specific calculation formula is as follows:

h _t ＝tanh(c _i )*o _i

wherein h is _t Is an implicit feature of the output at time t,

representing the Hadamard product, U _j ，W _j ，b _j J e ({ z, f, o, c }) is a learnable parameter, z _i 、f _i And o _i The t-th window feature is a forgetting gate vector, an input gate vector and an output gate vector, respectively. And finally, predicting the result at the t +1 moment through a full connection layer:

Δy’＝σ(Wh _t )

s502, obtaining a real value label of a sample, and influencing a characteristic graph G for local propagation of a network event label i ⁱ Counting the samples in the t +1 snapshot

And set this value to the true value tag y of the sample _i . Because of the specificity of tag propagation in social networks, original and tagged tweets are also considered behaviors for propagation. The invention thus uses a number of originals including the forwarding number and the form of a labelTaken as an indicator of interest for a user group, i.e. </R>

Wherein->

Indicates how many people forwarded the tweet with the event label, <' > or>

Indicating how many originals cover the target tag.

S503, regarding the prediction of the popularity of the network, regarding the popularity prediction as a regression model, so MLSE is used herein as the target loss function based on the regression model:

where Δ y' represents a predicted popularity indicator, y _i The actual propagation index of the representation.

A computer arrangement comprising a memory storing a computer program and a processor implementing a method for network event tag popularity prediction based on multi-tag impact as described above when the computer program is executed by the processor.

A computer-readable storage medium, on which a program is stored, which, when executed by a processor, implements a multi-tag impact-based network event tag popularity prediction method as described above.

The invention establishes a network event label popularity prediction method based on multi-label influence, and concretely relates to a method for predicting the popularity of network event labels based on multi-label influence, which comprises the steps of firstly, aiming at the mutual propagation influence of the labels related to the event in the network propagation process, utilizing the correlation between the labels related to the event and the network event to collect and construct an event label propagation relation graph, a global label relation graph and a local influence attribute graph, then aiming at the dynamic change of the propagation process of social network events, generating event labels with semantic aggregation process and evolution process, utilizing a propagation relation feature embedding layer of a graph attention network, a feature fusion layer fusing static and dynamic features and a local feature aggregation layer based on a graph capsule network to construct semantic features and time sequence group features, providing more reliable accuracy for predicting the popularity of network event labels, and finally, carrying out dynamic time sequence representation learning through a time sequence model to predict the popularity of hot events. Aiming at different social platforms, a network event tag popularity prediction model based on multi-tag influence can be trained according to data, and the problem of hot event popularity prediction is solved better. The method can be used for predicting events with large public sentiment influence, such as social hotspot problems, public sentiment events, position events, viewpoint events, outturn events, international events and the like.

Compared with the prior art, the invention has the following technical effects:

1. aiming at obtaining hidden mutual influence relations in node representation in the blog spreading process of the social network, the invention designs a feature aggregation component based on a graph attention network, which comprises a static semantic feature aggregation process and a dynamic group spreading feature aggregation process, wherein semantic features and two group indexes are introduced as key features for influencing spreading, and the component models the association between labels and the intrinsic semantic relation of the labels.

2. The invention aims at the local strong correlation of the label semantics when the situation of an event develops, provides a method for representing and learning the group aggregation characteristic combining the static semantic characteristic and the dynamic time sequence characteristic by using a graph capsule network, captures the strong correlation of the locality between propagation graphs and better expresses the hierarchical structure relationship from the locality to the whole to model a propagation influence network so as to embody the mutual influence relationship of the groups under different labels on the event propagation,

3. aiming at the fact that the propagation process of the social network events is dynamically changed, the generated event labels can have a semantic gathering process and an evolution process, the feature representation of the propagation evolution process is learned by applying an LSTM time sequence model, semantic correlation and structural correlation are calculated, the semantic gathering process and the evolution process of the event labels are simulated, the potential features of propagation influence among different labels in the time sequence process are learned, and the popularity of the target label in the future is predicted.

Aiming at the problem that nodes which can generate a label relation network graph are increased continuously, a large number of noise labels are possibly generated, a label subgraph with strong influence is sampled by using a random walk algorithm, a key label set with influence is screened, an event label relation network subgraph based on time sequence is constructed, node noise of a global graph is avoided, and the operation complexity of deep learning is reduced.

Drawings

FIG. 1 is a schematic diagram illustrating steps of a method for predicting popularity of a multi-tag influenced network event tag according to the present invention;

FIG. 2 is a flowchart illustrating steps of a method for predicting popularity of a multi-tag influenced network event tag according to the present invention;

FIG. 3 is a diagram illustrating an internal structure of a computing device according to an embodiment.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.

Referring to fig. 1 and fig. 2, a method for predicting popularity of a network event tag based on multi-tag influence according to the present invention at least includes the following steps:

step 1: crawling tags and texts related to social network events in a large real-world data set propagated by the social network events;

step 2: the crawled data are cleaned and sorted; and preprocessing the relationship features.

And step 3: collecting event related labels aiming at network events, and constructing an event label propagation relation graph, a global label relation graph G and a local influence attribute graph G by using the association among the labels _i (ii) a And calculating the propagation characteristic attribute of the node, firstly carrying out window division on time according to observable data, and then constructing a local propagation influence characteristic graph G of a network event label i according to data under a time window _i Including extracting static semantic feature maps

And dynamic population propagation timing diagram sequences

Obtaining a dynamic propagation matrix H 'formed in t windows in observable time as input into a dynamic propagation feature aggregation representation layer' _dym (ii) a Representing static semantic node vectors into a matrix H ^sem And a dynamic propagation matrix H' _dym As input toA feature fusion layer for obtaining fusion features H fused with the semantic features and the propagation features of the fusion label nodes _f (ii) a Map the propagation attribute->

And 5: representing vectors h of different time sequence subgraphs in samples _t Inputting the dynamic time sequence representation learning into an LSTM model, inputting the result into a full-connection layer to obtain a prediction result, and guiding the model learning by the error between the prediction result and the obtained real value label of the sample.

Specifically, in one embodiment of the present invention, the method comprises the following steps:

step 1: in the event propagation of the social network in the real world, a subject word is used as a keyword to crawl social texts in a certain time period.

Step 2: cleaning and sorting the crawled data; and preprocessing the relation characteristic preprocessing. The method specifically comprises the following steps:

step 201: and (2) data cleaning, wherein the publication condition of the text pushing is recorded in the csv file after the data cleaning, and the data field comprises a user id for publishing the text pushing, text pushing content, the publication time and a text pushing interaction user id (comprising forwarding, original creation and comment, wherein the original interaction id is the same as the text sending id). Each record represents a tweet release related to an event;

step 202: relation characteristic preprocessing, namely additionally extracting titles representing general events in the preprocessing process according to the characteristics of the tweets in different data sets, and establishing a label relation for the tweets with a plurality of labels in one tweet;

and step 3: collecting event related labels aiming at network events, and constructing an event label propagation relation graph, a global label relation graph G and a local influence attribute graph G by using the association between the labels _i (ii) a And calculating the propagation characteristic attribute of the node, firstly carrying out window division on time according to observable data, and then constructing a network event label i according to data under a time windowLocal propagation influence signature G _i Including extracting static semantic feature maps

And dynamic population propagation timing diagram sequences

Step 301: constructing an event label propagation relation graph, and selecting two indexes as sources of label relations in the relation graph, wherein the two indexes are an explicit relation and an implicit semantic relation;

step A: and an explicit relation index which indicates that the labels have a propagation influence relation in the process of propagation if a user explicitly aggregates more than two labels in a tweet. The specific formalization is as follows:

wherein C is _o-occur (i, j) represents the frequency of two labels appearing in one tweet at the same time, N represents the set of all neighbor co-occurrence nodes of the node i, and the formula expresses the association degree of the label i and the label j in the whole event corpus and can represent the explicit propagation influence.

And B: and when the network event labels after explosion lack explicit relations, establishing relation association by utilizing the extracted explicit labels and through semantic similarity in an observation window, wherein the aim is to extract the event labels which are semantically related but do not have explicit marked # characters. Specifically, the model uses a point-by-Point Mutual Information (PMI) method to establish a link for the semantic relationship between the tags, and the method can express the weight relationship of the tags between event semantic data, and is specifically formalized as follows:

where d (i, j) is the total number of tweets in which the observable window event label i and label j occur simultaneously, where an explicit feature, unlike an explicit feature, is an explicit label with a # number evident in the tweet, and where d (i, j) is a co-occurrence that occurs in the tweet without the # being included. d (i) and d (j) are the total number of tweets in the set that contain i and j at least once. D is the total number of tweets in the social networking event. In general, a positive PMI value means that the tags in the event tag library have a high semantic correlation.

And C: and establishing relation weight between the social network event tags, wherein the implicit semantic relation only establishes association between the tag pairs with positive PMI values. And finally, determining the relationship weight between the social network event tags by an addition method:

R(i,j)＝{R _ex (i,j)+R _im (i,j)}

for social networking events, the above process can establish an event label relationship graph G = < V, E > in the data of the observable time window;

step D: sampling a fixed-size sub-network G for a target label i from a network event label relation global graph G _i Instead of directly processing the global network G itself with much noise information, the sampling method at this step selects the restart random walk algorithm RWR, and with the RWR method, the relevant label most likely to affect propagation can be selected, and the upper limit of the associated label node is constrained, that is:

G _i ＝RWR(G,i,m,R)

wherein G represents an event label global relationship graph, m represents the constraint number of sampling nodes, i represents a target label, and R represents the weight R (i, j) between the nodes of the event label relationship graph. In the present embodiment, m =15;

step 302: constructing a static semantic relation attribute graph, and constructing an event label semantic relation attribute graph G by taking semantic features as static attributes (label semantics are not related to time sequence) in the label relation graph ^sem To passThe process is as follows:

step A: the extraction of the semantic features of the tags is difficult to obtain enough semantic information directly from the texts of the tags because most of the tags are abbreviations of event generalization formulas and are particularly obvious in English data sets. And supplementing the semantic information of the labels in the observable time window with the pushtext where the labels are located, so as to obtain the pushtext s with the labels, the maximum number of which is forwarded in the observable time, and taking the pushtext s as the characteristic for explaining the label semantics.

And B: embedding and expressing event label semantic features, after establishing a text base of semantic interpretation of labels, calling a sensor transform 5 interface based on a bert model to embed semantic initial vectors into the labels, and formalizing the semantic initial vectors into the labels:

H _s ＝{bert2sentence(i)}

wherein i is more than or equal to 0 and less than or equal to | V |, H _s ∈R ^|v|×d And | V | represents the number of all label nodes, d represents the embedding dimension, in this embodiment, d =32, | V | =20;

and C: static semantic relationship attribute graph construction, using event tags in step 301 to propagate local influence relationship graph G _i Constructing static semantic relation attribute graph with the node attribute obtained in the last step

Is node V _i Represents a collection of semantic features, an

Step 303: constructing a sequence of event label propagation time sequence attribute graphs and propagating the time sequence attribute graphs based on the event labels

Book mouldType pair is split based on a temporal subgraph, i.e. nodes valid within a retention time window t>

And side->

Wherein t is not less than 0<n, represents the time window of the serialization. The dynamic influence process among different topic labels in a time window can be fed back by the sampling method, and the step is divided into two steps:

step A: calculating the social influence of participating in the group, defining the social influence of the label i in a t window, and calculating each label node as follows:

wherein N is ^t (N) represents the total number of tag tweets contained in the time window t, N ^t (E) Representing the total number of tweets of all associated label nodes in the subgraph under the time window t. Intuitively, the index expresses the overall degree of influence of the population participating in the tag on the event tag at time t.

And B: and calculating the participation population propagation influence, and defining the population propagation influence of hashtag at the time t. For each node the following calculations are performed:

wherein the content of the first and second substances,

indicating that a push by a tag was issued at time t representing a time windowThe sum of the number of user fans of the text,

Step C: constructing tag dynamic group attributes H ^inf (t) selecting a dynamic population impact characteristic H which is decisive for propagation ^inf (t) as a node attribute. Because the propagation influences of different labels are different in different time windows, the key indexes influenced by the two labels are used as the influence evaluation standard of label participation groups under the time sequence, and then the dynamic propagation attribute H of the target label i is obtained ^inf (t), formalized as follows:

H ^inf (t)＝[O(t),M(t)]

step D: constructing a sequence of timing diagrams

Propagating an attribute H through dynamics ^inf (t) constructing an attribute graph G based on dynamic attributes and dynamic relationships _t Then, acquiring a sequence of event label propagation time sequence attribute graphs in each time window:

wherein

As sample input for the deep learning model.

And 4, step 4: constructing semantic features and time-series population features, wherein the constructed semantic features and time-series population features comprise: graph attention network based propagation relationship featuresThe system comprises an embedding layer, a feature fusion layer fusing static and dynamic features, and a local feature aggregation layer based on a graph capsule network, wherein the propagation relation feature embedding layer based on the graph attention network comprises a static semantic feature aggregation representation and a dynamic propagation feature aggregation representation. Aiming at the static semantic relation attribute graph, the graph is

The method specifically comprises the following steps:

step 401, learning a propagation relation feature embedding layer, where the propagation relation feature embedding layer is mainly responsible for learning propagation influence relations among different tags, including static semantic feature aggregation representation and dynamic propagation feature aggregation representation. A graph neural network GAT capable of learning influence relation weight is selected as a method for representing the characteristics of the graph neural network, different weight coefficients between nodes are learned in a GAT supervised learning process, and hidden mutual influence relations are obtained in node representation.

The input of GAT contains two parts, the node feature vector H epsilon R ^V|×d And an adjacency matrix N ^t ∈R ^|V|×|V| D represents the dimension of the feature and n represents the number of nodes in the subgraph. For each node v in the graph _i And an attribute h _i The method comprises the following steps:

H＝[h ₁ ,h ₂ ,...,h ₁ ,] ^T

H _O ＝[h' ₁ ,h' ₂ ,...,h' _n ,] ^T ,H _o ∈R ^|V|×d'

wherein α ∈ R ^2×d’ The represents a trainable weight matrix, j epsilon N (i) represents that label nodes j and i have edges in an adjacent matrix (representing that j and i are related), and a LeakyReLu method is selected as a nonlinear activation function, a _i,j Represents the weight of the mutual influence relationship between the label node i and the label node j, H _O For the matrix formed by the output embedded vectors, | V | represents the number of graph nodes, d' represents the dimension of the output node characteristics, and | | represents the splicing operation. In the present embodiment, | V | =20, d' =64;

for static semantic relationship attribute maps, will

As input, obtaining tensor H 'formed in t windows in observable time' _dym ：

Yi Jian

Where t represents the number of time windows, | V | represents the number of sub-graph nodes, F' ₂ Representing the characteristic dimensions of the output node. In the present embodiment, | V | =20,t =8,f' ₁ ＝32，F' ₂ ＝32；

Step 402: the feature fusion module is mainly responsible for fusing semantic features and propagation features of the label nodes and further serving as input of the next layer, and in order to keep node semantic invariance in the time sequence process, the model combines H with the time sequence _sem In matrix H' _dym The feature representation layer is formalized as follows:

H _f ＝H _sem ||H’ _dym

i represents a splicing operation, then

In the present embodiment, | V | =20,t =8,f' ₁ ＝32，F’ ₂ ＝32；

Step 403: according to the characteristic that the local strong correlation of the semantic of the labels occurs in the event situation development, the labels exist in a strong connection state in the graph, in order to capture the strong correlation of the locality between the propagation graphs and better express the hierarchical structure relationship from the locality to the whole, and the model is inspired by a graph capsule network, a routing mechanism is applied to vote the group effect of label nodes so as to better capture the relationship from the locality to the whole in the graph, then the hierarchical relationship from the locality to the whole is inferred in a multi-round iteration mode, and finally the characteristic representation of the timing graph is obtained. Specifically, the process is mainly composed of a time sequence attribute chart

Mapping to a graph embedding vector h _G The process comprises three steps:

step A: the timing diagram is layered, and the relationship between the lower layer part and the upper layer of the timing diagram is established through a voting mechanism. Where v denotes a capsule node in the lower-level voting graph and u denotes a capsule node in the higher-level routing graph, the feature fusion layer and the sequence chart are first used as an initial voting matrix, that is:

wherein N is ^t A adjacency matrix representing a low-level voting map,

the votes representing the initial underlying network in the view capsule network represent vectors, each vector ≥ being>

Viewed as v _i For capsule node u in high-level cluster _j Represents the voting weight, | N | represents the number of nodes in the capsule, F' ₃ ＝F' ₁ +F' ₁ Representing the dimensions of the input vector. In the present embodiment, | N | =10,F' ₁ ＝32，F’ ₂ ＝32；F’ ₃ ＝64；

And B: establishing dynamic routing, the task of this process being to iteratively calculate the routing weights C between the lower level nodes v and the higher level capsules u _i,j I.e. v of which lower layers _j The group can activate the high-level cluster node u _j (u _j Can be viewed as a more closely related relationship cluster in a subgraph) to obtain local-to-global activation relationships. Then, the votes in step A are weighted to obtain v in the lower graph _j Routing weights C to locally aggregated high-level graphs _i,j j, namely:

initialized to 0, and then the dynamic routing weight is calculated by carrying out R iterations according to the following three formulas>

The three formulas are specifically as follows:

b _i,j ＝b _i,j +v _j|i .u _j

wherein the function of squash (nonlinear "squeeze" function) is to calculate the capsule u _j Node v in _j|i Of node v, i.e. node v _i|j Vote for u _j Wherein v is _i|j ·u _j The effect of (a) is to compute the correspondence between each set of votes and the higher-level capsule, which may focus more on aggregating information from neighbors that may be in the same cluster. After R iterations, this process obtains the capsule node u and the high-level abstract adjacency matrix of the high-level aggregation graph, which are expressed as:

G _route ＝(A,u)

A＝C ^T NC,A∈R ^|V|×|U|

wherein, N is an adjacency matrix in the lower-level voting graph, | V | represents the node number of the lower-level voting graph, | U | represents the node number of the higher-level routing graph, and C represents the node number of the lower-level voting graph _i,j A routing weight matrix from a lower layer to a higher layer is formed, C belongs to R ^|V|×|U| Therefore, a can be regarded as an adjacency matrix of the high-level routing graph, and u is a feature vector of a node in the high-level routing graph, which is obtained by the formula in the previous step. The above process can thus be simplified to a down-conversion:

U,A＝Route(Vote(V,N))

namely:

U,A＝RV(V,N)

and C: establishing a time sequence attribute graph representation, then repeating the step A and the step B, abstracting the high-level cluster graph to the whole graph embedded representation again, and the effect of doing so is to reserve the characteristics of the influence of local hashtag labels on the propagation to the maximum extent on the basis of a graph capsule network, namely:

and 5: representing vector h of different time sequence subgraphs in sample _t Inputting the dynamic time sequence representation learning into an LSTM model, inputting the result into a full-connection layer to obtain a prediction result, and guiding the model learning by the error between the prediction result and the obtained real value label of the sample, wherein the method comprises the following three steps:

/>

h _t ＝tanh(c _i )*o _i

wherein h is _t Is an implicit feature of the output at time t,

representing the Hadamard product, U _j ，W _j ，b _j J ∈ ({ z, f, o, c }) is a learnable parameter, z _i 、f _i And o _i The vector of left-behind gates, the vector of input gates and the vector of output gates of the tth window feature, respectively. And finally, predicting the result at the t +1 moment through a full connection layer:

Δy’＝σ(Wh _t )

And set this value to the true value tag y of the sample _i . Because of the specificity of tag propagation in social networks, original and tagged tweets are also considered to act on propagation. The invention thus uses a sum of the forwarded number and the number of originals in the form of a label as an indicator of interest for a user population, i.e. < >>

Wherein +>

Indicates how many people forwarded the tweet with the event label, <' > or>

Indicating how many originals cover the target tag.

S503, regarding the network popularity prediction, the popularity prediction is regarded as a regression model, so MLSE is used as an objective loss function based on the regression model in the text:

Such an architecture has two advantages:

(1) More excellent characteristic modeling capability. The propagation influence mechanism among event labels is considered, the data rule of tweet propagation is researched, the construction method of the event label propagation association relation is designed, then the group index behind the event label propagation and the semantic features for triggering label aggregation are extracted, and a network event label popularity prediction model based on multi-label influence is designed for the labels which become hot events in the social network. The model considers the interaction relation of the labels in network propagation, and predicts the propagation popularity in the social network through the semantic meaning and the group relation implied by the labels.

(2) More accurate tag popularity prediction capabilities. The method has remarkable performance improvement on the task of predicting the popularity of the event label. Meanwhile, experiments verify that core indexes MLSE of event labels on data sets in the propagation process exceed the existing optimal reference popularity prediction model, compared with the optimal baseline model, 25.9% and 29.3% are respectively improved on the two instantiated data sets, which shows that the proposed label propagation influence relationship and semantic features have great help on the popularity prediction model, and proves that the proposed model is superior in performance, the mutual propagation influence has obvious influence on popularity, and the proposed assumption of the model is reliable and effective.

In the embodiment, the characteristics of label characteristics, static semantics, dynamic groups and semantic aggregation in information propagation are utilized, and more accurate prediction is performed on the network event labels influenced by multiple labels, so that different deep learning model parameters with different pertinences can be obtained by adjusting different social texts, and prediction problems in semantic categories, such as propagation prediction of viewpoints and places, public sentiment event prediction and the like, are better solved.

The method provided by the embodiment can be used for online public sentiment event prediction, opinion and stand propagation prediction, public sentiment event monitoring, rumor monitoring, public event emergency prevention and the like, particularly can be used for predicting hot events influenced by multiple tags in a social network, such as public sentiment hot events, opinion hot events, stand hot events and the like, and can also be used for network information supervision of enterprises, and whether information issued by the enterprises is massively propagated in the future is predicted.

In an embodiment of the present invention, there is also provided a computer apparatus, including a memory and a processor, where the memory stores a computer program, and the processor implements the method for predicting popularity of network event tags based on multi-tag influence as described above when executing the computer program.

The computer apparatus may be a terminal, and its internal structure diagram may be as shown in fig. 3. The computer device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program when executed by a processor implements a network event tag popularity prediction method based on multi-tag impact. The display screen of the computer device can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer device can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer device, an external keyboard, a touch pad or a mouse and the like.

The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. The memory is used for storing programs, and the processor executes the programs after receiving the execution instructions.

The processor may be an integrated circuit chip having signal processing capabilities. The processor may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like. The Processor may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with the present application and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In an embodiment of the present invention, there is also provided a computer-readable storage medium having a program stored thereon, characterized in that: the program when executed by a processor implements a heterogeneous network based social network influence prediction method as described above.

As will be appreciated by one of skill in the art, embodiments of the present invention may be provided as a method, computer apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, computer apparatus, or computer program products according to embodiments of the invention. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart and/or flowchart block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart.

The application of the multi-tag influence-based network event tag popularity prediction method, the computer device and the computer-readable storage medium provided by the invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A network event label popularity prediction method based on multi-label influence is characterized by comprising the following steps:

s2, cleaning and arranging the crawled data; preprocessing the relation characteristics;

the method specifically comprises the following steps:

s201, data cleaning is carried out, publication conditions of the pushed text are recorded in a csv file after the data cleaning, and data fields comprise a user id for publishing the pushed text, pushed text content, publication time and a user id for pushed text interaction; each record represents a tweet release related to an event;

s202, relation characteristic preprocessing, namely additionally extracting titles representing general events in the preprocessing process according to the characteristics of the text pushing in different data sets, and establishing label relations for the text pushing with a plurality of labels in one text pushing;

And a dynamic population propagation timing diagram sequence>

S4, constructing semantic features and time sequence group features;

the constructed semantic features and temporal population features include: the method comprises a graph attention network-based propagation relation feature embedding layer, a feature fusion layer fusing static and dynamic features and a local feature aggregation layer based on a graph capsule network, wherein the graph attention network-based propagation relation feature embedding layer comprises a static semantic feature aggregation representation and a dynamic propagation feature aggregation representation;

aiming at the static semantic relation attribute graph, the graph is

Obtaining a dynamic propagation matrix H 'formed in t windows in observable time as input into a dynamic propagation feature aggregation representation layer' _dym ；

Representing static semantic node vectors into a matrix H ^sem And dynamic propagation matrix H' _dym As input to the feature fusion layer, obtaining fusion feature H fused with semantic feature and propagation feature of fusion label node _f (ii) a Will propagate attribute maps

Graph embedding vector h obtained by local feature aggregation layer based on graph capsule network as input _G ；

S5, representing vectors h of different time sequence subgraphs in the sample _t Inputting the data into an LSTM model for dynamic time sequence representation learning, inputting the result into a full-connection layer to obtain a prediction result, and guiding model learning by the error between the prediction result and the obtained true value label of the sample.

2. The method for predicting popularity of network event tags based on multi-tag influence according to claim 1, wherein S3 specifically includes:

s301, constructing an event label propagation relation graph, and selecting two indexes as sources of label relations in the relation graph, wherein the two indexes are an explicit relation and an implicit semantic relation;

s302, constructing a static semantic relation attribute graph, and constructing an event label semantic relation attribute graph G by taking semantic features as static attributes in a label relation graph ^sem ；

And side->

Wherein t is not less than 0<n, representing a serialized time window.

3. The method for predicting popularity of network event tags based on multi-tag influence according to claim 2, wherein S301 specifically includes:

(A) An explicit relation index, which indicates that the tags have a propagation influence relation in the propagation process if a user explicitly aggregates more than two tags in a tweet; the specific formalization is as follows:

wherein C is _o-occur (i, j) represents the frequency of two labels appearing in one tweet at the same time, N represents the co-occurrence node set of all neighbors of the node i, and the formula expresses the association degree of the labels i and the labels j in the whole event corpus and represents the explicit propagation influence;

(B) Implicit semantic relation indexes, when the network event labels after explosion lack explicit relations, establishing relation association by using the extracted explicit labels and semantic similarity in an observation window, and extracting event labels which are semantically related but do not have explicit marked # characters;

specifically, the model establishes a link for the semantic relationship between the tags by using a point-by-point mutual information PMI method, and the specific formalization is as follows:

wherein d (i, j) is the total number of tweets in which the event tag i and the event tag j of the observable window occur simultaneously, where, unlike the explicit feature, the explicit feature is an explicit tag with a # number in the tweets explicitly, and where d (i, j) is a co-occurrence that appears in the tweets without containing # s; d (i) and d (j) are the total number of the tweets in the set, wherein the tweets at least comprise i and j once; d is the total number of tweets in the social networking event;

(C) Establishing relation weight between social network event tags, wherein an implicit semantic relation only establishes association between tag pairs with positive PMI values; and finally, determining the relationship weight between the social network event tags by an addition method:

R(i,j)＝{R _ex (i,j)+R _im (i,j)}

for social network events, the above process establishes an event label relationship graph G = < V, E > in the data of the observable time window;

(D) Sampling a fixed-size sub-network G for a target label i from a network event label relation global graph G _i Instead of directly processing the global network G itself with much noise information, the sampling method at this step selects the restart random walk algorithm RWR, selects the relevant label most likely to affect propagation by using the RWR method, and constrains the upper limit of the associated label node, namely:

G _i ＝RWR(G,i,m,R)

wherein G represents an event label global relationship graph, m represents the constraint quantity of sampling nodes, i represents a target label, and R represents the weight R (i, j) between the nodes of the event label relationship graph.

4. The method for predicting popularity of network event tags based on multi-tag influence according to claim 2, wherein S302 specifically includes:

(A) Extracting semantic features of the labels;

supplementing semantic information of the labels in an observable time window with the pushtext where the labels are located, acquiring the pushtext s with the labels and the maximum forwarded quantity in the observable time, and taking the pushtext s as the feature for explaining the label semantics;

(B) Embedding and expressing the semantic features of the event labels, establishing a text library of semantic interpretation of the labels, calling a sentenceTransformer5 interface based on a bert model to embed semantic initial vectors into the labels, and formalizing the semantic initial vectors into the semantic initial vectors:

H _s ＝{bert2sentence(i)}

wherein i is not less than 0 and not more than V, H _s ∈R ^|v|×d V represents the number of all label nodes, and d represents the embedding dimension;

(C) Constructing a static semantic relation attribute graph, and propagating a local influence relation graph G by using event labels in step 301 _i Constructing a static semantic relation attribute graph with the node attributes obtained in the last step:

wherein, V _i Set of associated nodes as target label i, E _i In the form of a set of label node relationships,

is node V _i Represents a collection, and>

5. the method for predicting popularity of network event tags based on multi-tag influence according to claim 1, wherein S303 specifically includes:

(A) Calculating the social influence of participating groups, defining the social influence of the group of the label i in a t window, and calculating each label node as follows:

wherein N is ^t (N) represents the total number of tag tweets contained in the time window t, N ^t (E) The total number of the tweets of all the associated label nodes in the subgraph under the time window t is represented, and the index expresses the overall influence degree of the group participating in the label on the event label at the time t;

(B) Calculating the influence of participating population propagation, and defining the population propagation influence of hashtag at the time t; for each node the following calculations are performed:

wherein the content of the first and second substances,

the total number of the fan users expressing the push text issued by the label at the time t of the time window is represented; />

Representing the total amount of users who issue event-related tweets under a time window t, wherein the index expresses the group attention degree of the groups participating in the event labels to the current labels;

(C) Constructing tag dynamic group attributes H ^inf (t) selecting a dynamic population impact characteristic H decisive for dissemination ^inf (t) as a node attribute; the key indexes influenced by the two labels are used as the influence evaluation standard of the label participation groups under the time sequence, and the dynamic propagation attribute H of the target label i is obtained ^inf (t), formalized as follows:

H ^inf (t)＝[O(t),M(t)]

(D) Constructing a sequence of timing diagrams

wherein

As sample input for the deep learning model.

6. The method for predicting the popularity of the network event tags based on the multi-tag influence as claimed in claim 1, wherein the step 4 specifically comprises:

s401, learning a propagation relation feature embedding layer, wherein the propagation relation feature embedding layer is responsible for learning propagation influence relations among different labels and comprises static semantic feature aggregation representation and dynamic propagation feature aggregation representation; selecting a graph neural network GAT capable of learning influence relation weight as a method for representing the characteristics of the graph neural network, learning different weight coefficients among nodes in a GAT supervised learning process, and obtaining a hidden mutual influence relation in node representation;

the input to GAT contains two parts, the node feature vector H ∈ R ^|V|×d And adjacency matrix N ^t ∈R ^|V|×|V| D represents the dimension of the feature, and n represents the number of nodes in the subgraph; for each node v in the graph _i And an attribute h _i The method comprises the following steps:

H＝[h ₁ ,h ₂ ,...,h ₁ ,] ^T

H _o ＝[h' ₁ ,h' ₂ ,...,h' _n ,] ^T ,H _o ∈R ^|V|×d'

wherein α ∈ R ^2×d’ Θ represents a trainable weight matrix, j ∈ N (i) represents that the label nodes j and i have edges in the adjacency matrix, that is, j and i are related; the LeakyReLu method was chosen as the nonlinear activation function, a _i，j Represents the weight of the mutual influence relationship between the label node i and the label node j, H _O For a matrix formed by output embedded vectors, | V | represents the number of graph nodes, d' represents the dimensionality of output node features, | | represents splicing operation;

for static semantic relationship attribute maps, will

As input, obtaining tensor H 'formed in t windows in observable time' _dym ：

Where t represents the number of time windows, | V | represents the number of sub-graph nodes, F' ₂ Representing a characteristic dimension of the output node;

s402, the feature fusion module is responsible for fusing semantic features and propagation features of the label nodes and further serves as input of the next layer, and the model is used for enabling the model to be H _sem In matrix H' _dym The feature representation layer is formalized as follows:

H _f ＝H _sem ||H′ _dym

i represents a splicing operation, then

S403, according to the characteristic that the local strong correlation of the label semantics occurs in the event situation development, voting is carried out on the group effect of the label nodes by the model through a routing mechanism, the relation from the local part to the whole part in the graph is captured, then the hierarchical relation from the local part to the whole part is inferred through a multi-round iteration mode, and finally the characteristic representation of the timing diagram is obtained.

7. The method for predicting popularity of network event tags based on multi-tag influence as claimed in claim 6, wherein the time sequence attribute map is used in S403

Mapping to a graph embedding vector h _G The process comprises three steps:

(A) Layering a time sequence diagram, and establishing the relation between a low-level local part and a high-level whole part of the time sequence diagram through a voting mechanism; where v denotes a capsule node in the lower-level voting graph and u denotes a capsule node in the higher-level routing graph, the feature fusion layer and the sequence chart are first used as an initial voting matrix, that is:

/>

wherein N is ^t A adjacency matrix representing the lower-level voting map,

Is viewed as v _i For capsule node u in high-level cluster _j Represents the voting weight, | N | represents the number of nodes in the capsule, F' ₃ ＝F' ₁ +F' ₁ A dimension representing an input vector;

(B) Establishing dynamic routing, the task of this process being to iteratively calculate the routing weights C between the lower level nodes v and the higher level capsules u _i，j I.e. v of which lower layers _j The group can activate the high-level cluster node u _j To obtain local to global activation relationships; weighting the votes in the step (A) to obtain v in the low-level graph _j Routing weights C to locally aggregated high-level graphs _i,j j, namely:

The three formulas are specifically as follows:

b _i,j ＝b _i,j +v _j|i ·u _j

wherein the function of the squash nonlinear 'squeezing' function is to calculate the capsule u _j Node v in _j|i Of node v, i.e. node v _i|j Vote to u _j Wherein v is _i|j ·u _j The role of (a) is to calculate the consistency between each set of votes and the high-level capsule; after R iterations, this process obtains the capsule node u and the high-level abstract adjacency matrix of the high-level aggregation graph, which are expressed as:

G _route ＝(A,u)

A＝C ^T NC,A∈R ^|V|×|U|

wherein, N is an adjacency matrix in the lower-level voting graph, | V | represents the node number of the lower-level voting graph, | U | represents the node number of the higher-level routing graph, and C represents the node number of the lower-level voting graph _i,j A constructed routing weight matrix from lower layer to higher layer, C ∈ R ^|V|×|U| A is the adjacency matrix of the high-level routing graph, and u is the high-level routingThe feature vector of the node in the graph, step B is defined as the following transformation:

U,A＝Route(Vote(V,N))

namely:

U,A＝RV(V,N)

(C) Establishing a time sequence attribute graph representation, then repeating the step (A) and the step (B), abstracting the high-level cluster graph to the whole graph embedded representation again, and keeping the characteristics of the influence of a local hash tag on propagation on the basis of a graph capsule network, namely:

1 in the formula is a graph representation which is obtained by only aggregating one node after abstraction to a higher layer, and the feature vector of the node is represented as a propagation influence representation vector of the label relation graph under the current time window t; for each timing attribute graph, a graph representation process is performed using a local aggregation layer:

8. the method for predicting popularity of network event tags based on multi-tag influence according to claim 1, wherein S5 specifically includes:

s501, representing vector h through different time sequence subgraphs in samples ^t Applying a long-time memory LSTM kernel; the specific calculation formula is as follows:

h _t ＝tanh(c _i )*o _i

wherein h is _t Is an implicit feature of the output at time t,

representing the Hadamard product, U _j ，W _j ，b _j J ∈ ({ z, f, o, c }) is a learnable parameter, z _i 、f _i And o _i A forgetting gate vector, an input gate vector and an output gate vector which are the t-th window characteristic respectively;

and finally, predicting the result at the t +1 moment through a full connection layer:

Δy′＝σ(Wh _t )

And set this value to the true value tag y of the sample _i ；

Using a sum of the number of forwards and the number of originals in the form of a label as an indicator of interest to a user population, i.e.

Wherein->

Indicates how many people forwarded the tweet with the event label, <' > or>

Indicating how many original tweets cover the target tags;

s503, regarding the prediction of the popularity of the network, regarding the popularity prediction as a regression model, and using MLSE as a target loss function based on the regression model:

where Δ y' represents the predicted popularity index, y _i The actual propagation index of the representation.

9. A computer apparatus comprising a memory and a processor, the memory storing a computer program, characterized in that: the processor, when executing the computer program, implements a network event tag popularity prediction method based on multi-tag impact as described above.

10. A computer-readable storage medium on which a program is stored, characterized in that: the program when executed by a processor implements a method for predicting popularity of a tag for a network event based on multi-tag impact as described above.