CN113626724A - Propagation network reconstruction method and device based on node state observation result - Google Patents
Propagation network reconstruction method and device based on node state observation result Download PDFInfo
- Publication number
- CN113626724A CN113626724A CN202110777832.4A CN202110777832A CN113626724A CN 113626724 A CN113626724 A CN 113626724A CN 202110777832 A CN202110777832 A CN 202110777832A CN 113626724 A CN113626724 A CN 113626724A
- Authority
- CN
- China
- Prior art keywords
- node
- nodes
- bayesian
- state
- propagation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 230000008569 process Effects 0.000 claims abstract description 39
- 208000015181 infectious disease Diseases 0.000 claims abstract description 38
- 230000011218 segmentation Effects 0.000 claims abstract description 23
- 239000013256 coordination polymer Substances 0.000 claims description 14
- 238000010586 diagram Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 abstract description 13
- 230000009471 action Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to a propagation network reconstruction method and a device based on node state observation results, which comprises the following steps: calculating Bayesian mutual information values between any two nodes according to the node infection state data after the propagation process is finished; clustering the Bayesian mutual information values by using a K-Means algorithm to obtain a segmentation threshold of the Bayesian mutual information; and acquiring a directed edge set of the influence relationship between the nodes according to the segmentation threshold and the observation data probability distribution parameters of the nodes, and reconstructing the propagation network structure according to the directed edge set. The invention reconstructs the transmission network structure based on the node infection state data which is easy to collect and relatively accurate, and accurately acquires the potential influence relationship between the nodes, thereby helping to understand the transmission path of the information and providing support for the intervention policy of the corresponding transmission event.
Description
Technical Field
The invention relates to the technical field of information retrieval, in particular to a propagation network reconstruction method and device based on node state observation results.
Background
The rapid development of the current information technology enriches the communication modes and communication frequencies among people, promotes the formation of a complex propagation network with artificial nodes, and performs propagation of content such as speech, information and the like, thereby affecting the surrounding related people. Analyzing and mastering the information propagation path in the network can help us to better understand the inherent attributes in the propagation network, so that the targeted intervention is performed on the propagation events in the network. For example, understanding the transmission pathway of diseases in the population can more effectively guide the formulation of prevention and treatment measures and block the transmission of diseases. The reconstruction of the propagation network mainly finds out potential influence relations in the network according to the historical propagation track data observed in the propagation network, so as to better guide future propagation events.
The propagation trajectory data relied on by the existing reconstruction method is mainly exact time data of each node in a propagation network influenced by a propagation event in the propagation process, and it is considered that influence relations exist among nodes continuously infected within a certain time, but the inference method needing exact time is usually limited, because timely observing whether the nodes are influenced or not can consume a large amount of time and resources, the cost can be hard to bear, even if the monitoring cost can be borne, when the nodes are infected to the infection symptoms and have a latent period, the collected time data can be inaccurate, and the accuracy of inference based on the time data is influenced, so that the inference method is difficult to apply in real life. Therefore, the invention explores an inference method based on node infection state data, and infers the node influence relationship in the propagation network based on the relatively easily acquired and accurate data of whether the node is infected after the propagation event is finished, so as to reconstruct the potential propagation network structure.
Disclosure of Invention
The embodiment of the invention provides a propagation network reconstruction method and a device based on node state observation results, which are used for reconstructing a potential propagation network structure by using node infection state data after propagation time is finished.
On one hand, a propagation network reconstruction method based on node state observation results is provided, which comprises the following steps: calculating Bayesian mutual information values between any two nodes according to the node infection state data after the propagation process is finished;
clustering the Bayesian mutual information values by using a K-Means algorithm to obtain a segmentation threshold of the Bayesian mutual information;
and acquiring a directed edge set of the influence relationship between the nodes according to the segmentation threshold and the observation data probability distribution parameters of the nodes, and reconstructing the propagation network structure according to the directed edge set.
In some embodiments, calculating a bayesian mutual information value between any two nodes according to the node infection state data after the propagation process is finished includes the steps of:
calculating the Bayesian mutual information value according to a first formula;
putting the calculated Bayesian mutual information into a set BMI;
the first formula is:
among them, Bayesian nMI (v)i,vj) Bayesian mutual information; v. ofiAnd vjFor any two nodes in a propagation network, all nodes in the propagation network are contained in V ═ { V ═ V 1,v2,..,vnIn (1) }; ψ (·) denotes a Digamma function; beta represents the number of propagation processes; x is the number ofi、xjRespectively correspondingly indicating the nodes v after the propagation process is finishedi、vjThe infection state of (1) indicates infected, the value of 0 indicates not infected, i is more than or equal to 1, and j is more than or equal to n;representing a node viAt state x after the end of the beta propagation processiThe number of times of the operation of the motor,representing a node vjAt state x after the end of the beta propagation processjThe number of times of the operation of the motor,represents the node v after the beta propagation process is finishediIn state xiAnd node vjIn state xjThe number of times of (c); alpha is alphaijRepresenting a node viAnd vjIs measured in the measured data.
In some embodiments, the node v is obtainediAnd vjIs measured by the probability distribution parameter alpha of the observed dataijThe method comprises the following steps:
According to a second formulaPerforming iterative update to obtain new probability distribution parametersAnd when the number of updates exceeds a preset threshold τite1OrAnddifference value betweenLess than a predetermined threshold τalpha1Stopping updating to stop updatingAs said αij;
The second formula is:
in some embodiments, clustering the bayesian mutual information values using a K-Means algorithm to obtain a segmentation threshold of the bayesian mutual information includes:
Setting the clustering number to be 2 and clustering Bayesian mutual information values in the BMI by using a K-Means algorithm;
taking the maximum value of one group of elements with smaller average value in the two groups of clustered elements as the segmentation threshold value tau;
obtaining a node set CPi={vj|vj∈V\{vi},Bayesian(vj,vi)>τ}。
In some embodiments, obtaining a directed edge set of an influence relationship between nodes according to the segmentation threshold and the observation data probability distribution parameters of the nodes, and reconstructing a propagation network structure by using the directed edge set, includes:
for each node viTraverse the set CPiAll nodes with the number of middle nodes less than log beta are used for obtaining the combination
Computing the node combination FiAs each node viScore (v) when parent nodes of (c) are aggregatedi,Fi);
Score according to said score (v)i,Fi) High to low order pair FiSorting is carried out, and F after sorting is carried outiIs added to the set CFi;
For each viInitializing an empty set PiAnd go through viCorresponding set CFiMiddle parent node combining up to CFiIs traversed or set PiThe number of nodes contained in (1) exceeds log beta so that CFiAll nodes of each father node combination in the set PiPerforming the following steps;
initializing a set of edgesIs given as { V, E }, where E is a set of directed edges affecting relationships in the propagation network, and V is an edge i→vjRepresenting an infected node viWith a probability wiSuccessfully infecting node vj;
Successively traversing each node viParent node set P ofiAnd updating the set of edges E ═ E utou { v } in graph Gj→vi|vj∈PiAnd taking the result after traversing and updating as a reconstructed propagation network structure diagram.
In some embodiments, the score (v) is calculatedi,Fi) The method comprises the following steps:
calculating the score according to a third formula (v)i,Fi);
The third formula is:
wherein, | FiI represents a parent node set FiThe number of nodes contained in the data; n is a radical ofijRepresenting a parent node set FiIn a state after the beta propagation process is finishedNumber of epochs, pijIs of length | FiVector of |, which represents FiThe infection status of each node in the cluster; n is a radical ofij1(Nij0) Denotes the node v after the end of the beta propagation processiIn an infected state (not infected state) and a set of nodes FiIn state pijThe number of occurrences of time; Γ (·) represents a gamma function; alpha is alphaij1(αij0) Representing a node viIn an infected state (not infected state) and a set of nodes FiIn state pijIs measured in the measured data.
In some embodiments, α is calculatedijk(k ∈ {0,1}), comprising the steps of:
According to a fourth formula toPerforming iterative updates to obtain new parametersAnd when the number of updates exceeds a preset threshold τ ite2OrAndthe difference betweenLess than a predetermined threshold τalpha2Stopping updating to stop updatingAs said αijk(k∈{0,1});
The fourth formula is:
in another aspect, a propagation network reconstruction apparatus based on node state observation results is provided, which includes: an infection status data acquisition module to:
calculating Bayesian mutual information values between any two nodes according to the node infection state data after the propagation process is finished;
clustering the Bayesian mutual information values by using a K-Means algorithm to obtain a segmentation threshold of the Bayesian mutual information;
a propagation network structure reconstruction module to:
and acquiring a directed edge set of the influence relationship between the nodes according to the segmentation threshold and the observation data probability distribution parameters of the nodes, and reconstructing the propagation network structure according to the directed edge set.
In some embodiments, the infection status data obtaining module is further configured to:
calculating the Bayesian mutual information value according to a first formula;
putting the calculated Bayesian mutual information into a set BMI;
the first formula is:
among them, Bayesian nMI (v)i,vj) Bayesian mutual information; v. ofiAnd vjFor any two nodes in a propagation network, all nodes in the propagation network are contained in V ═ { V ═ V 1,v2,..,vnIn (1) }; ψ (·) denotes a Digamma function; beta represents the number of propagation processes; x is the number ofi、xjRespectively correspondingly indicating the nodes v after the propagation process is finishedi、vjThe infection state of (1) indicates infected, the value of 0 indicates not infected, i is more than or equal to 1, and j is more than or equal to n;representing a node viAt state x after the end of the beta propagation processiThe number of times of the operation of the motor,representing a node vjAt state x after the end of the beta propagation processjThe number of times of the operation of the motor,represents the node v after the beta propagation process is finishediIn state xiAnd node vjIn state xjThe number of times of (c); alpha is alphaijRepresenting a node viAnd vjThe probability distribution parameter of the observed data of (1);
setting the clustering number to be 2 and clustering Bayesian mutual information values in the BMI by using a K-Means algorithm;
taking the maximum value of one group of elements with smaller average value in the two groups of clustered elements as the segmentation threshold value tau;
obtaining a node set CPi={vj|vj∈V\{vi},Bayesian(vj,vi)>τ}。
In some embodiments, the propagation network structure reconstruction module includes:
for each node viTraverse the set CPiAll nodes with the number of middle nodes less than log beta are used for obtaining the combination
Computing the node combination FiAs each node viScore (v) when parent nodes of (c) are aggregatedi,Fi);
Score according to said score (v)i,Fi) High to low order pair F iSorting is carried out, and F after sorting is carried outiIs added to the set CFi;
For each viInitializing an empty set PiAnd go through viCorresponding set CFiMiddle parent node combining up to CFiIs traversed or set PiThe number of nodes contained in (1) exceeds log beta so that CFiAll nodes of each father node combination in the set PiPerforming the following steps;
initializing a set of edgesIs given as { V, E }, where E is a set of directed edges affecting relationships in the propagation network, and V is an edgei→vjRepresenting infected knotsPoint viWith a probability wiSuccessfully infecting node vj;
Successively traversing each node viParent node set P ofiAnd updating the set of edges E ═ E utou { v } in graph Gj→vi|vj∈PiAnd taking the result after traversing and updating as a reconstructed propagation network structure diagram.
The embodiment of the invention reconstructs the transmission network structure based on the node infection state data which is easy to collect and relatively accurate, and accurately acquires the potential influence relationship between the nodes, thereby helping to understand the transmission path of the information and providing support for the intervention policy of the corresponding transmission event.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a propagation network reconstruction method based on node state observation results according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a propagation network reconstruction apparatus based on a node state observation result according to an embodiment of the present invention;
fig. 3 is a result of constructing an influence relationship graph corresponding to an F value on an artificial network generated by an LFR algorithm according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a propagation network reconstruction method based on node state observation results, which includes the steps of:
s100, calculating Bayesian mutual information values between any two nodes according to the node infection state data after the propagation process is finished;
S200, clustering the Bayesian mutual information value by using a K-Means algorithm to obtain a segmentation threshold of the Bayesian mutual information;
s300, acquiring a directed edge set of influence relations between the nodes according to the segmentation threshold and the observation data probability distribution parameters of the nodes, and reconstructing a propagation network structure according to the directed edge set.
The embodiment of the invention reconstructs the transmission network structure based on the node infection state data which is easy to collect and relatively accurate, and accurately acquires the potential influence relationship between the nodes, thereby helping to understand the transmission path of the information and providing support for the intervention policy of the corresponding transmission event.
In some embodiments, S100 comprises the steps of:
s110, calculating the Bayesian mutual information value according to a first formula;
s120, putting the Bayesian mutual information obtained by calculation into a set BMI;
the first formula is:
among them, Bayesian nMI (v)i,vj) Bayesian mutual information; v. ofiAnd vjFor any two nodes in a propagation network, all nodes in the propagation network are contained in V ═ { V ═ V1,v2,..,vnIn (1) }; ψ (·) denotes a Digamma function; beta represents the number of propagation processes; x is the number ofi、xjRespectively correspondingly indicating the nodes v after the propagation process is finishedi、vjThe infection state of (1) indicates infected, the value of 0 indicates not infected, i is more than or equal to 1, and j is more than or equal to n; Representing a node viAt state x after the end of the beta propagation processiThe number of times of the operation of the motor,representing a node vjAt state x after the end of the beta propagation processjThe number of times of the operation of the motor,represents the node v after the beta propagation process is finishediIn state xiAnd node vjIn state xjThe number of times of (c); alpha is alphaijRepresenting a node viAnd vjIs measured in the measured data.
In some embodiments, the obtaining node v in S110iAnd vjIs measured by the probability distribution parameter alpha of the observed dataijThe method comprises the following steps:
S112, according to the second formulaPerforming iterative update to obtain new probability distribution parametersAnd when the number of updates exceeds a preset threshold τite1OrAnddifference value betweenLess than a predetermined threshold τalpha1Stopping updating to stop updatingAs said αij;
The second formula is:
preferably, τ is setite1∈N+,τalpha1∈(0,1)。
In this example,. tau.ite1The larger the value is, the more the number of update iterations is, taualpha1The smaller the value is, the more times of updating iteration is; the more iterations, the more accurate the result is, but the longer it takes.
In some embodiments, S200 includes the steps of:
s210, setting the clustering number to be 2 and clustering Bayesian mutual information values in the BMI by using a K-Means algorithm;
S220, taking the maximum value of the elements with smaller mean value in the two groups of clustered elements as the segmentation threshold tau;
s230, acquiring the node set CPi={vj|vj∈V\{vi},Bayesian(vj,vi)>τ}。
It should be noted that, in step S220, after clustering is performed, a mean value of each group of elements is calculated, and the maximum value is the maximum bayesian information value in the reorganized elements.
In some embodiments, S300 includes the steps of:
s310, for each node viTraverse the set CPiAll nodes with the number of middle nodes less than log beta are used for obtaining the combination
S320, calculating the node combination FiAs each node viOf parent ofScore (v) when nodes are aggregatedi,Fi);
S330, score (v) according to the scorei,Fi) High to low order pair FiSorting is carried out, and F after sorting is carried outiIs added to the set CFi;
S340, for each viInitializing an empty set PiAnd go through viCorresponding set CFiMiddle parent node combining up to CFiIs traversed or set PiThe number of nodes contained in (1) exceeds log beta so that CFiAll nodes of each father node combination in the set PiPerforming the following steps;
s350, initializing an edge setIs given as { V, E }, where E is a set of directed edges affecting relationships in the propagation network, and V is an edgei→vjRepresenting an infected node viWith a probability w iSuccessfully infecting node vj;
S360, sequentially traversing each node viParent node set P ofiAnd updating the set of edges E ═ E utou { v } in graph Gj→vi|vj∈PiAnd taking the result after traversing and updating as a reconstructed propagation network structure diagram.
It should be noted that, in step S360, the graph G is updated to add new edges to the graph, and each edge adding is performed by sequentially traversing each node and the corresponding parent node set, so that an edge is added to a node of the parent node set and the child node, and the problem of finding an edge in the composition is solved.
In some embodiments, the score (v) is calculated in S320i,Fi) The method comprises the following steps:
s321, calculating the score (v) according to a third formulai,Fi);
The third formula is:
wherein, | FiI represents a parent node set FiThe number of nodes contained in the data; n is a radical ofijRepresenting a parent node set FiIn a state after the beta propagation process is finishedNumber of epochs, pijIs of length | FiVector of |, which represents FiThe infection status of each node in the cluster; n is a radical ofij1(Nij0) Denotes the node v after the end of the beta propagation processiIn an infected state (not infected state) and a set of nodes FiIn state pijThe number of occurrences of time; Γ (·) represents a gamma function; alpha is alphaij1(αij0) Representing a node viIn an infected state (not infected state) and a set of nodes F iIn state pijIs measured in the measured data.
In some embodiments, α is calculated in S320ijk(k ∈ {0,1}), comprising the steps of:
S322, according to the fourth formulaPerforming iterative updates to obtain new parametersAnd when the number of updates exceeds a preset threshold τite2OrAndthe difference betweenLess than a predetermined threshold τalpha2Stopping updating to stop updatingAs said αijk(k∈{0,1});
The fourth formula is:
preferably, τ is setite2∈N+,τalpha2∈(0,1)。
Note that τ isite2The larger the value is, the more the number of update iterations is, taualpha2The smaller the value is, the more times of updating iteration is; the more iterations, the more accurate the result is, but the longer it takes. As shown in fig. 2, an embodiment of the present invention further provides a device for reconstructing a propagation network based on node state observation results, including:
an infection status data acquisition module to:
calculating Bayesian mutual information values between any two nodes according to the node infection state data after the propagation process is finished;
clustering the Bayesian mutual information values by using a K-Means algorithm to obtain a segmentation threshold of the Bayesian mutual information;
a propagation network structure reconstruction module to:
And acquiring a directed edge set of the influence relationship between the nodes according to the segmentation threshold and the observation data probability distribution parameters of the nodes, and reconstructing the propagation network structure according to the directed edge set.
In some embodiments, the infection status data obtaining module is further configured to:
calculating the Bayesian mutual information value according to a first formula;
putting the calculated Bayesian mutual information into a set BMI;
the first formula is:
among them, Bayesian nMI (v)i,vj) Bayesian mutual information; v. ofiAnd vjFor any two nodes in a propagation network, all nodes in the propagation network are contained in V ═ { V ═ V1,v2,..,vnIn (1) }; ψ (·) denotes a Digamma function; beta represents the number of propagation processes; x is the number ofi、xjRespectively correspondingly indicating the nodes v after the propagation process is finishedi、vjThe infection state of (1) indicates infected, the value of 0 indicates not infected, i is more than or equal to 1, and j is more than or equal to n;representing a node viAt state x after the end of the beta propagation processiThe number of times of the operation of the motor,representing a node vjAt state x after the end of the beta propagation processjThe number of times of the operation of the motor,represents the node v after the beta propagation process is finishediIn state xiAnd node vjIn state xjThe number of times of (c); alpha is alphaijRepresenting a node viAnd vjThe probability distribution parameter of the observed data of (1);
setting the clustering number to be 2 and clustering Bayesian mutual information values in the BMI by using a K-Means algorithm;
Taking the maximum value of one group of elements with smaller average value in the two groups of clustered elements as the segmentation threshold value tau;
obtaining a node set CPi={vj|vj∈V\{vi},Bayesian(vj,vi)>τ}。
In some embodiments, the propagation network structure reconstruction module includes:
for each node viTraverse the set CPiAll nodes with the number of middle nodes less than log beta are used for obtaining the combination
Computing the node combination FiAs each node viScore (v) when parent nodes of (c) are aggregatedi,Fi);
Score according to said score (v)i,Fi) High to low order pair FiSorting is carried out, and F after sorting is carried outiIs added to the set CFi;
For each viInitializing an empty set PiAnd go through viCorresponding set CFiMiddle parent node combining up to CFiIs traversed or set PiThe number of nodes contained in (1) exceeds log beta so that CFiAll nodes of each father node combination in the set PiPerforming the following steps;
initializing a set of edgesIs given as { V, E }, where E is a set of directed edges affecting relationships in the propagation network, and V is an edgei→vjRepresenting an infected node viWith a probability wiSuccessfully infecting node vj;
Successively traversing each node viParent node set P ofiAnd updating the set of edges E ═ E utou { v } in graph Gj→vi|vj∈PiAnd taking the result after traversing and updating as a reconstructed propagation network structure diagram.
As shown in table 1, in one particular embodiment, six networks are used, where the networks LFRNet1, LFRNet2, LFRNet3, LFRNet4 and LFRNet5 are artificial networks generated using an LFR algorithm. Propagation trajectory data of the propagation network is obtained by randomly selecting 15% of nodes as initial infection points according to an IC model and performing multiple simulated propagation.
TABLE 1 Experimental network
First, a propagation network structure is defined as G ═ { V, E, W }, where V ═ V }1,v2,..,vnDenotes n nodes in the propagation network; e represents a set of directed edges, edge v, affecting relationships in the propagation networki→vjIndicating infected node viWill have a certain probability wiSuccessfully infecting node vj(ii) a W represents the corresponding weight on the directed edge set E; using S ═ S1,S2,…,SβDenotes observation data of the infection status collected after the completion of the beta transmission process, whereinIs shown asThe observation data after the secondary propagation process is finished,is shown asNode v after the end of the secondary processiWhether the infection is caused or not, the infection is caused if the value is 1, the non-infection is caused if the value is 0, i is more than or equal to 1 and less than or equal to n,
step 1: calculating any two nodes v according to the collected infection state dataiAnd vjBayesian mutual information Bayesian nMI (v) betweeni,vj) All Bayes to be calculatedThe mutual information value is put into the set BMI, among which Bayesian nMI (v) i,vj) Is calculated as:
wherein ψ (·) represents a Digamma function; beta represents the number of propagation processes;representing a node viAt state x after the end of the beta propagation processiThe number of times of (c);representing a node vjAt state x after the end of the beta propagation processjThe number of times of (c);represents the node v after the beta propagation process is finishediIn state xiAnd node vjIn state xjThe number of times of (c);
Step 2: clustering mutual information values in the BMI by using a K-Means algorithm, setting the number of clusters to be 2, recording the maximum value in the smaller element of the two groups of clustered elements as a threshold value tau, and recording the maximum value in each node viUsing aggregate CPi={vj|vj∈V\{vi},Bayesian(vj,vi) Is recorded with the node V in ViIs greater than the threshold τ.
And step 3: for each node viTraverse the corresponding CPiNode combination with middle node number less than log betaCalculate each node combination FiAs viScore (v) when parent nodes are aggregatedi,Fi) And pair F in order of scores from high to lowiSorting and adding into the set CFiWherein score (v)i,Fi) Is calculated as:
wherein, | FiI represents a parent node set FiThe number of nodes contained in the data; n is a radical of ijRepresenting a parent node set FiIn a state after the beta propagation process is finished(πjIs of length | FiVector of |, representing FiThe infection status of each node in the group); n is a radical ofij1(Nij0) Denotes the node v after the end of the beta propagation processiIn an infected state (not infected state) and a set of nodes FiState pijThe number of occurrences of time; Γ (·) represents a gamma function;
αij1(αij0) Representing a node viIn an infected state (not infected state) and a set of nodes FiIn state pijThe probability distribution parameter of the observed data of (1),
And 4, step 4: for each node viInitializing an empty set PiTo record viThen for each viSequentially traverse the corresponding CFiThe parent node in (1) is combined and joined into PiTo CFiElement in (1) is traversed or PiThe number of nodes contained in (1) exceeds log beta.
And 5: initializing a set of edgesThen sequentially traversing each node ViParent node set P ofiUpdating the set of edges E ═ E utou { v } in graph Gj→vi|vj∈PiAnd obtaining a final transmission network structure chart G after traversing and updating.
As shown in fig. 3, according to 100 pieces of propagation trace data generated on each artificial network LFRNeti (i ═ 1, 2., 5), the method provided by the present invention is used to reconstruct a potential influence relationship diagram of LFRNeti, and an F value is used to measure the accuracy of the reconstructed influence relationship diagram of the present invention.
The embodiment of the invention provides a propagation network reconstruction method and a device based on node state observation results, which only utilize node infection state data after propagation time is finished and mine the potential influence relationship among nodes through learned probability distribution parameters of observation data.
In the description of the present invention, it should be noted that the terms "upper", "lower", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, which are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and operate, and thus, should not be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
It is to be noted that, in the present invention, relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A propagation network reconstruction method based on node state observation results is characterized by comprising the following steps:
calculating Bayesian mutual information values between any two nodes according to the node infection state data after the propagation process is finished;
clustering the Bayesian mutual information values by using a K-Means algorithm to obtain a segmentation threshold of the Bayesian mutual information;
and acquiring a directed edge set of the influence relationship between the nodes according to the segmentation threshold and the observation data probability distribution parameters of the nodes, and reconstructing the propagation network structure according to the directed edge set.
2. The propagation network reconstruction method based on node state observation results as claimed in claim 1, wherein a bayesian mutual information value between any two nodes is calculated according to node infection state data after the propagation process is finished, comprising the steps of:
calculating the Bayesian mutual information value according to a first formula;
putting the calculated Bayesian mutual information into a set BMI;
the first formula is:
among them, Bayesian nMI (v)i,vj) Bayesian mutual information; v. ofiAnd vjFor any two nodes in a propagation network, all nodes in the propagation network are contained in V ═ { V ═ V1,v2,..,vnIn (1) }; ψ (·) denotes a Digamma function; beta represents the number of propagation processes; x is the number of i、xjRespectively correspondingly indicating the nodes v after the propagation process is finishedi、vjThe infection state of (1) indicates infected, the value of 0 indicates not infected, i is more than or equal to 1, and j is more than or equal to n;representing a node viAt state x after the end of the beta propagation processiThe number of times of the operation of the motor,representing a node vjAt betaAt state x after the secondary propagation process endsjThe number of times of the operation of the motor,represents the node v after the beta propagation process is finishediIn state xiAnd node vjIn state xjThe number of times of (c); alpha is alphaijRepresenting a node viAnd vjIs measured in the measured data.
3. The method of claim 2, wherein the node v is obtainediAnd vjIs measured by the probability distribution parameter alpha of the observed dataijThe method comprises the following steps:
According to a second formulaPerforming iterative update to obtain new probability distribution parametersAnd when the number of updates exceeds a preset threshold τite1OrAnddifference value betweenLess than a predetermined threshold τalpha1Stopping updating to stop updatingAs said αij;
The second formula is:
4. the method as claimed in claim 2, wherein the step of clustering the bayesian mutual information values to obtain the segmentation threshold of the bayesian mutual information using the K-Means algorithm comprises the steps of:
Setting the clustering number to be 2 and clustering Bayesian mutual information values in the BMI by using a K-Means algorithm;
taking the maximum value of one group of elements with smaller average value in the two groups of clustered elements as the segmentation threshold value tau;
obtaining a node set CPi={vj|vj∈V\{vi},Bayesian(vj,vi)>τ}。
5. The method as claimed in claim 4, wherein the method for reconstructing a propagation network based on node state observation results includes the steps of obtaining a set of directed edges affecting relationships between nodes according to the slicing threshold and the observation data probability distribution parameters of the nodes, and reconstructing a propagation network structure by using the set of directed edges, including:
for each node viTraverse the set CPiAll nodes with the number of middle nodes less than log beta are used for obtaining the combination
Computing the node combination FiAs each node viScore (v) when parent nodes of (c) are aggregatedi,Fi);
Score according to said score (v)i,Fi) High to low order pair FiSorting is carried out, and F after sorting is carried outiIs added to the set CFi;
For each viInitializing an empty set PiAnd go through viCorresponding set CFiMiddle parent node combining up to CFiIs traversed or set PiThe number of nodes contained in (1) exceeds log beta so that CFiAll nodes of each father node combination in the set PiPerforming the following steps;
Initializing a set of edgesIs given as { V, E }, where E is a set of directed edges affecting relationships in the propagation network, and V is an edgei→vjRepresenting an infected node viWith a probability wiSuccessfully infecting node vj;
Successively traversing each node viParent node set P ofiAnd updating the set of edges E ═ E utou { v } in graph Gj→vi|vj∈PiAnd taking the result after traversing and updating as a reconstructed propagation network structure diagram.
6. A method for propagation network reconstruction based on node state observations as claimed in claim 5, characterized in that the score (v) is calculatedi,Fi) The method comprises the following steps:
calculating the score according to a third formula (v)i,Fi);
The third formula is:
wherein, | FiI represents a parent node set FiThe number of nodes contained in the data; n is a radical ofijRepresenting a parent node set FiIn a state after the beta propagation process is finishedNumber of epochs, pijIs of length | FiVector of |, which represents FiThe infection status of each node in the cluster; n is a radical ofij1(Nij0) Denotes the node v after the end of the beta propagation processiIn an infected state (not infected state) and a set of nodes FiIn state pijThe number of occurrences of time; Γ (·) represents a gamma function; alpha is alphaij1(αij0) Representing a node viIn an infected state (not infected state) and a set of nodes FiIn state pijIs measured in the measured data.
7. The method of claim 6, wherein calculating α is based on propagation network reconstruction from node state observationsijk(k ∈ {0,1}), comprising the steps of:
According to a fourth formula toPerforming iterative updates to obtain new parametersAnd when the number of updates exceeds a preset threshold τite2OrAndthe difference betweenLess than a predetermined threshold τalpha2Stopping updating to stop updatingAs said αijk(k∈{0,1});
The fourth formula is:
8. a propagation network reconstruction device based on node state observation results is characterized by comprising:
an infection status data acquisition module to:
calculating Bayesian mutual information values between any two nodes according to the node infection state data after the propagation process is finished;
clustering the Bayesian mutual information values by using a K-Means algorithm to obtain a segmentation threshold of the Bayesian mutual information;
a propagation network structure reconstruction module to:
and acquiring a directed edge set of the influence relationship between the nodes according to the segmentation threshold and the observation data probability distribution parameters of the nodes, and reconstructing the propagation network structure according to the directed edge set.
9. The apparatus of claim 8, wherein the node state observation-based propagation network reconstruction device,
The infection state data acquisition module is further configured to:
calculating the Bayesian mutual information value according to a first formula;
putting the calculated Bayesian mutual information into a set BMI;
the first formula is:
among them, Bayesian nMI (v)i,vj) Bayesian mutual information; v. ofiAnd vjFor any two nodes in a propagation network, all nodes in the propagation network are contained in V ═ { V ═ V1,v2,..,vnIn (1) }; ψ (·) denotes a Digamma function; beta represents the number of propagation processes; x is the number ofi、xjRespectively correspondingly indicating the nodes v after the propagation process is finishedi、vjThe infection state of (1) indicates infected, the value of 0 indicates not infected, i is more than or equal to 1, and j is more than or equal to n;representing a node viAt state x after the end of the beta propagation processiThe number of times of the operation of the motor,representing a node vjAt state x after the end of the beta propagation processjThe number of times of the operation of the motor,represents the node v after the beta propagation process is finishediIn state xiAnd node vjIn state xjThe number of times of (c); alpha is alphaijRepresenting a node viAnd vjThe probability distribution parameter of the observed data of (1);
setting the clustering number to be 2 and clustering Bayesian mutual information values in the BMI by using a K-Means algorithm;
taking the maximum value of one group of elements with smaller average value in the two groups of clustered elements as the segmentation threshold value tau;
Obtaining a node set CPi={vj|vj∈V\{vi},Bayesian(vj,vi)>τ}。
10. The apparatus of claim 9, wherein the node state observation-based propagation network reconstruction device,
the propagation network structure reconstruction module comprises the following steps:
for each node viTraverse the set CPiAll nodes with the number of middle nodes less than log beta are used for obtaining the combination
Computing the node combination FiAs each node viScore (v) when parent nodes of (c) are aggregatedi,Fi);
Score according to said score (v)i,Fi) High to low order pair FiSorting is carried out, and F after sorting is carried outiIs added to the set CFi;
For each viInitializing an empty set PiAnd go through viCorresponding set CFiMiddle parent node combining up to CFiIs traversed or set PiThe number of nodes contained in (1) exceeds log beta so that CFiAll nodes of each father node combination in the set PiPerforming the following steps;
initializing a set of edgesIs given as { V, E }, where E is a set of directed edges affecting relationships in the propagation network, and V is an edgei→vjRepresenting an infected node viWith a probability wiSuccessfully infecting node vj;
Successively traversing each node viParent node set P ofiAnd updating the set of edges E ═ E utou { v } in graph Gj→vi|vj∈PiAnd taking the result after traversing and updating as a reconstructed propagation network structure diagram.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110777832.4A CN113626724B (en) | 2021-07-09 | 2021-07-09 | Propagation network reconstruction method and device based on node state observation result |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110777832.4A CN113626724B (en) | 2021-07-09 | 2021-07-09 | Propagation network reconstruction method and device based on node state observation result |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113626724A true CN113626724A (en) | 2021-11-09 |
CN113626724B CN113626724B (en) | 2023-10-20 |
Family
ID=78379382
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110777832.4A Active CN113626724B (en) | 2021-07-09 | 2021-07-09 | Propagation network reconstruction method and device based on node state observation result |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113626724B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114465893A (en) * | 2022-02-28 | 2022-05-10 | 武汉大学 | Propagation network reconstruction method, device, equipment and storage medium |
CN118096418A (en) * | 2024-04-29 | 2024-05-28 | 江西求是高等研究院 | Incremental updating method and system for propagation network topology |
CN118096417A (en) * | 2024-04-28 | 2024-05-28 | 江西求是高等研究院 | Propagation network mode discovery method, system, computer and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160142266A1 (en) * | 2014-11-19 | 2016-05-19 | Battelle Memorial Institute | Extracting dependencies between network assets using deep learning |
US20180018709A1 (en) * | 2016-05-31 | 2018-01-18 | Ramot At Tel-Aviv University Ltd. | Information spread in social networks through scheduling seeding methods |
CN108021636A (en) * | 2017-11-27 | 2018-05-11 | 武汉大学 | A kind of communication network structural remodeling method for not depending on temporal information |
CN109190865A (en) * | 2018-06-25 | 2019-01-11 | 西南交通大学 | Bayesian network water quality indicator evaluation method, water area water-quality grade evaluation method, water quality indicator prediction technique |
CN109829468A (en) * | 2018-04-16 | 2019-05-31 | 南京航空航天大学 | Civil aircraft Fault Diagnosis of Complex System method based on Bayesian network |
CN109981361A (en) * | 2019-03-20 | 2019-07-05 | 武汉大学 | The determination method and device of the source of infection in a kind of communication network |
-
2021
- 2021-07-09 CN CN202110777832.4A patent/CN113626724B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160142266A1 (en) * | 2014-11-19 | 2016-05-19 | Battelle Memorial Institute | Extracting dependencies between network assets using deep learning |
US20180018709A1 (en) * | 2016-05-31 | 2018-01-18 | Ramot At Tel-Aviv University Ltd. | Information spread in social networks through scheduling seeding methods |
CN108021636A (en) * | 2017-11-27 | 2018-05-11 | 武汉大学 | A kind of communication network structural remodeling method for not depending on temporal information |
CN109829468A (en) * | 2018-04-16 | 2019-05-31 | 南京航空航天大学 | Civil aircraft Fault Diagnosis of Complex System method based on Bayesian network |
CN109190865A (en) * | 2018-06-25 | 2019-01-11 | 西南交通大学 | Bayesian network water quality indicator evaluation method, water area water-quality grade evaluation method, water quality indicator prediction technique |
CN109981361A (en) * | 2019-03-20 | 2019-07-05 | 武汉大学 | The determination method and device of the source of infection in a kind of communication network |
Non-Patent Citations (3)
Title |
---|
FORREST W. CRAWFORD等: "《Hidden network reconstruction from information diffusion》", 《2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION)》 * |
孙月明等: "《无需感染时间信息的传播网络快速推断算法》", 《计算机科学与探索》 * |
张润梅: "《 基于贝叶斯网络的复杂***因果关系研究》", 《中国博士学位论文全文数据库 (基础科学辑)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114465893A (en) * | 2022-02-28 | 2022-05-10 | 武汉大学 | Propagation network reconstruction method, device, equipment and storage medium |
CN118096417A (en) * | 2024-04-28 | 2024-05-28 | 江西求是高等研究院 | Propagation network mode discovery method, system, computer and storage medium |
CN118096418A (en) * | 2024-04-29 | 2024-05-28 | 江西求是高等研究院 | Incremental updating method and system for propagation network topology |
Also Published As
Publication number | Publication date |
---|---|
CN113626724B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113626724A (en) | Propagation network reconstruction method and device based on node state observation result | |
CN109636061B (en) | Training method, device and equipment for medical insurance fraud prediction network and storage medium | |
WO2023035564A1 (en) | Load interval prediction method and system based on quantile gradient boosting decision tree | |
US7870136B1 (en) | Clustering data with constraints | |
WO2022179384A1 (en) | Social group division method and division system, and related apparatuses | |
CN112420192B (en) | Disease typing method and related equipment integrating multidimensional diagnosis and treatment information | |
Fazayeli et al. | Uncertainty quantified matrix completion using Bayesian hierarchical matrix factorization | |
WO2022134353A1 (en) | Hardware state detection method and apparatus, and computer device and storage medium | |
CN115801600B (en) | Noise data environment-oriented propagation network structure reconstruction method and device | |
Closas et al. | Sequential detection of influenza epidemics by the Kolmogorov-Smirnov test | |
CN108021636B (en) | Propagation network structure reconstruction method independent of time information | |
CN116304205A (en) | Propagation network structure reconstruction method, device, equipment and storage medium | |
He et al. | BSODA: a bipartite scalable framework for online disease diagnosis | |
CN112820400B (en) | Disease diagnosis device and equipment based on medical knowledge map knowledge reasoning | |
Ouzienko et al. | Imputation of missing links and attributes in longitudinal social surveys | |
Singh et al. | Comparative benchmarking of causal discovery algorithms | |
CN112561066A (en) | Method for rapidly reconstructing propagation network structure | |
Singh et al. | Comparative benchmarking of causal discovery techniques | |
CN110957046A (en) | Medical health case knowledge matching method and system | |
Ying et al. | Gold classification of COPDGene cohort based on deep learning | |
Sriwong et al. | Post-operative life expectancy of lung cancer patients predicted by Bayesian network model | |
CN112435133A (en) | Medical insurance combined fraud detection method, device and equipment based on graph analysis | |
CN112732690A (en) | Stabilizing system and method for chronic disease detection and risk assessment | |
Njah et al. | A new equilibrium criterion for learning the cardinality of latent variables | |
Zhou | New Techniques for Learning Parameters in Bayesian Networks. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |