CN116032828B

CN116032828B - Medium number centrality approximate calculation method and device

Info

Publication number: CN116032828B
Application number: CN202310167081.3A
Authority: CN
Inventors: 王怀习; 束妮娜; 马涛; 王晨; 冯也来; 黄郡; 沈培佳; 杨成武
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2023-06-13
Anticipated expiration: 2043-02-27
Also published as: CN116032828A

Abstract

The invention disclosesA method and a device for approximate calculation of medium centrality. The method comprises the following steps: computing networkGIs characterized by the feature vector centrality of (1); calculating a network by using the feature vector centralityGThe number of the intermediate nodes is counted again to construct a multiple networkG ₂ The method comprises the steps of carrying out a first treatment on the surface of the From the multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node setSThe method comprises the steps of carrying out a first treatment on the surface of the Computing the set of non-heavy sample nodesSCenter of bettery, get networkGThe median centrality approximation. Therefore, the invention selects the sample nodes of the medium number centrality calculation based on the characteristic vector centrality as the basis based on the advantage of low characteristic vector centrality calculation complexity, the shortest paths among the sample nodes can better represent all the shortest paths among the network nodes, and the network approximate medium number centrality value is obtained by calculating the shortest paths among the sample nodes, so that the rapid calculation of the medium number centrality of the large-scale network is realized.

Description

Medium number centrality approximate calculation method and device

Technical Field

The invention relates to the technical field of networks, in particular to a method and a device for approximate calculation of betweenness centrality.

Background

Since the 21 st century, human society information has been rapidly developed, mobile internet popularization and application and internet of things deployment have rapidly landed, the number of network devices has been significantly increased, and the complexity of the number of physical network devices and the connection relationship has increased. The real networks in the different fields of power network, telecommunication network, traffic network, social network, communication network, internet of things and the like form the network world with all the inclusive sense.

The real network is a plentiful research object for network science, the abstract network is usually embodied with the same network attribute by abstracting different real networks, and the system research of abstract network property and rule forms complex network science with abundant connotation. The main research topics of network science cover network centrality measure and global network characteristics, network model structure and function analysis, network link prediction and recommendation algorithm, network dynamics, network control and optimization and the like, and the network centrality measure research plays a fundamental role in complex network science. In order to characterize the importance of nodes and edges in a network, researchers have proposed various network centrality measures that evaluate the centrality of nodes and edges in a network from various angles. The network centrality measure mainly characterizes the importance of nodes and edges in a network, and provides theoretical support for researching the identification of key nodes and key edges in a real network.

The betweenness centrality is an index for describing the importance of a certain node or a certain side in a network from the perspective of network connectivity, and is also an important research point of a complex network, and the research of a rapid calculation method has profound practical significance. For example, according to the core node in the social network, an influence ranking list is given, users are attracted, and targeted network marketing is carried out on the influence ranking list; by protecting the key server in the network, the key server can be prevented from being attacked by viruses or hackers, so that the whole network can normally operate; by isolating the infectious source, the transmission and spread of infectious viruses and the like can be effectively prevented, and in the practical application, the importance degree of each node in the network needs to be known so as to find out the key nodes in the network. However, as the number of nodes in a network increases and the connection relationship between the nodes is dense, the topology structure of the network is also more complex, and the conventional medium number centrality calculation method is difficult to completely meet the requirement of large-scale medium number centrality calculation in the network. It is highly desirable to design a fast calculation method that achieves the centrality of the bets.

Disclosure of Invention

In view of the above-mentioned problems, the present invention aims to provide a method and a device for approximate calculation of medium centrality, which are based on the advantage of low complexity of calculation of characteristic vector centrality, and based on the characteristic vector centrality, sample nodes for medium centrality calculation are selected, the shortest paths between the sample nodes can better represent all the shortest paths between network nodes, and then approximate medium centrality values are obtained by calculating the shortest paths between the sample nodes, so that the method and the device are favorable for realizing rapid calculation of medium centrality in a large-scale network.

To achieve the above object, in a first aspect of the present invention, a method for calculating a median centrality approximation is disclosed, the method comprising:

computing networkGThe characteristic vector centrality is obtained; the feature vector centrality

=(x ₁ ，x ₂ ，…，x _n ) The method comprises the steps of carrying out a first treatment on the surface of the The saidnRepresentation vector->

Component numbers of (2); the saidnNetwork =networkGTotal number of medium nodes. />

Calculating a network by using the feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ 。

From the multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node setS。

Computing the set of non-heavy sample nodesSCenter of bettery, get networkGThe median centrality approximation.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the calculating network uses the feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ Comprising:

calculating the mean value of each component of the centrality of the feature vector to obtain a component mean valuex ₀ 。

The saidx ₀ =

The method comprises the steps of carrying out a first treatment on the surface of the The saidx _i ∈{x ₁ ，x ₂ ，…，x _n }。

Judging the component mean valuex ₀ Whether or not it is smaller than a preset average weight numbercAnd obtaining a judging result.

The average weight numbercThe range of the values is as follows

The method comprises the steps of carrying out a first treatment on the surface of the The range of the average weight value sufficiently ensures the universality and the variability of the sampled samples based on the centrality of the feature vector, wherein ∈ >

For networksGIs the average degree of the node.

According to the judgment result, a preset vector coefficient calculation model is utilized to obtain a vector coefficient

。

Centering the feature vector with the vector coefficients

Multiplying to obtain a second feature vector; said feature vector centrality->

=(x ₁ ，x ₂ ，…，x _n ) The method comprises the steps of carrying out a first treatment on the surface of the Said second eigenvector->

=(τ˙x ₁ ，τ˙x ₂ ，…，τ˙x _n ) 。

And rounding up each component in the second feature vector to obtain a node weight vector.

The node weight vector

=(μ ₁ ，μ ₂ ，…，μ _n ) The method comprises the steps of carrying out a first treatment on the surface of the The saidμ _i =⌈τ⋅x _i ⌉，1≤i≤nThe method comprises the steps of carrying out a first treatment on the surface of the The saidμ _i Is a non-negative integer; the node weight vector is the firstiIndividual componentsμ _i Characterizing a networkGMiddle (f)iPersonal nodev _i Corresponding weight number.

Using the node weight vector to determine the node weight vector for the networkGProcessing to obtain multiple networksG ₂ 。

The saidG ₂ ={μ _1· v ₁ ，μ _2· v ₂ ，…，μ _n· v _n }。

In an optional implementation manner, in a first aspect of the embodiment of the present invention, according to the determination result, a vector coefficient is obtained by using a preset vector coefficient calculation modelτComprising:

when the judgment result is yes, calculating the minimum integer for enabling the preset first vector coefficient calculation model to be establishedz ₁ The minimum integer is setz ₁ As vector coefficientsτIs a value of (2).

The first vector coefficient calculation model is

。

Wherein, the saidx _i ∈{x ₁ ，x ₂ ，…，x _n }；

Representing an upward rounding, saidnNetwork =networkGThe total number of the middle nodes; the said z ₁ Is an integer greater than 0; the saidc ₁ For a preset average weightc。

When the judgment result is NO, calculating the maximum integer for enabling the preset second vector coefficient calculation model to be establishedz ₂ The maximum integer is calculatedz ₂ Is the inverse of (2) as a vector coefficientτIs a value of (2).

The second vector coefficient calculation model is

。

Wherein, the saidx _i ∈{x ₁ ，x ₂ ，…，x _n }；

Representing an upward rounding, saidnNetwork =networkGThe total number of the middle nodes; the saidz ₂ Is an integer greater than 0; the saidc ₂ For a preset average weightc。

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the multiple network is selected from the multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node setSComprising:

from the multiple networksG ₂ In the method, the method is selected according to uniform random probability distributionhObtaining a first sample node set by the nodes; the saidh=

Said->

Representing an upward rounding; the saidnNetwork =networkGThe total number of the middle nodes; the saidpThe pre-set sampling proportion is characterized in that,pthe value range is 0.1-0pAnd less than or equal to 0.2, the sampling proportion is selected by comprehensively considering the representativeness of the sampling sample and the efficiency of an approximate calculation method.

And judging whether the number of the repeated nodes exists in the first sample node set or not to obtain a second judging result.

If the second judgment result is yes, deleting repeated nodes in the first sample node set, selecting new nodes from a method of uniform random probability distribution in a multiple network, adding the new nodes into the first sample node set, and enabling the total number of the nodes in the first sample node set to reach hAnd triggering and executing the judgment whether the first sample node set has the repeated nodes or not to obtain a second judgment result.

If the second judgment result is negative, the first sample node set is confirmed to be a non-heavy sample node set

。

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the computing networkGFeature vector centrality, resulting in feature vector centrality, comprising:

constructing a networkGAdjacent matrix a of (a);

the a=

If the network isGMiddle nodev _i And nodev _j With edges in betweena _i，j =1, a step of; otherwisea _i，j =0。

And constructing a first characteristic equation based on the adjacency matrix A.

The first characteristic equation is

。

Where A is the adjacency matrix of the network,λas a value of the characteristic(s),

is a feature vector.

And carrying out calculation processing on the first characteristic equation, calculating to obtain a characteristic vector corresponding to the maximum characteristic value, and taking the characteristic vector corresponding to the maximum characteristic value as characteristic vector centrality.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the computing the set of non-heavy sample nodesSCenter of bettery, get networkGA median centrality approximation comprising:

using a median centrality calculation model for the non-heavy sample node set SProcessing to obtain a sample node setSIs a median centrality of (a).

The medium centrality calculation model is as follows:

in the method, in the process of the invention,Bet(v)representing nodesvThe centrality of the medium number,Bet(e)representing edgeseThe centrality of the medium number,σ（v _i， v _j ) Is a nodev _i To the nodev _j Is used to determine the number of shortest paths,σ（v _i， v _j│ v) Is a nodev _i To the nodev _j Through the node

The number of shortest paths of (a);σ（v _i， v _j│ e) Is a nodev _i To the nodev _j Through the edgeeThe number of shortest paths of (a);Srepresenting a set of non-heavy sample nodesS。

Aggregating the sample nodesSIs determined as a networkGIs approximated to the median centrality of (a) to obtain a networkGIs a median centrality approximation of (c).

In a second aspect of an embodiment of the present invention, there is disclosed a medium centrality approximation calculation apparatus, the apparatus including:

a first computing module for computing a networkGThe characteristic vector centrality is obtained; the feature vector centrality

Component numbers of (2); the saidnNetwork =networkGThe total number of the middle nodes; />

A first network construction module for calculating a network using the feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ ；

A second network construction module for constructing a network from the multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node set S；

A second calculation module for calculating the non-heavy sample node setSCenter of bettery, get networkGThe median centrality approximation.

In a second aspect of the embodiment of the present invention, the first computing module computes a networkGThe characteristic vector centrality is obtained by the specific way that:

constructing a networkGIs a contiguous matrix a of (a).

The said

The first characteristic equation is

。

In the method, in the process of the invention,Aas an adjacency matrix for the network,λas a value of the characteristic(s),

is a feature vector.

In a second aspect of the embodiment of the present invention, the first network construction module calculates a network using the feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ The method specifically comprises the following steps:

calculating the mean value of each component of the centrality of the feature vector to obtain a component mean value x ₀ 。

The saidx ₀ =

The average weight numbercThe range of the values is as follows

The method comprises the steps of carrying out a first treatment on the surface of the The range of the average weight value sufficiently ensures the universality and the variability of the sampled samples based on the centrality of the feature vector, wherein ∈>

For networksGIs the average degree of the node.

。

Centering the feature vector with the vector coefficientsτMultiplying to obtain a second feature vector; the feature vector centrality

=(τ˙x ₁ ，τ˙x ₂ ，…，τ˙x _n )。

The node weight vector

=(μ ₁ ，μ ₂ ，…，μ _n ) The method comprises the steps of carrying out a first treatment on the surface of the The saidμ _i =⌈τ⋅x _i ⌉，1≤i≤nThe method comprises the steps of carrying out a first treatment on the surface of the The saidμ _i Is a non-negative integer; the node weight vector is the firstiIndividual componentsμ _i Characterizing a networkGMiddle (f)iPersonal nodev _i Corresponding weight number

Using node weight vectors for the networkGProcessing to obtain multiple networksG ₂ 。

The saidG ₂ ={μ _1· v ₁ ，μ _2· v ₂ ，…，μ _n· v _n }。

In a second aspect of the embodiment of the present invention, the first network construction module calculates a model by using a preset vector coefficient according to the determination result to obtain a vector coefficient, and specifically includes:

The first vector coefficient calculation model is

。

Wherein, the saidx _i ∈{x ₁ ，x ₂ ，…，x _n }；

Representing an upward rounding, saidnNetwork =networkGThe total number of the middle nodes; the saidz ₁ Is an integer greater than 0; the saidc ₁ For a preset average weightc。

The second vector coefficient calculation model is

。

Wherein, the saidx _i ∈{x ₁ ，x ₂ ，…，x _n }；

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the second network building module is configured from the multiple networksCollateralsG ₂ Sample nodes are selected to obtain a non-heavy sample node setSThe method specifically comprises the following steps:

from the multiple networksG ₂ In the method, the method is selected according to uniform random probability distributionhObtaining a first sample node set by the nodes; the saidh=⌈p⋅n⌉, said

Representing an upward rounding; the saidnNetwork =networkGThe total number of the middle nodes; the said pThe pre-set sampling proportion is characterized in that,pthe value range is 0.1-0pAnd less than or equal to 0.2, the sampling proportion is selected by comprehensively considering the representativeness of the sampling sample and the efficiency of an approximate calculation method.

When the second judgment result is yes, deleting repeated nodes in the first sample node set, selecting new nodes from a method of uniform random probability distribution in a multiple network, and adding the new nodes into the first sample node set to enable the total number of the nodes in the first sample node set to reachhAnd triggering and executing the judgment whether the first sample node set has the repeated nodes or not to obtain a second judgment result.

When the second judgment result is no, the first sample node set is confirmed to be a no-heavy sample node setS。

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the second calculation module calculates the set of non-heavy sample nodesSCenter of bettery, get networkGThe median centrality approximation value specifically comprises:

using a median centrality calculation model for the non-heavy sample node setSProcessing to obtain a sample node set SIs a median centrality of (a).

The medium centrality calculation model is as follows:

Aggregating the sample nodesSIs determined as a networkGIs approximated to the median centrality of (a) to obtain a networkGThe median centrality approximation.

Another aspect of the invention discloses another medium number centrality approximation calculation device, the device comprising:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform some or all of the steps in the method for median centering approximation calculation disclosed in the first aspect of the embodiment of the present invention.

A fourth aspect of the present invention discloses a computer storage medium storing computer instructions for performing part or all of the steps in the method for central approximation of a betweenness disclosed in the first aspect of the present invention when called.

The invention has the beneficial effects that:

the invention relates to a method for approximate calculation of betweenness centrality, which uses a calculation networkGThe characteristic vector centrality is obtained; calculating a network by using the feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ The method comprises the steps of carrying out a first treatment on the surface of the From the multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node setSThe method comprises the steps of carrying out a first treatment on the surface of the Computing the set of non-heavy sample nodesSCenter of bettery, get networkGThe median centrality approximation. Therefore, the invention selects the sample nodes of the medium number centrality calculation based on the characteristic vector centrality as the basis based on the advantage of low characteristic vector centrality calculation complexity, the shortest paths among the sample nodes can better represent all the shortest paths among the network nodes, and the network approximate medium number centrality value is obtained by calculating the shortest paths among the sample nodes, so that the rapid calculation of the medium number centrality of the large-scale network is realized.

Drawings

FIG. 1 is a flow chart of a method for calculating a median centrality approximation in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of an exemplary three-layer routing network according to one embodiment of the present invention;

FIG. 3 is a schematic diagram of a medium-center approximation calculation device according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another medium-center approximation calculation apparatus according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For the sake of easy understanding of the embodiments of the present application, the following will briefly introduce related concepts:

the median centrality. The betweenness centrality is generally divided into node betweenness centrality and edge betweenness centrality, and betweenness centrality index is an important centrality index in key node and key link identification. Given a network g= (V, E), V, E represent the node set and edge set of the network respectively,n= |v| represents the number of nodes in the network,m=i E represents the number of edges in the network.

Node bets are defined as the ratio of the number of all shortest paths through the node to the total number of shortest paths in the network. Node in network vThe betweenness of (a) is defined as:

wherein the method comprises the steps ofσ（v _i， v _j ) Is a nodev _i To the nodev _j Is used to determine the number of shortest paths,σ（v _i， v _j│ v) Is a nodev _i To the nodev _j Through the nodevIs used for the number of shortest paths of the network.

Edge betweenness is defined as the ratio of the number of all shortest paths through the edge to the total number of shortest paths in the network. Edge(s)eThe betweenness of (a) is defined as:

wherein the method comprises the steps ofσ（v _i， v _j│ e) Is a nodev _i To the nodev _j Through the edgeeIs used for the number of shortest paths of the network.

For ease of research, node and edge betweenness centrality in a network is often normalized, and normalized betweenness centrality in an undirected network is defined as:

the closer the normalized value is to 1, the higher the frequency of the node on the shortest path between the network nodes is, and the more important is; when the normalized value is 0, the node is not present on the shortest path between other nodes, and the importance is very low.

Theorem: in a connected network, the sum of node normalization medians is equal to the average shortest distance of the networklSubtracting 1; the sum of the edge normalized medians is equal to the average shortest distancelI.e.

The relationship between the mid-distance centrality and the edge mid-distance centrality in the network and the average shortest distance of the network is given by the mid-distance centrality identity, which reveals the inter-implication relationship between the mid-distance centrality and the average shortest path of the network, and establishes the relationship between the mid-distance centrality and the small world network. The medium center identity is established not only on the connected network but also on the general network.

The medium number centrality calculation usually adopts a Brandes algorithm, and when the medium number centrality of an unauthorized network is calculated, a depth-first search algorithm is used for calculating the complexity;Θ(mn)the method comprises the steps of carrying out a first treatment on the surface of the In calculating the betweenness centrality of the authorized network, the Dijkstra algorithm is used, and the calculation complexity is thatΘ(mn+n ² logn). When the network is

When the network is dense, m is ≡n ² The central computational complexity of the medium number is thatΘ(n ³ )The computational complexity is high and cannot be applied to a large-scale network.

Feature vector centrality. Feature vector centrality is an important measure of node centrality in a network, the feature vector centrality is related to the number of adjacent nodes and the importance of each adjacent node in the nodes, and node information of the centrality is more abundant than the number of the centrality.

Let a=

Representing a networkGIf nodev _i And nodev _j With edges in betweena _i，j =1, a step of; otherwisea _i，j =0. Nodev _i The feature vector centrality of (1) may be defined as:

/>

wherein, the liquid crystal display device comprises a liquid crystal display device,M(v)is a nodevThe set of all neighboring nodes,λis a constant.

Feature vector centrality definitions may be expressed as matrix-vector tokens,

constant (constant)λIs a characteristic value of the adjacency matrix A, < >>

Is the characteristic valueλCorresponding feature vectors. Typically, the adjacency matrix A will have a number of different eigenvalues λThere is also a corresponding feature vector. However, the fact that all elements of the matrix are positive means that only the largest eigenvalues will produce the required centrality measure. Feature vector of adjacency matrixiThe individual components give the nodes +.>

Is used for the feature vector centrality value of (1). Since the eigenvector point multiplied by an arbitrary constant is still eigenvalue +.>

Is based on the feature vectors ofFeature vector centrality the average weight parameter is introduced to determine feature vector values when constructing multiple networks. The feature vector centrality time complexity isO(n+m)The computational complexity is low.

Example 1

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for calculating a median centrality approximation according to an embodiment of the present invention. The method described in fig. 1 is applicable to an information network, a social network, an internet of things and a traffic network, and the embodiment of the invention is not limited. As shown in fig. 1, the method for calculating the median centrality approximation of the feature vector centrality may include the following operations:

101. computing networkGAnd obtaining the characteristic vector centrality.

In the embodiment of the invention, the characteristic vector centrality

=(x ₁ ，x ₂ ，…，x _n ) The method comprises the steps of carrying out a first treatment on the surface of the The saidnRepresentation vector- >

Component numbers of (2); the saidnRepresenting a networkGTotal number of medium nodes.

102. Computing networks using feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ 。

103. From multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node setS。

104. Computing a set of non-duplicate sample nodesSCenter of bettery, get networkGThe median centrality approximation.

Therefore, by implementing the method for approximate calculation of the betweenness centrality described by the embodiment of the invention, the centrality of the feature vector is obtained by calculating the centrality of the feature vector of the original network, the non-heavy sample node set is selected based on the centrality of the feature vector, and the value of the betweenness centrality of the non-heavy sample node set is calculated to obtain the value of the approximate betweenness centrality of the original network, so that the calculation is simplified, the complexity is reduced, and the rapid calculation of the betweenness centrality of the large-scale network is realized.

In an alternative embodiment, the computing network is calculated in step 101 aboveGFeature vector centrality, resulting in feature vector centrality, comprising:

constructing a networkGIs a contiguous matrix a of (a).

The a=

The first characteristic equation is

。

is a feature vector; />

=(x ₁ ，x ₂ ，…，x _n )。

According to the Perron-Frobenius theorem,

. Based on this characteristic, feature vector centrality calculation is often developed by a matrix exponentiation method, i.e., an initial vector +.>

Then iterate to calculate +.>

When iteratingNumber of timestWhen a certain threshold is reached, the person is allowed to go (I)>

The value is close to the maximum characteristic valueλ _max The corresponding feature vector, i.e. feature vector centrality.

It can be seen that the value of the centrality of the feature vector characterizes the networkGThe importance of a node in the set determines the probability that the node is selected as a sample node.

In another alternative embodiment, the feature vector centrality is used in step 102 to calculate a networkGNode weight vector of (a) to construct a multiple networkG ₂ Comprising:

The saidx ₀ =

The average weight numbercThe range of the values is as follows

≤c≤5⋅/>

The method comprises the steps of carrying out a first treatment on the surface of the Wherein->

For networksGIs the average degree of the node. According to the judgment result, a preset vector coefficient calculation model is utilized to obtain a vector coefficient +. >

。

To the above featuresVector centrality and vector coefficientsτMultiplying to obtain a second feature vector; center of the feature vector

=(τ˙x ₁ ，τ˙x ₂ ，…，τ˙x _n )。

And rounding up each component in the second characteristic vector to obtain a node weight vector.

The node weight vector

=(μ ₁ ，μ ₂ ，…，μ _n ) The method comprises the steps of carrying out a first treatment on the surface of the Above-mentionedμ _i =⌈τ˙x _i ⌉，1≤i≤nThe method comprises the steps of carrying out a first treatment on the surface of the Above-mentionedμ _i Is a non-negative integer; the node weight vector is the firstiIndividual componentsμ _i Characterizing a networkGMiddle (f)iPersonal nodev _i Corresponding weight number.

Using node weight vector to networkGProcessing to obtain multiple networksG ₂ 。

Above-mentionedG ₂ ={μ _1· v ₁ ，μ _2· v ₂ ，…，μ _n· v _n }。

Therefore, the universality and the diversity of the sampled samples based on the characteristic vector centrality are fully ensured by reasonably taking the average weight, the weight of the network node is determined according to the average value of each component of the characteristic vector centrality and the average weight, the weight of the node in the multiple network is in direct proportion to the characteristic vector centrality of the node, and the more important node becomes the sample node with higher probability due to the construction of the multiple network.

In yet another optional embodiment, the above-mentioned method uses a preset vector coefficient calculation model to obtain vector coefficients according to the above-mentioned determination resultτComprising:

when the judgment result is yes, calculating the minimum integer for enabling the preset first vector coefficient calculation model to be established z ₁ The minimum integer is setz ₁ As vector coefficientsτIs a value of (2).

The first vector coefficient calculation model is as follows

。

Wherein, the abovex _i ∈{x ₁ ，x ₂ ，…，x _n }；

Representing an upward rounding of the abovenRepresenting the number of components in the centrality of the feature vector; above-mentionedz ₁ Is an integer greater than 0; above-mentionedc ₁ For a preset average weightc。

The second vector coefficient calculation model is as follows

。

Wherein, the abovex _i ∈{x ₁ ，x ₂ ，…，x _n }；

Representing an upward rounding of the abovenRepresenting the number of components in the centrality of the feature vector; the saidz ₂ Is an integer greater than 0; above-mentionedc ₂ For a preset average weightc。

It can be seen that the value of the centrality of the feature vector represents the importance degree of different nodes, and the average weight is used to control the value of the vector coefficient to construct multiple networksG ₂ Thereby ensuring that important nodes appear in the multiple network with higher numbers of weights and that secondary nodes appear in the multiple network with lower numbers of weights while the total number of weights of the multiple network is controlled within a reasonable range.

In yet another alternative embodiment, the step 103 is performed from multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node set SComprising:

from multiple networksG ₂ In the method, the method is selected according to uniform random probability distributionhObtaining a first sample node set by the nodes; above-mentionedh=⌈p⋅n⌉ above

Representing an upward rounding; above-mentionednRepresenting the number of components in the centrality of the feature vector; above-mentionedpThe pre-set sampling proportion is characterized in that,pthe value range is 0.1-0pAnd less than or equal to 0.2, the sampling proportion is selected by comprehensively considering the representativeness of the sampling sample and the efficiency of an approximate calculation method.

Judging whether the first sample node set has the number of repeated nodes or not to obtain a second judging result;

if the second judgment result is yes, deleting repeated nodes in the first sample node set, selecting new nodes from a method of uniform random probability distribution in the multiple networks, adding the new nodes into the first sample node set, and enabling the total number of the nodes in the first sample node set to reachhAnd triggering and executing the judgment whether the first sample node set has the repeated node or not, and obtaining a second judgment result.

If the second judgment result is no, the first sample node set is confirmed to be a non-heavy sample node setS。

It can be seen that the sampling proportion is preset by comprehensively considering the representativeness of the sampling samples and the efficiency of the approximate calculation method, so that the node set without the heavy samples SScale of (a) is compared with the initial networkGScale of

Between them.

In yet another alternative embodiment, a set of no-heavy sample nodes is calculated in step 104 aboveSCenter of bettery, get networkGA median centrality approximation comprising:

for the node set without heavy sample by using the medium number centrality calculation modelSProcessing to obtain a sample node setSIs a median centrality of (a).

The above-mentioned medium centrality calculation model is:

Will be put onThe sample node setSIs determined as a networkGIs approximated to the median centrality of (a) to obtain a networkGNode betweenness centrality approximation.

It can be seen that replacing the shortest set of paths between any node that need to be traversed in the exact definition of the median centrality with only needing to traverse the set of non-duplicate sample nodesSThe shortest path set among the inner nodes greatly reduces the scale of path search, and meanwhile, the approximate value of the centrality of the betweenness keeps the magnitude sequence relation of the centrality values of the betweenness of different nodes and edges as much as possible. Node set due to no heavy sample SIs only between 0.1 and 0.2 of the initial node set size, so as to have no heavy sample node setSThe calculation complexity of the approximation of the computation of the betweenness centrality is equivalent to 0.01 to 0.04 of the complexity of the accurate calculation method, so that the betweenness calculation complexity of the large-scale network is rapidly reduced.

To specifically explain the method of the present embodiment, a typical three-layer routing network g= (V, E) is described.

The above-mentioned typical three-layer routing network structure is shown in fig. 2, and numerals 1 to 31 in fig. 2 denote routing networks, respectivelyG31 router nodes in (1), 31 router nodes constitute a router node setV，The connection of the routers forms edges, all of which constitute an edge setE。

Specifically, the networkG=(V，E) Medium router node setVSum edge setEThe following are provided:

V={1，2，⋯，31}，E={(1，2)，(1，3)，(1，4)，(1，5)，(1，6)，(2，7)，(2，8)，(2，9)，(2，10)，(2，11)，(3，12)，(3，13)，(3，14)，(3，15)，(3，16)，(4，17)，(4，18)，(4，19)，(4，20)，(4，21)，(5，22)，(5，23)，(5，24)，(5，25)，(5，26)，(6，27)，(6，28)，(6，29)，(6，30)，(6，31)}。

in order to calculate the betweenness centrality of nodes in the network, an approximate calculation method based on the feature vector centrality is performed as follows:

step 1: computing networkGThe characteristic vector values of the nodes, the characteristic vector centrality of the nodes is shown in Table 1The illustration is:

table 1 feature vector centrality for each node

Node	Characteristic vector value	Node	Characteristic vector value	Node	Characteristic vector value	Node	Characteristic vector value
								1	0.49999939	9	0.09999988	17	0.09999988	25	0.09999988
2	0.31622815	10	0.09999988	18	0.09999988	26	0.09999988
								3	0.31622815	11	0.09999988	19	0.09999988	27	0.09999988
4	0.31622815	12	0.09999988	20	0.09999988	28	0.09999988
								5	0.31622815	13	0.09999988	21	0.09999988	29	0.09999988
6	0.31622815	14	0.09999988	22	0.09999988	30	0.09999988
								7	0.09999988	15	0.09999988	23	0.09999988	31	0.09999988
8	0.09999988	16	0.09999988	24	0.09999988

Step 2: computing networks using feature vector centrality GNode weight vector of (a) to construct a multiple networkG ₂ 。

Setting average weight number, due to average degree of node

=2, select average weightc=2⋅/>

=4；

The node weight distribution is determined from the distribution of feature vector centrality as shown in table 2 below:

table 2 node weight distribution

Node	Weight number	Node	Weight number	Node	Weight number	Node	Weight number
								1	14	9	3	17	3	25	3
2	9	10	3	18	3	26	3
								3	9	11	3	19	3	27	3
4	9	12	3	20	3	28	3
								5	9	13	3	21	3	29	3
6	9	14	3	22	3	30	3
								7	3	15	3	23	3	31	3
8	3	16	3	24	3

Constructing multiple networks based on node reconstruction numbersG ₂ As shown in table 2,G ₂ including 14 nodes 1,9 nodes 2, … …,3 nodes 31.

Step 3: from multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node set S;

according to the node weight distribution and the method of uniform random probability distribution, the node sampling probability distribution can be calculated as shown in table 3:

TABLE 3 node sampling probability distribution

Node	Sampling probability	Node	Sampling probability	Node	Sampling probability	Node	Sampling probability
								1	0.10447761	9	0.02238806	17	0.02238806	25	0.02238806
2	0.06716418	10	0.02238806	18	0.02238806	26	0.02238806
								3	0.06716418	11	0.02238806	19	0.02238806	27	0.02238806
4	0.06716418	12	0.02238806	20	0.02238806	28	0.02238806
								5	0.06716418	13	0.02238806	21	0.02238806	29	0.02238806
6	0.06716418	14	0.02238806	22	0.02238806	30	0.02238806
								7	0.02238806	15	0.02238806	23	0.02238806	31	0.02238806
8	0.02238806	16	0.02238806	24	0.02238806

Setting sampling proportionp=0.2。

According to sampling probabilitypSample node set can be knownSIs of the scale of |S|=⌈pIs calculated, and the calculated set of non-heavy sample nodes s= {2,3,5, 16, 21, 23, 28}.

Step 4: computing a set of non-duplicate sample nodesSCenter of bettery, get networkGThe median centrality approximation is shown in table 4:

table 4 networkGIntermediate center approximation

Node	Intermediate approximation	Node	Intermediate approximation	Node	Intermediate approximation	Node	Intermediate approximation
								1	0.04367816	9	0	17	0	25	0
2	0	10	0	18	0	26	0
								3	0.01149425	11	0	19	0	27	0
4	0.01379310	12	0	20	0	28	0
								5	0.01149425	13	0	21	0	29	0
6	0.01379310	14	0	22	0	30	0
								7	0	15	0	23	0	31	0
8	0	16	0	24	0		0

Since the approximation and the exact value of the median centrality adopt the same normalization coefficient, but the node sets involved between the approximation and the exact value are respectivelySAndVtherefore, there is no practical significance in comparing the actual numerical values between the two. The selection of key nodes and key edges mainly depends on the order of magnitude of the median centrality values. From the approximate distribution of the betweenness centrality, the importance of the nodes is as follows: 1,4,6,3,5,2,7,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31.

To analyze the effect of the median center approximation calculation method, table 5 gives the exact value distribution of the median center, and it is known that the node importance degree is: 1,2,3,4,5,6,7,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31.

Table 5 networkGCenter of median precision value

Node	Accurate value of medium number	Node	Accurate value of medium number	Node	Accurate value of medium number	Node	Accurate value of medium number
								1	0.82758621	9	0	17	0	25	0
2	0.31034483	10	0	18	0	26	0
								3	0.31034482	11	0	19	0	27	0
4	0.31034482	12	0	20	0	28	0
								5	0.31034482	13	0	21	0	29	0
6	0.31034482	14	0	22	0	30	0
								7	0	15	0	23	0	31	0
8	0	16	0	24	0		0

By comparing the approximate value and the accurate value sequencing result of the centrality of the medians, the accuracy rate reaches more than 90 percent except that the importance degree of the nodes 2,3 and 5 is changed and the sequences of other nodes are maintained.

When the node of the node set without the heavy sample is selected, the network needs to be traversed for accurate calculation of the medium number centrality

The shortest paths between any pair of nodes are counted, the frequency of each shortest path passing through the nodes and the edges is counted, and the medium centrality of the nodes and the edges is calculated. Because the upper limit of the number of node pairs in the network is +.>

，nIs the number of nodes in the network. The time complexity Θ (mn+n) of the exact computation of the median centrality ² log ⁿ )=Θ(n ³ ) The computational complexity is high and cannot be applied to a large-scale network. In the middle ofn=V represents the number of nodes in the network,m=i E represents the number of edges in the network. In order to solve the problem of central rapid calculation of medium numbers in a large-scale network, the network is replaced by a node set without heavy samples in the methodSThe number of node pairs is from

Reduced to->

The workload of shortest path traversal can be reduced to global shortest path traversal workloadp ² . Meanwhile, sample nodes generated based on feature vector centrality sampling represent important nodes in a network, shortest paths among the sample nodes represent typical shortest paths in the network, the relative sequence of the betweenness centrality accurate values of the nodes and the edges can be well reserved based on the approximate value of the betweenness centrality calculated by the shortest paths among the sample nodes, the betweenness centrality relative sequence of the nodes and the edges is a problem which is focused on in a real network application scene, and the order keeping performance of the betweenness centrality approximate value can effectively solve the real network application problem.

Example two

Referring to fig. 3, fig. 3 is a schematic diagram of a median center approximation calculation apparatus according to an embodiment of the invention. The device described in fig. 3 is applicable to an information network, a social network, an internet of things and a traffic network, and the embodiment of the invention is not limited. As shown in fig. 3, the apparatus may include:

a first computing module for computing a networkGThe characteristic vector centrality is obtained; center of the feature vector

=(x ₁ ，x ₂ ，…，x _n ) The method comprises the steps of carrying out a first treatment on the surface of the Above-mentionednRepresentation vector->

Component numbers of (2); above-mentionednEqual to the networkGThe total number of the middle nodes;

a first network construction module for calculating a network using feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ ；

A second network construction module for constructing multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node setS；

A second calculation module for calculating a set of non-heavy sample nodesSCenter of bettery, get networkGThe median centrality approximation.

Therefore, by implementing the intermediate number centrality approximation calculation device described in fig. 3, the intermediate number centrality of the non-heavy sample node set can be used as the intermediate number centrality approximation value of the original network by reasonably selecting the non-heavy sample node set, so that the scale of path search can be greatly reduced, and meanwhile, the intermediate number centrality approximation value also maintains the magnitude sequence relation of intermediate number centrality values of different nodes and edges as much as possible.

In another alternative embodiment, as shown in FIG. 3, the first computing module computes a networkGThe characteristic vector centrality is obtained by the specific way that:

constructing a networkGIs a contiguous matrix a of (a).

The above A=

If the network isGMiddle nodev _i And nodev _j With edges in betweena _i,j =1, a step of; otherwisea _i,j =0。/>

Based on the adjacency matrix A, a first characteristic equation is constructed.

The first characteristic equation is

。

is a feature vector.

In yet another alternative embodiment, as shown in FIG. 3, the first network construction module calculates the network using feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ The method specifically comprises the following steps:

Above-mentionedx ₀ =

Judging the component mean value x ₀ Whether or not it is smaller than a preset average weight numbercAnd obtaining a judging result.

The average weight numbercThe range of the values is as follows

≤c≤5⋅/>

For networksGIs the average degree of the node.

；

Multiplying the centrality of the feature vector by the vector coefficient tau to obtain a second feature vector; center of the feature vector

=(τ˙x ₁ ，τ˙x ₂ ，…，τ˙x _n )；

The node weight vector

=(μ ₁ ，μ ₂ ，…，μ _n ) The method comprises the steps of carrying out a first treatment on the surface of the Above-mentionedμ _i =⌈τ⋅x _i ⌉，1≤i≤nThe method comprises the steps of carrying out a first treatment on the surface of the Above-mentionedμ _i Is a non-negative integer; the node weight vector is the firstiIndividual componentsμ _i Characterizing a networkGMiddle (f)iPersonal nodev _i Corresponding weight number.

In yet another optional embodiment, the first network building module obtains the vector coefficients according to the determination result and by using a preset vector coefficient calculation modelτThe method specifically comprises the following steps:

The first vector coefficient calculation model is as follows

。

Wherein, the abovex _i ∈{x ₁ ，x ₂ ，…，x _n }；

Representing an upward rounding of the abovenThe number of components in the centrality of the feature vector; the saidz ₁ Is an integer greater than 0; the saidc ₁ For a preset average weightc。

The second vector coefficient calculation model is as follows

。

Wherein, the abovex _i ∈{x ₁ ，x ₂ ，…，x _n }；

Representing an upward rounding of the abovenRepresenting the number of components in the centrality of the feature vector; above-mentionedz ₂ Is an integer greater than 0; the saidc ₂ For a preset average weightc。

In yet another alternative embodiment, as shown in FIG. 3, the second network building block is derived from multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node setSThe method specifically comprises the following steps:

from multiple networksG ₂ In the method, the method is selected according to uniform random probability distribution hObtaining a first sample node set by the nodes; above-mentionedh=⌈p⋅n⌉ above

If the second judgment result is no, the first sample node set is confirmed to be a non-heavy sample node set

。

In yet another alternative embodiment, as shown in FIG. 3, the second calculation module calculates a set of no-heavy sample nodesSCenter of bettery, get networkGThe median centrality approximation value specifically comprises:

Using the medium number centrality calculation model to collect the non-heavy sample nodesSProcessing to obtain a sample node setSIs a median centrality of (a).

The above-mentioned medium centrality calculation model is:

in the method, in the process of the invention,Bet(v)representing nodesvThe centrality of the medium number,Bet(e)representing edgeseThe centrality of the medium number,σ（v _i， v _j ) Is a nodev _i To the nodev _j Is used to determine the number of shortest paths,σ（v _i， v _j│ v) Is a nodev _i To the nodev _j Through the process ofNode

Aggregating the sample nodesSIs determined as a networkGIs approximated to the median centrality of (a) to obtain a networkGNode betweenness centrality approximation.

Example III

Referring to fig. 4, fig. 4 is a schematic structural diagram of another intermediate center-to-center approximation calculation apparatus according to an embodiment of the present invention. The device described in fig. 4 is applicable to an information network, a social network, an internet of things and a traffic network, and the embodiment of the invention is not limited. As shown in fig. 4, the apparatus may include:

a memory 401 storing executable program codes;

a processor 402 coupled with the memory 401;

the processor 402 invokes executable program code stored in the memory 401 for performing the steps in the medium centrality approximation calculation method of feature vector centrality described in embodiment one.

Example IV

The embodiment of the invention discloses a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the steps in the medium centrality approximation calculation method characterized by the feature vector centrality described in the embodiment.

Example five

The embodiment of the invention discloses a computer program product, which comprises a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the steps in the method for calculating the centrality approximation of the betweenness of the centrality vectors described in the embodiment.

Finally, it should be noted that: the embodiment of the invention discloses a method and a device for approximate calculation of medium centrality, which are only disclosed as the preferred embodiment of the invention, and are only used for illustrating the technical scheme of the invention, but not limiting the technical scheme; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method for central approximation calculation of a betweenness, comprising:

step 101, computing networkGThe characteristic vector centrality is obtained; the number of components in the centrality of the characteristic vector is equal to the networkGThe total number of the middle nodes;

102, calculating a network by utilizing the centrality of the feature vectorsGNode weight vector of (a) to construct a multiple networkG ₂ The method comprises the steps of carrying out a first treatment on the surface of the Specific:

calculating the mean value of each component of the centrality of the feature vector to obtain a component mean value;

judging whether the component mean value is smaller than a preset average weight numbercObtaining a judgment result;

according to the judgment result, a preset vector coefficient calculation model is utilized to obtain a vector coefficientτ；

Centering the feature vector with the vector coefficientsτMultiplying to obtain a second feature vector;

rounding up each component in the second feature vector to obtain a node weight vector;

using the node weight vector to determine the node weight vector for the networkGProcessing to obtain multiple networksG ₂ ；

Step 103, from the multiple networkG ₂ Sample nodes are selected to obtain a non-heavy sample node setSThe method comprises the steps of carrying out a first treatment on the surface of the Specific:

from the multiple networksG ₂ In the method, the method is selected according to uniform random probability distributionhObtaining a first sample node set by the nodes; the said h=⌈p⋅n⌉; the saidpRepresenting a preset sampling proportion; the saidnFor networksGThe total number of the middle nodes;

judging whether repeated nodes exist in the first sample node set or not to obtain a second judging result;

when the second judgment result is yes, deleting repeated nodes in the first sample node set, selecting new nodes from a method of uniform random probability distribution in a multiple network, and adding the new nodes into the first sample node set to enable the total number of the nodes in the first sample node set to reachhTriggering and executing the judgment whether repeated nodes exist in the first sample node set to obtain a second judgment result;

when the second judgment result is no, the first sample node set is confirmed to be a no-heavy sample node setS；

104, calculating the node set without the heavy sampleSCenter of bettery, get networkGA median centrality approximation; specific:

using a median centrality calculation model for the non-heavy sample node setSProcessing to obtain a sample node setSIs the median centrality of (2);

the medium centrality calculation model is as follows:

in the method, in the process of the invention,Bet(v)representing nodesvThe centrality of the medium number,Bet(e)representing edgeseThe centrality of the medium number,σ（v _i， v _j ) Is a node v _i To the nodev _j Is used to determine the number of shortest paths,σ（v _i， v _j│ v) Is a nodev _i To the nodev _j Through the node

The number of shortest paths of (a);σ（v _i， v _j│ e) Is a nodev _i To the nodev _j Through the edgeeThe number of shortest paths of (a);Srepresenting a set of non-heavy sample nodesS；

2. The method of claim 1, wherein the vector coefficients are obtained by using a predetermined vector coefficient calculation model according to the determination resultτComprising:

when the judgment result is yes, calculating a minimum integer for enabling a preset first vector coefficient calculation model to be establishedz ₁ The minimum integer is setz ₁ As vector coefficients

Is a value of (2);

the first vector coefficient calculation model is

；

Wherein, the saidx _i Any component of the feature vector centrality;

representing an upward rounding;saidnA component number in the centrality of the feature vector; the saidz ₁ Is an integer greater than 0; the saidc ₁ For a preset average weightc；

When the judging result is NO, calculating a maximum integer for enabling a preset second vector coefficient calculation model to be establishedz ₂ The maximum integer is calculated z ₂ Is the inverse of (2) as a vector coefficientτIs a value of (2);

the second vector coefficient calculation model is

；

Wherein, the saidx _i Any component of the feature vector centrality;

representing an upward rounding; the saidnA component number in the centrality of the feature vector; the saidz ₂ Is an integer greater than 0; the saidc ₂ For a preset average weightc。

3. The method of median centrality approximation calculation of claim 1, wherein the calculation networkGFeature vector centrality, resulting in feature vector centrality, comprising:

constructing a networkGAdjacent matrix of (a)A；

Based on the adjacency matrixAConstructing a first characteristic equation;

the first characteristic equation is

；

is a feature vector;

4. A medium centrality approximation calculation apparatus, the apparatus comprising:

a first computing module for computing a networkGThe characteristic vector centrality is obtained; the number of components in the centrality of the characteristic vector is equal to the network GThe total number of the middle nodes;

a first construction module for calculating a network using the feature vector centralityGNode weight vector of (a) to construct a multiple networkG ₂ The method comprises the steps of carrying out a first treatment on the surface of the Specific:

A second building block for building up a network from the multiple networksG ₂ Sample nodes are selected to obtain a non-heavy sample node setSThe method comprises the steps of carrying out a first treatment on the surface of the Specific:

from the multiple networksG ₂ In the method, the method is selected according to uniform random probability distributionhObtaining a first sample node set by the nodes; the saidh=⌈p⋅n⌉; the saidpRepresenting a preset sampling proportion; the saidnFor networksGThe total number of the middle nodes;

A second calculation module for calculating the non-heavy sample node setSCenter of bettery, get networkGA median centrality approximation; specific:

the medium centrality calculation model is as follows:

The number of shortest paths of (a);σ（v _i， v _j│ e) Is a nodev _i To the nodev _j Through the edgeeThe number of shortest paths of (a); SRepresenting a set of non-heavy sample nodesS；

5. A medium centrality approximation calculation apparatus, the apparatus comprising:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform the betweenness centrality approximation method of any of claims 1-3.

6. A computer storage medium storing a computer program, the computer program stored in the storage medium being called by a processor to perform the betweenness centrality approximation calculation method according to any one of claims 1-3.