CN114692200B

CN114692200B - Privacy-protected distributed graph data feature decomposition method and system

Info

Publication number: CN114692200B
Application number: CN202210341719.6A
Authority: CN
Inventors: 郑宜峰; 王松磊
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2024-06-14
Anticipated expiration: 2042-04-02
Also published as: CN114692200A

Abstract

The invention discloses a method and a system for decomposing the characteristics of distributed graph data of privacy protection, wherein in the method provided by the invention, graph nodes which are randomly sampled and hold local graph data encrypt own degree information and send the encrypted degree information to a first computing terminal and a second computing terminal, the first computing terminal and the second computing terminal cooperatively calculate and generate first encryption degree distribution information and second encryption degree distribution information in a ciphertext domain, the graph nodes can determine a target interval to which the degree of the graph nodes belongs, then proper sampling sensitivity sampling noise is selected, a side with a weight of 0 false is added in a real graph adjacent matrix, sparse representation of the matrix is realized in a matrix triplet form, the triplet set with the false side is encrypted and ciphertext is respectively sent to the first computing terminal and the second computing terminal for characteristic decomposition of encryption, and on the premise of protecting the privacy of the nodes, the sparsity of the graph data is kept and the effectiveness of the characteristic decomposition is ensured.

Description

Privacy-protected distributed graph data feature decomposition method and system

Technical Field

The invention relates to the technical field of information security, in particular to a distributed graph data characteristic decomposition method and system for privacy protection.

Background

Graph data can describe complex interrelationships between entities, and a wide variety of analysis tasks can be performed on the information-rich Graph data, which can become more challenging when the Graph data is presented in a distributed form. By distributed form, it is meant that each entity can only acquire part of the data (named partial graph data) about the entire graph. For example, in a phonebook network, each user is a graph node, and each user's phonebook represents the contact (i.e., edge in graph data) between that user and other users. Obviously, if the telephone directory network is modeled as a graph, no entity can directly acquire the information of the whole graph, and instead, each user can only know a part of connection relations (namely, the contact information contained in the own telephone directory).

Collecting such distributed graph data for graph task analysis can raise significant privacy concerns (e.g., no one would like to share his own phonebook). Thus, if the local graph data owned by each user is not protected, they may be unwilling to participate in the analysis of such graph tasks. It is therefore necessary to introduce privacy preserving mechanisms in the task analysis performed on such distributed graph data so that valuable graph analysis tasks can be performed without compromising each user's sensitive and private local graph data.

In the graph analysis task, feature decomposition is a very popular basic task. Based on the feature decomposition of the graph data, the adjacency matrix of the graph data is acted to generate feature values and feature vectors, basic information such as community structure detection, community important member discovery, graph division, webpage ordering and the like can be provided for various graph analysis tasks, but no feature decomposition scheme for realizing privacy protection on the distributed graph data exists at present.

Accordingly, there is a need for improvement and advancement in the art.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method and a system for decomposing the characteristics of distributed graph data for privacy protection, which aim to solve the problem that a characteristic decomposition scheme for privacy protection is not realized on the distributed graph data in the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

In a first aspect of the present invention, there is provided a method for decomposing a feature of privacy-preserving distributed graph data, the method comprising:

Generating an initial set by target graph nodes in a global graph according to local graph data, wherein the initial set comprises a plurality of groups of triples, and each group of triples comprises a node mark of the target graph node, a node mark of one adjacent graph node of the target graph node and a weight of a connecting edge of the target graph node and the adjacent graph node;

The target graph node encrypts the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, the first encryption degree information is sent to a first computing terminal, and the second encryption degree information is sent to a second computing terminal;

the first computing terminal and the second computing terminal generate first encryption degree distribution information and second encryption degree distribution information of global map data according to the first encryption degree information and the second encryption degree information sent by the target map nodes;

The target graph node determines a target interval to which the degree of the target graph node belongs according to the received first encryption degree distribution information and the second encryption degree distribution information, determines target sampling sensitivity according to boundary information of the target interval, samples noise from Laplace distribution according to the target sampling sensitivity, adds a false triplet into the target combination according to the noise, and generates a target set, wherein the weight value in the false triplet is 0;

The target graph node encrypts the target set based on additive secret sharing to obtain a first encryption set and a second encryption set, the first encryption set is sent to a first computing terminal, and the second encryption set is sent to a second computing terminal;

And the first computing terminal and the second computing terminal conduct feature decomposition on the global graph data according to the first encryption set and the second encryption set corresponding to each node in the global graph.

The method for decomposing the data characteristics of the distributed graph with the privacy protection, before the target graph node encrypts the degree of the target graph node based on function secret sharing, comprises the following steps:

The first computing terminal and/or the second computing terminal randomly selects part of nodes from all nodes of the global graph to send a degree encryption request;

And after the target graph node receives the degree encryption request, the target graph node encrypts the degree of the target graph node based on function secret sharing.

The privacy-protected distributed graph data feature decomposition method, wherein the target graph node encrypts the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, comprises the following steps:

the target graph node obtains the first encryption degree information and the second encryption degree information output by a first preset algorithm in function secret sharing, wherein the input of the first preset algorithm comprises the degree of the target graph node.

The method for decomposing the characteristics of the distributed graph data with privacy protection, wherein the first computing terminal and the second computing terminal generate first encryption degree distribution information and second encryption degree distribution information of global graph data according to the first encryption degree information and the second encryption degree information sent by a plurality of target graph nodes, comprises the following steps:

The first computing terminal inputs the first encryption degree information and a target degree of the target graph node into a second preset algorithm in function secret sharing to obtain first encryption degree comparison information between the degree of the target graph node and the target degree, and the second computing terminal inputs the second encryption degree information and the target degree of the target graph node into the second preset algorithm to obtain second encryption degree comparison information between the degree of the target graph node and the target degree;

when the degree of the target graph node is equal to the target degree, the sum of the degree of the target graph node and the first encryption degree comparison information and the second encryption degree comparison information of the target degree is 1, otherwise, the sum is 0;

The first computing terminal acquires first encryption histogram information, the second computing terminal acquires second encryption histogram information, the first encryption histogram information comprises first encryption map node quantity information corresponding to each target degree, each first encryption map node quantity information is the sum of all first encryption degree comparison information corresponding to one target degree, the second encryption histogram information comprises second encryption map node quantity corresponding to each target degree, and each second encryption map node quantity information is the sum of all second encryption degree comparison information corresponding to one target degree;

acquiring first encryption degree information between the degrees of the plurality of target graph nodes and each target degree as first encryption degree histogram information, and acquiring second encryption degree information between the degrees of the plurality of target graph nodes and each target degree as second encryption degree histogram information by the second computing terminal;

The first computing terminal and the second computing terminal determine the first encryption degree distribution information and the second encryption degree distribution information according to the first encryption degree histogram information and the second encryption degree histogram information.

The privacy-preserving distributed graph data feature decomposition method comprises the steps that each bit value in the first encryption degree distribution information and the second encryption degree distribution information is 0 or 1; the first computing terminal and the second computing terminal determine the first encryption degree distribution information and the second encryption degree distribution information according to the first encryption degree histogram information and the second encryption degree histogram information, and the method comprises the following steps:

the first computing terminal and the second computing terminal determine the number of target nodes in each interval according to the number of target graph nodes for sending encryption degree information and the preset interval number;

the first computing terminal sequentially adds the first encryption map node quantity information in the first encryption map histogram information into a first accumulator according to the corresponding target degree size sequence, and the second computing terminal sequentially adds the second encryption map node quantity information in the second encryption map histogram information into a second accumulator according to the corresponding target degree size sequence;

After the first encryption map node number information and the second encryption map node number information are respectively added to the first accumulator and the second accumulator each time, the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new bit value in the first encryption degree distribution information according to the first encryption comparison result, obtains a second encryption comparison result according to the second accumulator, and generates a new bit value in the second encryption degree distribution information according to the second encryption comparison result, wherein when the sum of the first accumulator and the second accumulator is not smaller than the target node number, the exclusive-OR gate operation result of the first encryption comparison result and the second encryption comparison result is 1, and when the sum of the first accumulator and the second accumulator is smaller than the target node number, the exclusive-OR gate operation result of the first encryption comparison result and the second encryption comparison result is 0;

The first computing terminal turns over the latest one-bit numerical value in the first encryption degree distribution information to obtain a turned-over bit, the first computing terminal obtains a first secret sharing share based on additive secret sharing calculation, the second computing terminal obtains a second secret sharing share based on additive secret sharing calculation, wherein the sum of the first secret sharing share and the second secret sharing share is the product of a first value and a second value, the first value is the exclusive OR gate operation result of the turned-over bit and the latest one-bit in the second encryption degree distribution information, and the second value is the sum of the first accumulator and the second accumulator;

the first computing terminal updates the value of the first accumulator to the first secret sharing share, adds next first encryption map node quantity information to the first accumulator, and the second computing terminal updates the value of the second accumulator to the second secret sharing share, and adds next second encryption map node quantity information to the second accumulator.

The method for decomposing the data characteristics of the privacy-preserving distributed graph, wherein the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new one-bit value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, generates a new one-bit value in the second encryption degree distribution information according to the second encryption comparison result, and comprises the following steps:

the third computing terminal generates a first secret key, a second secret key, a first random number and a second random number according to a third preset algorithm shared by function secrets and the target node number, sends the first secret key and the first random number to the first computing terminal, sends the second secret key and the second random number to the second computing terminal,

The first computing terminal sends the sum of the first random number and the first accumulator to the second computing terminal, the second computing terminal generates a second random number, and the sum of the second random number and the second accumulator is sent to the first computing terminal, so that the first computing terminal and the second computing terminal acquire scrambling input values, and the scrambling input values are the sums of the first random number, the second random number, the first accumulator and the second accumulator;

The first computing terminal inputs the replacement input value and the first secret key to a fourth preset algorithm shared by function secrets to obtain the first encryption bit, the second computing terminal inputs the replacement input value and the second secret key to the fourth preset algorithm shared by function secrets to obtain a second encryption bit, wherein when the sum of the first accumulator and the second accumulator is smaller than the number of target nodes, the result of the exclusive-OR operation of the first encryption bit and the second encryption bit is 1, and when the sum of the first accumulator and the second accumulator is not smaller than the number of target nodes, the result of the exclusive-OR operation of the first encryption bit and the second encryption bit is 0;

The first computing terminal takes the first encryption bit as a new value in the first encryption degree distribution information, the second computing terminal takes the second encryption bit as a new value in the second encryption degree distribution information after overturning, or takes the first encryption bit as a new value in the first encryption degree distribution information after overturning, and the second computing terminal takes the second encryption bit as a new value in the second encryption degree distribution information.

The method for decomposing the data characteristics of the distributed graph with privacy protection comprises the steps that the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new one-bit value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, generates a new one-bit value in the second encryption degree distribution information according to the second encryption comparison result, and comprises the following steps:

the first computing terminal acquires a first random number, the second computing terminal acquires a second random number, and the sum of the first random number and the second random number is the number of the target nodes;

The first computing terminal acquires first bit data, the second computing terminal acquires second bit data, the first bit data is bit data corresponding to the difference between the first accumulator and the first random number, and the second bit data is bit data corresponding to the difference between the second accumulator and the second random number;

The first computing terminal and the second computing terminal input bit data held by the first computing terminal and the second computing terminal into a parallel prefix adding circuit, execute exclusive-OR gate calculation and AND gate calculation to respectively obtain the most significant bit of the first bit data and the most significant bit of the second bit data, when the sum of the first accumulator and the second accumulator is smaller than the number of target nodes, the exclusive-OR gate operation result of the most significant bit of the first bit data and the second bit data is 1, and when the sum of the first accumulator and the second accumulator is not smaller than the number of target nodes, the exclusive-OR gate operation result of the most significant bit of the first bit data and the second bit data is 0;

The first computing terminal takes the most significant bit of the first bit data as a new value in the first encryption degree distribution information, the second computing terminal takes the most significant bit of the second bit data as a new value in the second encryption degree distribution information after overturning, or takes the most significant bit of the first bit data as a new value in the first encryption degree distribution information after overturning, and the second computing terminal takes the most significant bit of the second bit data as a new value in the second encryption degree distribution information.

The method for decomposing the characteristics of the distributed graph data with privacy protection, wherein the first computing terminal and the second computing terminal decompose the characteristics of the global graph data according to the first encryption set and the second encryption set corresponding to each node in the global graph, comprises the following steps:

The first computing terminal obtains a first encryption adjacency matrix according to the first encryption set corresponding to each node in the global graph, and the second computing terminal obtains a second encryption adjacency matrix according to the second encryption set corresponding to each node in the global graph;

The first computing terminal and the second computing terminal perform dimension reduction on the sum of the first encryption adjacent matrix and the second encryption adjacent matrix based on additive secret sharing to obtain a dimension reduction matrix;

the first computing terminal and the second computing terminal execute a QR algorithm on the dimension reduction matrix based on the additive secret to acquire an encryption characteristic value and an encryption characteristic vector of the global map data;

For square root operation in the dimension reduction process, the first computing terminal and the second computing terminal obtain the reciprocal of the square root through iterative computation of a second computing formula based on additive secret sharing;

The second calculation formula is as follows:

y '_n represents the calculation result of the reciprocal of the root calculated in the nth iteration, and x' represents the number of roots to be opened.

The method for decomposing the data characteristics of the distributed graph with the privacy protection, wherein the first computing terminal and the second computing terminal execute a QR algorithm on the dimension reduction matrix based on the additive secret, comprises the following steps:

In the ith iteration in the QR algorithm, the first computing terminal acquires a first encryption matrix and a second encryption matrix, wherein the sum of the first encryption matrix and the second encryption matrix is a target matrix, and the target matrix is a matrix formed by elements with positions (i, i), (i, i+1), (i+1, i), (i+1 ) in a plaintext Givens rotation matrix used in the ith iteration;

For matrix multiplication operation in the QR algorithm, the first computing terminal and the second computing terminal realize multiplication operation in additive secret sharing by taking the first encryption matrix and the second encryption matrix as two secret shares of a Givens rotation matrix in the QR algorithm based on a randomly generated multiplication tuple matrix.

In a second aspect of the present invention, there is provided a privacy-preserving distributed graph data feature decomposition system, the system comprising a target graph node, a first computing terminal and a second computing terminal, the target graph node, the first computing terminal and the second computing terminal being adapted to perform relevant steps in the privacy-preserving distributed graph data feature decomposition method as provided in the first aspect of the present invention.

Compared with the prior art, the invention provides a privacy-protected distributed graph data feature decomposition method and a privacy-protected distributed graph data feature decomposition system, in the privacy-protected distributed graph data feature decomposition method, graph nodes which are randomly sampled and hold local graph data encrypt own degree information and send the encrypted degree information to a first computing terminal and a second computing terminal, the first computing terminal and the second computing terminal cooperatively generate first encryption degree distribution information and second encryption degree distribution information in a ciphertext domain, the graph nodes can determine a target interval to which the self degree belongs, and then select proper sampling sensitivity sampling noise, a weight value is added to a real graph adjacent matrix, a false edge is added to the graph node, sparse representation of the matrix triplet is realized, the graph nodes encrypt a triplet set added with false edges to obtain a first encryption set and a second encryption set and send the first encryption set and the second encryption set to the first computing terminal respectively, the first computing terminal and the second computing terminal conduct feature decomposition on the basis of the first encryption set and the second encryption set, the privacy-protected node is guaranteed, and the privacy-protected feature decomposition feature of the graph data is effectively decomposed on the premise that the privacy-protected feature decomposition is guaranteed.

Drawings

FIG. 1 is a flow chart of an embodiment of a distributed graph data feature decomposition method for privacy protection provided by the present invention;

Fig. 2 is a schematic diagram of an application scenario of an embodiment of a privacy preserving distributed graph data feature decomposition method provided by the present invention;

FIG. 3 is a density function of Laplace distributions of different sensitivities;

FIG. 4 is a schematic diagram of a secure degree histogram estimation algorithm in an embodiment of a method for decomposing data features of a distributed graph for privacy protection according to the present invention;

FIG. 5 is a schematic diagram of a secure degree distribution information generation algorithm in an embodiment of a method for decomposing data features of a distributed graph for privacy protection according to the present invention;

FIG. 6 is a schematic diagram of a parallel prefix-add circuit in an embodiment of a distributed graph data feature decomposition method for privacy protection provided by the present invention;

FIG. 7 is a schematic diagram of a partial graph data encryption algorithm in an embodiment of a distributed graph data feature decomposition method for privacy protection provided by the present invention;

FIG. 8 is a schematic diagram of the Arnoldi algorithm in plaintext;

FIG. 9 is a schematic diagram of the Lanczos algorithm in plaintext;

FIG. 10 is a schematic diagram of Arnoldi algorithm of ciphertext domain in an embodiment of a distributed graph data feature decomposition method for privacy protection provided by the present invention;

FIG. 11 is a schematic diagram of an iterative process of a conventional QR algorithm;

FIG. 12 is a schematic diagram of a QR algorithm of a ciphertext domain in an embodiment of a distributed graph data feature decomposition method of privacy protection provided by the present invention;

Fig. 13 is a schematic diagram of an iterative process of a QR algorithm in an embodiment of a method for decomposing a data feature of a distributed graph for privacy protection according to the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1

The embodiment provides a feature decomposition method of distributed graph data with privacy protection, which aims to realize feature decomposition of ciphertext domains on the distributed graph data in a privacy protection mode, and generates feature values and feature vectors based on the feature decomposition of the graph data acting on an adjacency matrix of the graph data. As shown in fig. 2, in the method provided in this embodiment, three entities are included: n users U _i (i ε [1, N ]), two cloud servers (computing terminals) CS ₁ and CS ₂, and analysts. Wherein each user holds local map data (e.g., the address book held by each user in the address book scene) to form a complete map (Graph). In the figure, each user represents a graph node, and the relationships between users (such as the contact modes in the address book scene) represent edges in the graph. The size of the local data held by each user represents the degree of each node (e.g., the number of contacts per user in an address book scenario).

The distributed graph may be represented in the form of an adjacency matrix A of size N, where each row A [ i, ] (i ε [1, N ]) in the adjacency matrix represents the local graph data held by user U _i. For example, in an unweighted graph, a [ i, j ] =1 may indicate that there is a relationship between users U _i and U _j; in one weight graph, a [ i, j ] =v may represent that there is a relationship between users U _i and U _j, and that the affinity is v. These users allow the cloud servers to perform analysis tasks on their federated data (i.e., the complete graph data) of the local graph data. However, due to privacy concerns, during the whole feature decomposition process, each user U _i does not want to disclose its own private local graph data a [ i, j ], since its own sensitive information is leaked, and for this purpose, a feature decomposition method for privacy protection on a distributed graph is provided.

In the method provided by the present embodiment, the participants providing the cloud computing service are two (denoted CS ₁ and CS ₂), and come from different trust domains. This can provide services in a realistic industrial scenario with two competing cloud service providers. CS ₁ and CS ₂ assist in performing the feature decomposition tasks and cannot obtain sensitive information of any user throughout, while obtaining the results of feature decomposition in the clear because the results of feature decomposition are also encrypted. Both cloud service merchants are "honest but curious" and non-collusion. That is, each cloud server as a computing terminal performs faithfully the security protocol we designed to operate, while they try to infer sensitive data of users independently from the process of data collection and feature decomposition. In this scenario, the method provided by the invention consists of the following two parts:

1. Secure distributed graph data gathering: at this stage, the cloud server gathers the encrypted local graph data A [ i, ] (i ε [1, N ]) for each user U _i, forming a complete encrypted adjacency matrix to support subsequent feature decomposition. Meanwhile, in the process of completing collection of the graph data, the sparsity of the graph data can be well reserved by the method provided by the embodiment, and the characteristic can greatly save calculation and communication cost of subsequent feature decomposition on the encrypted adjacency matrix.

2. Safe feature decomposition: the cloud server, after collecting the complete encrypted adjacency matrix, cooperatively performs a feature decomposition of the ciphertext domain. Specifically, the dimension reduction of the matrix is completed in the ciphertext domain, and then the QR algorithm is implemented in the ciphertext domain to obtain the complete eigenvalues and eigenvectors of the small matrix after the dimension reduction.

The method provided in this embodiment employs two cryptographic techniques: the cryptographic techniques employed in the method provided in this embodiment are described below with respect to additive secret sharing and functional secret sharing.

Additive secret sharing

An Additive Secret Sharing (ASS) representation of a privacy value xIt has two forms:

1. arithmetic secret sharing: Wherein/> < X > ₁ and < x > ₂ are held by two computing parties, respectively.

2. Boolean secret sharing: Wherein/> ₁ and ₂ are held by two computing parties, respectively.

With the secret sharing described above, two computing participants can securely perform linear and multiplicative computations without obtaining plaintext data.

1) Safe linear calculation: linear computation in secret sharing requires only two computing parties to perform local computation. I.e., if alpha, beta, gamma are constants in the clear,And/>Is a secret shared value, then

Each party can use the ciphertext they hold to make local calculations.

2) Secure multiplication computation: calculating the product of two secret sharing values requires two parties to communicate in one round. I.e. for calculation ofTwo parties need to share one multiplication tuple/>, in advanceEach party P _i then calculates < e > _i＝<x>_i-_i and < f > _i＝<y>_i-<v>_i locally. Each party P _i then sends < e > _i and < f > _i to each other to obtain the plaintext e and f. Finally, P _i, the product ciphertext held by i ε {0,1} is

<z>_i＝i×e×f+f×_i+e×<v>_i+Kw)_i

The linear and multiplication operations in boolean secret sharing are similar to those in arithmetic sharing, except that exclusive-or is used"Replace addition operation, use" AND/>"Replace multiplication operations".

Secret sharing of functions

Function Secret Sharing (FSS) is an extension of additive secret sharing, which can accomplish secure function computation with a low traffic. Thus, FSS has great performance advantages over common secret sharing under high latency networks. In general, a two-party FSS-based privacy function f consists of two abstract algorithms:

1. (k ₁,k₂)←Gen(1^λ, f): given a security parameter λ and a function description f, two FSS keys k ₁,k₂ are output, one for each computing party.

< F (x) > _i←Eval(k_i, x): given an FSS key k _i and an evaluation point x, an additively secret shared share of the evaluation result < f (x) > _i is output.

The FSS may ensure that if an attacker learns only one of the two FSS keys, he cannot obtain any information about this objective function and the calculated output f (x).

As shown in fig. 1, the method for decomposing the data features of the distributed graph for privacy protection provided in this embodiment includes the steps of:

S100, generating an initial set by target graph nodes in a global graph according to local graph data, wherein the initial set comprises a plurality of groups of triples, and each group of triples comprises a node mark of the target graph node, a node mark of one adjacent graph node of the target graph node and weights of connecting edges of the target graph node and the adjacent graph node.

The target graph node may be any node in the global graph, and for the feature decomposition task of the distributed graph data, the local graph data of each user needs to be collected first so as to form a complete encrypted adjacency matrix a about the graph, so as to perform the feature decomposition of the subsequent ciphertext domain. Specifically, each row A [ i, ] in the adjacency matrix represents the local graph data for each user. At this stage each user U _i shares his local graph data A [ i, ] in ciphertext to both cloud servers CS ₁ and CS ₂. To achieve higher efficiency, each user U _i is made to encrypt A [ i, ] with the aid of ASS. The simple application of this technique by each node on its own local graph data will result in a high overhead, since the distributed graph data is typically sparse, i.e. each row a i of the adjacency matrix is mostly 0 elements, and only a small part is valid data (e.g. the phonebook of each user has only a small number of phone numbers). The sparse representation of the matrix is used in this embodiment, the basic idea being to only process and submit the (encrypted) values of the non-zero elements and their positions. Specifically, each user U _i stores only the positions and weights of non-zero elements: { (i, j, A [ i, j ])}. Each element in the set is a matrix triplet: (i, j, a [ i, j ]), where i represents the node label of the target graph node, j represents the node label of one neighboring node of the target graph node, i.e., each element represents an edge between node (i.e., user) U _i and node U _j, a [ i, j ] represents the weight of the edge, and the number of elements in the set represents the degree (degree) of node U _i. Thereafter, each user U _i encrypts the weights A [ i, j ] using ASS technology. Specifically, an edge weight is givenUser generation of a random number/>The two shares of the arithmetic ciphertext of the weight a [ i, j ] are then < a [ i, j ] > ₁ =a [ i, j ] -r and < a [ i, j ] > ₂ =r, respectively. Finally, the user U _i transmits { (i, j, < a [ i, j ] > ₁) } and { (i, j, < a [ i, j ] > ₂) } to the first computing terminal CS ₁ and the second computing terminal CS ₂, respectively. However, since the number of non-zero elements represents the number of edges (i.e., degrees), this simple encryption would leak the degree information of each node to the computing terminal. Based on this information, there have been documents indicating that the computing terminal can infer various privacy information of the user U _i. Meanwhile, if the distributed graph is an unauthorized graph (i.e. the element in the adjacency matrix is 0 or 1), it is not meaningful to encrypt only the edge with the non-0 weight, because the existence of the edge reveals that the weight of the edge is 1, and then the computing terminal can obtain the complete graph adjacency matrix. Thus, the challenge here is how to protect the degree information of each user U _i while using a sparse matrix triplet coding scheme. While not affecting the effectiveness of subsequent feature decomposition.

To address this challenge, the present embodiment provides a method to find a theoretical balance (trade-off) between user-level information and matrix sparsity. Specifically, each user U _i adds some false edges (i.e., (i, j, 0)) with weight 0 at random empty positions in { (i, j, A [ i ]:) }, and then applies ASS technology to the weights of the real edges and the weights of the false edges simultaneously to encrypt. Since in the ASS technique, indistinguishability of ciphertext can be ensured even if the same (e.g., 0) value is encrypted multiple times. Therefore, the method can not only make the cloud server unable to distinguish the real edge from the false edge, but also does not influence the effectiveness of the subsequent security feature decomposition process (because the weight of the false edge is 0). While the degree information of the user U _i may be protected (because some false edges are added). The challenge here remains in how to choose the appropriate number of false edges to achieve a theoretical balance between sparsity and privacy. Specifically, too many false edges can weaken the sparsity of the adjacency matrix of the collected ciphertext graph, increase subsequent system overhead, and too few false edges can result in weaker privacy protection.

In one possible implementation, each user U _i samples one noise n _i from a discrete laplace distribution (definition 2), sharing only its own noisy local data.

The laplace distribution is one of the most popular noise distributions, which can be defined as:

When the probability density function of a discrete random variable satisfies the following equation, then the random variable obeys the Laplacian distribution Lap (E, delta).

Where μ is the mean of the laplace distribution.

Where Δ is the sensitivity of the function f:

A＝max|f(x)-f(x')|

which can be used to measure how much data from a single entity can change the data output in the worst case.

According to the definition of the laplace distribution, if no settings are made, when each user U _i samples a noise n _i from the discrete laplace distribution (definition 2), the sensitivity delta of the laplace distribution here should be set to delta = d _max-d_min, where d _max,d_min is the maximum and minimum possible node in the distribution diagram, respectively. Then n _i false edges (i.e., (i, j, 0)) with a weight of 0 are added at random positions in the partial graph data. And finally, applying an ASS technology to encrypt the weight of the real edge and the weight of the virtual false edge, and then sending the ciphertext to each cloud server. Thus, privacy protection of each node degree can be realized, and existence of each edge can be protected.

While this approach is effective, it will result in a greater sensitivity Δ (theoretically N, i.e., the number of nodes in the graph) which will result in sampled laplace noise N _i which will be very large, meaning that each user needs to add nearly N false edges, which can severely impact the sparsity of the graph and the performance of subsequent feature decomposition. The probability density function of the discrete laplace distribution of different Δ as shown in fig. 3. The graph reveals that a large sensitivity delta will make the shape of the density function of the laplace distribution more uniform. This characteristic indicates that the greater the sensitivity delta, the greater the probability that the user U _i will select a greater noise |n _i |. Conversely, a small sensitivity Δ (e.g., Δ=50 in fig. 3) will make the probability density function more concentrated, which means that the user U _i will choose a smaller noise |n _i | with a large probability. Thus, if all users U _i sample noise from a Laplace distribution with a greater sensitivity Δ, each user will be caused to add a significant number of false edges to their local graph data, thereby severely impacting the sparsity of the collected graph data.

In order to obtain better sparsity, in this embodiment, privacy protection based on the "barrel-dividing" concept is used, that is, each node in the graph is divided into different "barrels", so as to realize privacy protection of node degree information in the "barrels", specifically, the node in the graph is divided into several barrels, or a plurality of degree intervals are divided between the maximum value and the minimum value possible for the degree of the node in the graph, each barrel contains approximately equal number of users, and the degree of all the nodes in the barrel is in one cell, so that the cell [ d _p,d_q ] is in. Thus, all users within the same bucket may use a smaller sensitivity Δ=d _q-d_p. In order to safely barrel the nodes, the method provided by the embodiment further includes the steps of:

S200, encrypting the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, sending the first encryption degree information to a first computing terminal, and sending the second encryption degree information to a second computing terminal;

S300, the first computing terminal and the second computing terminal generate first encryption degree distribution information and second encryption degree distribution information of global map data according to the first encryption degree information and the second encryption degree information sent by the target map nodes.

The first encryption degree distribution information and the second encryption degree distribution information are two secret shares of barrel mapping, the first computing terminal and the second computing terminal can not obtain the barrel mapping of plaintext only by the secret shares held by the first computing terminal and the second computing terminal, after the first encryption degree distribution information and the second encryption degree distribution information are respectively sent to the node, the node can decrypt to obtain the barrel mapping, and further the barrel to which the first computing terminal and the second computing terminal belong is determined, and the sensitivity when sampling noise is set. In this embodiment, the bucket map is a string of bit strings, where element 1 shows the boundaries of the bucket, e.g., given d _max =10, and bucket map inter= 0001000001 shows that the user is divided into two buckets (bins): user's degree e [1,4] and user's degree e [5,10]. The following specifically describes how the bucket map is calculated in the ciphertext domain.

In order to safely divide the range of all possible values of the degrees of the nodes in the global graph by the first computing terminal CS ₁ and the second computing terminal CS ₂ into sections, the user needs to be classified, and the first computing terminal and the second computing terminal first need to estimate the encrypted degree histograms of all the nodes without obtaining any node plaintext degree information, that is, estimate the number of nodes corresponding to the available value of each degree. Specifically, given a common degree d _i and degree information d _j of a particular user U _j, CS _{1,2} needs to detect whether d _j＝d_i is present in the case where d _j and the detection result are ciphertext. To achieve this objective, the present embodiment mainly uses a distributed point function (distributed point function, hereinafter abbreviated as DPF) based on FSS. One DPF mechanism f _α,β (x) outputs β if x=α, otherwise outputs 0.

Similar to the general framework of FSS, a two-sided DPF mechanism based on FSS consists of two algorithms:

1. (k ₁,k₂)←Gen(1^λ, α, β): given one security parameter λ and α, β, two DPF keys k ₁,k₂ are output, each to one of the cloud servers CS ₁ and CS ₂.

< F _α,β(x)>_i←Eval(k_i, x): given a DPF key k _i and an evaluation point x, a secret sharing fraction < f _α,β(x)>_i of the evaluation result is output.

The pseudo code of the secure degree histogram estimation algorithm in this embodiment is shown in fig. 4.

This scheme utilizes a sampling strategy because having all nodes send their encryption degree d _i directly to the computing terminal would result in a high overhead. I.e. the computing terminal randomly samples S users (denoted SU _j}_j∈[1,S]) from the whole user population, letting these sampled users send their own encrypted degree information. That is, before the target graph node encrypts the degree of the target graph node based on the function secret sharing, the method includes:

The target graph node encrypts the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, and the method comprises the following steps:

The first computing terminal and the second computing terminal generate first encryption degree distribution information and second encryption degree distribution information of global graph data according to the first encryption degree information and the second encryption degree information sent by the target graph nodes, and the method comprises the following steps:

Specifically, each sampled user SU _j generates a DPF key (line 1 of Algorithm 3 in fig. 4) based on his degree d _j, with DPF parameters α, β set to d _j and 1, respectively. Thereafter, each user SU _j sends a key k _j,1 (i.e., the first encryption degree information) to the first computing terminal CS ₁ and a key k _j,2 (i.e., the second encryption degree information) to the second computing terminal CS ₂. After all sampled users send the DPF keys, each cloud server CS _t∈{1,2} evaluates Eval (k _j,t, i) with each key { k _j,t}_j∈[1,S] ] for all possible degrees iε [1, d _max ]. Finally, by summing these evaluations (line 6 of Algorithm 3), CS _t can obtain exactly how many sampled users each have their degree d _j equal to the encrypted share of i [1, d _max ]. The correctness is demonstrated as follows:

through the steps, the first computing terminal and the second computing terminal respectively hold the encrypted degree histogram estimation Then the first computing terminal and the second computing terminal further generate a bucket map in the ciphertext domain. The pseudo code of the algorithm to generate the bucket map is shown in fig. 5. Algorithm 4 in FIG. 5, which outputs encrypted bucket map/>(The encrypted bit string is shared in boolean secret), where element 1 shows the boundary of each bucket. For example, given d _max =10, inter= 0001000001 shows that users are divided into two buckets: user's degree e [1,4] and user's degree e [5,10]. After computing the encrypted bucket map, cloud servers CS ₁ and CS ₂ may store/>And sending the information to each user, and judging which barrel the user belongs to according to the degree of the user. How Algorithm 4 is implemented is described in detail below.

Algorithm 4 (line 1) first lets CS _1,2 calculate the bucket size sizeB of the plaintext (i.e., how many users each bucket needs to contain), then initializes an encrypted accumulatorCS _1,2 then uses the last estimated degree histogram/>The accumulator is added next to each other (line 4 of Algorithm 4). At the same time, add one eachThen, CS _1,2 determines in the ciphertext domain whether to/>And adds this comparison (encrypted with boolean secret sharing) to the bucket map/>(Line 5 of Algorithm 4). Specifically, if accu+. sizeB, inter [ i ] =1 shows a bucket boundary, otherwise inter [ i ] =0. Then, based on the result of the encryption comparison, CS _1,2 determines in the ciphertext domain whether an accumulator/>, which is to be encrypted, is neededAnd setting 0. Specifically, if inter [ i ] =1, indicating that a bucket boundary is present, the accumulator [ accu ] ^A needs to be set to 0 in preparation for accumulation for the next bucket. If inter [ i ] =0 then say that no bucket boundary occurs, then continue accumulating unchanged. The above steps are shown in line 6 of Algorithm 4. Wherein "+| -! "means" not "operations that can be done by having one of the CS _1,2 flip its secret share < inter [ i ] > ₁ or < inter [ i ] > ₂. Finally, CS _1,2 outputs the encrypted bucket map/>

In Algorithm 4, the addition operation may be performed by a protocol supported by the additive secret sharing itself, but the comparison operation of the ciphertext domainsAre not natively supported. Thus, two operations/>, which can be done in the ciphertext domain, are provided in this embodimentIs a method of (2). The first approach is based on function secret sharing FSS. It is better suited for high latency network scenarios because it requires a minimum number of interaction rounds between servers (at the cost of more local computation). The second approach is based on additive secret sharing ASS, which requires less local computation, but requires more online traffic and traffic theory between the two servers, and is therefore more suitable for low latency network scenarios.

In a first method, the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new value of one bit in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, generates a new value of one bit in the second encryption degree distribution information according to the second encryption comparison result, including:

Specifically, the first method mainly uses a distributed comparison function (distributed comparison function, hereinafter referred to as DCF) in function secret sharing to implement comparison operation of ciphertext domains. One DCFg _α,β (x) out β outputs 0 if x < α is input, otherwise. Similar to the general framework of FSS, a two-party DCF mechanism based on FSS consists of two algorithms:

1. (k ₁,k₂,r₁,r₂)←Gen(1^λ, αα, β): given one security parameter λ and α, β, two DCF keys k ₁,k₂ and two random numbers r ₁,r₂ are output, one for each of the two parties (where r ₁+r₂＝rⁱⁿ).

< G _α,β(x)>_i←Eval(k_i,x+rⁱⁿ): given a DCF key k _i and a scrambled (masked) input x+r ⁱⁿ, a secret share of the evaluation result < g _α,β(x)>_i is output.

The security assessment process of the DCF function requires only one round of online communication, i.e. computing terminals CS ₁ and CS ₂ send < x > _i+r_i, i e {1,2} to each other to disclose scrambled (masked) input x+r ⁱⁿ without compromising the privacy of encrypted input x. Next, how to complete based on DCF function is describedAnd (3) operating.

To accomplishSetting the related parameters as alpha= sizeB, beta=1, and the output domain as/>The generated DCF keys may then be sent to cloud servers CS ₁ and CS ₂, respectively. Note that in an actual working scenario, such work of preparing keys offline may be done by a third party server. After obtaining the relevant DCF key, CS _t∈{1,2} first swaps < accu > _t+r_t to disclose accu+r ⁱⁿ. Thereafter, they each evaluate Eval (k _t,accu+rⁱⁿ) locally, which would output <1> _t if accu < sizeB, otherwise <0> _t. Since Algorithm 4 requires CS _1,2 to output [1] ^B if accu is not less than sizeB, one of the CS _1,2 needs to flip his evaluation locally to take the "NOT" of the evaluation. /(I)

In a second method, the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new value of one bit in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, generates a new value of one bit in the second encryption degree distribution information according to the second encryption comparison result, including:

The second approach is based on "bit decomposition" of the secret shared field. Specifically, the most significant bit (most significant bit, hereinafter denoted as msb) of the complement of a number x, which may represent the positive and negative properties of x (i.e., msb=0 then x+.0, otherwise msb=1). Given two numbers a and B of complement representations, which may be represented as two ciphertext sharing shares of one number, are held by computing terminals CS ₁ and CS ₂, respectively. The most significant bit of a + B can be safely calculated by a custom parallel prefix addition circuit. A custom 8-bit parallel prefix addition circuit is shown in figure 6.

Given a ciphertext held by each of the computing terminals CS ₁ and CS ₂ Cloud servers CS ₁ and CS ₂ may first locally decompose < x > ₁ and < x > ₂ into bit data: < x > _i＝x_i[1],…,x_i [ k ], i ε {1,2}. The computing terminals CS ₁ and CS ₂ then input the self-held bits into the customized parallel prefix adding circuit to safely execute the exclusive OR gate/>AND gate"Calculation". As described above, exclusive OR/>"And/>"Natively supported in boolean secret sharing". So the cloud servers CS ₁ and CS ₂ can safely calculate the most significant bit of ciphertext data to obtain privacy inputMagnitude relation to 0. Based on the parallel prefix addition circuit, the cloud servers CS ₁ and CS ₂ can calculateI.e. output/>If accu is not less than sizeB, otherwise/>The requirement CS _1,2 output/>, in Algorithm 4If accu. Gtoreq. SizeB, one of the CS _1,2 needs to flip his evaluation locally to take the "NOT" of the evaluation.

From the foregoing description, it can be seen that for each bit in the bucket map, there are two cases: when the sum of the first accumulator and the second accumulator is 0 or 1, when the sum is 1, the accumulator needs to be emptied, when the sum is 0, the accumulator needs not to be emptied, and in order to realize the emptying or non-emptying of the accumulator in the ciphertext domain, that is, the first accumulator and the second accumulator are respectively updated, but the first computing terminal and the second computing terminal cannot infer whether the sum of the first accumulator and the second accumulator is emptied or not, according to the method provided by the embodiment, after the first encryption degree distribution information and the second encryption degree distribution information are obtained, one of the two computing terminals performs the inversion on the latest first bit of the locally held encryption degree distribution information, for example, the first computing terminal performs the inversion on the latest first bit value in the first encryption degree distribution information, and if the latest first bit in the barrel map is 1, the sum (exclusive OR gate operation result) of the second encryption degree distribution information cannot be inferred to be 0, and if the latest bit in the barrel map is 0, the latest bit in the first computing terminal performs the arithmetic on the latest bit value in the first computing terminal and the second encryption degree distribution information, and the first computing terminal is not subjected to the first encryption state.

Referring to fig. 1 again, the method for decomposing the data features of the distributed graph for privacy protection provided in this embodiment further includes the steps of:

s400, the target graph node determines a target interval of the target graph node according to the received first encryption degree distribution information and the second encryption degree distribution information, determines target sampling sensitivity according to boundary information of the target interval, samples noise from Laplacian distribution according to the target sampling sensitivity, adds a false triplet in the target combination according to the noise, and generates a target set, wherein the weight value in the false triplet is 0;

S500, the target graph node encrypts the target set based on the additive secret sharing to obtain a first encryption set and a second encryption set, the first encryption set is sent to a first computing terminal, and the second encryption set is sent to a second computing terminal.

The sum of the first encryption degree distribution information and the second encryption degree distribution information which are respectively locally held by the first computing terminal and the second computing terminal is the barrel mapping, the first computing terminal and the second computing terminal send the locally held encryption degree distribution information to nodes in the graph, then the nodes in the graph decrypt to obtain the barrel mapping of the plaintext, and then encrypt local graph data which are held by the first computing terminal and the second computing terminal based on the barrel mapping. As shown in fig. 7, each user U _i encrypts its partial graph data using Algorithm 5 after the encrypted bucket map is decrypted at the user side. The main point to note here is that the noise n _i sampled from the laplace distribution may be negative, meaning that the user U _i needs to delete some edges. Obviously, this will seriously affect the accuracy of the subsequent feature decomposition. To solve this problem, in the present embodiment, each user U _i truncates n _i (i.e., line 4 of Algorithm 5). After that, we useRepresenting a set of real and false edges,/>The partial graph data after user U _i adds a false edge with a weight of 0 is represented. Finally, user U _i applies ASS to the weights A [ i, j ] of each (real or false) edge to obtain the final local graph data ciphertextAnd/>, the share of the ciphertextAnd/>To CS ₁ and CS ₂, respectively.

Compared with each element in the local diagram data A [ i, ] of each user directly encrypted, the scheme for encrypting the local diagram data can save 90% of ciphertext storage space, and can save 80% of online communication and 50% of calculation time in subsequent feature decomposition.

S600, the first computing terminal and the second computing terminal conduct feature decomposition on the global graph data according to the first encryption set and the second encryption set corresponding to each node in the global graph.

The first computing terminal receives the first encryption set sent by each node in the global graph, and the second computing terminal receives the second encryption set sent by each node in the global graph, and performs feature decomposition on global graph data in a ciphertext domain. The following first describes a graph data feature decomposition process of the plaintext:

Given an N x N adjacency matrix, the complexity of performing a complete feature decomposition on it is N ³. This is unnecessary because most graph analysis applications only require the eigenvalues and eigenvectors of top-k (k is much smaller than N). Thus, in a practical application scenario, given a large-scale adjacency matrix A, in order to calculate its top-k eigenvalue and eigenvector, the first step is to reduce its dimension from N to M (M is usually slightly larger than k) to generate a new small matrix To be further processed. The most popular dimension reduction algorithms are the Arnoldi Algorithm (Algorithm 1, pseudo-code shown in FIG. 8) and the Lanzcos Algorithm (Algorithm 2, pseudo-code shown in FIG. 9), which work on asymmetric and symmetric matrices, respectively. After dimension reduction, a new small matrix/>, is typically calculated using the QR algorithmIs described, and the feature vector is described. Finally, a small matrix/>The eigenvalues and eigenvectors of top-k of the original matrix A are represented; and/>Feature vector/>, corresponding to the feature value of top-k of (a)(Each column vector of the matrix is/>Can be determined by the formula/>And converting the characteristic vector V corresponding to the characteristic value of top-k of the original matrix A, wherein the matrix P is obtained from 11 th rows in Algorithm 1 and Algorithm 2. The clear text QR algorithm is described below.

The QR algorithm proceeds in an iterative fashion. Formally, given a target matrix L, let T ₀ =l, in the kth iteration (k e1, k ]), the calculation result T _k-1 of the previous iteration is input, and a QR decomposition T _k-1＝Q_k-1R_k-1 can be calculated, where Q _k-1 is an orthogonal matrix, R _k-1 is an epinastine matrix, and T _k＝R_k-1Q_k-1 is output. When the QR algorithm ends, the diagonal elements of the output matrix T _K of the last iteration are eigenvalues of the target matrix L, and the matrix s=q ₁...Q_K is all eigenvectors of the target matrix L (one eigenvector for each column). One QR decomposition may be accomplished using Givens rotation. Formally, given an M.times.M epinastine matrix T _k-1, an orthogonal Givens rotation matrix G _i, i.e. [1, M-1] can be created.

Wherein,And/>H (1) =t _k-1. At the end of this QR decomposition,

In addition, Q _k-1＝G₁...G_M-1. FIG. 11 illustrates the process of performing a QR decomposition of a 4*4 epinastine matrix using a series of Givens rotation matrices.

The first computing terminal and the second computing terminal perform feature decomposition on the global graph data according to the first encryption set and the second encryption set corresponding to each node in the global graph, including:

The second calculation formula is as follows:

wherein y '_n represents the calculation result of the reciprocal of the root of the nth iteration calculation, and x' represents the number of roots to be opened.

Taking Arnoldi algorithm as an example, the process of performing the dimension reduction operation on the ciphertext domain by the first computing terminal and the second computing terminal is described below. Looking at the operations at lines 1-7 of FIG. 8, both of which consist of addition and multiplication, are naturally supported in the additive secret sharing domain. However, how to perform the operations of lines 8,9 in the ciphertext domain is challenging, because they require a square root operation (L ₂ norms require square root opening) and a division operation, respectively.

In this embodiment, the square root operation and the division operation are decomposed into a series of addition and multiplication operations supported in the ciphertext domain using a method of approximation. Specifically, to calculate square root in the ciphertext domainFirst approximately calculate the reciprocal of square root/>I.e.

Wherein y '_n represents the calculation result of the reciprocal of the root of the nth iteration calculation, x' represents the number of roots to be opened, which converges the iteration toClearly, both subtraction and multiplication are natively supported in the secret sharing domain. In addition, to obtain a faster convergence speed, an initial value may be used

y'₀＝3e^0.5-x'+0.003。

Can then calculateObtain/>Division/>, for ciphertext domainThe main challenge is to calculate the reciprocal/>However, the reciprocal in the division operation of Algorithm 1 (line 9) is/>Which is the inverse of the calculation result (square root) of the eighth line. Therefore, the result of the calculation of the reciprocal of the square root described above can be directly calculated/>As/>Line 9 can be completed in the ciphertext domain by simply multiplying p _k. Thus, algorithm 1 may be performed in its entirety in the ciphertext domain. The Arnoldi algorithm for a particular ciphertext domain may be as shown in FIG. 10. For other secure computation methods of the dimension reduction algorithm, such as the Lanczos method, the operation that needs to be securely performed is the same as the secure arnold method, and the algorithm description of the secure Lanczos method is omitted here.

The QR algorithm mainly consists of matrix multiplication, and for multiplication between two M matrices, the direct method in the secret sharing domain is element-by-element multiplication, which requires M ³ multiplications and requires 2M ³ elements to be communicated between two servers. That is, the multiplication tuples required for the multiplication between two matrices may be vectorized to z=xy, where X and Y play the role of masking (masked) the two input matrices during the secure multiplication. Specifically, for two ciphertext matrices of size N x NAndIf one wants to calculate the ciphertext matrix product/>, between themExisting secret sharing protocols are to configure a pair of multiplication tuples for each multiplication in the matrix, but not for each element (i.e./>) So 3N ³ elements need to be prepared in advance (i.e. two N x N matrix multiplications require N ³ multiplications, one multiplication tuple for each multiplication), but this approach is not efficient and unnecessary. The multiplication tuple vectorization adopted in the present embodiment means that independent multiplication tuples are not directly randomly generated, but a multiplication tuple matrix/>After that, ciphertext matrix multiplication may be directly performed. The two cloud servers P _i∈{0,1} perform matrix operations < E > _i＝<A>_i-_i and < F > _i＝_i-<V>_i using their own secret shares, respectively. Each party P _i then sends < E > _i and < F > _i to each other to obtain the E and F in plaintext. Finally, P _i, the product ciphertext held by i ε {0,1} is < C > _i＝i×E×F+F×_i+E×<V>_i+<E>_i. It can be found that the two parties to the process need only communicate 2N ² elements online, i.e., each sent to the other two matrices < E > _i and < F > _i. At the same time the multiplication tuples that need to be prepared in advance become/>I.e. 3N ² elements.

The pseudo code of the QR algorithm secured in this embodiment is shown in fig. 12. Its inputOutput/>And/>Reviewing the feature decomposition of plaintext described in the foregoing,/>Top-k (largest k) of diagonal elements of (a) are the original matrix/>Is a characteristic value of top-k of (c). And/>Is a small matrix/>Through the formulaWhich can be converted into an original matrix/>Of (3), wherein/>Is the output matrix of the safe dimension reduction algorithm.

Further, the inventors have found that at each Givens rotationOr H (i) G _i, only the i-th and i+1-th rows of H (i) are updated. Therefore, to save overhead, the Givens rotation matrix G _i can be simplified from equation (1) toFIG. 13 illustrates the completion of a QR decomposition using the optimized 4*4 Givens rotation matrix. It can be seen that a large number of multiplications can be saved compared to fig. 11. Similarly, in the calculation/>(I.e., line 15 of Algorithm 7), G _i may also be reduced to G _i. After simplification, the Givens rotation matrix is multiplied in a traversal, as shown in FIG. 13, givens rotation matrix g ₁ is multiplied by the four elements of 2 x 2 on the top left of the H matrix, H [1:2,1:2], then by the next set of elements, H [1:2,2:3], H [1:2,3:4]. Thereafter, givens rotation matrices are updated, resulting in g ₂ for the next row, i.e., H [2:3,1:2], [2:3,2:3], H [2:3,3:4], and so on (see shaded portion in FIG. 13).

It should be noted that, the matrix 4*4 is an example, and in practical application, H may be any dimension, and the Givens matrix 2×2 may be used to update by multiplying in the above manner.

After the above simplification, it can be found that the simplified Givens rotation matrixMultiple multiplications with two rows of elements in the large matrix H are required, i.e. repeated use in multiple multiplications. Thus, coupled multiplication tuples may be used to save traffic between two computing terminals. For example, assume that ciphertext matrix/>Multiplying by ciphertext matrixOnly one random matrix/>, needs to be usedDe-masking (mask)/>Without the necessity of using k different random matrices. Thus, when multiple multiplications need to be performed by the same/>As a multiplier, only one random matrix need be used in the present invention to mask it. Finally,/>Transposed matrix/>The computing terminals CS ₁ and CS ₂ can be enabled to locally transpose the own secret sharing shares < g _i>₁ or < g _i>₂, so that communication overhead is further saved.

After the above optimization, the secure QR algorithm provided in this embodiment may obtain a great performance improvement. Specifically, the underlying secure QR Algorithm requires computing terminals CS ₁ and CS ₂ to communicate 6K (M-1) M ² elements online (ignoring the traffic needed to approximate the square root because it is not optimized; K and M correspond to K and M in Algorithm 7, respectively), while the optimized secure QR Algorithm only requires communicating K (M-1) (6M+4) elements online. According to experiments, compared with the basic safe QR algorithm, the optimized safe QR algorithm can save up to 97% of online communication and 9.3% of calculation time.

In summary, this embodiment provides a method for decomposing the characteristics of distributed graph data for privacy protection, in the method, graph nodes holding local graph data encrypt their own degree information and send the encrypted degree information to a first computing terminal and a second computing terminal, where the first computing terminal and the second computing terminal cooperatively generate first encryption degree distribution information and second encryption degree distribution information in a ciphertext domain, so that the graph nodes determine a target interval to which their own degree belongs, and then select appropriate sampling sensitivity sampling noise, edges with weight value of 0 false are added in a real graph adjacent matrix, sparse representation of the matrix is realized in the form of a matrix triplet, and encrypted characteristic decomposition is performed on the adjacent matrix with false edges added, so that on the premise of protecting the privacy of the nodes, the sparsity of the graph data is maintained, the effectiveness of the characteristic decomposition is ensured, and the characteristic decomposition of the distributed special data for privacy protection is realized.

It should be understood that, although the steps in the flowcharts shown in the drawings of the present specification are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in the flowcharts may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order in which the sub-steps or stages are performed is not necessarily sequential, and may be performed in turn or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Example two

Based on the embodiment, the invention also correspondingly provides a privacy-preserving distributed graph data characteristic decomposition system, which comprises a target graph node, a first computing terminal and a second computing terminal; the target graph node, the first computing terminal and the second computing terminal are configured to cooperatively execute relevant steps in the distributed graph data feature decomposition method for privacy protection in the first embodiment.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for decomposing a feature of privacy-preserving distributed graph data, the method comprising:

2. The privacy-preserving distributed graph data feature decomposition method of claim 1, wherein the target graph node comprises, prior to encrypting the degrees of the target graph node based on functional secret sharing:

3. The privacy-preserving distributed graph data feature decomposition method of claim 1, wherein the target graph node encrypts the degree of the target graph node based on function secret sharing to obtain first encryption degree information and second encryption degree information, comprising:

4. The privacy-preserving distributed graph data feature decomposition method of claim 3, wherein the first computing terminal and the second computing terminal generate first encryption degree distribution information and second encryption degree distribution information of global graph data from the first encryption degree information and the second encryption degree information transmitted by the plurality of target graph nodes, comprising:

5. The privacy-preserving distributed graph data feature decomposition method of claim 4, wherein each bit value in the first and second encryption degree distribution information is 0 or 1; the first computing terminal and the second computing terminal determine the first encryption degree distribution information and the second encryption degree distribution information according to the first encryption degree histogram information and the second encryption degree histogram information, and the method comprises the following steps:

6. The method according to claim 5, wherein the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, generates a new value in the second encryption degree distribution information according to the second encryption comparison result, and includes:

7. The method according to claim 5, wherein the first computing terminal obtains a first encryption comparison result according to the first accumulator, generates a new value in the first encryption degree distribution information according to the first encryption comparison result, and the second computing terminal obtains a second encryption comparison result according to the second accumulator, generates a new value in the second encryption degree distribution information according to the second encryption comparison result, and includes:

8. The method for feature decomposition of privacy-preserving distributed graph data according to claim 1, wherein the first computing terminal and the second computing terminal perform feature decomposition on the global graph data according to the first encrypted set and the second encrypted set corresponding to each node in the global graph, and the method comprises:

The second calculation formula is as follows:

9. The privacy preserving distributed graph data feature decomposition method of claim 8, wherein the first computing terminal and the second computing terminal perform a QR algorithm on the dimension reduction matrix based on additive secrets, comprising:

10. A privacy-preserving distributed graph data feature decomposition system, wherein the system comprises a target graph node, a first computing terminal and a second computing terminal; the target graph node, the first computing terminal and the second computing terminal cooperate to complete the privacy-preserving distributed graph data feature decomposition method according to any one of claims 1-9.