CN111737740A - Multi-party sequence data issuing method and system meeting difference privacy - Google Patents

Multi-party sequence data issuing method and system meeting difference privacy Download PDF

Info

Publication number
CN111737740A
CN111737740A CN202010541485.0A CN202010541485A CN111737740A CN 111737740 A CN111737740 A CN 111737740A CN 202010541485 A CN202010541485 A CN 202010541485A CN 111737740 A CN111737740 A CN 111737740A
Authority
CN
China
Prior art keywords
sequence
terminal
node
data
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010541485.0A
Other languages
Chinese (zh)
Other versions
CN111737740B (en
Inventor
唐朋
郭山清
鞠雷
刘高源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010541485.0A priority Critical patent/CN111737740B/en
Publication of CN111737740A publication Critical patent/CN111737740A/en
Application granted granted Critical
Publication of CN111737740B publication Critical patent/CN111737740B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The utility model provides a multi-party sequence data issuing method and system meeting the difference privacy, belonging to the technical field of data processing.A data owner preprocesses data, adds head and tail character identification, sequence length truncation, character type statistics and the like for each sequence; under the condition of differential privacy, a data owner and a third party utilize a batch processing method, start from the zeroth layer, utilize a node splitting discrimination protocol to split and judge all nodes of each layer, and split nodes with scores exceeding a certain threshold value until a predicted suffix tree is constructed; and generating a new group of data for publishing by a third party according to the constructed predicted suffix tree. The multi-party sequence data set issuing method meeting the differential privacy can issue the sequence data set with higher data utility while meeting the differential privacy protection, and effectively reduces the communication overhead.

Description

Multi-party sequence data issuing method and system meeting difference privacy
Technical Field
The disclosure relates to the technical field of data processing, in particular to a multi-party sequence data issuing method and system meeting differential privacy.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Sequence data is a common type of data.Given an alphabet list
Figure BDA0002539084090000011
One length of lenCan be represented as
Figure BDA0002539084090000013
Wherein xiIs composed of
Figure BDA0002539084090000012
One symbol (element) in (1). The sequence data common in life comprises the travel track of citizens, the browsing records of netizens and the like. At present, a traditional sequence data issuing method based on a differential privacy technology mainly solves the problem of issuing sequence data in a single-party scene. In a single party scenario, a single data owner owns all of the sequence data, and the data owner publishes its set of sequence data under differential privacy conditions. Based on a prefix tree model, technicians provide a sequence data issuing method meeting the difference privacy. The method comprises the steps of constructing a prefix tree model by using original sequence data under a differential privacy condition, and then generating new sequence data by using the model; based on a variable-length n-gram model, researchers provide a sequence data issuing method meeting the difference privacy. According to the method, an n-gram model is constructed by using original sequence data under a differential privacy condition, and then new sequence data is generated by using the constructed n-gram model. However, for both of the above approaches, if the depth of the model being built is too great, it may result in the resulting data set being less effective. To solve this problem, researchers have proposed an optimized sequence data set distribution method PrivTree based on a prediction suffix tree model. The method provides a new Laplace mechanism by utilizing the property that the statistical information of the nodes in the prediction suffix tree has monotonicity. The mechanism can enable the size of the Laplace noise added into the non-leaf nodes of the suffix tree to be independent of the depth of the tree, so that the size of the noise is obviously reduced, and the effectiveness of the issued sequence data set is improved.
However, the inventor of the present disclosure finds that, in a multi-party scenario, data respectively belongs to multiple data owners, and during a process that multiple data owners publish multiple sets of local data sets together, the overall published data is prone to reveal individual sensitive information in each local data set, and meanwhile, each data owner may also reveal individual sensitive information in its own local data set to other data owners.
Disclosure of Invention
In order to solve the defects of the prior art, the data owner and the data publisher construct a prediction suffix tree together under the condition of differential privacy, and then the data publisher generates a group of new data for publishing according to the constructed prediction suffix tree.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
the first aspect of the disclosure provides a multi-party sequence data publishing method meeting differential privacy.
A multi-party sequence data issuing method meeting difference privacy is applied to a first terminal and comprises the following steps:
preprocessing the held data sequence;
receiving a predicted suffix tree and a node queue which are sent by a second terminal and only comprise root nodes, judging whether nodes in the node queue need to be split or not by adopting a batch processing mode under the condition of meeting differential privacy, and sending a judgment result to the second terminal so that the second terminal obtains the final structure of the predicted suffix tree;
and under the condition of meeting the difference privacy, calculating a prediction histogram of the node to obtain parameters of a prediction suffix tree, and sending the parameters to the second terminal, so that the second terminal generates a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
A second aspect of the present disclosure provides a data providing apparatus.
A data providing device comprising a processor communicatively coupled to an external second terminal, the processor configured to:
preprocessing the held data sequence;
receiving a predicted suffix tree and a node queue which are sent by a second terminal and only comprise root nodes, judging whether nodes in the node queue need to be split or not by adopting a batch processing mode under the condition of meeting differential privacy, and sending a judgment result to the second terminal so that the second terminal obtains the final structure of the predicted suffix tree;
and under the condition of meeting the difference privacy, calculating a prediction histogram of the node to obtain parameters of a prediction suffix tree, and sending the parameters to the second terminal, so that the second terminal generates a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
The third aspect of the disclosure provides a multi-party sequence data publishing method meeting differential privacy.
A multi-party sequence data issuing method meeting difference privacy is applied to a second terminal and comprises the following steps:
initializing a prediction suffix tree only containing a root node, initializing a node queue for storing nodes which are not traversed, and inserting the root node into the queue;
receiving a node splitting judgment result sent by a first terminal, and obtaining a final structure of the prediction suffix tree when all nodes are split;
and receiving a prediction histogram of a node sent by the first terminal to obtain parameters of a prediction suffix tree, and generating a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
A fourth aspect of the present disclosure provides a multi-party sequence data issuing apparatus satisfying differential privacy.
A multi-party sequence data dissemination device satisfying differential privacy comprising a processor communicatively coupled to a first terminal, the processor configured to:
initializing a prediction suffix tree only containing a root node, initializing a node queue for storing nodes which are not traversed, and inserting the root node into the queue;
receiving a node splitting judgment result sent by a first terminal, and obtaining a final structure of the prediction suffix tree when all nodes are split;
and receiving a prediction histogram of a node sent by the first terminal to obtain parameters of a prediction suffix tree, and generating a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
The fifth aspect of the present disclosure provides a multiparty sequence data distribution method satisfying differential privacy.
A multi-party sequence data publishing method meeting differential privacy comprises the following steps:
each first terminal preprocesses the held data sequence and keeps the preprocessed data sequence at the first terminal;
secondly, initializing a prediction suffix tree only comprising a root node, initializing a node queue for storing nodes which are not traversed, and inserting the root node into the queue;
the second terminal is combined with the first terminal, whether the nodes in the node queue need to be split or not is judged in a batch processing mode under the condition that differential privacy is met, and the first terminal sends a judgment result to the second terminal;
when all the nodes in the node queue are completely split, the second terminal obtains the final structure of the prediction suffix tree;
the second terminal is combined with the first terminal, under the condition that the difference privacy is met, the prediction histogram of the node is calculated, the parameter of the prediction suffix tree is obtained, and the parameter is sent to the second terminal;
the second terminal generates a new set of overall sequence data based on the structure and parameters of the predicted suffix tree.
A sixth aspect of the present disclosure provides a multiparty sequence data distribution system satisfying differential privacy.
A multi-party sequence data issuing system meeting differential privacy comprises at least two first terminals and at least one second terminal, wherein each first terminal is in communication connection with the second terminal;
each first terminal preprocesses the held data sequence and keeps the preprocessed data sequence at the first terminal;
the second terminal initializes a prediction suffix tree only containing a root node, initializes a node queue for storing nodes which are not traversed, and inserts the root node into the queue;
the second terminal is combined with the first terminal, whether the nodes in the node queue need to be split or not is judged in a batch processing mode under the condition that differential privacy is met, and the first terminal sends a judgment result to the second terminal;
when all the nodes in the node queue are completely split, the second terminal obtains the final structure of the prediction suffix tree;
the second terminal is combined with the first terminal, under the condition that the difference privacy is met, the prediction histogram of the node is calculated, the parameter of the prediction suffix tree is obtained, and the parameter is sent to the second terminal;
the second terminal generates a new set of overall sequence data based on the structure and parameters of the predicted suffix tree.
Compared with the prior art, the beneficial effect of this disclosure is:
1. according to the method, the device or the system, the first terminal and the second terminal are firstly used for jointly constructing the predicted suffix tree under the differential privacy condition, and then the second terminal generates a group of new data for publishing according to the constructed predicted suffix tree, so that the problem that a data collector in a traditional data publishing method can steal and reveal sensitive information of a user is solved, and meanwhile, the communication overhead is effectively reduced.
2. Compared with the traditional sequence data issuing method meeting the differential privacy, the method, the device or the system expands the traditional single-party scene to the multi-party scene, ensures that the personal privacy data in the local data set of each participant can not be obtained by other participants, and effectively solves the problem of personal sensitive information leakage in the multi-party data fusion process.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a flowchart of a publishing method provided in embodiment 1 of the present disclosure.
Fig. 2 is a schematic diagram of a predicted suffix tree provided in embodiment 1 of the present disclosure.
Fig. 3 is an example of a node task ranking result provided in embodiment 1 of the present disclosure.
Fig. 4 is an example of generating blocks in batch processing provided in embodiment 1 of the present disclosure.
Fig. 5 is a comparison graph of data utility of the method provided in embodiment 1 of the present disclosure and three methods, Independent, PrivTree, and nonprivacy, under different privacy budgets.
Fig. 6 is a comparison graph of data utility of the method provided in embodiment 1 of the present disclosure and the Independent and PrivTree methods under different numbers of participants.
Fig. 7 is an effect diagram of an improved node splitting discrimination protocol provided in embodiment 1 of the present disclosure.
Fig. 8 is a comparison diagram of the running time of the batch processing method and the node method for node splitting according to embodiment 1 of the present disclosure.
Fig. 9 is a schematic structural diagram of a distribution system provided in embodiment 6 of the present disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example 1:
as shown in fig. 1, an embodiment 1 of the present disclosure provides a multi-party sequence data distribution method satisfying differential privacy, applied to a first terminal (data owner), including the following steps:
preprocessing the held data sequence, and reserving the preprocessed data sequence at the first terminal;
receiving a predicted suffix tree and a node queue which are sent by a second terminal and only comprise root nodes, judging whether nodes in the node queue need to be split or not by adopting a batch processing mode under the condition of meeting differential privacy, and sending a judgment result to the second terminal so that the second terminal obtains the final structure of the predicted suffix tree;
and under the condition of meeting the difference privacy, calculating a prediction histogram of the node to obtain parameters of a prediction suffix tree, and sending the parameters to the second terminal, so that the second terminal generates a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
The detailed steps are as follows:
step S1: data owner according to a reasonable sequence length threshold lenmaxAdding beginning character $ and ending character $ to the beginning and the end of the original sequence&(ii) a Second pair sequence length exceeds threshold lenmaxThe sequence of (a) is truncated.
For example: one piece of original sequence data is S ═ x1x2…xnConvert it into
Figure BDA0002539084090000071
Judgment of n +2 and lenmaxIf n +2>lenmaxThen cut off into
Figure BDA0002539084090000072
Otherwise, the original sequence is reserved.
In detail, in the sequence data distribution problem satisfying the differential privacy, the noise injection amount of the distribution method is proportional to the longest sequence length of the original data, but the too large noise injection amount may reduce the effectiveness of the distribution data. Therefore, the sequence length of the original data set is reasonably limited, and the information loss caused by the change of the sequence length is reduced as much as possible, so that the aim of reducing the noise injection amount is fulfilled.
Step S2: the third party initializes a v containing only root nodes according to the information provided by the K data owners1And initializing a node queue Q for storing unretraversed nodes, and applying v1A queue is inserted.
Step S3: and carrying out the judgment on the queue Q. If Q is null, the construction of the predicted suffix tree tau structure is finished, and the step S5 is continuously executed; otherwise, it indicates that there are nodes that need to be split, and the step S4 is continuously executed.
Step S4: and judging whether the nodes in the Q need to be split or not by adopting a batch processing mode by the third party and the K data owners under the condition of meeting the differential privacy. Specifically, the third party takes out a certain number of nodes from the Q, and the third party and the K data owners jointly execute a node splitting discrimination protocol and send the discrimination result to the third party; the third party splits the nodes needing splitting according to the judgment result and puts the split new nodes into Q; and then returns to step S3.
Step S5: parameters are filled in the predicted suffix tree structure generated in S4. That is, for each node, the data owner and the third party utilize under the condition that the differential privacy is satisfied
Figure BDA0002539084090000081
Calculating a suffix histogram of v, hist (v), wherein the prediction histogram of a node is of a length of
Figure BDA0002539084090000082
A corresponding set of elements in the vector
Figure BDA0002539084090000083
One symbol of (2).
Step S6: the third party generates a new set of release data sets D' based on the generated predicted suffix tree tau, and
Figure BDA0002539084090000084
for step S2, the Prediction Suffix Tree (PST) is a Markov model commonly used to characterize the statistics of sequence data. In a PST, each node v is associated with its prediction sequence dom (v) and prediction histogram hist (v). In particular, dom (v) is a collection of
Figure BDA0002539084090000091
The symbol of (1), hist (v) is a symbol of length
Figure BDA0002539084090000092
Each element in the vector is a corresponding set
Figure BDA0002539084090000093
Is marked as hist (v) x]。
For example, FIG. 2 depicts a PST constructed on a data set D. In a PST, a set is added at the front of the predicted sequence of each node v
Figure BDA0002539084090000094
One symbol, one child node v' of v can be obtained. Thus, for each child node v 'of v, dom (v) is a suffix to dom (v').
With dom (v)1) The node of phi is the root node v1From v1Initially, the data owner and the third party iteratively split the nodes. Specifically, the data owner and third parties determine whether the score of the current node is greater than a given threshold. Such asIf the score of the node is larger than a given threshold value, splitting the node to obtain
Figure BDA0002539084090000095
A child node, and by being in dom (v)1) Front end insertion of
Figure BDA0002539084090000096
Obtaining a predicted character string of a child node by the middle symbol; otherwise, the node is regarded as a leaf node.
For step S4, taking the splitting node v as an example, specifically:
step S4.1: for K data owners, each data owner PkFrom its own local data set DkComputing
Figure BDA0002539084090000097
Wherein: sum 'of'ik=∑n≠lhist(v)[xn]-depth(v)*/K+Lap(λ)k,θ′k=(θ-)/K+Lap(λ)k
Wherein the content of the first and second substances,
Figure BDA00025390840900000910
1for the privacy budget allocated in the PST structure construction step, Lap (lambda)k=gk-hk,gkAnd hkIs formed by PkTwo independent random variables are generated and distributed according to Gamma, and the density function is as follows:
Figure BDA0002539084090000099
to meet the differential privacy protection requirement, the privacy budget is equally divided into two1And2i.e. by122, and will1Assigned to the PST structure construction step, will2Allocating to PST parameter acquisition step;
step S4.2: and circularly calling a minimum value protocol by the K data owners and the third party to obtain:
Figure BDA0002539084090000101
in order to meet the requirements of privacy protection,
Figure BDA0002539084090000102
split into the sum of K terms, i.e.:
Figure BDA0002539084090000103
data owner DKMaster the knowledge
Figure BDA0002539084090000104
Other data owner DiMaster si=-riWherein r isiData owner PiA random number that is locally generated and known only to itself. (ii) a
Step S4.3: the K data owners and the third party call a maximum value protocol to obtain:
Figure BDA0002539084090000105
the result is split into the sum of K terms, namely:
Figure BDA0002539084090000106
and assign them to K data owners;
step S4.4: data owner PkRegenerating the random number rkAnd performing encryption E (r)k);
Step S4.5: each data owner computing together
Figure BDA0002539084090000107
Step S4.6: data owner PkComputing
Figure BDA0002539084090000108
Step S4.7: each data owner computing together
Figure BDA0002539084090000109
And jointly decipher to obtain
Figure BDA00025390840900001010
If it is not
Figure BDA00025390840900001011
Splitting the node; otherwise, the node does not need to be split.
For the minimum value protocol of step S4.2, specifically:
step S4.2.1: initialization variable s1=c11,…,sK=c1K
Step S4.2.2: data owner PkRegenerating the random number rkAnd performing encryption E (r)k);PkComputing
Figure BDA00025390840900001012
Wherein
Figure BDA00025390840900001013
From Pk-1Is sent to PkUp to PKTo obtain
Figure BDA00025390840900001014
Step S4.2.3: data owner PkComputing
Figure BDA0002539084090000111
And sent to the third party and calculated by the third party
Figure BDA0002539084090000112
Through common decryption to obtain
Figure BDA0002539084090000113
Step S4.2.4: if it is
Figure BDA0002539084090000114
Calculate E (g)1)=1-|r|/r,E(g1) 1+ | r |/r; otherwise, calculate E (g)1)=1+|r|/r,E(g1)=1-|r|/r;
Step S4.2.5: data owner PkComputing
Figure BDA0002539084090000115
Figure BDA0002539084090000116
And sends a third party, ukIs a newly generated random number;
step S4.2.6: third party computing
Figure BDA0002539084090000117
Figure BDA0002539084090000118
All parties decrypt together to obtain
Figure BDA0002539084090000119
Figure BDA00025390840900001110
Step S4.2.7: updating s1=-u1,…,sK-1=-uK-1,sK=temp/2-uK
For step S5, specifically:
step S5.1: for leaf nodes v, to meet the differential privacy protection requirements, data owners and third parties utilize
Figure BDA00025390840900001111
The suffix histogram hist (v) of v is calculated. Wherein the prediction histogram of a node has a length of
Figure BDA00025390840900001112
A corresponding set of elements in the vector
Figure BDA00025390840900001113
To satisfy the differential privacy protection requirement, each element of hist (v) will be injected with an amount of noise η, subject to a laplace distribution, that scales by 1 ÷ or2I.e. η ═ Lap (1 ═ Lan-2) Let the suffix histogram containing noise be written as
Figure BDA00025390840900001114
Step S5.2: for a non-leaf node v', using the suffix histogram of its child node, calculate the suffix histogram of that node
Figure BDA00025390840900001115
In detail, the present invention is described in detail,
Figure BDA00025390840900001116
v is any child node of v'.
For step S6, specifically:
step S6.1: computing
Figure BDA00025390840900001117
Setting a counter for the total number of sequence pieces to be generated;
step S6.2: the third party first initializes a sequence s0$, then go to s in an iterative manner0Inserting symbols;
step S6.3: during the ith iteration, the third party has obtained the subsequence si-1And determining the predicted character string as s in taui-1According to the probability distribution Pr [ xi=x]=hist(v)[x]/||hist(v)||1From a set by a third party
Figure BDA0002539084090000124
In the selection of the symbol xiAnd x isiIs inserted into si-1Thereby obtaining a sequence si
Step S6.4: if xiIs composed of&Then s williRegarding the sequence as a sequence, and ending the generation of the sequence; otherwise, step S6.3 is continued.
Step S6.5: judging whether the counter is full, if not, continuing to execute the steps S6.2-S6.4; otherwise, the generation of the publishing data set D' is finished.
The scheme for judging the splitting of the batch processing nodes in the embodiment is further described in terms of solving the problem of excessive communication overhead.
For the traditional method of performing the splitting judgment by using a single node, the sequential manner brings too much communication and calculation cost. Specifically, if the fan-out of the PST is equal to l, then the number of nodes in the PST will reach (l)h-1)/(l-1), h being the height of the PST. When each node is judged to be split, l +1 communication rounds are needed. Therefore, the total number of rounds to construct a complete PST will be as high as (l)h-1)/(l-1)*(l+1)≈lhThis results in significant communication and computational costs. For the above problem, the present embodiment is specifically discussed in two aspects, which are specifically as follows:
first, in the node splitting determination, the minimum value needs to be selected from the l sums. To this end, these sums can first be divided into
Figure BDA0002539084090000121
Pairs, with the smaller sum being selected from each pair simultaneously. Then, these are combined
Figure BDA0002539084090000122
Selected and divided into
Figure BDA0002539084090000123
Pairs and select the smaller value from each pair until the minimum value is obtained. Thus, the communication turn of each node is reduced to
Figure BDA0002539084090000131
Taking fig. 3 as an example, the minimum value is obtained according to the above-described scheme.
Assuming that the leaf node is taken as i-8 sums, first in the first round, 4 smaller sums are selected from 4 pairs. In the second round, two smaller sums are selected from the two pairs. In the last round (i.e., round 3), the minimum sum is selected. The result of each selection is represented as C1,…,C7. Furthermore, nodes in different subtrees can be judged simultaneously, i.e. nodes at the same level. Taking the example shown in FIG. 2, assume that the tree is a PST tree, in pair C7Is subjected to resolution into C5、C6Then, can simultaneously pair C5、C6And (6) carrying out splitting judgment.
Based on the above discussion, a batch-based construction method is proposed. Judging whether each node needs to be split or not requires multiple interactions (a series of minimum and maximum calculation), and calling the calculation to be completed in each interaction as a task. Then, the multiple interactions can be viewed as a series of ordered tasks (tasks). In the batch processing scheme, on one hand, it is required to ensure that the number of tasks included in each batch is as same as possible, and on the other hand, required marking information (for example, which node each task corresponds to, the number of tasks of the node) is reduced as much as possible. In order to solve the above two problems, the present embodiment provides the following solutions:
in order to ensure that the number of tasks contained in each batch is as same as possible, the concept of 'block' is introduced, and a 'splicing' method is proposed, namely, each block contains a plurality of tasks, the tasks come from different nodes, the number of the tasks from the different nodes is different, and the total number of the tasks contained in the block is fixed. The specific description is as follows.
First, the tasks of each node are ordered. As shown in FIG. 3, a node contains 8 values m1,m2,…,m8To select the minimum value among the 8 values, 7 comparisons T are required1,T2,…,T7And thus the number of tasks of the node is 7. These 8 values and 7 tasks can form a tree with a depth of 4, which is composed of
Figure BDA0002539084090000132
And (4) calculating. Wherein, the leaf layer is a numerical value, and the non-leaf layer is a task. And setting the layer where the leaf nodes are located as the 3 rd layer and the layer where the root nodes are located as the 0 th layer. Analysis shows that for any path from a leaf to a root node, a task at the ith layer on the path must wait for a task at the (i +1) th layer to complete. Tasks from different nodes are then stitched together to form a block. Specifically, as shown in the left diagram of FIG. 4, this block contains tasks from layer 2 at the ith node, tasks from layer 1 at the (i-1) th node, and tasks from layer 0 at the (i-2) th node. Thus, the entire block contains 7 tasks from 3 nodes.
To reduce the required marking information, a "sliding" method is proposed, which is described in detail below.
For tasks from a certain node, their positions in the blocks are constantly sliding downward so that the task at the lowest end of any one block is the last task of a certain node. As shown in the right diagram of fig. 4, for consecutive 3 blocks, the task from the ith node is located at the top three positions in the 1 st block, two positions in the middle in the second block, and one position at the lowest end in the 3 rd block. Thus, by the number of the block and the position of the task in the block, it can be determined from which node each task is, the number of tasks of the node, thereby reducing the required marking information.
Through formal analysis, the multi-party sequence data distribution method (DPST) satisfying differential privacy in the present embodiment can provide higher utility of distributed data and lower communication overhead while satisfying differential privacy protection.
The comparison method is set for experiments, and after the experimental results are analyzed, the issued data of the proposed DPST method is determined to have better data utility, the communication overhead in the issuing process is lower, and the experimental methods are shown in Table 1.
Table 1: experimental methods
Figure BDA0002539084090000141
Figure BDA0002539084090000151
In order to better illustrate the advantages of the algorithm of the embodiment, for the test of the data utility, the accuracy (precision) and the sequence length distribution error (totalvanion distribution) of the frequent sequence mining result of top-k which are widely used measurement standards are adopted to compare the data utility of the data issued by the algorithm; for the test of the performance of the algorithm, namely the communication overhead, the operation time of the algorithm is compared and analyzed according to the same input environment. Each algorithm was run 100 times in duplicate for each set of experiments and the average of the results was recorded.
The data used in the experiment were from two real sets of data, and the specific characteristics of each database data are shown in table 2 below.
Table 2: data characteristics in the database.
Figure BDA0002539084090000152
Figure BDA0002539084090000161
The usability of the DPST algorithm is illustrated by analyzing experimental data below.
Firstly, the data utility of the data issued by each algorithm is experimentally analyzed, which is specifically shown in fig. 4 and 5.
Comparing the data utility of the method (i.e. DPST algorithm) described in this embodiment with that of the Independent, PrivTree, and nprivacy algorithms under different privacy budgets, wherein: the privacy parameters are set to {0.1,0.2,0.4,0.8,1.0,1.6}, respectively, and the number of data owners is fixed to 2.
The accuracy (precision) of top-k frequent sequence mining results for each algorithm release data is shown in (a) -5 in fig. 5; fig. 5 (e) to 5 (f) show the sequence length distribution error (total variation distance) of each algorithm release data. The precision and total variation distance of NoPrivacy are independent of privacy changes and represent the best results that can be achieved.
In all experiments, DPST can obtain the same good effect as PrivTree because DPST utilizes a node splitting protocol, i.e., a prediction tree can be constructed, and also PrivTree can be constructed, and the noise injection amount is the same for both methods. It can also be seen that DPST is superior to the Independent test in all cases because the Independent requires that each party inject a share of noise into the dataset to ensure that its dataset meets differential privacy, which results in poor data utility for the final integrated whole dataset.
The method of this example (i.e., DPST algorithm) was compared to the data utility of both Independent and PrivTree algorithms at different numbers of participants, where: the number of data owners is set to {2,3,4,6,8,10}, and the privacy parameter is fixed to 0.4.
The accuracy (precision) of top-k frequent sequence mining results for each algorithm release data is shown in (a) -6 in fig. 6; fig. 6(e) -6(f) show the sequence length distribution error (totalvanisation distance) of the published data of each algorithm. DPST achieved as good a utility as PrivTree in all experiments, and changing the number of participants had no effect on both methods. Furthermore, it can be seen that DPST is superior to Independent in all cases. Experimental results prove that independency is sensitive to the change of the number of participants, because the noise injection amount of an independency algorithm to a node is increased along with the increase of the number of participants, so that the performance of independency is worse than that of DPST.
Secondly, the performance effect of the improved node splitting discrimination protocol IMSP is evaluated by comparing the running time of the IMSP with the running time of the BMSP, as shown in fig. 7. Fig. 7 (a) shows the runtime (in seconds) of IMSP and BMSP at different fan-outs of PST, where the number of participants is set to 2; fig. 7 (b) shows the run times (in seconds) of the IMSP and BMSP in the scenario where the number of participants is different, with the fan-out of the PST set to 10. It can be observed that in all cases, the runtime of IMSPs is much less than BMSPs, and this difference becomes more pronounced as PST fan-out or number of participants increases. Meanwhile, the runtime of IMSP tends to scale linearly with the number of fanouts, while the runtime of BMSP grows quadratically with the number of fanouts.
The effectiveness of node splitting based on batch processing is then evaluated by comparing the run times of the BBM method and the NBM method on the data set, as shown in fig. 8. Fig. 8 (a) and fig. 8 (b) show the run times (in minutes) of BBM and NBM at different privacy budgets, with the number of participants set to 2; fig. 8 (c) and fig. 8 (d) show the run time (in minutes) of BBM and NBM at different numbers of participants, with the privacy budget set to 0.4. It can be observed that as the privacy budget increases, the runtime of both BBM and NBM becomes longer. This is because the PSTs constructed by BBM and NBM have larger heights and more nodes under a larger privacy budget. Furthermore, it can be observed that in all cases the runtime of the BBM is much smaller than NBM, and the difference becomes more pronounced as the privacy budget or number of participants increases. The reason is that the communication time between the parties is remarkably reduced by performing node splitting judgment by using batch processing; by utilizing parallel computing, the computing time of the parties is significantly reduced.
Example 2:
the embodiment 2 of the present disclosure provides a data providing device, which includes a processor, where the processor is communicatively connected to an external second terminal, and the processor is configured to:
preprocessing the held data sequence;
receiving a predicted suffix tree and a node queue which are sent by a second terminal and only comprise root nodes, judging whether the nodes in the node queue need to be split or not in a batch processing mode under the condition that differential privacy is met, and sending a judgment result to the second terminal so that the second terminal can obtain a final predicted suffix tree structure;
and under the condition of meeting the difference privacy, calculating a prediction histogram of the node to obtain parameters of a prediction suffix tree, and sending the parameters to the second terminal, so that the second terminal generates a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
The working mode of the device is the same as that of the method in embodiment 1, and the description is omitted here.
Example 3:
the embodiment 3 of the present disclosure provides a multi-party sequence data issuing method meeting the difference privacy, which is applied to a second terminal, and includes the following steps:
initializing a prediction suffix tree only containing a root node, initializing a node queue for storing nodes which are not traversed, and inserting the root node into the queue;
receiving a node splitting judgment result sent by a first terminal, and obtaining a final structure of the prediction suffix tree when all nodes are split;
and receiving a prediction histogram of a node sent by the first terminal to obtain parameters of a prediction suffix tree, and generating a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
The detailed method is the same as that in example 1, and is not described herein again.
Example 4:
the embodiment 4 of the present disclosure provides a multi-party sequence data issuing device satisfying differential privacy, including a processor, the processor being connected in communication with a first terminal, the processor being configured to:
initializing a prediction suffix tree only containing a root node, initializing a node queue for storing nodes which are not traversed, and inserting the root node into the queue;
receiving a node splitting judgment result sent by a first terminal, and obtaining a final structure of the prediction suffix tree when all nodes are split;
and receiving a prediction histogram of a node sent by the first terminal to obtain parameters of a prediction suffix tree, and generating a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
The working mode of the device is the same as that of the method in embodiment 1, and the description is omitted here.
Example 5:
the embodiment 5 of the present disclosure provides a multi-party sequence data issuing method meeting the difference privacy, including the following steps:
each first terminal preprocesses the held data sequence and keeps the preprocessed data sequence at the first terminal;
the second terminal initializes a prediction suffix tree only containing a root node, initializes a node queue for storing nodes which are not traversed, and inserts the root node into the queue;
the second terminal is combined with the first terminal, whether the nodes in the node queue need to be split or not is judged in a batch processing mode under the condition that differential privacy is met, and the first terminal sends a judgment result to the second terminal;
when all the nodes in the node queue are completely split, the second terminal obtains the final structure of the prediction suffix tree;
the second terminal is combined with the first terminal, under the condition that the difference privacy is met, the prediction histogram of the node is calculated, the parameter of the prediction suffix tree is obtained, and the parameter is sent to the second terminal;
the second terminal generates a new set of overall sequence data based on the structure and parameters of the predicted suffix tree.
The detailed method is the same as that in example 1, and is not described herein again.
Example 6:
as shown in fig. 9, embodiment 6 of the present disclosure provides a multi-party sequence data distribution system satisfying differential privacy, including at least two first terminals and at least one second terminal, where each first terminal is communicatively connected to the second terminal;
each first terminal preprocesses the held data sequence and keeps the preprocessed data sequence at the first terminal;
the second terminal initializes a prediction suffix tree only containing a root node, initializes a node queue for storing nodes which are not traversed, and inserts the root node into the queue;
the second terminal is combined with the first terminal, whether the nodes in the node queue need to be split or not is judged in a batch processing mode under the condition that differential privacy is met, and the first terminal sends a judgment result to the second terminal;
when all the nodes in the node queue are completely split, the second terminal obtains the final structure of the prediction suffix tree;
the second terminal is combined with the first terminal, under the condition that the difference privacy is met, the prediction histogram of the node is calculated, the parameter of the prediction suffix tree is obtained, and the parameter is sent to the second terminal;
the second terminal generates a new set of overall sequence data based on the structure and parameters of the predicted suffix tree.
The working method of the system is the same as that in embodiment 1, and is not described again here
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. A multi-party sequence data issuing method meeting difference privacy is applied to a first terminal and comprises the following steps:
preprocessing the held data sequence;
receiving a predicted suffix tree and a node queue which are sent by a second terminal and only comprise root nodes, judging whether the nodes in the node queue need to be split or not by adopting a batch processing mode according to a preprocessed data sequence under the condition of meeting differential privacy, and sending a judgment result to the second terminal so that the second terminal obtains the final structure of the predicted suffix tree;
and under the condition of meeting the difference privacy, calculating a prediction histogram of the node to obtain parameters of a prediction suffix tree, and sending the parameters to the second terminal, so that the second terminal generates a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
2. The multi-party sequence data issuing method satisfying differential privacy as claimed in claim 1, wherein the preprocessing specifically is: adding a start symbol and an end symbol for the data sequence, and truncating the data sequence with the length larger than a preset threshold value;
or the first terminal and the second terminal carry out data interaction, and jointly execute a node splitting discrimination protocol to judge whether each node needs to be split or not;
or, the batch processing mode specifically includes: dividing tasks in a data block mode, wherein each block comprises a plurality of tasks, the tasks in each block are from different nodes, the number of the tasks from the different nodes is different, and the total number of the tasks in each block is fixed;
for tasks from a certain node, their positions in the blocks are continuously slid downwards, so that the task at the lowest end of any one block is the last task of a certain node;
or, calculating a prediction histogram of the node, specifically:
calculating suffix histograms of all leaf nodes, and injecting Laplace noise into each dimension of data of the suffix histograms, wherein the process of noise injection every time meets the difference privacy;
for all non-leaf nodes, the suffix histogram is the sum of suffix histograms of all leaf nodes in a subtree taking the node as a root node;
or, the method for generating each sequence of the new overall sequence data set specifically comprises: :
initialization sequence s0After $, inserting characters in sequence at the end of the sequence;
the ith insertion procedure was: for the currently generated sequence si-1=$x1x2…xi-1Finding out the node with prediction sequence equal to current generation sequence in tau, selecting symbol x according to preset probability distributioniInsertion of si-1Terminal, i.e. generation of new subsequences si
If xi≠&Continuing to execute the insertion process; otherwise, the sequence generation ends, where $ is the start match,&is the end symbol.
3. A data providing apparatus, comprising a processor communicatively coupled to an external second terminal, the processor configured to:
preprocessing the held data sequence;
receiving a predicted suffix tree and a node queue which are sent by a second terminal and only comprise root nodes, judging whether the nodes in the node queue need to be split or not by adopting a batch processing mode according to a preprocessed data sequence under the condition of meeting differential privacy, and sending a judgment result to the second terminal so that the second terminal obtains the final structure of the predicted suffix tree;
and under the condition of meeting the difference privacy, calculating a prediction histogram of the node to obtain parameters of a prediction suffix tree, and sending the parameters to the second terminal, so that the second terminal generates a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
4. The data providing device according to claim 3, wherein the preprocessing is specifically: adding a start symbol and an end symbol for the data sequence, and truncating the data sequence with the length larger than a preset threshold value;
or the processor and the second terminal perform data interaction, jointly execute a node splitting discrimination protocol and judge whether each node needs to be split or not;
or, the batch processing mode specifically includes: the task division is carried out in a data block mode, each block comprises a plurality of tasks, the tasks in each block are from different nodes, the number of the tasks from the different nodes is different, the total number of the tasks in each block is fixed, and for the tasks from a certain node, the positions of the tasks in the block continuously slide downwards, so that the task at the lowest end of any one block is the last task of the certain node;
or, calculating a prediction histogram of the node, specifically:
calculating suffix histograms of all leaf nodes, and injecting Laplace noise into each dimension of data of the suffix histograms, wherein the process of noise injection every time meets the difference privacy;
for all non-leaf nodes, the suffix histogram is the sum of suffix histograms of all leaf nodes in a subtree taking the node as a root node;
or, the method for generating each sequence of the new overall sequence data set specifically comprises: :
initialization sequence s0After $, inserting characters in sequence at the end of the sequence;
the ith insertion procedure was: for the currently generated sequence si-1=$x1x2…xi-1Finding out the node with prediction sequence equal to current generation sequence in tau, selecting symbol x according to preset probability distributioniInsertion of si-1Terminal, i.e. generation of new subsequences si
If xi≠&Continuing to execute the insertion process; otherwise, the sequence generation ends, where $ is the start match,&is the end symbol.
5. A multi-party sequence data issuing method meeting difference privacy is applied to a second terminal and comprises the following steps:
initializing a prediction suffix tree only containing a root node, initializing a node queue for storing nodes which are not traversed, and inserting the root node into the queue;
receiving a node splitting judgment result sent by a first terminal, and obtaining a final structure of the prediction suffix tree when all nodes are split;
and receiving a prediction histogram of a node sent by the first terminal to obtain parameters of a prediction suffix tree, and generating a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
6. The multi-party sequence data issuing method satisfying differential privacy as claimed in claim 5, wherein the preprocessing specifically includes: adding a start symbol and an end symbol for the data sequence, and truncating the data sequence with the length larger than a preset threshold value;
or the second terminal is combined with the first terminal to jointly execute a node splitting discrimination protocol and judge whether each node needs to be split or not;
or, the batch processing mode specifically includes: the task division is carried out in a data block mode, each block comprises a plurality of tasks, the tasks in each block are from different nodes, the number of the tasks from the different nodes is different, the total number of the tasks in each block is fixed, and for the tasks from a certain node, the positions of the tasks in the block continuously slide downwards, so that the task at the lowest end of any one block is the last task of the certain node;
or, calculating a prediction histogram of the node, specifically:
calculating suffix histograms of all leaf nodes, and injecting Laplace noise into each dimension of data of the suffix histograms, wherein the process of noise injection every time meets the difference privacy;
for all non-leaf nodes, the suffix histogram is the sum of suffix histograms of all leaf nodes in a subtree taking the node as a root node;
or, the method for generating each sequence of the new overall sequence data set specifically comprises: :
initialization sequence s0After $, inserting characters in sequence at the end of the sequence;
insertion iThe process is as follows: for the currently generated sequence si-1=$x1x2…xi-1Finding out the node with prediction sequence equal to current generation sequence in tau, selecting symbol x according to preset probability distributioniInsertion of si-1Terminal, i.e. generation of new subsequences si
If xi≠&Continuing to execute the insertion process; otherwise, the sequence generation ends, where $ is the start match,&is the end symbol.
7. A multi-party sequence data distribution device satisfying differential privacy, comprising a processor communicatively coupled to a first terminal, the processor configured to:
initializing a prediction suffix tree only containing a root node, initializing a node queue for storing nodes which are not traversed, and inserting the root node into the queue;
receiving a node splitting judgment result sent by a first terminal, and obtaining a final structure of the prediction suffix tree when all nodes are split;
and receiving a prediction histogram of a node sent by the first terminal to obtain parameters of a prediction suffix tree, and generating a group of new overall sequence data sets according to the structure and the parameters of the prediction suffix tree.
8. The multi-party sequence data distribution device meeting the differential privacy requirement of claim 7, wherein the preprocessing specifically comprises: adding a start symbol and an end symbol for the data sequence, and truncating the data sequence with the length larger than a preset threshold value;
or the second terminal is combined with the first terminal to jointly execute a node splitting discrimination protocol and judge whether each node needs to be split or not;
or, the batch processing mode specifically includes: the task division is carried out in a data block mode, each block comprises a plurality of tasks, the tasks in each block are from different nodes, the number of the tasks from the different nodes is different, the total number of the tasks in each block is fixed, and for the tasks from a certain node, the positions of the tasks in the block continuously slide downwards, so that the task at the lowest end of any one block is the last task of the certain node;
or, calculating a prediction histogram of the node, specifically:
calculating suffix histograms of all leaf nodes, and injecting Laplace noise into each dimension of data of the suffix histograms, wherein the process of noise injection every time meets the difference privacy;
for all non-leaf nodes, the suffix histogram is the sum of suffix histograms of all leaf nodes in a subtree taking the node as a root node;
or, the method for generating each sequence of the new overall sequence data set specifically comprises: :
initialization sequence s0After $, inserting characters in sequence at the end of the sequence;
the ith insertion procedure was: for the currently generated sequence si-1=$x1x2…xi-1Finding out the node with prediction sequence equal to current generation sequence in tau, selecting symbol x according to preset probability distributioniInsertion of si-1Terminal, i.e. generation of new subsequences si
If xi≠&Continuing to execute the insertion process; otherwise, the sequence generation ends, where $ is the start match,&is the end symbol.
9. A multi-party sequence data issuing method meeting difference privacy is characterized by comprising the following steps:
each first terminal preprocesses the held data sequence;
the second terminal initializes a prediction suffix tree only containing a root node, initializes a node queue for storing nodes which are not traversed, and inserts the root node into the queue;
the second terminal is combined with the first terminal, whether the nodes in the node queue need to be split or not is judged by adopting a batch processing mode according to the preprocessed data sequence under the condition that differential privacy is met, and the first terminal sends a judgment result to the second terminal;
when all the nodes in the node queue are completely split, the second terminal obtains the final structure of the prediction suffix tree;
the second terminal is combined with the first terminal, under the condition that the difference privacy is met, the prediction histogram of the node is calculated, the parameter of the prediction suffix tree is obtained, and the parameter is sent to the second terminal;
the second terminal generates a new set of overall sequence data based on the structure and parameters of the predicted suffix tree.
10. The system is characterized by comprising at least two first terminals and at least one second terminal, wherein each first terminal is in communication connection with the second terminal;
each first terminal preprocesses the held data sequence and keeps the preprocessed data sequence at the first terminal;
secondly, initializing a prediction suffix tree only comprising a root node, initializing a node queue for storing nodes which are not traversed, and inserting the root node into the queue;
the second terminal is combined with the first terminal, whether the nodes in the node queue need to be split or not is judged by adopting a batch processing mode according to the preprocessed data sequence under the condition that differential privacy is met, and the first terminal sends a judgment result to the second terminal;
when all the nodes in the node queue are completely split, the second terminal obtains the final structure of the prediction suffix tree;
the second terminal is combined with the first terminal, under the condition that the difference privacy is met, the prediction histogram of the node is calculated, the parameter of the prediction suffix tree is obtained, and the parameter is sent to the second terminal;
the second terminal generates a new set of overall sequence data based on the structure and parameters of the predicted suffix tree.
CN202010541485.0A 2020-06-15 2020-06-15 Multi-party sequence data issuing method and system meeting difference privacy Active CN111737740B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010541485.0A CN111737740B (en) 2020-06-15 2020-06-15 Multi-party sequence data issuing method and system meeting difference privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010541485.0A CN111737740B (en) 2020-06-15 2020-06-15 Multi-party sequence data issuing method and system meeting difference privacy

Publications (2)

Publication Number Publication Date
CN111737740A true CN111737740A (en) 2020-10-02
CN111737740B CN111737740B (en) 2022-11-01

Family

ID=72649122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010541485.0A Active CN111737740B (en) 2020-06-15 2020-06-15 Multi-party sequence data issuing method and system meeting difference privacy

Country Status (1)

Country Link
CN (1) CN111737740B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114218602A (en) * 2021-12-10 2022-03-22 南京航空航天大学 Differential privacy heterogeneous multi-attribute data publishing method based on vertical segmentation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050065823A1 (en) * 2003-09-23 2005-03-24 Siemens Medical Solutions Usa, Inc. Method and apparatus for privacy checking
CN110874488A (en) * 2019-11-15 2020-03-10 哈尔滨工业大学(深圳) Stream data frequency counting method, device and system based on mixed differential privacy and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050065823A1 (en) * 2003-09-23 2005-03-24 Siemens Medical Solutions Usa, Inc. Method and apparatus for privacy checking
CN110874488A (en) * 2019-11-15 2020-03-10 哈尔滨工业大学(深圳) Stream data frequency counting method, device and system based on mixed differential privacy and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
唐朋: "满足差分隐私的多方数据发布技术研究", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114218602A (en) * 2021-12-10 2022-03-22 南京航空航天大学 Differential privacy heterogeneous multi-attribute data publishing method based on vertical segmentation

Also Published As

Publication number Publication date
CN111737740B (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN109189991B (en) Duplicate video identification method, device, terminal and computer readable storage medium
CN110532369B (en) Question and answer pair generation method and device and server
WO2018188576A1 (en) Resource pushing method and device
EP2924594A1 (en) Data encoding and corresponding data structure in a column-store database
JP7457125B2 (en) Translation methods, devices, electronic equipment and computer programs
CN107608773A (en) task concurrent processing method, device and computing device
Zhang et al. SUMMA: subgraph matching in massive graphs
CN111737740B (en) Multi-party sequence data issuing method and system meeting difference privacy
CN110275889B (en) Feature processing method and device suitable for machine learning
Shen et al. Deep learning convolutional neural networks with dropout-a parallel approach
CN109147868A (en) Protein function prediction technique, device, equipment and storage medium
CN108399266B (en) Data extraction method and device, electronic equipment and computer readable storage medium
CN110704424A (en) Sorting method and device applied to database and related equipment
CN106911777A (en) A kind of data processing method and server
Moghaddam et al. A general framework for sorting large data sets using independent subarrays of approximately equal length
CN113821657A (en) Artificial intelligence-based image processing model training method and image processing method
CN112766390A (en) Method, device and equipment for determining training sample
CN108304467A (en) For matched method between text
CN106569986A (en) Character string replacement method and device
CN110069772A (en) Predict device, method and the storage medium of the scoring of question and answer content
EP3663890A1 (en) Alignment method, device and system
CN112381169B (en) Image identification method and device, electronic equipment and readable storage medium
CN114741363A (en) Method, device and program product for processing and acquiring skeleton resources of virtual image
CN108763871B (en) Hole filling method and device based on third-generation sequencing sequence
CN113139102A (en) Data processing method, data processing device, nonvolatile storage medium and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant