CN112995076B - Discrete data frequency estimation method, user side, data center and system - Google Patents

Discrete data frequency estimation method, user side, data center and system Download PDF

Info

Publication number
CN112995076B
CN112995076B CN201911298496.4A CN201911298496A CN112995076B CN 112995076 B CN112995076 B CN 112995076B CN 201911298496 A CN201911298496 A CN 201911298496A CN 112995076 B CN112995076 B CN 112995076B
Authority
CN
China
Prior art keywords
discrete data
codes
data
code
discrete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911298496.4A
Other languages
Chinese (zh)
Other versions
CN112995076A (en
Inventor
刘莹
朱洪斌
刘圣龙
赵涛
王衡
周鑫
王迪
毛一凡
崔硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Center Of State Grid Corp Of China
Original Assignee
Big Data Center Of State Grid Corp Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Center Of State Grid Corp Of China filed Critical Big Data Center Of State Grid Corp Of China
Priority to CN201911298496.4A priority Critical patent/CN112995076B/en
Publication of CN112995076A publication Critical patent/CN112995076A/en
Application granted granted Critical
Publication of CN112995076B publication Critical patent/CN112995076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L25/00Baseband systems
    • H04L25/02Details ; arrangements for supplying electrical power along data transmission lines
    • H04L25/08Modifications for reducing interference; Modifications for reducing effects due to line faults ; Receiver end arrangements for detecting or overcoming line faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a discrete data frequency estimation method, a user side, a data center and a system, wherein the method comprises the following steps: the user side generates discrete data codes according to the types of the discrete data sent to the data center; the method comprises the steps that a user side obtains a disturbing code corresponding to a discrete data code and sends the disturbing code corresponding to the discrete data code to a data center; the data center receives the scrambling codes corresponding to the discrete data codes of the user sides; and the data center determines the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides. According to the scheme, the user terminal reduces the noise injection on the original data according to the definition of loose local differential privacy, reduces the distortion degree of the data as much as possible on the basis of meeting the local differential privacy, improves the usability of the disturbed data, and further improves the accuracy of the statistical result.

Description

Discrete data frequency estimation method, user side, data center and system
Technical Field
The invention relates to the field of power grid information control, in particular to a discrete data frequency estimation method, a user side, a data center and a system.
Background
In the field of production control, including but not limited to the field of power grid information control, it is often necessary to collect service data of different areas and different departments to a data center, and through joint analysis, the occurrence frequency of a certain service event is obtained, and service analysis is performed. The case of separating data ownership and data use right is involved, namely, all data of the data are respectively in different areas and different departments, and the analysis result can be shared, so that the joint data analysis needs to be carried out under the condition of ensuring the data secret of each part.
At present, business data of the same region and different departments are directly collected to a data center, sensitive data leakage risks exist, the data center serves as a key node for joint work of all parties, and data safety protection responsibility is huge. In addition, in order to maintain data security and avoid data security responsibility, the enthusiasm of each party for sharing data is greatly reduced, which is not beneficial to the development of data service. Therefore, a technology for performing local differential privacy processing by independent parties according to the free data condition and performing joint analysis under the condition of protecting the data privacy of the independent parties is urgently needed.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to reduce the injection of noise on the original data according to the definition of loose local differential privacy, reduce the distortion degree of the data as much as possible on the basis of meeting the local differential privacy, improve the usability of the disturbed data and further improve the accuracy of the statistical result.
The purpose of the invention is realized by adopting the following technical scheme:
the invention provides a discrete data frequency estimation method, which is applied to a user terminal, and the improvement is that the method comprises the following steps:
generating discrete data codes according to the types of the discrete data sent to the data center;
and acquiring a scrambling code corresponding to the discrete data code, and sending the scrambling code corresponding to the discrete data code to a data center.
Preferably, the length of the discrete data codes is equal to the total number of discrete data types.
Further, the discrete data is encoded as (v) 1 ...v i ...v n ) Where n is the total number of discrete data types, v i The code value corresponding to the i-th discrete data, if the discrete number sent to the data center by the user endAccording to the type of i-th discrete data, v i 1, otherwise, v i =0。
Preferably, the obtaining of the scrambling code corresponding to the discrete data code includes:
acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
Further, the obtaining of the conversion probability of the code value corresponding to each type of discrete data in the discrete data coding includes:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure BDA0002321239460000021
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure BDA0002321239460000022
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure BDA0002321239460000023
scrambling code values corresponding to ith discrete data in scrambling codes corresponding to the discrete data codes,
Figure BDA0002321239460000024
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure BDA0002321239460000025
coding corresponding to i-th discrete data in discrete data codingThe probability that the value is converted to 1.
Further, the determining, based on the transition probability of the code value corresponding to each type of discrete data in the discrete data codes, a scrambling code corresponding to the discrete data codes includes:
in the {0,1} set
Figure BDA0002321239460000026
Extract 0 to
Figure BDA0002321239460000027
Probability of (1) is extracted, and if 0 is extracted, then
Figure BDA0002321239460000028
If 1 is drawn, then
Figure BDA0002321239460000029
The invention provides a user terminal applied to discrete data frequency estimation, and the improvement is that the user terminal comprises:
the generating module is used for generating discrete data codes according to the types of the discrete data sent to the data center;
the acquisition module is used for acquiring a scrambling code corresponding to the discrete data code;
and the sending module is used for sending the scrambling codes corresponding to the discrete data codes to the data center.
Preferably, the length of the discrete data codes is equal to the total number of discrete data types.
Further, the discrete data is encoded as (v) 1 ...v i ...v n ) Where n is the total number of discrete data types, v i Is the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v is i 1, otherwise, v i =0。
Preferably, the obtaining module includes:
the acquisition unit is used for acquiring the conversion probability of the code value corresponding to each type of discrete data in the discrete data codes;
and the determining unit is used for determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
Further, the obtaining unit is specifically configured to:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure BDA0002321239460000031
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure BDA0002321239460000032
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure BDA0002321239460000033
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure BDA0002321239460000034
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure BDA0002321239460000035
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
Further, the determining unit is specifically configured to:
in the {0,1} set
Figure BDA0002321239460000036
Probability extraction of0, to
Figure BDA0002321239460000037
Probability of (1) is extracted, and if 0 is extracted, then
Figure BDA0002321239460000038
If 1 is drawn, then
Figure BDA0002321239460000039
The invention provides a discrete data frequency estimation method, which is applied to a data center, and the improvement is that the method comprises the following steps:
receiving a scrambling code corresponding to the discrete data code of each user side;
and determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
Preferably, the determining the occurrence frequency of each type of discrete data according to the scrambling code corresponding to the discrete data code of each user side includes:
counting the frequency of 0 of the disturbing code value corresponding to the ith discrete data in the disturbing codes corresponding to the discrete data codes of each user terminal
Figure BDA0002321239460000041
And frequency with scrambling code value 1
Figure BDA0002321239460000042
Based on
Figure BDA0002321239460000043
And
Figure BDA0002321239460000044
establishing an i-th discrete data generation frequency equation set;
and solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
Further, the system of the generation frequency equation of the ith type of discrete data is as follows:
Figure BDA0002321239460000045
in the above formula, f 0 (i) For no occurrence frequency of i-th type discrete data, f 1 (i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
The present invention provides a data center for use in discrete data frequency estimation, the improvement wherein the data center comprises:
the receiving module is used for receiving the scrambling codes corresponding to the discrete data codes of the user sides;
and the determining module is used for determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
Preferably, the determining module includes:
a statistic unit for counting the frequency of 0 for the scrambling code value corresponding to the i-th discrete data in the scrambling codes corresponding to the discrete data codes of each user terminal
Figure BDA0002321239460000046
And a frequency with scrambling code value 1
Figure BDA0002321239460000047
A building unit for building based on
Figure BDA0002321239460000048
And
Figure BDA0002321239460000049
establishing an i-th discrete data generation frequency equation set;
and the solving unit is used for solving the i-th type discrete data receiving frequency equation set to obtain the occurrence frequency of the i-th type discrete data.
Further, the system of the generation frequency equation of the ith type of discrete data is as follows:
Figure BDA00023212394600000410
in the above formula, f 0 (i) No occurrence frequency, f, for type i discrete data 1 (i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
The invention provides a method of discrete data frequency estimation, the improvement being that the method comprises:
the user side generates discrete data codes according to the types of the discrete data sent to the data center;
the method comprises the steps that a user side obtains a disturbing code corresponding to a discrete data code and sends the disturbing code corresponding to the discrete data code to a data center;
the data center receives the disturbing codes corresponding to the discrete data codes of the user sides;
and the data center determines the occurrence frequency of various discrete data according to the disturbing codes corresponding to the discrete data codes of the user terminals.
The present invention provides a discrete data frequency estimation system, the improvement wherein said system comprises: the user side and the data center.
Compared with the closest prior art, the invention has the following beneficial effects:
in the technical scheme provided by the invention, a user terminal generates discrete data codes according to the types of discrete data sent to a data center, randomly scrambles code values corresponding to various types of discrete data in the discrete data codes, and sends the scrambled codes to a data collection center; the data processed by the scheme meets the privacy requirement, and the risk of privacy disclosure is avoided.
After the data collection center receives the disturbing codes corresponding to the discrete data codes of the user sides, the occurrence frequency of various discrete data is determined according to the disturbing codes corresponding to the discrete data codes of the user sides.
Drawings
FIG. 1 is a flow chart of a method for estimating a frequency of discrete data according to the present invention;
fig. 2 is a schematic diagram of a ue structure applied to a discrete data frequency estimation method according to the present invention;
FIG. 3 is a schematic diagram of a data center structure applied to a discrete data frequency estimation method provided by the present invention;
fig. 4 is a schematic structural diagram of a discrete data frequency estimation system provided by the present invention.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In order to carry out joint data analysis under the condition of ensuring data secrets of all parties, the discrete data frequency estimation method provided by the invention introduces the definition of loose local differential privacy on the basis of the existing scheme, and provides a discrete data frequency estimation scheme meeting the loose local differential privacy. The main idea of the scheme is that a user terminal reduces noise injection on original data according to definition of loose local differential privacy, reduces distortion of the data as much as possible on the basis of meeting the local differential privacy, improves usability of disturbed data, and further improves accuracy of a statistical result, as shown in fig. 1, the method includes:
101, a user side generates discrete data codes according to the types of the discrete data sent to a data center;
102, the user side acquires a scrambling code corresponding to the discrete data code and sends the scrambling code corresponding to the discrete data code to a data center;
103, the data center receives the disturbing codes corresponding to the discrete data codes of each user side;
and 104, the data center determines the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
Wherein the length of the discrete data code is equal to the total number of the discrete data types.
The discrete data is encoded as (v) 1 ...v i ...v n ) Where n is the total number of discrete data types, v i If the type of the discrete data sent by the user side to the data center is the ith type of discrete data, v is the code value corresponding to the ith type of discrete data i 1, otherwise, v i =0。
For example: each user terminal possesses one of the discrete data in the discrete data set S. Each user terminal firstly checks the own data d i Performing one-hot encoding, i.e. obtaining a unit vector v of length m i Only self data d i The corresponding position is 1, and the rest of the positions are 0. Specifically, if d i Is the jth data (j ≦ m) in the discrete data set, the unit vector v i The j-th bit in (1) and the rest are 0.
Specifically, in the embodiment provided by the present invention, step 101 and step 102 may be applied to the user side, where in step 102, acquiring the scrambling code corresponding to the discrete data code includes:
acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
Further, the obtaining of the conversion probability of the code value corresponding to each type of discrete data in the discrete data coding includes:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure BDA0002321239460000071
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure BDA0002321239460000072
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure BDA0002321239460000073
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure BDA0002321239460000074
the probability of converting the coded value corresponding to the i-th discrete data in the discrete data coding into 0,
Figure BDA0002321239460000075
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
Wherein, δ is generally a value greater than 0 and much smaller than 1, and when δ is 0, the privacy protection mechanism satisfies the local differential privacy under strict definition. In this application, a discrete data frequency estimation method satisfying loose local differential privacy is mainly discussed.
Further, the determining, based on the transition probabilities of the code values corresponding to various types of discrete data in the discrete data codes, a scrambling code corresponding to the discrete data codes includes:
in the {0,1} set
Figure BDA0002321239460000076
Extract 0 to
Figure BDA0002321239460000077
Probability of (1) is extracted, and if 0 is extracted, then
Figure BDA0002321239460000078
If 1 is drawn, then
Figure BDA0002321239460000079
Based on the technical solutions of step 101 and step 102, the present invention provides a ue for discrete data frequency estimation, as shown in fig. 2, the ue includes:
the generating module is used for generating discrete data codes according to the types of the discrete data sent to the data center;
the acquisition module is used for acquiring a scrambling code corresponding to the discrete data code;
and the sending module is used for sending the scrambling codes corresponding to the discrete data codes to the data center.
Preferably, the length of the discrete data codes is equal to the total number of discrete data types.
Further, the discrete data is encoded as (v) 1 ...v i ...v n ) Where n is the total number of discrete data types, v i Is the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v is i 1, otherwise, v i =0。
Preferably, the obtaining module includes:
the acquisition unit is used for acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
and the determining unit is used for determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes.
Further, the obtaining unit is specifically configured to:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure BDA0002321239460000081
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure BDA0002321239460000082
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure BDA0002321239460000083
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure BDA0002321239460000084
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure BDA0002321239460000085
and converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 1.
Further, the determining unit is specifically configured to:
in the {0,1} set
Figure BDA0002321239460000086
Extract 0 to
Figure BDA0002321239460000087
Probability of (1) is extracted, and if 0 is extracted, then
Figure BDA0002321239460000088
If 1 is drawn, then
Figure BDA0002321239460000089
In the embodiment provided by the present invention, step 103 and step 104 may be applied to a data center, where step 104 includes:
counting the frequency of 0 corresponding to the disturbing code value of the ith type of discrete data in the disturbing codes corresponding to the discrete data codes of each user terminal
Figure BDA00023212394600000810
And a frequency with scrambling code value 1
Figure BDA00023212394600000811
Based on
Figure BDA00023212394600000812
And
Figure BDA00023212394600000813
establishing an i-th discrete data generation frequency equation set;
and solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
Further, the system of the generation frequency equation of the ith type of discrete data is as follows:
Figure BDA0002321239460000091
in the above formula, f 0 (i) For no occurrence frequency of i-th type discrete data, f 1 (i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
Based on the technical solutions of step 103 and step 104, the present invention provides a data center for discrete data frequency estimation, as shown in fig. 3, the data center includes:
the receiving module is used for receiving the scrambling codes corresponding to the discrete data codes of the user sides;
and the determining module is used for determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides.
Preferably, the determining module includes:
a statistic unit for counting the frequency of 0 for the scrambling code value corresponding to the i-th discrete data in the scrambling codes corresponding to the discrete data codes of each user terminal
Figure BDA0002321239460000092
And a frequency with scrambling code value 1
Figure BDA0002321239460000093
A building unit for building based on
Figure BDA0002321239460000094
And
Figure BDA0002321239460000095
establishing an i-th discrete data generation frequency equation set;
and the solving unit is used for solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data.
Further, the system of the generation frequency equation of the ith type of discrete data is as follows:
Figure BDA0002321239460000096
in the above formula, f 0 (i) For no occurrence frequency of i-th type discrete data, f 1 (i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
Meanwhile, the present invention also provides a discrete data frequency estimation system, as shown in fig. 4, the system includes: the user side and the data center.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (6)

1. A method for discrete data frequency estimation, the method being applied to a user side, the method comprising:
generating discrete data codes according to the types of the discrete data sent to the data center;
obtaining a scrambling code corresponding to the discrete data code, and sending the scrambling code corresponding to the discrete data code to a data center;
the obtaining of the scrambling code corresponding to the discrete data code includes:
acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
determining a scrambling code corresponding to the discrete data code based on the conversion probability of the code value corresponding to each type of discrete data in the discrete data code;
the obtaining of the conversion probability of the code value corresponding to each type of discrete data in the discrete data code includes:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure FDA0003695328640000011
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure FDA0003695328640000012
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure FDA0003695328640000013
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure FDA0003695328640000014
the probability of converting the coded value corresponding to the i-th discrete data in the discrete data coding into 0,
Figure FDA0003695328640000015
the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 1;
the length of the discrete data codes is equal to the total number of the discrete data types;
the discrete data is encoded as (v) 1 ...v i ...v n ) Where n is the total number of discrete data types, v i Is the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v is i 1, otherwise, v i =0;
The determining, based on the transition probabilities of the code values corresponding to various types of discrete data in the discrete data codes, a scrambling code corresponding to the discrete data codes includes:
in the {0,1} set
Figure FDA0003695328640000021
Extract 0 to
Figure FDA0003695328640000022
Probability of (1) is extracted, and if 0 is extracted, then
Figure FDA0003695328640000023
If 1 is drawn, then
Figure FDA0003695328640000024
2. A user terminal for discrete data frequency estimation, the user terminal comprising:
the generating module is used for generating discrete data codes according to the types of the discrete data sent to the data center;
the acquisition module is used for acquiring a scrambling code corresponding to the discrete data code;
the sending module is used for sending the disturbing code corresponding to the discrete data code to a data center;
the acquisition module includes:
the acquisition unit is used for acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
the determining unit is used for determining the scrambling codes corresponding to the discrete data codes based on the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
the obtaining unit is specifically configured to:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure FDA0003695328640000025
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure FDA0003695328640000026
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure FDA0003695328640000027
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure FDA0003695328640000028
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure FDA0003695328640000029
the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 1;
the length of the discrete data codes is equal to the total number of the discrete data types;
said discrete data is encoded as (v) 1 ...v i ...v n ) Where n is the total number of discrete data types, v i Is the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v is i 1, otherwise, v i =0;
The determining unit is specifically configured to:
in the {0,1} set
Figure FDA0003695328640000031
Extract 0 to
Figure FDA0003695328640000032
Probability of (1) is extracted, and if 0 is extracted, then
Figure FDA0003695328640000033
If 1 is drawn, then
Figure FDA0003695328640000034
3. A discrete data frequency estimation method applied to a data center is characterized by comprising the following steps:
receiving a scrambling code corresponding to the discrete data code of each user side;
determining the occurrence frequency of various discrete data according to the disturbing codes corresponding to the discrete data codes of the user sides;
the determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides comprises:
counting the frequency of 0 corresponding to the disturbing code value of the ith type of discrete data in the disturbing codes corresponding to the discrete data codes of each user terminal
Figure FDA0003695328640000035
And a frequency with scrambling code value 1
Figure FDA0003695328640000036
Based on
Figure FDA0003695328640000037
And
Figure FDA0003695328640000038
establishing an i-th discrete data generation frequency equation set;
solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data;
the generation frequency equation of the ith discrete data is as follows:
Figure FDA0003695328640000039
in the above formula, f 0 (i) For no occurrence frequency of i-th type discrete data, f 1 (i) And the occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1.
4. A data center for use in discrete data frequency estimation, the data center comprising:
the receiving module is used for receiving the scrambling codes corresponding to the discrete data codes of the user sides;
the determining module is used for determining the occurrence frequency of various discrete data according to the disturbing codes corresponding to the discrete data codes of the user terminals;
the determining module includes:
a statistic unit for counting the frequency of 0 for the scrambling code value corresponding to the i-th discrete data in the scrambling codes corresponding to the discrete data codes of each user terminal
Figure FDA00036953286400000310
And a frequency with scrambling code value 1
Figure FDA00036953286400000311
A building unit for building based on
Figure FDA00036953286400000312
And
Figure FDA00036953286400000313
establishing an i-th discrete data generation frequency equation set;
the solving unit is used for solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data;
the generation frequency equation of the ith discrete data is as follows:
Figure FDA0003695328640000041
in the above formula, f 0 (i) For no occurrence frequency of i-th type discrete data, f 1 (i) The occurrence frequency of the ith type of discrete data, epsilon is a privacy protection budget, delta is a parameter under loose local differential privacy, and the value is from 0 to1.
5. A method of discrete data frequency estimation, the method comprising:
the user side generates discrete data codes according to the types of the discrete data sent to the data center;
a user side acquires a disturbing code corresponding to a discrete data code and sends the disturbing code corresponding to the discrete data code to a data center;
the data center receives the disturbing codes corresponding to the discrete data codes of the user sides;
the data center determines the occurrence frequency of various discrete data according to the disturbing codes corresponding to the discrete data codes of the user sides;
the obtaining of the scrambling code corresponding to the discrete data code includes:
acquiring the conversion probability of the code values corresponding to various types of discrete data in the discrete data codes;
determining a scrambling code corresponding to the discrete data code based on the conversion probability of the code value corresponding to each type of discrete data in the discrete data code;
the obtaining of the conversion probability of the code value corresponding to each type of discrete data in the discrete data codes includes:
determining the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 0 according to the following formula:
Figure FDA0003695328640000042
determining the probability of converting the coded value corresponding to the ith type of discrete data into 1 in the discrete data coding according to the following formula:
Figure FDA0003695328640000043
in the above formula, epsilon is the privacy protection budget, delta is a parameter under loose local differential privacy, the value is between 0 and 1,
Figure FDA0003695328640000044
scrambling code values corresponding to i-th discrete data in scrambling codes corresponding to discrete data codes,
Figure FDA0003695328640000045
the probability of converting the coded value corresponding to the i-th type of discrete data in the discrete data coding into 0,
Figure FDA0003695328640000051
the probability of converting the coded value corresponding to the ith type of discrete data in the discrete data coding into 1;
the determining the occurrence frequency of various discrete data according to the scrambling codes corresponding to the discrete data codes of the user sides comprises:
counting the frequency of 0 corresponding to the disturbing code value of the ith type of discrete data in the disturbing codes corresponding to the discrete data codes of each user terminal
Figure FDA0003695328640000052
And a frequency with scrambling code value 1
Figure FDA0003695328640000053
Based on
Figure FDA0003695328640000054
And
Figure FDA0003695328640000055
establishing an i-th discrete data generation frequency equation set;
solving the receiving frequency equation set of the ith type of discrete data to obtain the occurrence frequency of the ith type of discrete data;
the generation frequency equation of the ith discrete data is as follows:
Figure FDA0003695328640000056
in the above formula, f 0 (i) For no occurrence frequency of i-th type discrete data, f 1 (i) The occurrence frequency of the ith type of discrete data is shown, epsilon is a privacy protection budget, and delta is a parameter under loose local differential privacy, and the value is between 0 and 1;
the length of the discrete data codes is equal to the total number of the discrete data types;
the discrete data is encoded as (v) 1 ...v i ...v n ) Where n is the total number of discrete data types, v i Is the coded value corresponding to the i-th type of discrete data, if the type of the discrete data sent to the data center by the user side is the i-th type of discrete data, v is i 1, otherwise, v i =0;
The determining, based on the transition probabilities of the code values corresponding to various types of discrete data in the discrete data codes, a scrambling code corresponding to the discrete data codes includes:
in the {0,1} set
Figure FDA0003695328640000057
Extract 0 to
Figure FDA0003695328640000058
Probability of (1) is extracted, and if 0 is extracted, then
Figure FDA0003695328640000059
If 1 is drawn, then
Figure FDA00036953286400000510
6. A discrete data frequency estimation system, the system comprising: the user terminal according to claim 2 and the data center according to claim 4.
CN201911298496.4A 2019-12-17 2019-12-17 Discrete data frequency estimation method, user side, data center and system Active CN112995076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911298496.4A CN112995076B (en) 2019-12-17 2019-12-17 Discrete data frequency estimation method, user side, data center and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911298496.4A CN112995076B (en) 2019-12-17 2019-12-17 Discrete data frequency estimation method, user side, data center and system

Publications (2)

Publication Number Publication Date
CN112995076A CN112995076A (en) 2021-06-18
CN112995076B true CN112995076B (en) 2022-09-27

Family

ID=76341887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911298496.4A Active CN112995076B (en) 2019-12-17 2019-12-17 Discrete data frequency estimation method, user side, data center and system

Country Status (1)

Country Link
CN (1) CN112995076B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107302521A (en) * 2017-05-23 2017-10-27 全球能源互联网研究院 The sending method and method of reseptance of a kind of privacy of user data
CN108509627A (en) * 2018-04-08 2018-09-07 腾讯科技(深圳)有限公司 data discretization model training method and device, data discrete method
CN109299436A (en) * 2018-09-17 2019-02-01 北京邮电大学 A kind of ordering of optimization preference method of data capture meeting local difference privacy
CN110022531A (en) * 2019-03-01 2019-07-16 华南理工大学 A kind of localization difference privacy municipal refuse data report and privacy calculation method
WO2019172837A1 (en) * 2018-03-05 2019-09-12 Agency For Science, Technology And Research Method and system for deriving statistical information from encrypted data
CN110569286A (en) * 2019-09-11 2019-12-13 哈尔滨工业大学(威海) activity time sequence track mining method based on local differential privacy

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107302521A (en) * 2017-05-23 2017-10-27 全球能源互联网研究院 The sending method and method of reseptance of a kind of privacy of user data
WO2019172837A1 (en) * 2018-03-05 2019-09-12 Agency For Science, Technology And Research Method and system for deriving statistical information from encrypted data
CN108509627A (en) * 2018-04-08 2018-09-07 腾讯科技(深圳)有限公司 data discretization model training method and device, data discrete method
CN109299436A (en) * 2018-09-17 2019-02-01 北京邮电大学 A kind of ordering of optimization preference method of data capture meeting local difference privacy
CN110022531A (en) * 2019-03-01 2019-07-16 华南理工大学 A kind of localization difference privacy municipal refuse data report and privacy calculation method
CN110569286A (en) * 2019-09-11 2019-12-13 哈尔滨工业大学(威海) activity time sequence track mining method based on local differential privacy

Also Published As

Publication number Publication date
CN112995076A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
KR102045978B1 (en) Facial authentication method, device and computer storage
US9407435B2 (en) Cryptographic key generation based on multiple biometrics
CN106817358B (en) Encryption and decryption method and device for user resources
CN110011993B (en) Two-dimensional code big data safety transmission device based on developments multistage encryption and decryption
CN110852374A (en) Data detection method and device, electronic equipment and storage medium
Merhav et al. Optimal watermark embedding and detection strategies under limited detection resources
CN112911290B (en) Ciphertext image reversible data hiding method based on predictive difference compression coding
CN108648761B (en) Method for embedding blockchain account book in audio digital watermark
CN110635807A (en) Data coding method and decoding method
CN111222158A (en) Block chain-based two-party security and privacy comparison method
CN108322750B (en) Encrypted domain HEVC video data hiding method based on residual coefficient modulation
CN115296862A (en) Network data secure transmission method based on data coding
CN103593590A (en) Mixing additivity multi-time watermark embedding method and decoding method based on cloud environment
CN112995076B (en) Discrete data frequency estimation method, user side, data center and system
CN112398861B (en) Encryption system and method for sensitive data in web configuration system
CN116821967B (en) Intersection computing method and system for privacy protection
CN116341582B (en) Electronic traffic data management method and system based on two-dimension code
CN113537516B (en) Training method, device, equipment and medium for distributed machine learning model
CN115292739B (en) Data management method of metal mold design system
CN113271469B (en) Safety and reversible video privacy safety protection system and protection method
CN112288757B (en) Encryption domain image segmentation optimization method based on data packing technology
CN115292726A (en) Semantic communication method and device, electronic equipment and storage medium
CN115001687A (en) Secret sharing-based identity privacy data distributed storage method and system
CN114003939A (en) Multiple collinearity analysis method for longitudinal federal scene
CN116737741B (en) Platform merchant balance data real-time updating processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant