CN115118448A

CN115118448A - Data processing method, device, equipment and storage medium

Info

Publication number: CN115118448A
Application number: CN202210427247.6A
Authority: CN
Inventors: 范晓亮; 蒋杰; 杨昱睿; 刘煜宏; 陈鹏; 陶阳宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-09-27
Anticipated expiration: 2042-04-21
Also published as: CN115118448B

Abstract

The application provides a data processing method, a device, equipment and a storage medium, which relate to the technical field of data processing, and the method comprises the following steps: determining M data groups of a first attribute data set, and for each data group, sending a plurality of data identifier sets and first ciphertexts of the data group to a second server, wherein the second server holds a second attribute data set of a data object, the second server is used for determining a plurality of second ciphertexts of the data group, each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identifier set in the plurality of data identifier sets, the first ciphertexts and the second attribute data set, one data identifier set is composed of data identifiers corresponding to unmarked first attribute data randomly selected by the first server from the data group, the plurality of second ciphertexts of the data group sent by the second server are received, and an aggregation result of the second attribute data of the data group is obtained according to the aggregation algorithm, a private key and the plurality of second ciphertexts of the data group.

Description

Data processing method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a data processing method, a data processing device, data processing equipment and a storage medium.

Background

Currently, in many data processing application scenarios, data is stored and maintained independently between different data parties (e.g. between different organizations or between different departments of the same organization), and different data parties often hold different types of data of the same object, for example, a data party holds the name of a data object, and B data party holds some attribute score of the data object.

Grouping aggregation refers to splitting a data set into groups according to some criteria, applying some aggregation function or method to each group, and integrating the resulting new values into a result object. Due to factors such as privacy protection and data protection, data scattered on different data sides cannot be directly gathered together for grouping and aggregation. When grouping and aggregating data of two data parties to perform joint data statistics, the data of each data party needs privacy protection and cannot be revealed to the other party.

Therefore, a data processing method is needed that can protect the data privacy of the data parties by grouping and aggregating different types of data objects held by two data parties respectively on the premise of protecting the privacy of the data of both parties.

Disclosure of Invention

The application provides a data processing method, a data processing device, data processing equipment and a storage medium, which are used for realizing grouping and aggregation by using different types of data objects respectively held by two data parties, protecting the data privacy of the two data parties and improving the safety of data processing.

In a first aspect, the present application provides a data processing method, which is applied to a first server, where the first server holds a first attribute data set of a data object, and includes:

determining M data groups of the first attribute data set, wherein each data group comprises the same first attribute data and a data identifier corresponding to each first attribute data, and M is a positive integer;

for each data group in the M data groups, sending a plurality of data identifier sets of the data group and a first ciphertext encrypted by using a public key to a second server, where the second server holds a second attribute data set of the data object, the second server is configured to determine a plurality of second ciphertexts of the data group, each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identifier set in the plurality of data identifier sets, the first ciphertext and the second attribute data set, and one data identifier set is composed of data identifiers corresponding to unmarked first attribute data randomly selected by the first server from the data group;

sequentially receiving a plurality of second ciphertexts of the data group sent by the second server;

and obtaining an aggregation result of the second attribute data of the data group according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data group.

In a second aspect, the present application provides a data processing method, which is applied to a second server holding a second attribute data set of a data object, and includes:

receiving a plurality of data identifier sets of a data group sent by a first server and a first ciphertext encrypted by using a public key, wherein the first server holds a first attribute data set of the data object, the data group is any one of M data groups of the first attribute data set, the data group comprises identical first attribute data and a data identifier corresponding to each first attribute data, M is a positive integer, and one data identifier set consists of data identifiers corresponding to unmarked first attribute data randomly selected from the data group by the first server;

determining a plurality of second ciphertexts of the data group, wherein each second cipher text is determined by the second server according to a preset aggregation algorithm, one data identifier set in the plurality of data identifier sets, the first cipher text and the second attribute data set;

and sequentially sending the plurality of second ciphertexts of the data group to the first server, wherein the plurality of second ciphertexts are used for obtaining the aggregation result of the second attribute data of the data group by the first server according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data group.

In a third aspect, the present application provides a data processing apparatus holding a first set of attribute data for a data object, the apparatus comprising:

a determining module, configured to determine M data groups of the first attribute data set, where each data group includes the same first attribute data and a data identifier corresponding to each first attribute data, and M is a positive integer;

an obtaining module, configured to send, to a second server, multiple data identifier sets of the data group and a first ciphertext encrypted by using a public key for each data group of the M data groups, where the second server holds a second attribute data set of the data object, the second server is configured to determine multiple second ciphertexts of the data group, each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identifier set of the multiple data identifier sets, the first ciphertext, and the second attribute data set, and one data identifier set is composed of data identifiers corresponding to unmarked-state first attribute data randomly selected by the first server from the data groups;

the receiving module is used for sequentially receiving a plurality of second ciphertexts of the data group sent by the second server;

and the processing module is used for obtaining an aggregation result of the second attribute data of the data group according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data group.

In a fourth aspect, the present application provides a data processing apparatus holding a second set of attribute data for a data object, the apparatus comprising:

a receiving module, configured to receive multiple data identifier sets of a data group sent by a first server and a first ciphertext encrypted by using a public key, where the first server holds a first attribute data set of the data object, the data group is any one of M data groups of the first attribute data set, the data group includes the same first attribute data and a data identifier corresponding to each first attribute data, M is a positive integer, and one data identifier set is composed of data identifiers corresponding to unmarked first attribute data randomly selected by the first server from the data group;

a determining module, configured to determine a plurality of second ciphertexts of the data group, where each second cipher text is determined by the second server according to a preset aggregation algorithm, one of the plurality of data identifier sets, the first cipher text, and the second attribute data set;

and the sending module is used for sequentially sending the plurality of second ciphertexts of the data group to the first server, and obtaining an aggregation result of the second attribute data of the data group by the first server according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data group.

In a fifth aspect, the present application provides a data processing apparatus comprising: a processor and a memory, the memory for storing a computer program, the processor for invoking and executing the computer program stored in the memory to perform the method of the first aspect or the second aspect.

In a sixth aspect, the present application provides a computer readable storage medium for storing a computer program for causing a computer to perform the method of the first or second aspect.

In a seventh aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of the first or second aspect.

In summary, in the present application, a first server determines M data groups of a first attribute data set, and for each data group of the M data groups, a plurality of second ciphertexts of the data group are obtained from a second server, where each second cipher text is determined by the second server according to a preset aggregation algorithm, a data identifier set, a first cipher text encrypted by the first server using a public key, and a second attribute data set, and each data identifier set is composed of data identifiers corresponding to unmarked first attribute data randomly selected by the first server from the data group, and finally, according to the aggregation algorithm, a private key, and the plurality of second cipher texts of the data group, an aggregation result of the second attribute data of the data group is obtained. Because the first server and the second server are interacted with each other by the encrypted first ciphertext and the encrypted second ciphertext, and the data identifier corresponding to the first attribute data of each data group is not sent to the second server at one time, but a plurality of data identifiers (namely, data identifier sets) corresponding to the first attribute data are randomly selected for multiple times and each time and sent to the second server, the second server cannot acquire grouping information of the first attribute data set, the first server cannot acquire details of the second attribute data set, and the first server can acquire an aggregation result of the second attribute data of each data group in the M data groups. Therefore, the purpose of grouping and aggregating is achieved, the data privacy of two data parties is protected, and the safety of data processing is improved.

Further, in the application, by using a homomorphic encryption algorithm, the correctness of a packet aggregation result and the data security of an interaction process can be further ensured.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a data processing system according to an embodiment of the present application;

fig. 2 is a schematic diagram of a data processing process according to an embodiment of the present application;

fig. 3 is an interaction flowchart of a data processing method according to an embodiment of the present application;

fig. 4 is an interaction flowchart of a data processing method according to an embodiment of the present application;

fig. 5 is a flowchart illustrating a data processing method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic block diagram of a data processing apparatus 700 provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Before the technical scheme of the application is introduced, the related knowledge of the application is introduced as follows:

1. cloud computing (cloud computing) is a computing mode that distributes computing tasks over a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.

According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

2. Database (Database), which can be regarded as an electronic file cabinet in short, a place for storing electronic files, a user can add, query, update, delete, etc. to data in files. A "database" is a collection of data that is stored together in a manner that can be shared by multiple users, has as little redundancy as possible, and is independent of the application. A Database Management System (DBMS) is a computer software System designed for managing a Database, and generally has basic functions of storage, interception, security assurance, backup, and the like. The database management system can make classification according to the database model supported by it, such as relational expression, XML (Extensible Markup Language); or classified according to the type of computer supported, e.g., server cluster, mobile phone; or sorted according to the Query Language used, such as SQL (Structured Query Language), XQuery, or sorted according to performance impulse emphasis, such as max size, maximum operating speed, or other sorting.

3. Homomorphic encryption is based on cryptography, and processes homomorphic encrypted data to obtain an output, and decrypts the output, and the result is the same as the output result obtained by processing unencrypted original data by the same method.

4. A packet, which may also be referred to as a data packet, is a grouping of data in a data table in a database by a column or a row.

Based on the above description, please exemplarily refer to fig. 1, fig. 1 is a schematic architecture diagram of a data processing system provided in an embodiment of the present application, where the data processing system includes a first server 10 and a second server 20, and the first server 10 and the second server 20 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

The first server 10 or the second server 20 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform.

Optionally, the first server 10 or the second server 20 in this embodiment may also be any other computing device with computing capability, for example, a terminal. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like.

In an application scenario of the embodiment of the present application, on the premise that the first server 10 and the second server 20 cannot acquire attribute data of a data object held by the other side, the data object is divided into a plurality of data groups according to the first attribute data set of the data object, and the second attribute data of the data object in each data group is aggregated according to a preset aggregation algorithm, so as to obtain an aggregation result of the second attribute data of each data group. The first attribute data set is a set formed by the first attribute data, and the second attribute data set is a set formed by the second attribute data.

In some embodiments, the data object may be a user object, and in different application scenarios, the data object may also be other types of data objects, such as a commodity object, an order object, and the like, and each data object may specifically be represented by an identifier of the object. In the following embodiments of the present application, a scene in which a data object is a user object is taken as an example, and for other types of data objects, similar to the user object, details are not repeated.

Illustratively, as shown in Table one below, Table one is a first attribute data set (a1-ai) of data objects stored by a first server, the first attribute data in the first attribute data set corresponding to the data identification ID1-IDN, and Table two is a second attribute data set (b1-bN) of data objects stored by a second server, the second attribute data in the second attribute data set also corresponding to the data identification ID1-IDN, that is, the first attribute data set and the second attribute data set correspond to the same set of data identifications.

Watch 1

Watch two

Data identification	Second attribute data set
		ID1	b1
ID2	b2
		……	……
IDN	bN

More specifically, for example, taking the data object as a student, the first server 10 stores the name of the student (i.e. the first attribute data set of the student), which is as follows:

name of student

Data identification	Name (I)
		1	a
2	a
		3	b
4	b
		5	a
6	b
		7	c
8	c
		9	a

The second server 20 stores the examination results of the students (i.e., the second attribute data set of the students), which are shown in the following table four:

examination score of students in table four

Wherein the first attribute data set of the student and the second attribute data set of the student correspond to the same set of data identifications (i.e., 1-9). In this embodiment of the application, on the premise that the first server 10 and the second server 20 cannot obtain the attribute data of the data object held by the other party, the data object is divided into a plurality of data groups according to the first attribute data set of the data object (i.e., divided into three data groups, namely, student a, student b, and student c according to the name of the student), and the second attribute data of the data object in each data group is aggregated according to a preset aggregation algorithm, for example, the aggregation algorithm is summation, and the score sum of each student is calculated to obtain the aggregation result of the second attribute data of each data group, that is, the score sum of each student can be obtained. Fig. 2 is a schematic diagram of a data processing process provided by an embodiment of the present application, and as shown in fig. 2, the embodiment of the present application implements that a first attribute data set is divided into three data groups of student a, student b, and student c according to names of students, and then scores of each student are summed to obtain an aggregated result of second attribute data of each data group, that is, a score sum of each student shown in fig. 2, a score sum of student a is 340, a score sum of student b is 260, and a score sum of student c is 130.

It should be noted that the tables shown in the first and second tables are only simple examples, and the same data object may further include more attribute data sets, for example, a first server holds two attribute data sets of the data object, and a second server holds three attribute data sets of the data object, which is not limited in this embodiment of the present application.

In order to solve the technical problem, the grouping aggregation is performed by using different types of data objects respectively held by two data parties, so that the data privacy of the two data parties is protected, and the data processing safety is improved. In the embodiment of the present application, a first server 10 holds a first attribute data set of a data object, a second server 20 holds a second attribute data set of the same data object, in the present application, the first server 10 determines M data sets of the first attribute data set first, for each data group in the M data groups, obtaining a plurality of second ciphertexts of the data group from the second server 20, where each second cipher text is determined by the second server 20 according to a preset aggregation algorithm, a data identifier set, a first cipher text encrypted by the first server 10 using a public key, and a second attribute data set, each data identifier set is composed of data identifiers corresponding to unmarked first attribute data randomly selected by the first server 10 from the data group, and finally, according to the aggregation algorithm, the private key, and the plurality of second cipher texts of the data group, obtaining an aggregation result of the second attribute data of the data group. Because the first server 10 and the second server 20 interact with each other by the encrypted first ciphertext and the encrypted second ciphertext, and the data identifier corresponding to the first attribute data of each data group is not sent to the second server at one time, but a plurality of data identifiers (i.e., a data identifier set) corresponding to the first attribute data are randomly selected to be sent to the second server 20 multiple times and each time, the second server 20 cannot know the grouping information of the first attribute data set, the first server 10 cannot know the details of the second attribute data set, and the aggregation result of the first server 10 acquiring the second attribute data of each data group in the M data groups is realized. Therefore, the purpose of grouping and aggregating is achieved, the data privacy of two data parties is protected, and the safety of data processing is improved.

The technical scheme of the application is explained in detail as follows:

fig. 3 is an interaction flowchart of a data processing method according to an embodiment of the present application, in which a first server holds a first attribute data set of a data object, and a second server holds a second attribute data set of the same data object, as shown in fig. 3, the method includes the following steps:

s101, the first server determines M data groups of the first attribute data set, each data group comprises the same first attribute data and a data identifier corresponding to each first attribute data, and M is a positive integer.

Specifically, in the embodiment of the present application, the first attribute data set and the second attribute data set correspond to the same group of data identifiers. If the first server holds the first attribute data set of the data object and the second server holds the second attribute data set of the same data object, and the two attribute data sets do not intersect with each other, optionally, before S101, the method of this embodiment may further include:

the first server generates a data identifier set corresponding to both the first attribute data set and the second attribute data set, and the second server also generates the data identifier set. For example, after passing through secure Join, the first server and the second server generate a set of data identifications corresponding to both the first attribute data set and the second attribute data set. For example, it may be a virtual table T, where the table T only contains the intersection part of the first attribute data set and the second attribute data set, and each record of the virtual table T is connected by the Id column of the first attribute data set and the second attribute data set (may be a single Id or a joint Id). On the basis of the table T, the method of the present embodiment is performed for federal grouping, and aggregation operation is performed within the grouping.

The first server determines M data groups of the first attribute data set, each data group including the same first attribute data and a data identifier corresponding to each first attribute data, and taking the first attribute data set shown in table one as an example, the data identifiers ID1 and ID2 corresponding to the first attribute data a1 and two a1 are one data group.

S102, the first server sends a plurality of data identification sets of the data groups and a first ciphertext encrypted by using a public key to the second server for each data group in the M data groups, wherein one data identification set is composed of data identifications corresponding to first attribute data which are randomly selected by the first server from the data groups and are in an unmarked state.

Specifically, for each of the M data sets, through S102 to S105, an aggregation result of the second attribute data of each data set may be obtained.

In an implementable manner, for each of the M data groups, sending, to the second server, a plurality of data identification sets of the data group and a first ciphertext encrypted using a public key may specifically include:

s1021, the first server determines one data identifier set in the multiple data identifier sets through a first mode, wherein the first mode is as follows: randomly selecting a plurality of first attribute data in an unmarked state from the data group, and forming a data identification set D by data identifications corresponding to the first attribute data in the unmarked state.

Optionally, the first server randomly selects a plurality of first attribute data in an unmarked state from the data group, and the first attribute data may be:

the first server determines a random number r according to a value range of a preset random number, and randomly selects r first attribute data in an unmarked state from a data set.

For example, the range of values of the random number may be: 80< r < 100. When the data identification set is determined each time, the random number needs to be refreshed, and the data security can be further ensured.

S1022, the first server sends D and the first ciphertext to the second server.

S1023, when the first server receives a second ciphertext sent by the second server, setting the first attribute data corresponding to the data identifier in the D in the first attribute data set as a mark state, and determining that the D meets the preset condition for the second server by using a second ciphertext.

S1024, if the first server determines that the first attribute data in the unmarked state still exists in the data group, one data identification set in the multiple data identification sets is continuously determined through the first mode, and D and the first ciphertext are sent to the second server until the first attribute data in the unmarked state does not exist in the data group or the data identification set meeting the preset condition does not exist.

Optionally, the method of this embodiment may further include: and the first server receives a first ciphertext sent by the second server, wherein the first ciphertext is sent by the second server when the second server determines that the D does not meet the preset condition. The first server may not process the first ciphertext when it receives the first ciphertext.

Optionally, the preset condition may be: and D, whether the number N of the data identifications is larger than preset N or not.

And n is a preset positive integer greater than 1, wherein n is set to be greater than 1, so that second attribute data of the data object stored in the second server can be prevented from being leaked to the first server.

S103, the second server determines a plurality of second ciphertexts of the data group, wherein each second cipher text is determined by the second server according to a preset aggregation algorithm, one data identification set of the plurality of data identification sets, the first cipher text and the second attribute data set.

The preset aggregation algorithm may be any one of summation (Sum), maximization (Max), minimization (Min), averaging (Avg), and counting (Count), and may also be a user-defined aggregation function (UDAF).

Optionally, the aggregation algorithm in this embodiment may be preset in both the first server and the second server, or may be preset in the first server, and the first server may send the required aggregation algorithm to the second server at the same time when sending D and the first ciphertext to the second server each time.

Optionally, the determining, by the second server, a plurality of second ciphertexts of the data group may specifically include:

and S1031, respectively determining a second ciphertext corresponding to each data identifier set D in the multiple data identifier sets in a target manner, wherein the target manner includes: and when the fact that the D meets the preset condition is determined, determining a second ciphertext according to the aggregation algorithm, the D, the second attribute data set and the first ciphertext.

Optionally, determining a second ciphertext according to the aggregation algorithm, D, the second attribute data set, and the first ciphertext may specifically be:

and searching second attribute data corresponding to the data identifier belonging to D from the second attribute data set, aggregating the second attribute data corresponding to the data identifier belonging to D according to an aggregation algorithm to obtain a first aggregation result, and encrypting the first aggregation result according to the first ciphertext, the public key and the calculation logic corresponding to the aggregation algorithm to obtain a second ciphertext.

Specifically, for example, the first ciphertext is [ state ], the first aggregation result is delta, the second ciphertext is merge ([ state ], delta), merge is a calculation logic corresponding to the aggregation algorithm, if the aggregation algorithm is Sum (Sum), the calculation logic of merge is Sum, and the corresponding second ciphertext is [ state ] + delta; if the aggregation algorithm is Max (Max), the computation logic of merge is Max, and the corresponding second ciphertext? Delta [ state ] is the maximum value of [ state ] and delta. Because the state is a homomorphic ciphertext and the second ciphertext subjected to aggregation operation with the homomorphic ciphertext is also the homomorphic ciphertext, when the second server sends the second ciphertext to the first server, the second ciphertext can be prevented from being leaked to a third party.

And S1032, determining the second ciphertexts corresponding to the data identifier sets respectively as a plurality of second ciphertexts of the data group.

S104, the second server sequentially sends the plurality of second ciphertexts of the data group to the first server.

Optionally, the method of this embodiment may further include: and when the D is determined not to meet the preset condition, sending a first ciphertext to the first server.

S105, the first server obtains an aggregation result of the second attribute data of the data group according to the aggregation algorithm, the private key and the plurality of second ciphertexts of the data group.

In an implementable manner, S105 may specifically include:

s1051, respectively decrypting the plurality of second ciphertexts of the data group by using the private key to obtain a plurality of first aggregation results of the data group.

And S1052, aggregating the plurality of first aggregation results by using an aggregation algorithm to obtain an aggregation result of the second attribute data of the data group.

Optionally, the method of this embodiment may further include: the first server generates a public key and a private key and sends the public key to the second server.

Optionally, the first ciphertext is a homomorphic encrypted ciphertext, and a value of the first ciphertext is preset according to an aggregation algorithm. Accordingly, the second ciphertext is also the homomorphic encrypted ciphertext. For example, the value of the first ciphertext may be set to 0 when the aggregation algorithm is Sum (Sum), average (Avg), or Count (Count), may be set to a smaller value when the aggregation algorithm is Max, and may be set to a larger value when the aggregation algorithm is Min. When the aggregation algorithm is a user-defined aggregation function (UDAF), the value of the first ciphertext may be set according to the UDAF.

In this embodiment, by using a homomorphic encryption algorithm, the correctness of the packet aggregation result and the data security can be further ensured.

In the data processing method provided by this embodiment, a first server determines M data groups of a first attribute data set, and for each data group of the M data groups, a plurality of second ciphertexts of the data group are obtained from a second server, where each second cipher text is determined by the second server according to a preset aggregation algorithm, a data identifier set, a first cipher text encrypted by the first server using a public key, and a second attribute data set, and each data identifier set is composed of data identifiers corresponding to unmarked first attribute data randomly selected by the first server from the data group, and finally, according to the aggregation algorithm, a private key, and the plurality of second cipher texts of the data group, an aggregation result of the second attribute data of the data group is obtained. Because the first server and the second server are interacted with each other by the encrypted first ciphertext and the encrypted second ciphertext, and the data identifier corresponding to the first attribute data of each data group is not sent to the second server at one time, but a plurality of data identifiers (namely, data identifier sets) corresponding to the first attribute data are randomly selected for multiple times and each time and sent to the second server, the second server cannot acquire grouping information of the first attribute data set, the first server cannot acquire details of the second attribute data set, and the first server can acquire an aggregation result of the second attribute data of each data group in the M data groups. Therefore, the purpose of grouping and aggregating is achieved, the data privacy of two data parties is protected, and the safety of data processing is improved.

In the embodiment, in a multi-party scenario, for example, there are N participants, which can be converted into N-1 two-party scenarios to solve the problem. Specifically, for example, assume that there are 4 parties, and the grouping column belongs to party a, the parties being party a, party B, party C, and party D, respectively. The current 4-party scene can be converted into 3 two-party scenes, namely, the A party and the B party, the A party and the C party, and the A party and the D party, and each two-party scene can be subjected to grouping aggregation based on privacy protection by using the data processing method.

The data processing method provided by the present application is described in detail below with reference to a specific embodiment.

Fig. 4 is an interaction flowchart of a data processing method provided in an embodiment of the present application, and fig. 5 is a flowchart of a data processing method provided in an embodiment of the present application, as shown in fig. 5, in this embodiment, a first server stores a first attribute data set (a1-ai) of a data object, a second server stores a second attribute data set (b1-bn) of the same data object, the second attribute data set and the first attribute data set have the same data identifier (Id1-Idn), an encryption algorithm in this embodiment uses a homomorphic encryption algorithm, as shown in fig. 4, the method may include the following steps:

s301, the first server generates a homomorphic encryption public key and a private key and sends the homomorphic encryption public key to the second server.

S302, the first server determines M data groups of the first attribute data set, each data group comprises the same first attribute data and a data identifier corresponding to each first attribute data, and an aggregation result of second attribute data of each data group is obtained through the following steps S3021-S3032.

For example, the data identifiers ID1, ID2 and ID3 corresponding to the first attribute data being a1 and two a1 are one data group.

And S3021, the first server determines a random number r according to a preset value range of the random number.

Specifically, for example, 80< r < 100.

S3022, the first server randomly selects r pieces of first attribute data in an unmarked state from the target data group of the first attribute data set, and forms data identifiers corresponding to the r pieces of first attribute data in the unmarked state into a data identifier set D.

Specifically, the target data group is any one of M data groups. For example, 50 first attribute data in a data group, the first attribute data of the data group are all users a, and there are 50 data identifications (id1-id50), each time the random number r is determined, the random number r determined each time may be different, for example, the random number r determined for the first time is 5, 5 first attribute data in an unmarked state (i.e., unselected first attribute data) are randomly selected from the 50 first attribute data, and the data identifications corresponding to the 5 first attribute data are combined into a data identification set D, for example, D includes id1, id2, id5, id7, and id 9. And in the second selection, selecting from the data identifications corresponding to the remaining 45 first attribute data.

S3023, the first server sends the D and the first ciphertext encrypted by using the homomorphic encryption public key to the second server.

The value of the first ciphertext may be set according to a preset aggregation algorithm, for example, when the aggregation algorithm is Sum (Sum), average (Avg), or Count (Count), the value of the first ciphertext may be set to 0, when the aggregation algorithm is Max, the value of the first ciphertext may be set to a smaller value, and when the aggregation algorithm is Min, the value of the first ciphertext may be set to a larger value. When the aggregation algorithm is a user-defined aggregation function (UDAF), the value of the first ciphertext may be set according to the UDAF.

S3024, the second server determines whether D meets a preset condition, that is, determines whether the number N of the data identifiers in D is greater than preset N.

And n is a preset positive integer larger than 1, wherein n is set to be larger than 1, so that the second attribute data of the data object of the second server can be prevented from being leaked to the first server.

If the N is larger than N, executing S3025; if it is determined that N is less than or equal to N, S3029 described below is performed.

S3025, the second server searches for the second attribute data corresponding to the data identifier belonging to D from the second attribute data set.

Specifically, as shown in fig. 5, for example, Id1, Id2, and Id3 are included in D, second attribute data corresponding to Id1, Id2, and Id3, for example, second attribute data corresponding to Id1, Id2, and Id3 are b1, b2, and b3, respectively, are searched from the second attribute data set.

And S3026, the second server aggregates the second attribute data corresponding to the data identifier belonging to the D according to a preset aggregation algorithm to obtain a first aggregation result.

S3027, the second server encrypts the first aggregation result according to the first ciphertext, the public key and the calculation logic corresponding to the aggregation algorithm to obtain a second ciphertext.

And S3028, the second server sends the second ciphertext and the first indication information to the first server, and the first indication information is used for indicating that D meets a preset condition.

And S3029, the second server sends the first ciphertext and second indication information to the first server, wherein the second indication information is used for indicating that D does not meet the preset condition.

S3030, if the first server receives the second ciphertext and the first indication information, setting the first attribute data corresponding to the data identifier in the D in the first attribute data set as a mark state.

Specifically, by setting the first attribute data corresponding to the data identifier in D as the flag state, it can be indicated that the first attribute data corresponding to the data identifier in D has been selected.

S3031, if it is determined that the first attribute data in the unmarked state still exists in the target data group, the first server continues to execute S3021 to S3030 until the first attribute data in the unmarked state does not exist in the target data group or the data identifier set meeting the preset condition does not exist.

S3032, the first server decrypts the plurality of second ciphertexts of the target data group respectively by using the private key to obtain a plurality of first aggregation results of the data group, and aggregates the plurality of first aggregation results by using an aggregation algorithm to obtain an aggregation result of the second attribute data of the target data group.

Specifically, through S3031, the first server may receive all the second ciphertexts of the target data group, for example, 10 received second ciphertexts of the target data group are provided, each second cipher text corresponds to a first aggregation result, the first server decrypts the second cipher texts by using the homomorphic encryption private key to obtain a first aggregation result, the first server decrypts the 10 second cipher texts by using the homomorphic encryption private key respectively to obtain 10 first aggregation results, and if the preset aggregation algorithm is Sum (Sum), the first server adds the 10 first aggregation results to obtain an aggregation result of the second attribute data of the target data group. If the preset aggregation algorithm is Max (Max), the first server may determine the maximum aggregation result of the 10 first aggregation results as the aggregation result of the second attribute data of the target data group.

Optionally, in this embodiment, the second server and the first server may preset the same aggregation algorithm. The first server may preset an aggregation algorithm, and before or when D is sent to the second server, the aggregation algorithm is sent to the second server, so that the two sides are subjected to the same aggregation processing.

In summary, through the above-described S3021 to S3032, the aggregation result of the second attribute data of each packet in the first attribute data set can be obtained.

In the data processing method provided by this embodiment, because the first server and the second server interact with each other by using the encrypted first ciphertext and the encrypted second ciphertext, and the data identifier corresponding to the first attribute data of each data group is not sent to the second server at one time, but a plurality of data identifiers (i.e., a data identifier set) corresponding to the first attribute data are randomly selected and sent to the second server multiple times at each time, the second server cannot know the grouping information of the first attribute data set, the first server cannot know the details of the second attribute data set, and the first server obtains the aggregation result of the second attribute data of each data group in the M data groups. Therefore, the purpose of grouping and aggregating is achieved, the data privacy of two data parties is protected, and the safety of data processing is improved.

Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, where the data processing apparatus holds a first attribute data set of a data object, and as shown in fig. 6, the apparatus may include: a determination module 11, a sending module 12, a receiving module 13 and a processing module 14, wherein,

the determining module 11 is configured to determine M data groups of the first attribute data set, where each data group includes the same first attribute data and a data identifier corresponding to each first attribute data, and M is a positive integer;

the sending module 12 is configured to send, to a second server, multiple data identifier sets of a data group and a first ciphertext encrypted by using a public key for each data group of M data groups, where the second server holds a second attribute data set of a data object, the second server is configured to determine multiple second ciphertexts of the data group, each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identifier set of the multiple data identifier sets, the first ciphertext, and the second attribute data set, and the one data identifier set is composed of data identifiers corresponding to unmarked state first attribute data randomly selected by the first server from the data group;

the receiving module 13 is configured to sequentially receive a plurality of second ciphertexts of the data set sent by the second server;

the processing module 14 is configured to obtain an aggregation result of the second attribute data of the data group according to the aggregation algorithm, the private key, and the plurality of second ciphertexts of the data group.

Optionally, the sending module 12 is configured to: determining one data identifier set in a plurality of data identifier sets in a first mode, wherein the first mode is as follows: randomly selecting a plurality of first attribute data in an unmarked state from the data group, and forming data identifications corresponding to the first attribute data in the unmarked state into a data identification set D;

sending the D and the first ciphertext to a second server;

when a second ciphertext sent by a second server is received, setting first attribute data corresponding to the data identifier in the D in the first attribute data set as a mark state, and determining that the D meets a preset condition for the second server by using a second ciphertext as the second server;

if it is determined that the first attribute data in the unmarked state still exists in the data group, one data identifier set in the multiple data identifier sets is continuously determined in the first mode, and the D and the first ciphertext are sent to the second server until the first attribute data in the unmarked state does not exist in the data group or the data identifier set meeting the preset condition does not exist in the data group.

Optionally, the receiving module 13 is further configured to:

and receiving a first ciphertext sent by the second server, wherein the first ciphertext is sent by the second server when the second server determines that the D does not meet the preset condition.

Optionally, the sending module 12 is specifically configured to: determining a random number r according to a value range of a preset random number;

and randomly selecting r first attribute data in an unmarked state from the data group.

Optionally, the sending module 12 is further configured to: and generating a public key and a private key and sending the public key to the second server.

Optionally, the processing module 14 is configured to decrypt, using a private key, the multiple second ciphertexts of the data group respectively to obtain multiple first aggregation results of the data group;

and aggregating the plurality of first aggregation results by using an aggregation algorithm to obtain an aggregation result of the second attribute data of the data group.

Optionally, the first ciphertext is a homomorphic encrypted ciphertext, and a value of the first ciphertext is preset according to an aggregation algorithm.

Fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, where the data processing apparatus holds a second attribute data set of a data object, and as shown in fig. 7, the apparatus may include: a receiving module 21, a determining module 22 and a sending module 23, wherein,

the receiving module 21 is configured to receive multiple data identifier sets of a data group sent by a first server and a first ciphertext encrypted by using a public key, where the first server holds a first attribute data set of a data object, the data group is any one of M data groups of the first attribute data set, the data group includes the same first attribute data and a data identifier corresponding to each first attribute data, M is a positive integer, and one data identifier set is composed of data identifiers corresponding to unmarked first attribute data randomly selected by the first server from the data group;

the determining module 22 is configured to determine a plurality of second ciphertexts of the data group, where each second cipher text is determined by the second server according to a preset aggregation algorithm, one data identifier set of the plurality of data identifier sets, the first cipher text, and the second attribute data set;

the sending module 23 is configured to send the plurality of second ciphertexts of the data group to the first server in sequence, and is configured to obtain an aggregation result of the second attribute data of the data group by the first server according to the aggregation algorithm, the private key, and the plurality of second ciphertexts of the data group.

Optionally, the determining module 22 is configured to: respectively determining a second ciphertext corresponding to each data identification set D in the plurality of data identification sets in a target mode, wherein the target mode comprises the following steps: when the fact that the D meets the preset condition is determined, determining a second ciphertext according to the aggregation algorithm, the D, the second attribute data set and the first ciphertext;

and determining the second ciphertexts respectively corresponding to the data identification sets as a plurality of second ciphertexts of the data group.

Optionally, the determining module 22 is specifically configured to: searching second attribute data corresponding to the data identifier belonging to D from the second attribute data set;

according to an aggregation algorithm, aggregating second attribute data corresponding to the data identification belonging to the D to obtain a first aggregation result;

and encrypting the first aggregation result according to the first ciphertext, the public key and the calculation logic corresponding to the aggregation algorithm to obtain a second ciphertext.

Optionally, the sending module 23 is further configured to: and when the D is determined not to meet the preset condition, sending a first ciphertext to the first server.

It is to be understood that apparatus embodiments and method embodiments may correspond to one another and that similar descriptions may refer to method embodiments. To avoid repetition, further description is omitted here. Specifically, the apparatus shown in fig. 6 may execute the method embodiment corresponding to the first server, the apparatus shown in fig. 7 may execute the method embodiment corresponding to the second server, and the foregoing and other operations and/or functions of each module in the apparatus are respectively for implementing the method embodiment corresponding to the data processing device, and are not described herein again for brevity.

The data processing apparatus of the embodiments of the present application is described above in connection with the drawings from the perspective of functional modules. It should be understood that the functional modules may be implemented by hardware, by instructions in software, or by a combination of hardware and software modules. Specifically, the steps of the method embodiments in the present application may be implemented by integrated logic circuits of hardware in a processor and/or instructions in the form of software, and the steps of the method disclosed in conjunction with the embodiments in the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, electrically erasable programmable memory, registers, and the like, as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps in the above method embodiments in combination with hardware thereof.

As shown in fig. 8, the data processing apparatus 700 may include:

a memory 710 and a processor 720, the memory 710 being configured to store a computer program and to transfer the program code to the processor 720. In other words, the processor 720 may call and run a computer program from the memory 710 to implement the method in the embodiment of the present application.

For example, the processor 720 may be configured to perform the above-described method embodiments according to instructions in the computer program.

In some embodiments of the present application, the processor 720 may include, but is not limited to:

general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like.

In some embodiments of the present application, the memory 710 includes, but is not limited to:

volatile memory and/or non-volatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), SLDRAM (Synchronous link DRAM), and Direct Rambus RAM (DR RAM).

In some embodiments of the present application, the computer program can be divided into one or more modules, which are stored in the memory 710 and executed by the processor 720 to perform the methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, the instruction segments being used to describe the execution of the computer program in the data processing apparatus.

As shown in fig. 8, the data processing apparatus may further include:

a transceiver 730, the transceiver 730 being connectable to the processor 720 or the memory 710.

The processor 720 may control the transceiver 730 to communicate with other devices, and specifically, may transmit information or data to the other devices or receive information or data transmitted by the other devices. The transceiver 730 may include a transmitter and a receiver. The transceiver 730 may further include an antenna, and the number of antennas may be one or more.

It will be appreciated that the various components in the data processing device are connected by a bus system which includes, in addition to a data bus, a power bus, a control bus and a status signal bus.

The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. In other words, the present application also provides a computer program product containing instructions, which when executed by a computer, cause the computer to execute the method of the above method embodiments.

When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application occur, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the module is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and all the changes or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method applied to a first server holding a first set of attribute data of a data object, the method comprising:

2. The method of claim 1, wherein sending, to a second server, for each of the M data groups, a plurality of sets of data identifications for the data group and a first ciphertext encrypted using a public key comprises:

determining one of the plurality of data identifier sets in a first manner, where the first manner is: randomly selecting a plurality of first attribute data in an unmarked state from the data group, and forming data identifications corresponding to the first attribute data in the unmarked state into a data identification set D;

sending the D and the first ciphertext to the second server;

when a second ciphertext sent by the second server is received, setting first attribute data corresponding to the data identifier in the D in the first attribute data set as a mark state, and determining that the D meets a preset condition for the second server by using the second ciphertext as the second server;

if it is determined that the first attribute data in the unmarked state still exists in the data group, one data identifier set in the plurality of data identifier sets is continuously determined through the first mode, and the D and the first ciphertext are sent to a second server until the first attribute data in the unmarked state does not exist in the data group or the data identifier set meeting the preset condition does not exist.

3. The method of claim 2, further comprising:

and receiving the first ciphertext sent by the second server, wherein the first ciphertext is sent by the second server when the D is determined not to meet the preset condition.

4. The method of claim 2, wherein randomly selecting the first attribute data of the plurality of unmarked states from the data set comprises:

determining a random number r according to a value range of a preset random number;

and randomly selecting the first attribute data of the r unmarked states from the data group.

5. The method of claim 2, further comprising:

and generating a public key and the private key, and sending the public key to the second server.

6. The method of claim 1, wherein obtaining an aggregation result of second attribute data of the data set according to the aggregation algorithm, a private key, and a plurality of second ciphertexts of the data set comprises:

decrypting the plurality of second ciphertexts of the data group respectively by using the private key to obtain a plurality of first aggregation results of the data group;

and aggregating the plurality of first aggregation results by using the aggregation algorithm to obtain an aggregation result of the second attribute data of the data group.

7. The method according to claim 1, wherein the first ciphertext is a homomorphic encrypted ciphertext, and a value of the first ciphertext is preset according to the aggregation algorithm.

8. A data processing method applied to a second server holding a second set of attribute data of a data object, the method comprising:

determining a plurality of second ciphertexts of the data group, wherein each second cipher text is determined by the second server according to a preset aggregation algorithm, one of the plurality of data identifier sets, the first cipher text and the second attribute data set;

9. The method of claim 8, wherein determining the plurality of second ciphertexts for the data set comprises:

respectively determining a second ciphertext corresponding to each data identifier set D in the plurality of data identifier sets in a target manner, wherein the target manner comprises: when the fact that the D meets a preset condition is determined, determining a second ciphertext according to the aggregation algorithm, the D, the second attribute data set and the first ciphertext;

and determining second ciphertexts respectively corresponding to the data identification sets as a plurality of second ciphertexts of the data group.

10. The method of claim 9, wherein determining a second ciphertext from the aggregation algorithm, the D, the second attribute data set, and the first ciphertext comprises:

searching second attribute data corresponding to the data identification belonging to the D from the second attribute data set;

according to the aggregation algorithm, aggregating second attribute data corresponding to the data identification belonging to the D to obtain a first aggregation result;

and encrypting the first aggregation result according to the first ciphertext, the public key and the calculation logic corresponding to the aggregation algorithm to obtain the second ciphertext.

11. The method of claim 8, wherein the first ciphertext is a homomorphic encrypted ciphertext, and wherein a value of the first ciphertext is pre-set according to the aggregation algorithm.

12. A data processing apparatus, wherein the data processing apparatus holds a first set of attribute data for a data object, the apparatus comprising:

a sending module, configured to send, to a second server, multiple data identifier sets of the data group and a first ciphertext encrypted using a public key for each data group of the M data groups, where the second server holds a second attribute data set of the data object, the second server is configured to determine multiple second ciphertexts of the data group, each second ciphertext is determined by the second server according to a preset aggregation algorithm, one data identifier set of the multiple data identifier sets, the first ciphertext, and the second attribute data set, and one data identifier set is composed of data identifiers corresponding to unmarked-state first attribute data randomly selected by the first server from the data groups;

13. A data processing apparatus, wherein the data processing apparatus holds a second set of attribute data for a data object, the apparatus comprising:

14. A data processing apparatus, characterized by comprising:

a processor and a memory, the memory for storing a computer program, the processor for invoking and executing the computer program stored in the memory to perform the method of any one of claims 1 to 7 or 8 to 11.

15. A computer-readable storage medium for storing a computer program which causes a computer to perform the method of any one of claims 1 to 7 or 8 to 11.