CN112560088A

CN112560088A - Knowledge federation-based data security exchange method and device and storage medium

Info

Publication number: CN112560088A
Application number: CN202011443118.3A
Authority: CN
Inventors: 韦达; 孟丹; 李宏宇; 李晓林
Original assignee: Tongdun Holdings Co Ltd
Current assignee: Tongdun Holdings Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-03-26
Anticipated expiration: 2040-12-11
Also published as: CN112560088B

Abstract

The invention provides a safe data exchange method, a safe data exchange device and a storage medium based on knowledge federation, which comprises the following steps of circularly executing until a preset stop condition is reached: training a participating end model by using the pre-processed training data by the participating end to obtain model data; the participating end obtains the contribution value of each parameter in the model data, and generates a mask for covering the parameter with the contribution value arranged at K bit; acquiring indexes of parameters with contribution values arranged before K bits, and combining the indexes and corresponding gradient data to form participation end aggregation data; the federated server receives the aggregation data of each participating terminal, aggregates the gradients in the aggregation data of all participating terminals through indexes to obtain the aggregation data of the server, and sends the aggregation data of the server to each participating terminal; each participating end carries out local aggregation according to the server side aggregated data and the participating end aggregated data to obtain final aggregated data; and each participating end updates the corresponding participating end model by using the obtained final aggregation data.

Description

Knowledge federation-based data security exchange method and device and storage medium

Technical Field

The invention relates to a data processing technology, in particular to a knowledge federation-based data security exchange method, a knowledge federation-based data security exchange device and a storage medium.

Background

With the beginning of public solicitation of personal information protection and care draft of the people's republic of China, the protection of enterprises on information security is more and more strict. In the past, data exchange is directly carried out among enterprises; the methods of acquiring data and applying the data are increasingly limited, and the data form "islands" under the limit of regulations.

Under the condition, enterprises increasingly develop data communication through methods such as MPC and federal learning, and legally acquire more data to improve the performance of the model on the premise of ensuring safety. However, adding various encryption schemes to the data exchange process, while protecting the security of personal information, greatly increases the communication and computational pressure. Often the amount of data encrypted is more than a hundred times the amount of source data. Training an identical model, a cryptographically computed model often requires tens of times the previous time and hardware resources. The huge demands on computing power, memory, bandwidth and electric quantity bring heavy burden to enterprises.

At present, in order to ensure the stability of the model effect and the safety of information exchange, the data volume needing to be exchanged among all the participating terminals is large, the data processing efficiency is low, and various costs are increased.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, and a storage medium for secure data exchange based on knowledge federation, which reduce the amount of data transmitted during data exchange at each participating end, improve data processing efficiency, and reduce each cost.

In a first aspect of the embodiments of the present invention, a knowledge federation-based data security exchange method is provided, including executing the following steps in a loop until a preset stop condition is reached:

training a participating end model by using the pre-processed training data by the participating end to obtain model data;

the participating end obtains the contribution value of each parameter in the model data, and generates a mask for covering the parameter with the contribution value arranged at K bit; acquiring indexes of parameters with contribution values arranged before K bits, and combining the indexes and corresponding gradient data to form participation end aggregation data;

the federated server receives the aggregation data of each participating terminal, aggregates the gradients in the aggregation data of all participating terminals by using the indexes to obtain the aggregation data of the server, and sends the aggregation data of the server to each participating terminal;

each participating terminal carries out local aggregation according to the server side aggregated data and the participating terminal aggregated data to obtain final aggregated data;

and each participating end updates the corresponding participating end model by using the obtained final aggregation data.

Optionally, in a possible implementation manner of the first aspect, the generating a mask of the parameter with the contribution value after K includes calculating a value of K according to the following formula:

K＝Sm％

wherein S is the total number of parameters, and m% is the percentage of the model hyper-parameters needing to be transmitted.

Optionally, in a possible implementation manner of the first aspect, after receiving the aggregated data of the respective participating terminals, the federated server includes:

and after receiving the aggregation data of all the participating terminals, the federated server aggregates the indexes of all the participating terminals to obtain an aggregation index, and then aggregates the gradients of all the participating terminals according to the aggregation index to obtain the aggregation data of the server.

Optionally, in a possible implementation manner of the first aspect, the index aggregation for all participating terminals is performed every third cycle.

Optionally, in a possible implementation manner of the first aspect, the aggregating the gradients of all participating terminals according to an aggregation index includes: and carrying out average aggregation processing on the gradients of all the participating terminals according to the aggregation indexes.

Optionally, in a possible implementation manner of the first aspect, the performing, by each participating end, local aggregation according to the server-side aggregated data and the participating-end aggregated data includes:

inquiring the aggregation data of the participating end and the index of the aggregation data of the server end, and performing the following operations according to the inquiry result:

regarding the indexes inquired in both the participating end aggregated data and the server end aggregated data, taking the gradient value corresponding to the index in the server end aggregated data as a new gradient value;

aiming at the index which is only inquired in the server side aggregated data, carrying out nonlinear processing on the gradient value corresponding to the index in the server side aggregated data to obtain a new gradient value;

and maintaining the gradient value unchanged for the indexes which are not inquired in the participating end aggregation data and the server end aggregation data.

Optionally, in a possible implementation manner of the first aspect, the nonlinear processing is performed on the gradient value corresponding to the index in the server-side aggregated data to obtain a new gradient value, and the new gradient value is calculated based on the following formula:

wherein G is a new gradient value corresponding to the index, G_CAnd aggregating the gradient value corresponding to the index in the data for the server.

Optionally, in a possible implementation manner of the first aspect, before the participating end trains the participating end model with the pre-processed training data to obtain model data, the method includes: respectively sending the models to one or more participating terminals by the federal server side initialization model; and the participant terminal aligns the training data, and the initialization model performs local training on the basis of the aligned training data to obtain model data.

The second device of the embodiment of the invention provides a knowledge federation-based data security exchange device, which comprises the following modules which are executed in a circulating mode until a preset stop condition is reached:

the participating end training module is used for training a participating end model by using the pre-processed training data of the participating end to obtain model data or updating the participating end model by using final aggregated data;

the participating end aggregation module is used for acquiring the contribution value of each parameter in the model data by the participating end, and generating a mask for covering the parameter with the contribution value arranged at K bit; acquiring indexes of parameters with contribution values arranged before K bits, and combining the indexes and corresponding gradient data to form participation end aggregation data;

the federated aggregation module is used for receiving the aggregation data of each participating terminal by the federated server, performing aggregation processing on gradients in the aggregation data of all participating terminals by using indexes to obtain the aggregation data of the server, and sending the aggregation data of the server to each participating terminal;

and each participating end carries out local aggregation according to the server-side aggregated data and the participating-end aggregated data to obtain final aggregated data.

In a third aspect of the embodiments of the present invention, a readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method according to the first aspect of the present invention and various possible designs of the first aspect of the present invention.

The invention provides a knowledge federation-based data security exchange method, a device and a storage medium, which have the following advantages:

(1) the method is characterized in that the method comprises the steps of firstly dynamically generating a mask of an optimal gain gradient before data transmission is carried out on each participating end based on the safe data exchange of the knowledge federation, thereby forming a set of method for only transmitting optimal gain gradient data, ensuring the effectiveness and quality of data transmission, reducing the transmission quantity of data, improving the data processing efficiency and reducing each cost.

(2) The federated service end provides exchange data for different participating ends, and a nonlinear aggregation scheme is used, so that the participating end model can learn model gain information of other participating ends under the condition of protecting the stability of any participating end model.

(3) By the method, the bandwidth and memory overhead of each participating end model can be reduced in the training process, and the stability of the model performance can be kept.

Drawings

FIG. 1 is a flow chart of a first embodiment of a knowledge federation-based data security exchange method;

FIG. 2 is a flow chart of a second embodiment of a knowledge federation-based data security exchange method;

fig. 3 is a block diagram of a first embodiment of a knowledge federation-based data security exchange apparatus.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, and in the preceding drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that, in the present application, "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It is to be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C", "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that 1 or 2 or 3 of A, B, C comprises.

It should be understood that in the present invention, "B corresponding to A", "A corresponding to B" or "B corresponding to A" means that B is associated with A, and B can be determined according to A. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "when used in a.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

An embodiment of the present invention provides a safe data exchange method based on knowledge federation, which is applicable to a plurality of terminals that need to perform data interaction, as shown in fig. 1, for example, assuming that there are two participating terminals, i.e., a participating terminal a and a participating terminal B, perform federation training through a federation data part exchange method, a coordinator (c) terminal may be a third party (a federation service terminal in the present invention) or any party participating in the federation. Among the three parties, the C terminal only provides the function of nonlinear aggregation. Desensitization data are transmitted among the participating end A, the participating end B and the federal service end C, and all sensitive data of the participating end A and the participating end B exist.

The invention provides a mask code-based knowledge federation-based data security exchange method. Under the condition of training a multi-party model, according to the characteristics of different models, different masks are dynamically generated in the method, namely, data transmitted and not transmitted by each participant end are different. Specifically, the mask is associated with the importance of each portion of the model, and the importance of each portion of the model changes. Therefore, the method can dynamically adjust the data to be exchanged, and ensure that the data exchange among the multi-party models can obtain the maximum gain each time. Meanwhile, aiming at the problem that the importance of data exchanged by different participating terminals (namely participating terminals) and the model components are different, the method provides a corresponding data aggregation scheme, can aggregate data from different parts of multiple parties, and enables the data from other parties to generate effective gain on the data of the party on the premise of ensuring that the gain of the data of the party on the model of the party is larger.

The invention is mainly divided into four stages: 1. data preparation stage 2, model local training stage 3, model gain calculation stage 4, model aggregation stage

The first phase is the data preparation phase. The stage is to carry out data preprocessing according to the requirement of the federal.

The data preprocessing comprises the following steps: data cleaning, feature processing, including merging of similar features during feature processing, and removing features that are insignificant or contribute less.

For horizontal knowledge federation-based data security exchange: and different participating terminals encrypt the characteristic dimensions of the model and align the characteristic dimensions. For better training, a part of feature dimensions which contribute little or no contribution to the model are also removed after alignment.

For longitudinal knowledge federation-based data security exchange: different participating terminals encrypt the id of the data and then compare the encrypted id, and after the same id is recorded, the data are aligned to obtain aligned data with the same quantity and more characteristic dimensions. Furthermore, the aligned feature dimensions are deleted, and a part of feature dimensions which contribute little or even do not contribute to the model are removed to achieve a better training effect.

The second phase is the local training phase of the model, which is the same as the normal model training phase. This phase will be trained locally on the data of each participating peer using the Coordinator initialized model. For the deep learning model, the model is subjected to forward propagation and backward propagation once.

The third phase is the data transfer phase of the model. In the stage, the model gain is calculated according to the model training result in the second stage. Assuming that the contribution degree calculation method of the model is the product of the gradient G and the weight W, the contribution degree is calculated for all parameters (or nodes) of the model by the method. And then obtaining a corresponding K value through the model hyperparameter m%. Thereby generating a mask (M) for parameters (or nodes) that do not need to be transmitted and generating participating side aggregated data for data that need to be transmitted.

The fourth stage is the model polymerization stage. This stage is divided into three steps: a) index aggregation; b) c-terminal polymerization; c) and local aggregation of each participant terminal.

a) The index aggregation mainly aims to aggregate indexes of different contents transmitted by each participating end, and order and high efficiency of subsequent aggregation are guaranteed.

b) The C-side aggregation is to perform average aggregation on the contents transmitted by each participating side at the C-side according to the index aggregation result (that is, when the same parameter of the model or the gradient of the node is transmitted to the C-side, the alignment performs average operation according to the number uploaded).

c) The local aggregation of each participating end is to aggregate the gradients after the C end is aggregated again according to the local information, so that the problem that the local nodes excessively deform certain parameters or nodes due to the excessively high gain of other participating ends for the certain parameters (or nodes) is solved.

G_CRepresents the gradient of the parameter (or node) after C-terminal aggregation, and G represents the bookIf G of the local model participates in the aggregation of the C-terminal model, G ═ G_C. If the G of the local model does not participate in the aggregation of the C-terminal model, then

The second and fourth phases b) c) above are repeated, and the model is exchanged securely for data that is partially transmitted. And at a certain time interval, performing a third stage and a fourth stage a) to dynamically update the mask of the model. So that the training of the whole model can be efficient and low-loss.

In one embodiment, as shown in fig. 2, the knowledge federation-based data security exchange method of the present invention includes the following steps:

and S10, the participating end trains the participating end model by using the pre-processed training data to obtain model data.

In one embodiment, the initial model may be initialized by the federal service side based on the seed and the random number and then sent to each participant side, and at this time, the training process starts from the federal service side to initialize the model.

In another embodiment, the initial model may also be generated by the participating end locally based on a form of seed (random number) and random table. In the cross-sample knowledge federation-based data security exchange, in order to ensure that the initial models of the participating terminals are consistent, the seeds and the random table of each participating terminal should be the same.

The method comprises the following steps of performing alignment processing on training data based on received alignment information, and performing model training on the processed training data to obtain model data, wherein the alignment processing comprises the following steps:

in this embodiment, an initial model is generated by the federal server, and the federal server initializes the model and sends the model to one or more participating terminals by the dimension information. And the federal server sends the initialization model to a participant terminal A and a participant terminal B.

And the participating end receives the dimension information to align the training data, and the initialization model carries out local training based on the training data. And the participating end A and the participating end B perform alignment processing on the training data based on the dimension information, reduce the data volume during interaction, and then perform model training based on the training data.

And repeating the step of obtaining the contribution value of each parameter in the model data by the participating end until the participating end receives the aggregated data of the server end and then training the model of the participating end again for a preset turn. After the repeated calculation and training of the preset round, the new parameter (or node) with the best contribution of each participating end is aggregated with other participating ends, and the information of the high-gain parameters (or nodes) of other participating ends is absorbed. After multiple rounds of training and aggregation, a complete learning local model is finally obtained, and other knowledge federation-based data security exchange models participating in end gain are absorbed.

In a possible embodiment, before step S10, the following steps of preprocessing the training data are further included:

s01, the participating terminals encrypt the local training data, then the encrypted data are aligned transversely, and when the transverse data are aligned, the characteristic dimensions of each participating terminal are corresponding. For example: the participating end A has the following characteristic dimensions: "A, B, C, D". Then the feature dimensions among participating peers B are also: "A, B, C, D". For the feature dimension, the participating terminals a, B must be identical and the order is the same. If the feature dimension in A is: "A, B, C, D, E". B the characteristic dimensions are: "A, B, C, D", then the training data after alignment processing to the training data in A and B are "A, B, C, D", respectively, deleted the penta, the penta contributes little or even does not contribute the characteristic dimension for the model.

Through carrying out horizontal alignment, can remove some characteristic dimension that does not contribute to the model less or even not, can reach better training effect on the one hand, on the other hand, can reduce the size of data interaction, and then improve the transmission efficiency of data.

In one possible embodiment, in step S10, the method further includes the following steps:

and S02, the participating end encrypts the data and then longitudinally aligns the encrypted data. When the longitudinal data are aligned, the data id of each participating end is corresponding. The data sample ids of a and B are matched with each other in an encrypted manner. For example: the data sample id of the participating end A is a, B, c and d, and the data sample id of the participating end B is c, d, e and f. The data sample ids of the two parties are encrypted to obtain a1, b1, c1, d1, c1, d1, e1 and f 1. We can get matched data samples c1 and d 1. Finally, the aligned data is put into a model training, wherein a1, B1 in the participating end A and e1 and f1 in the participating end B are considered as feature dimensions with small or no contribution.

Different participating terminals encrypt the id of the data and then compare the encrypted id, and after the same id is recorded, the data are aligned to obtain aligned data with the same quantity and more characteristic dimensions. And then, deleting the aligned feature dimensions, and removing a part of feature dimensions which contribute little or even do not contribute to the model so as to achieve a better training effect, reduce the size of data interaction and further improve the transmission efficiency of data.

S20, the participating end obtains the contribution value of each parameter in the model data, and masks are generated for the parameters with the contribution values arranged at K bits; and acquiring an index of a parameter with the contribution value arranged before K bits, and combining the index and the corresponding gradient data to form participating end aggregation data. In the step, the participating end calculates the contribution value of each parameter, sorts the parameters based on the contribution value, generates a mask for the parameters with the contribution value arranged after K bits, obtains an index of the parameters with the contribution value arranged before K bits, completes the separation of data and the distribution of data through the above method, and only transmits the index of the parameters with the contribution value arranged before K bits to the federal service end during data transmission.

In the implementation, K can be set according to requirements and can also be obtained through calculation. In one possible embodiment, K is calculated based on the following formula:

K＝Sm％

wherein S is the total number of parameters, and m% is the model hyper-parameter.

In one possible embodiment, obtaining the contribution value of each parameter in the model data comprises:

obtaining the gradient G and the weight W of each parameter in the model, respectively calculating the contribution value of each parameter based on the following formula,

B＝|GW|

where B is the contribution value.

And sequencing all the parameters from high to low in sequence based on the contribution values of the parameters, and generating a mask M by the parameters with the contribution values after K.

In one embodiment, assume that the initialized model is (x)₁+2x₂+3x₃+4x₄) The model is distributed to a participant A and a participant B, and the local training of the participant A and the participant B is used for training respectively. The updated model after the local training of the participating end A is (2 ×)₁+1x₂+3x₃+2x₄) The model of the participating end B updated after local update is (3 x)₁+0x₂+4x₃+4x₄). Modeling data x of participating side A and participating side B models₁、x₂、x₃、x₄Are (1, -1, 0, -2) and (2, -2, 1, 0), respectively. The contribution values of the trained model parameters are calculated to be (2, 1, 0, 4) and (6, 0, 4, 0), respectively. The amount m% (i.e., the transmission percentage) of the partial transmission is set to 50% (i.e., m is 50), the value of K of the transmission is 2. Therefore, the maximum 2 of all participating terminals are selected for transmission, i.e. the first 2. The two with the largest contribution values of the participating end A are 4 and 2, and the index thereof is (3, 0); the two most contributing values of party B of the participating end are 6 and 4, with indices of (0, 2). The data transmitted by the final participant a to the federal server is "0: 1,3: -2 ", the content transmitted by the participant a to the federal server is" 0: 2,2: 1'. Through the mode, the original data or the encrypted data with larger data volume is changed into the mask data with low contribution value, and only the data with high contribution degree is transmitted, so that the transmitted data volume is reduced, and the transmission and processing efficiency of the data is improved.

And S30, the federated server receives the aggregated data of each participating end, aggregates the gradients in the aggregated data of all participating ends by using the indexes to obtain the aggregated data of the server, and sends the aggregated data of the server to each participating end.

In step S30, the indexes of all participating terminals are aggregated to obtain an aggregate index, and then the gradients of all participating terminals are aggregated according to the aggregate index to obtain aggregated data of the service terminal.

In step S30, averagely aggregating the gradients of all participating terminals according to the aggregation index to obtain server-side aggregated data includes:

and calculating the server side aggregated data of each parameter according to the following formula:

wherein t is a parameter, n is the total number of gradient values of all participating terminals (parties) relative to the parameter t, G^s，tAggregating data for the server side of the parameter t,

the jth gradient value from the participating end (party) for parameter t. The expression parameter (or node) t has n gradient values from participating end, and the average aggregation action is to obtain the average value of all participating ends of the node.

In the present embodiment, when obtaining server aggregation data according to the aggregation index average aggregation process, weight aggregation, average aggregation, and the like may be performed, and may be adjusted according to an application scenario, which is not described herein.

In the embodiment of this step, first, the aggregation index obtained by aggregating the indexes of the participating end a and the participating end B is "0: 2,1: 0,2: 1,3: 1, aggregating the model gradient according to the aggregation index to obtain the server aggregation data "0: 1.5,2: 1,3: -2 ". And then the aggregated data and gradient of the server side are reversely transmitted to the participating side A and the participating side B.

And S40, each participating end carries out local aggregation according to the server side aggregated data and the participating end aggregated data to obtain final aggregated data.

In the embodiment of this step, the server aggregation data aggregated by the federated server is subjected to nonlinear processing in the participating end a and the participating end B, the participating end a obtains the final aggregation data after local aggregation as (1.5, -1, 0.366, -2), and the participating end B obtains the final aggregation data after local aggregation as (1.5, -2, 1, -0.119).

The local aggregation of each participant terminal according to the server terminal aggregation data and the participant terminal aggregation data comprises:

and regarding the indexes inquired in the participating end aggregation data and the server end aggregation data, taking the gradient value corresponding to the index in the server end aggregation data as a new gradient value. For example, the model data has parameters (W, X, Y, Z), the index-corresponding parameter in the participating-side aggregated data is (X, Y, Z), and the gradient of each parameter in the participating-side aggregated data is (X)₁，Y₁，Z₁) The indices of the parameters (X, Y, Z) are (0, 1, 2), respectively. Indexes which can be found in the aggregation data of the participating terminals and the aggregation data of the service terminals, namely gradients of parameters corresponding to the indexes, namely local aggregation and server-side aggregation are participated, the service terminals aggregate gradients of the parameters (X, Y, Z) corresponding to the indexes (0, 1, 2) and gradients of other participating terminals, and gradient (X) of the indexes (0, 1, 2) in the aggregation data of the service terminals is obtained₂，Y₂，Z₂) And sent to the participating end, which sends the gradient (X)₁，Y₁，Z₁) Is updated to (X)₂，Y₂，Z₂). By the above mode, the gradient value corresponding to the server side aggregated data is used as a new gradient value, and the gradient value is updated at the participating side after the server side is aggregated.

And aiming at the index which is only inquired in the server side aggregated data, carrying out nonlinear processing on the gradient value corresponding to the index in the server side aggregated data to obtain a new gradient value.

The gradient value corresponding to the index in the server side aggregated data is subjected to nonlinear processing to obtain a new gradient value, and the new gradient value is calculated based on the following formula:

wherein G is a new gradient value corresponding to the index, G_CAnd aggregating the gradient value corresponding to the index in the data for the server. Through the method, the gradient values of the indexes which can be inquired by the server and the indexes which are not contained in the server aggregated data are subjected to nonlinear calculation. Through the steps, gradient updating can be carried out on the parameters which do not participate in the aggregation processing of the federal service end, so that the gradient of part of parameters in the local data is continuously updated without one or more rounds of circulation.

For example, the model data has parameters (W, X, Y, Z), the index pair in the participating end aggregation data has parameters (X, Y, Z), and the gradient of each parameter in the participating end aggregation data has a value (X)₁，Y₁，Z₁) The indexes of the parameters (X, Y, Z) are respectively (0, 1, 2), and the server aggregated data comprises (W)₂，X₂，Y₂，Z₂). At this time, for the gradient W₂Only though the participating end does not upload the gradient W corresponding to the participating end₁But other participants may upload a gradient with respect to the parameter W. The gradient of the parameter W can be updated directly based on the server-side data.

And maintaining the gradient value unchanged for the indexes which are not inquired in the participating end aggregation data and the server end aggregation data. In this case, for example, the model data includes parameters (W, X, Y, Z), the index pair in the participating side aggregation data includes parameters (X, Y, Z), and the gradient of each parameter in the participating side aggregation data includes a gradient (X)₁，Y₁，Z₁) The indexes of the parameters (X, Y, Z) are respectively (0, 1, 2), and the server aggregated data comprises (X)₂，Y₂，Z₂). At this time, no index corresponding to the parameter W, namely the parameter W, is added to the aggregation data of the participating end and the server end, namely the parameter W is added to the aggregation data of the server endThere is no updated gradient for parameter W, while the gradient for parameter W is maintained.

In this embodiment, a federate service end performs gradient average aggregation on parameters of a plurality of participating ends, locally performs nonlinear processing on parameters not participating in average aggregation to update gradients, then locally performs nonlinear processing on gradient update results of average aggregation and the local parameters not participating in average aggregation to update the gradients, and performs nonlinear aggregation on the gradient update results of average aggregation and the local parameters not participating in average aggregation, so that the two aggregations are mutually matched to maximally balance the gain influence of each data exchange on each participating end.

And step S50, each participating end updates the corresponding participating end model by using the obtained final aggregation data, and the participating end trains the model of the participating end again after updating the model until a preset stop condition is reached.

The stopping condition is that a set iteration number or model convergence is reached.

In this embodiment, after the above-mentioned repeated calculation and training of the preset round, the new parameter (or node) with the best contribution of each participating end is aggregated with other participating ends, and the information of the high-gain parameter (or node) of other participating ends is absorbed. After multiple rounds of training and aggregation, a complete learning local model is finally obtained, and other knowledge federation-based data security exchange models participating in end gain are absorbed.

In one embodiment, in the process of index aggregation of all the indexes of the participating terminals, the index aggregation of each participating terminal is executed every two cycles. The frequency of index aggregation depends on the loop times of the whole steps S10 to S40, but the frequency of index aggregation is not synchronized with the loop times. For example, the preset stop condition of the present invention is that the step S10 to the step S40 are cycled 100 times, and then preferably, the index aggregation is performed once by each participating end when the cycle of the step S10 to the step S40 reaches 5 to 10 times.

The second embodiment of the present invention provides a knowledge federation-based data security exchange apparatus, as shown in fig. 3, including the following modules that are executed in a loop until a preset stop condition is reached:

the participating end training module is used for training a participating end model by using the pre-processed training data of the participating end to obtain model data or finally aggregating the data to update the participating end model;

and the final aggregation module is used for carrying out local aggregation on each participant terminal according to the server terminal aggregation data and the participant terminal aggregation data to obtain final aggregation data.

The third embodiment of the invention provides a readable storage medium, wherein a computer program is stored in the readable storage medium, and the computer program is used for realizing the method for safely exchanging the knowledge federation-based data when being executed by a processor.

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A safe data exchange method based on knowledge federation is characterized by comprising the following steps of circularly executing until a preset stop condition is reached:

2. The knowledge federation-based data security exchange method of claim 1, wherein the parameter generation mask with contribution values after K comprises calculating a value of K according to the following formula:

K＝Sm％

3. The knowledge federation-based data security exchange method of claim 1, wherein the receiving, by the federation server, aggregated data for each participating end comprises:

4. The knowledge federation-based data security exchange method of claim 1, wherein the aggregation of the indexes for all participating peers is performed every at least two cycles.

5. The knowledge federation-based data security exchange method of claim 3, wherein the aggregating the gradients of all participating peers according to an aggregation index comprises: and carrying out average aggregation processing on the gradients of all the participating terminals according to the aggregation indexes.

6. The knowledge federation-based data security exchange method of claim 1, wherein the local aggregation of each participating end according to the server-side aggregated data and the participating-end aggregated data comprises:

7. The knowledge federation-based data security exchange method of claim 6, wherein the nonlinear processing is performed on the gradient value corresponding to the index in the server aggregated data to obtain a new gradient value is calculated based on the following formula:

8. The knowledge federation-based data security exchange method of claim 1, wherein before the participating end trains the participating end model to obtain model data by using the pre-processed training data, the method comprises: respectively sending the models to one or more participating terminals by the federal server side initialization model; and the participating end performs alignment processing on the training data.

9. The safe data exchange device based on the knowledge federation is characterized by comprising the following modules which are executed in a circulating mode until a preset stop condition is reached:

10. A readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 8.