CN112560088B

CN112560088B - Knowledge federation-based data security exchange method, device and storage medium

Info

Publication number: CN112560088B
Application number: CN202011443118.3A
Authority: CN
Inventors: 韦达; 孟丹; 李宏宇; 李晓林
Original assignee: Tongdun Holdings Co Ltd
Current assignee: Tongdun Holdings Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2024-05-28
Anticipated expiration: 2040-12-11
Also published as: CN112560088A

Abstract

The invention provides a knowledge federation-based data security exchange method, a device and a storage medium, which comprise the following steps of circularly executing until reaching a preset stop condition: the participant trains the participant model by utilizing the preprocessed training data to obtain model data; the participation end obtains the contribution value of each parameter in the model data, and generates a mask for the parameters with the contribution values arranged in K bits to mask; acquiring indexes of parameters with contribution values arranged before K bits, and combining the indexes and corresponding gradient data to form aggregation data of a participating end; the federation server receives the aggregation data of each participating end, carries out aggregation processing on gradients in all the aggregation data of the participating ends through indexes to obtain aggregation data of the server end, and sends the aggregation data of the server end to each participating end; each participating terminal carries out local aggregation according to the service terminal aggregation data and the participating terminal aggregation data to obtain final aggregation data; and each participating terminal updates the corresponding participating terminal model by using the obtained final aggregation data.

Description

Knowledge federation-based data security exchange method, device and storage medium

Technical Field

The present invention relates to data processing technologies, and in particular, to a method and apparatus for secure data exchange based on knowledge federation, and a storage medium.

Background

With the start of personal information protection draft of the people's republic of China, public solicitation opinion is started, and the protection of enterprises on information safety is more and more strict. In the past, data exchange is directly carried out among enterprises; methods of acquiring data and applying the big data are increasingly limited, and the data forms individual "islands" under regulatory restrictions.

Under the condition, data communication is increasingly developed among enterprises through methods such as MPC, federal learning and the like, and more data are legally acquired to improve the performance of the model on the premise of ensuring safety. However, various encryption schemes are added to the process of data exchange, and the communication and calculation pressures are greatly increased while the security of personal information is protected. Often the amount of encrypted data is more than a hundred times that of the source data. Training an identical model, the model of cryptographic computation often requires tens of times the time and hardware resources before. The huge demands on computing power, memory, bandwidth and electric quantity bring about a heavy burden to enterprises.

Currently, in order to ensure stable model effect and safety of information exchange, the data volume required to be exchanged between each participating end is large, the data processing efficiency is low, and various costs are increased.

Disclosure of Invention

The embodiment of the invention provides a data security exchange method, a data security exchange device and a storage medium based on knowledge federation, which reduce the data quantity transmitted during data exchange of each participating end, improve the data processing efficiency and reduce various costs.

In a first aspect of the embodiment of the present invention, a data security exchanging method based on knowledge federation is provided, including the steps of cyclically executing the following steps until reaching a preset stop condition:

The participant trains the participant model by utilizing the preprocessed training data to obtain model data;

The participation end obtains the contribution value of each parameter in the model data, and generates a mask for the parameters with the contribution values arranged in K bits to mask; acquiring indexes of parameters with contribution values arranged before K bits, and combining the indexes and corresponding gradient data to form aggregation data of a participating end;

The federation server receives aggregation data of all the participating terminals, and utilizes indexes to aggregate gradients in all the aggregation data of the participating terminals to obtain aggregation data of the server terminals, and the aggregation data of the server terminals are sent to all the participating terminals;

Each participating terminal carries out local aggregation according to the service terminal aggregation data and the participating terminal aggregation data to obtain final aggregation data;

and each participating terminal updates the corresponding participating terminal model by using the obtained final aggregation data.

Optionally, in a possible implementation manner of the first aspect, the parameter generation mask that ranks the contribution values after K includes calculating a value of K according to the following formula:

K＝Sm％

Wherein S is the total number of parameters, and m% is the percentage of the model super parameters to be transmitted.

Optionally, in one possible implementation manner of the first aspect, after the receiving, by the federal server, each participating end aggregate data includes:

and the federation server receives the aggregation data of each participating end, aggregates the indexes of all the participating ends to obtain an aggregation index, and aggregates gradients of all the participating ends according to the aggregation index to obtain the aggregation data of the server.

Optionally, in a possible implementation manner of the first aspect, the index aggregation to all participating ends is performed once every at least two loops.

Optionally, in a possible implementation manner of the first aspect, the aggregating the gradients of all the participating ends according to the aggregation index includes: and carrying out average aggregation treatment on gradients of all the participating terminals according to the aggregation index.

Optionally, in a possible implementation manner of the first aspect, the locally aggregating by each participant according to the service-side aggregated data and the participant aggregated data includes:

inquiring the index of the aggregation data of the participating end and the aggregation data of the service end, and carrying out the following operations according to the inquiry result:

Aiming at indexes queried in the aggregation data of the participating end and the aggregation data of the service end, taking a gradient value corresponding to the index in the aggregation data of the service end as a new gradient value;

Aiming at the index which is only inquired in the server-side aggregation data, carrying out nonlinear processing on the gradient value corresponding to the index in the server-side aggregation data to obtain a new gradient value;

and maintaining the gradient value unchanged for indexes which are not queried in the aggregation data of the participating end and the aggregation data of the service end.

Optionally, in one possible implementation manner of the first aspect, the performing a nonlinear process on a gradient value corresponding to the index in the server-side aggregated data to obtain a new gradient value is calculated based on the following formula:

Wherein, G is a new gradient value corresponding to the index, and G _C is a gradient value corresponding to the index in the server-side aggregate data.

Optionally, in a possible implementation manner of the first aspect, before the participant trains the participant model by using the preprocessed training data to obtain the model data, the method includes: the federal server side initialization model sends the model to one or more participating sides respectively; the participating end performs alignment processing on the training data, and the initialization model performs local training based on the aligned training data to obtain model data.

The second device of the embodiment of the invention provides a knowledge federation-based data security exchange device, which comprises the following modules for circularly executing until reaching a preset stop condition:

the participating end training module is used for training the participating end model by using the preprocessed training data to obtain model data or updating the participating end model by using the final aggregated data;

the participating terminal aggregation module is used for acquiring the contribution value of each parameter in the model data by the participating terminal, and generating a mask for the parameters with the contribution values arranged in K bits for masking; acquiring indexes of parameters with contribution values arranged before K bits, and combining the indexes and corresponding gradient data to form aggregation data of a participating end;

The federation aggregation module is used for receiving aggregation data of all the participating terminals by the federation server, carrying out aggregation processing on gradients in all the aggregation data of the participating terminals by using indexes to obtain aggregation data of the server, and sending the aggregation data of the server to all the participating terminals;

and the final aggregation module is used for locally aggregating each participating terminal according to the service terminal aggregation data and the participating terminal aggregation data to obtain final aggregation data.

In a third aspect of the embodiments of the present invention, there is provided a readable storage medium having stored therein a computer program for implementing the method of the first aspect and the various possible designs of the first aspect when the computer program is executed by a processor.

The knowledge federation-based data security exchange method, the knowledge federation-based data security exchange device and the storage medium provided by the invention have the following advantages:

(1) The data safety exchange based on knowledge federation firstly dynamically generates the mask of the optimal gain gradient before each participating end performs data transmission, thereby forming a set of method for transmitting only the optimal gain gradient data, ensuring the effectiveness and quality of data transmission, reducing the transmission quantity of data, improving the data processing efficiency and reducing each cost.

(2) The federal service end uses a nonlinear aggregation scheme for the exchange data provided by different participating ends, so that the participating end model can learn the model gain information of other participating ends under the condition of protecting the stability of any participating end model.

(3) The method ensures that each participating end model can reduce bandwidth and memory expenditure and maintain the stability of model performance in the training process.

Drawings

FIG. 1 is a flow chart of a first embodiment of a knowledge federation-based data security exchange method;

FIG. 2 is a flow chart of a second embodiment of a knowledge federation-based data security exchange method;

fig. 3 is a block diagram of a first embodiment of a knowledge federal based data security switching apparatus.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.

It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, meaning that there may be three relationships, e.g., and/or B, may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C are comprised, "comprising A, B or C" means that one of A, B, C is comprised, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of A, B, C are comprised.

It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at" or "when" depending on the context, "or" in response to a determination "or" in response to a detection.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

An embodiment of the present invention provides a data security exchange method based on knowledge federation, which is suitable for use between a plurality of terminals that need to perform data interaction, as shown in fig. 1, for example, assume that there are two participating terminals, that is, a participating terminal a and a participating terminal B perform federation training through a federation data portion exchange method, and a Coordinator (C) terminal may be a third party (federation server in the present invention) or any party that participates in the data security exchange method. Of these three parties, the C-terminal only provides the function of nonlinear aggregation. The desensitization data are transmitted among the three parties of the participation end A, the participation end B and the federal service end C, and all the sensitive data exist in the participation end A and the participation end B.

The invention provides a data security exchange method based on a mask and a knowledge federation. Under the condition of training a multiparty model, different masks are dynamically generated in the method according to the characteristics of different models, namely, each participating end transmits and does not transmit different data each time. In particular, the mask is associated with the importance of portions of the model, which vary from one portion of the model to another. Therefore, the method can dynamically adjust the data to be exchanged, and ensure that the maximum gain can be obtained for data exchange between multiparty models each time. Meanwhile, aiming at the problems of different importance of data exchanged by different participating terminals (namely the participating terminals) and different model components, the method provides a corresponding data aggregation scheme, which can aggregate the data of different parts from multiple parties, and can enable the data from other parties to generate effective gain for the data on the premise of ensuring that the gain of the data for the model is larger.

The invention is mainly divided into four stages: 1. data preparation stage 2, model local training stage 3, model gain calculation stage 4, model aggregation stage

The first phase is a data preparation phase. The data is preprocessed according to federal requirements at this stage.

The data preprocessing comprises the following steps: data cleaning, feature processing, including the merging of similar features during feature processing, and the removal of features that are not important or that contribute less.

For lateral knowledge federation-based data security exchange: different participating terminals encrypt the characteristic dimension of the model and then align the characteristic dimension. To obtain better training results, a portion of the feature dimensions that contribute little or no more to the model are also removed after alignment.

For a longitudinal knowledge federal based data security exchange: different participating terminals encrypt the ids of the data, then compare the encrypted ids, record the same ids, and then align the data to obtain aligned data with the same number of more characteristic dimensions. Furthermore, the aligned feature dimensions are also pruned, and a part of feature dimensions with little or no contribution to the model are removed to achieve a better training effect.

The second phase is the local training phase of the model, which is the same as the normal model training phase. This stage will use the initialized model of the Coordinator to train locally on the data of each participant. For a deep learning model, it is the model that propagates forward and backward once.

The third phase is the data transfer phase of the model. And the stage performs calculation of model gain according to the model training result of the second stage. Assuming that the contribution calculation method of the model is the product of the gradient G and the weight W, the contribution is calculated by this method for all parameters (or nodes) of the model. And obtaining a corresponding K value through the model super parameter m ". Thereby generating a mask (M) for parameters (or nodes) that do not need to be transmitted and generating participant aggregated data for data that needs to be transmitted.

The fourth stage is the model polymerization stage. This stage is divided into three steps: a) Index aggregation; b) C-terminal polymerization; c) Each participating end locally aggregates.

A) The main purpose of index aggregation is to aggregate the indexes of different contents transmitted by each participating terminal, so as to ensure the ordered and efficient subsequent aggregation.

B) And C-terminal aggregation is to average aggregate the contents transmitted by each participating terminal according to the index aggregation result at the C-terminal (namely, when the same parameter of the model or the gradient of the node is transmitted to the C-terminal, the alignment is performed according to the average operation of the uploaded number).

C) The local aggregation of each participating terminal is to aggregate the gradient after the C terminal is aggregated again according to the local information, so that the problem that local nodes excessively deform certain parameters or nodes due to the excessively high gain of other participating terminals to the certain parameters (or nodes) is avoided.

G _C represents the gradient of the parameter (or node) after C-terminal aggregation, G represents the parameter (or gradient of the node) of the local model if G of the local model participates in the aggregation of the C-terminal model, g=g _C if G of the local model does not participate in the aggregation of the C-terminal model

Repeating the second and fourth stages b) c) above, and performing a secure exchange of data for partial transmission of the model. And simultaneously, at a certain time interval, performing a third stage and a fourth stage a), and dynamically updating the mask of the model. So that the training of the whole model can be efficient and low-loss.

In one embodiment, as shown in fig. 2, the knowledge federation-based data security exchange method of the present invention includes the steps of:

s10, the participant trains the participant model by utilizing the preprocessed training data to obtain model data.

In one embodiment, the initial model may be initialized by the federal server based on seed and random numbers and then sent to each participant, at which point the training process begins with the federal server initializing the model.

In another embodiment, the initial model may also be generated locally by the participating end based on the seed (random number) and the form of a random table. In the data security exchange based on knowledge federation across samples, in order to ensure that the model initial models of all the participating ends are consistent, the seed and the random table of each participating end should be the same.

The participant terminal performs alignment processing on training data based on the received alignment information, performs model training on the processed training data to obtain model data, and comprises the following steps:

In this embodiment, an initial model is generated by a federation server, and the federation server initializes the model and dimension information to send the model to one or more participating ends respectively. The federal server sends the initialization model to the participating end A and the participating end B.

And the participating end receives the dimension information and performs alignment processing on training data, and the initialization model performs local training based on the training data. The participating end A and the participating end B perform alignment processing on training data based on dimension information, reduce data quantity during interaction, and then perform model training based on the training data.

Repeating the step of the participating end for obtaining the contribution value of each parameter in the model data until the participating end receives the service end aggregate data, and then training the model of the participating end again for a preset turn. And (3) through repeated calculation and training of the preset rounds, the new parameter (or node) with the optimal contribution of each participating terminal is aggregated with other participating terminals and absorbs the information of the high-gain parameters (or nodes) of other participating terminals. After multiple rounds of training and aggregation, a data security exchange model based on knowledge federation, which is used for completely learning the local model and absorbing gains of other participating terminals, is finally obtained.

In one possible embodiment, before step S10, the following steps of preprocessing the training data are further included:

S01, encrypting the local training data by the participating terminals, and then transversely aligning the encrypted data, wherein the characteristic dimensions of each participating terminal are corresponding when the transverse data are aligned. For example: the feature dimensions in the participating end A are: "A, B, C, ding". Then the feature dimension among participating terminals B is also: "A, B, C, ding". For the feature dimension, the participating terminals a, B must be identical and identical in sequence. If the A-neutral feature dimension is: "A, B, C, T, E". The feature dimensions in B are: "A, B, C, ding", the training data after alignment of the training data in A and B at this time are "A, B, C, ding", and the pentan is deleted, which contributes little or no feature dimension to the model.

Through carrying out horizontal alignment, can remove a part of the characteristic dimension that the model contribution is less even do not have the contribution, on the one hand can reach better training effect, on the other hand, can reduce the size of data interaction, and then improve the transmission efficiency of data.

In one possible embodiment, in step S10, the method further includes the steps of:

S02, the participating end encrypts the data and then longitudinally aligns the encrypted data. When the longitudinal data are aligned, the data ids of all the participating terminals are corresponding. The data sample ids of A and B are matched with each other in an encrypted mode. For example: the data sample ids of the participating terminals A are a, B, c and d, and the data sample ids of the participating terminals B are c, d, e and f. The data sample ids of both sides are encrypted to obtain a1, b1, c1, d1 and c1, d1, e1, f1. Then we can get matched data samples c1 and d1. Finally, the aligned data are subjected to modular training, wherein a1, B1 in the participation terminal A and e1, f1 in the participation terminal B are regarded as feature dimensions with little contribution or even no contribution.

Different participating terminals encrypt the ids of the data, then compare the encrypted ids, record the same ids, and then align the data to obtain aligned data with the same number of more characteristic dimensions. And then deleting the aligned characteristic dimensions, and removing a part of characteristic dimensions with small or no contribution to the model to achieve a better training effect, so that the size of data interaction is reduced, and the data transmission efficiency is further improved.

S20, the participating end acquires the contribution value of each parameter in the model data, and generates a mask for the parameters with the contribution values arranged in K bits to mask; and acquiring indexes of parameters with contribution values arranged before K bits, and combining the indexes and corresponding gradient data to form the aggregation data of the participation end. In the step, the participating end calculates the contribution value of each parameter, sorts the parameters based on the contribution values, generates a mask for the parameters with the contribution values arranged in K bits, obtains the index of the parameters with the contribution values arranged in front of K bits, completes the data separation and the data distribution in the above mode, and only transmits the index of the parameters with the contribution values arranged in front of K bits to the federal server during data transmission.

In this embodiment, K may be set as needed or calculated. In one possible embodiment, K is calculated based on the following formula:

K＝Sm％

wherein S is the total number of parameters, and m% is the model hyper-parameter.

In one possible embodiment, obtaining the contribution value of each parameter in the model data includes:

acquiring gradient G and weight W of each parameter in the model, respectively calculating contribution values of each parameter based on the following formula,

B＝|GW|

Wherein B is a contribution value.

And sequentially sorting all parameters from high to low based on the contribution values of the parameters, and generating a mask M by arranging the contribution values of the parameters after K.

In one embodiment, assuming that the initialized model is (x ₁+2x₂+3x₃+4x₄), the model is distributed to participating party a and participating party B, trained with local training of participating party a and participating party B, respectively. The model updated after the local training of the participating terminal A is (2 x ₁+1x₂+3x₃+2x₄), and the model updated after the local updating of the participating terminal B is (3 x ₁+0x₂+4x₃+4x₄). The parameter gradients of the modeling data x ₁、x₂、x₃、x₄ of the participating end a and participating end B models are (1, -1,0, -2) and (2, -2,1, 0), respectively. The contribution values of the trained model parameters obtained through calculation are (2,1,0,4) and (6,0,4,0) respectively. The K value of the transmission is 2 when the amount of partial transmission m% (i.e., transmission percentage) is set to 50% (i.e., m=50). The 2 largest contributions are taken out of all the participating terminals for transmission, i.e. the first 2. The two largest contribution values of the participating end A are 4 and 2, and the index is (3, 0); the maximum two contributions of party B are 6 and 4, with indices (0, 2). The data transmitted by the final participating end a to the federation server is "0: 1,3: -2 ", the content transmitted by the participating end a to the federal server is" 0:2,2: 1'. By the method, original data or encrypted data with larger data quantity is changed into the mask, and only data with high contribution degree is transmitted, so that the transmitted data quantity is reduced, and the transmission and processing efficiency of the data is improved.

S30, the federation server receives aggregation data of all the participating terminals, and utilizes indexes to aggregate gradients in all the aggregation data of the participating terminals to obtain aggregation data of the server terminals, and the aggregation data of the server terminals are sent to all the participating terminals.

In step S30, the indexes of all the participating terminals are aggregated to obtain an aggregated index, and then the gradients of all the participating terminals are aggregated according to the average aggregation of the aggregated indexes to obtain the aggregated data of the server.

In step S30, the step of obtaining service end aggregation data by performing average aggregation processing on gradients of all the participating ends according to the aggregation index includes:

and calculating the server side aggregate data of each parameter according to the following formula:

Wherein t is a parameter, n is the total number of gradient values of all the participating terminals (parties) relative to the parameter t, G ^s,t is the server side aggregate data of the parameter t, The j-th gradient value from the participant (party) for parameter t. The formula shows that there are n gradient values from the participating terminals in the parameter (or node) t, and the average aggregation behavior is that the average value of all the participating terminals from the node is obtained.

The weight aggregation, the average aggregation, and the like may be performed when the server side aggregated data is obtained by the aggregation index average aggregation processing in the present embodiment, and may be adjusted according to the application scenario, and are not specifically described herein.

In this embodiment of this step, first, an aggregation index obtained by aggregating indexes needed by the participating terminal a and the participating terminal B is "0: 2,1:0,2:1,3: 1", aggregating the model gradients according to the aggregation index to obtain server-side aggregation data" 0 ": 1.5,2:1,3: -2 ". And then transmitting the aggregated data and gradient of the aggregated server to the participation end A and the participation end B.

And S40, each participating terminal carries out local aggregation according to the service terminal aggregation data and the participating terminal aggregation data to obtain final aggregation data.

In the embodiment of the step, nonlinear processing is performed on service end aggregation data aggregated by the federal server in the participating end a and the participating end B, the participating end a obtains final aggregation data (1.5, -1,0.366, -2) after local aggregation, and the participating end B obtains final aggregation data (1.5, -2,1, -0.119) after local aggregation.

Each participating terminal carries out local aggregation according to the service terminal aggregation data and the participating terminal aggregation data, and the method comprises the following steps:

and aiming at indexes inquired in the aggregation data of the participating end and the aggregation data of the service end, taking a gradient value corresponding to the index in the aggregation data of the service end as a new gradient value. For example, the model data has parameters (W, X, Y, Z), the index of the participating end aggregate data corresponds to the parameters (X, Y, Z), the gradient of each parameter in the participating end aggregate data is (X ₁,Y₁,Z₁), and the indexes of the parameters (X, Y, Z) are (0, 1, 2) respectively. The index which can be found in the aggregation data of the participating end and the aggregation data of the service end, namely the gradient of the corresponding parameter of the index, participates in local aggregation and also participates in the aggregation of the service end, the service end can carry out aggregation processing on the gradient of the corresponding parameter (X, Y, Z) of the index (0, 1, 2) and the gradients of other participating ends, the gradient of the index (0, 1, 2) in the aggregation data of the service end is obtained as (X ₂,Y₂,Z₂) and is sent to the participating end, and the participating end updates the gradient (X ₁,Y₁,Z₁) as (X ₂,Y₂,Z₂). By the method, the gradient value corresponding to the server-side aggregation data is used as a new gradient value, and the gradient value is updated at the participating side after the server-side aggregation.

Aiming at the index which is only inquired in the server-side aggregation data, carrying out nonlinear processing on the gradient value corresponding to the index in the server-side aggregation data to obtain a new gradient value.

Performing nonlinear processing on the gradient value corresponding to the index in the server-side aggregation data to obtain a new gradient value, and calculating based on the following formula:

Wherein, G is a new gradient value corresponding to the index, and G _C is a gradient value corresponding to the index in the server-side aggregate data. The nonlinear calculation is performed on the index which can be queried at the server and the gradient value of the index which is not contained in the aggregated data of the server in the above mode. Through the steps, the gradient update can be carried out on the parameters which do not participate in the aggregation treatment of the federal server, so that the gradient continuous update of partial parameters in local data does not pass through one or more rounds of circulation.

For example, the model data has parameters (W, X, Y, Z), the index pair in the participating end aggregate data has parameters (X, Y, Z), the gradient of each parameter in the participating end aggregate data is (X ₁,Y₁,Z₁), the indexes of the parameters (X, Y, Z) are (0, 1, 2), and the server end aggregate data includes (W ₂,X₂,Y₂,Z₂). At this time, only the present participant does not upload the gradient W ₁ corresponding thereto for the gradient W ₂, but other participants may upload the gradient with respect to the parameter W. The gradient of the parameter W may be updated directly based on the server-side data.

And maintaining the gradient value unchanged for indexes which are not queried in the aggregation data of the participating end and the aggregation data of the service end. At this time, for example, the model data has parameters (W, X, Y, Z), the index pair in the participating end aggregate data has parameters (X, Y, Z), the gradient of each parameter in the participating end aggregate data is (X ₁,Y₁,Z₁), the indexes of the parameters (X, Y, Z) are (0, 1, 2), and the server end aggregate data includes (X ₂,Y₂,Z₂). At this time, the index corresponding to the parameter W is not available for the parameter W participating in the end aggregation data and the server end aggregation data, that is, the gradient of the parameter W is not updated, and the gradient of the parameter W is maintained unchanged at this time.

In this embodiment, gradient average aggregation is performed on parameters of a plurality of participating terminals at a federal server, nonlinear processing is performed on parameters which do not participate in average aggregation locally to update gradients, then nonlinear processing is performed on gradient update results of average aggregation and parameters which do not participate in average aggregation locally to update gradients, and two aggregation interactions can maximally balance gain influence of data exchange each time on each participating terminal.

And S50, updating the corresponding participant model by each participant by using the obtained final aggregation data, and training the model of the participant again after updating the model by the participant until a preset stop condition is reached.

The stopping condition is that the set iteration times are reached or the model converges.

In this embodiment, through the repeated calculation and training of the preset rounds, the new parameter (or node) with the best contribution of each participating terminal is aggregated with other participating terminals and absorbs the information of the high-gain parameters (or nodes) of other participating terminals. After multiple rounds of training and aggregation, a data security exchange model based on knowledge federation, which is used for completely learning the local model and absorbing gains of other participating terminals, is finally obtained.

In one embodiment, the index aggregation of each participating end is performed every at least two cycles during the index aggregation of all participating ends. The frequency of the index aggregation depends on the number of cycles of the whole steps S10 to S40 of the present invention, but the frequency of the index aggregation is not synchronized with the number of cycles. For example, the preset stopping condition of the invention is 100 times of the steps S10 to S40, and then, preferably, the index aggregation is performed once for each participating terminal when the steps S10 to S40 reach 5 to 10 times, and in this way, the federal server can generate the server aggregation data in each cycle, and the frequency of aggregation of each participating terminal can be reduced, so that the calculation amount at the participating terminal is reduced, and the processing efficiency of the participating terminal is improved.

The second embodiment of the invention provides a knowledge federation-based data security switching device, as shown in fig. 3, which includes the following modules that execute in a circulating manner until reaching a preset stop condition:

The participating end training module is used for training the participating end model by using the preprocessed training data to obtain model data or updating the participating end model by final aggregated data;

A third embodiment of the present invention provides a readable storage medium having stored therein a computer program for implementing the method for knowledge federally based data security exchange when executed by a processor.

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). In addition, the ASIC may reside in a user device. The processor and the readable storage medium may reside as discrete components in a communication device. The readable storage medium may be read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tape, floppy disk, optical data storage device, etc.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, the execution instructions being executed by the at least one processor to cause the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the Processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: DIGITAL SIGNAL Processor, abbreviated as DSP), application specific integrated circuits (english: application SPECIFIC INTEGRATED Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The data security exchange method based on knowledge federation is characterized by comprising the following steps of circularly executing until reaching a preset stop condition:

each participating terminal carries out local aggregation according to the service terminal aggregation data and the participating terminal aggregation data to obtain final aggregation data, which comprises the following steps: inquiring the index of the aggregation data of the participating end and the aggregation data of the service end, and carrying out the following operations according to the inquiry result: aiming at indexes queried in the aggregation data of the participating end and the aggregation data of the service end, taking a gradient value corresponding to the index in the aggregation data of the service end as a new gradient value; aiming at the index which is only inquired in the server-side aggregation data, carrying out nonlinear processing on the gradient value corresponding to the index in the server-side aggregation data to obtain a new gradient value; aiming at indexes which are not queried in the aggregation data of the participating end and the aggregation data of the service end, maintaining a gradient value unchanged;

2. The knowledge federal based data security exchange method according to claim 1, wherein ranking the contribution values after K to the parameter generation mask comprises calculating the value of K according to the following formula:

K＝Sm％

3. The knowledge federation-based data security exchange method according to claim 1, wherein after the federation server receives the aggregated data of each participating end, the method comprises:

4. The knowledge federal based data security exchange method according to claim 3, wherein the index aggregation to all participating ends is performed once every at least two cycles.

5. The knowledge federation-based secure data exchange method of claim 3, wherein aggregating gradients of all participating ends according to an aggregation index comprises: and carrying out average aggregation treatment on gradients of all the participating terminals according to the aggregation index.

6. The knowledge federation-based data security exchange method according to claim 1, wherein the nonlinear processing is performed on the gradient value corresponding to the index in the server-side aggregate data to obtain a new gradient value, and the new gradient value is calculated based on the following formula:

7. The knowledge federation-based data security exchange method according to claim 1, wherein before the participant trains the participant model using the pre-processed training data to obtain model data, comprising: the federal server side initialization model sends the model to one or more participating sides respectively; the participating end performs alignment processing on the training data.

8. The knowledge federation-based data security exchange device is characterized by comprising the following modules for circularly executing until reaching a preset stop condition:

The final aggregation module is used for each participating terminal to locally aggregate according to the service terminal aggregation data and the participating terminal aggregation data to obtain final aggregation data, and comprises the following steps: inquiring the index of the aggregation data of the participating end and the aggregation data of the service end, and carrying out the following operations according to the inquiry result: aiming at indexes queried in the aggregation data of the participating end and the aggregation data of the service end, taking a gradient value corresponding to the index in the aggregation data of the service end as a new gradient value; aiming at the index which is only inquired in the server-side aggregation data, carrying out nonlinear processing on the gradient value corresponding to the index in the server-side aggregation data to obtain a new gradient value; and maintaining the gradient value unchanged for indexes which are not queried in the aggregation data of the participating end and the aggregation data of the service end.

9. A readable storage medium, characterized in that the readable storage medium has stored therein a computer program for implementing the method of any of claims 1 to 7 when being executed by a processor.