CN117521783A

CN117521783A - Federal machine learning method, apparatus, storage medium and processor

Info

Publication number: CN117521783A
Application number: CN202311576433.7A
Authority: CN
Inventors: 江军; 王炜
Original assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Current assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Priority date: 2023-11-23
Filing date: 2023-11-23
Publication date: 2024-02-06

Abstract

The embodiment of the application provides a federal machine learning method, a federal machine learning device, a storage medium and a processor. The method comprises the following steps: determining the issuing model parameters of the global model in the current training round; acquiring historical single contribution degree and historical accumulated contribution degree of a local model of each client node in the last training round; determining the current accumulated contribution degree of each client node according to the total historical training times of the global model, the historical single contribution degree and the historical accumulated contribution degree of each client node; determining the number of the client nodes participating in the current training round and the probability of the current random selection of each client node according to the current accumulated contribution degree of each client node; selecting the number of client nodes participating in the current training round according to the probability that the current random of each client node is selected; and sending the issued model parameters to each client node participating in the round of training to train the local model so as to improve the model performance of federal machine learning.

Description

Federal machine learning method, apparatus, storage medium and processor

Technical Field

The present application relates to the field of computer technology, and in particular, to a federal machine learning method, apparatus, storage medium, processor, and computer device.

Background

At present, with the rapid development of artificial intelligence technology, the application of the artificial intelligence technology is more and more extensive, and the artificial intelligence technology has been applied to a plurality of fields such as image analysis, voice recognition, word processing, intelligent recommendation, security detection and the like, and privacy computing technology represented by federal machine learning has become a new leading edge technical hotspot field. Federal machine learning can effectively assist multiple institutions in data usage and machine learning modeling under requirements that meet user privacy protection, data security, and government regulations.

However, under the condition of protecting user privacy, the traditional federal machine learning carries out the training of federal machine learning according to each client node, but for some client nodes, the traditional federal machine learning may not have certain advantages for improving the performance of federal machine learning, and if some client nodes are selected blindly or all client nodes are adopted for federal machine learning, the training effect of federal machine learning is reduced, so that the performance of a model is low.

Disclosure of Invention

An object of an embodiment of the present application is to provide a federal machine learning method, apparatus, storage medium, processor and computer device, so as to solve the problem of low federal machine learning performance in the prior art.

To achieve the above object, a first aspect of the present application provides a federal machine learning method, applied to a server node, including:

determining final model parameters obtained after the global model on the server node is trained in the previous round as issuing model parameters of the global model in the current training round;

acquiring historical single contribution degree and historical accumulated contribution degree of a local model of each client node after model parameter training issued in the previous training based on a global model;

determining the current accumulated contribution degree of each client node according to the total historical training times of the global model, the historical single contribution degree and the historical accumulated contribution degree of each client node;

determining the number of the client nodes participating in the current training round and the probability of being selected at the current random of each client node according to the current accumulated contribution degree of each client node;

selecting the client nodes participating in the current training round, which are the number of the client nodes, from a plurality of client nodes according to the probability that the current random of each client node is selected;

And sending the issuing model parameters to each client node participating in the round of training, so that each client node participating in the round of training trains a corresponding local model through the issuing model parameters.

In the embodiment of the application, determining the number of the client nodes participating in the current training round according to the current accumulated contribution degree of each client node, and the probability that the current random of each client node is selected includes: determining the total number of nodes of the client nodes with the current accumulated contribution degree larger than a first preset value; determining a smaller value of the total number of nodes and the planned participation number of the client nodes of the current training round as the number of the client nodes participating in the current training round; selecting one client node from a plurality of client nodes as a selected client node, and determining a current cumulative contribution sum of the plurality of client nodes; determining a ratio between a current cumulative contribution of the selected client node and a sum of the current cumulative contributions as a probability that the current random of the selected client node is selected; the selected client node is culled from the plurality of client nodes and returns to the step of selecting one client node from the plurality of client nodes as the selected client node until a probability of the current random of each client node being selected is determined.

In an embodiment of the present application, the method further includes: after sending the model parameters to each client node participating in the training of the round, receiving a local update gradient of a local model sent by each client node participating in the training of the round; determining the current single contribution degree of each client node participating in the current round of training relative to a target client node according to the local updating gradient of each local model, wherein the target client node is selected from a plurality of client nodes according to preset service requirements; updating the parameters of the issuing model according to all the current single contribution degrees to obtain the final model parameters of the global model in the current training turn; and returning to the step of acquiring the historical single contribution degree and the historical accumulated contribution degree of the local model of each client node after the model parameter issued by the global model in the previous training round is trained under the condition that the final model parameter of the current training round and/or the total training frequency of the global model do not meet the termination training condition, until the re-determined final model parameter and/or the total training frequency meet the termination training condition.

In an embodiment of the present application, determining, according to a local update gradient of each client node participating in the present training round, a current single contribution degree of each client node participating in the present training round with respect to a target client node includes: determining a global updating gradient of the global model and first test parameters of each client node participating in the round of training according to the local updating gradient of each client node participating in the round of training; determining a second test parameter for the global model according to the global update gradient; transmitting the first test parameters and the second test parameters of each client node participating in the training round to a target client node; acquiring a first predicted loss and a second predicted loss which are respectively corresponding to each first test parameter and each second test parameter and are sent by a target client node; and determining the current single contribution degree of each client node participating in the training round relative to the target client node according to the first prediction loss and the second prediction loss of each client node participating in the training round.

In an embodiment of the present application, determining, according to the first predicted loss and the second predicted loss of each client node participating in the present training round, a current single contribution of each client node participating in the present training round with respect to the target client node includes: determining, for each client node participating in the current round of training, a loss difference between a first predicted loss and a second predicted loss of the client nodes participating in the current round of training; for each client node participating in the present round of training, determining a ratio between the loss difference value and the second predicted loss as a current single contribution of the client node participating in the present round of training relative to the target client node.

In an embodiment of the present application, the first test parameter is determined by formula (1):

wherein w is _i ′ _k Refers to the first test parameter, w, of the ith client node participating in the round of training in the kth training _k-1 Refers to the final model parameters of the global model during the kth-1 training, eta refers to the learning rate, m refers to the total number of client nodes participating in the present training during the kth training, and D _j The j refers to the total data in the sample data set of the j-th client node participating in the present training when the k-th training is performed, the j-th client node participating in the present training refers to a client node participating in the present training which is different from the i-th client node participating in the present training among m client nodes participating in the present training, Refers to the local update gradient of the jth client node participating in the present round of training at the kth training.

In the embodiment of the present application, updating the parameters of the issuing model according to all the current single contribution degrees to obtain the final model parameters of the global model in the current training round includes: selecting the client nodes which participate in the round of training and have the current single contribution degree larger than a second preset value as candidate client nodes; and updating the issued model parameters according to the local updating gradient of the candidate client node so as to obtain the final model parameters of the global model in the current training round.

In the embodiment of the application, the final model parameters of the global model in the current training round are determined by the formula (2):

wherein w is _k Refers to final model parameters, w, of the global model during the kth training _k-1 Refers to final model parameters of the global model during the k-1 th training, eta refers to learning rate, O ^k Refers to the total number of candidate client nodes, |D, at the kth training time _i I refers to the total amount of data in the sample dataset of the i-th client node participating in the present round of training,refers to the local update gradient of the ith client node participating in the present round of training at the kth training.

In the embodiment of the application, determining the current cumulative contribution of each client node according to the historical training total times of the global model, the historical single contribution and the historical cumulative contribution of each client node comprises: for each client node, determining the current accumulated contribution of the client node as a third preset value under the condition that the total historical training times are equal to the preset times; for each client node, under the condition that the total historical training times are larger than preset times, respectively determining a first weight coefficient of historical single contribution degree and a second weight coefficient of historical accumulated contribution degree of the client node; and for each client node, carrying out weighted summation on the historical single contribution degree and the historical accumulated contribution degree of the client node based on the first weight coefficient and the second weight coefficient to obtain the current accumulated contribution degree of the client node.

A second aspect of the present application provides a processor configured to perform the federal machine learning method described above.

A third aspect of the present application provides a machine-readable storage medium having stored thereon instructions for causing a machine to perform the federal machine learning method described above.

A fourth aspect of the present application provides a federal machine learning device comprising a processor as described above.

A fifth aspect of the present application provides a computer device comprising:

a memory configured to store instructions; and

the federal machine learning device described above.

According to the technical scheme, the participation probability of each client node in the current training round is determined according to the current accumulated contribution degree of each client node in the current training round, so that the client nodes participating in the current training round are dynamically adjusted, the local model of the client nodes participating in the current training round is trained through the issuing model parameters of the global model in the current training round, the effective client nodes are fully utilized to optimize and promote federal machine learning, and the training effect and model performance of federal machine learning are improved.

Additional features and advantages of embodiments of the present application will be set forth in the detailed description that follows.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the present application and are incorporated in and constitute a part of this specification, illustrate embodiments of the present application and together with the description serve to explain, without limitation, the embodiments of the present application. In the drawings:

FIG. 1 schematically illustrates an application environment schematic of a federal machine learning method according to an embodiment of the present application;

FIG. 2 schematically illustrates a flow diagram of a federal machine learning method according to an embodiment of the present application;

FIG. 3 schematically illustrates a flow diagram of a federal machine learning method according to another embodiment of the present application;

FIG. 4 schematically illustrates a timing diagram of a federal machine learning method according to an embodiment of the present application;

fig. 5 schematically shows an internal structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the specific implementations described herein are only for illustrating and explaining the embodiments of the present application, and are not intended to limit the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

It should be noted that, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is only for descriptive purposes, and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be regarded as not exist and not within the protection scope of the present application.

The federal machine learning method provided by the application can be applied to an application environment shown in fig. 1. Wherein server node 101 communicates with client node 102 over a network. The client nodes 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the number of client nodes 102 is plural. Server node 101 may be implemented as a stand-alone server or as a server cluster of multiple servers.

FIG. 2 schematically illustrates a flow diagram of a federal machine learning method according to an embodiment of the present application. As shown in fig. 2, in an embodiment of the present application, a federal machine learning method is provided, and this embodiment is mainly exemplified by the application of the method to the server node 101 in fig. 1, and includes the following steps:

step 201: and determining final model parameters obtained after the global model on the server node is trained in the last round as issuing model parameters of the global model in the current training round.

The server node can determine the final model parameters obtained after the global model on the server node is trained in the previous round as the issuing model parameters of the global model in the current training round. The final model parameters refer to global model parameters obtained after the global model is trained in the previous round.

Step 202: and acquiring historical single contribution degree and historical accumulated contribution degree of the local model of each client node after model parameter training issued in the previous training based on the global model.

The local model of each client node can be trained based on model parameters issued by the global model in the previous training round, and corresponding historical single contribution and historical cumulative contribution are obtained. The server node may obtain a historical single contribution and a historical cumulative contribution for the local model of each client node.

Step 203: and determining the current accumulated contribution degree of each client node according to the historical training total times of the global model, the historical single contribution degree and the historical accumulated contribution degree of each client node.

The server node may determine a current cumulative contribution for each client node based on a total number of historical training of the global model and the historical single contribution and the historical cumulative contribution for each client node.

In the case where the total number of historical training of the global model is equal to the preset number, the server node may determine that the current cumulative contribution of the client node is a third preset value. The preset times and the third preset value can be customized according to actual conditions. For example, the preset number of times may be 1, and the third preset value may be 0. And under the condition that the total historical training times of the global model are larger than the preset times, the server node can respectively determine a first weight coefficient of the historical single contribution degree and a second weight coefficient of the historical accumulated contribution degree of the client node. The server node may perform weighted summation on the historical single contribution and the historical accumulated contribution of the client node based on the first weight coefficient and the second weight coefficient to obtain the current accumulated contribution of the client node.

In embodiments of the present application, the current cumulative contribution of each client node may be determined by the following formula:

wherein,refers to the current cumulative contribution of the jth client node during the kth training, +.>Refers to the historical cumulative contribution of the jth client node during the kth-1 training,/>Refers to the historical single contribution degree of the jth client node in the kth training, alpha refers to the first weight coefficient of the historical single contribution degree of the client node, 1-alpha refers to the second weight coefficient of the historical accumulated contribution degree of the client node, k refers to the total number of historical training times of the global model, alpha is a constant, k and j are positive integers, and 0 < j is less than or equal to N _c ，N _c Refers to the total number of federal learning client nodes.

Step 204: the number of client nodes participating in the current training round and the probability that the current random of each client node is selected are determined according to the current accumulated contribution of each client node.

The server node may determine the number of client nodes participating in the current training round and the probability that the current random of each client node is selected based on the current cumulative contribution of each client node.

The server node may determine, from the plurality of client nodes, client nodes having a current cumulative contribution greater than a first preset value according to the current cumulative contribution of each client, and determine a total number of nodes for the client nodes having the current cumulative contribution greater than the first preset value. The server node may obtain the planned participation number of the client nodes of the current training round, and may determine a smaller value of the total number of nodes and the planned participation number as the number of the client nodes participating in the current training round, that is, the actual participation number of the client nodes of the current training round.

In the embodiment of the application, the number of client nodes participating in the current training round may be determined according to the following formula:

m＝min{N _fc ,|S ^k |}

wherein m is the number of client nodes participating in the current training round, N _fc Refers to the number of planned participation of the client node for the current training round, |s ^k The i refers to the total number of nodes of the client nodes whose current cumulative contribution is greater than a first preset value.

The server node may determine the probability that the current random of each client is selected based on not replacing the independent random samples. The server node may select one client node from the plurality of client nodes as the selected client node and determine a current cumulative contribution sum for the plurality of client nodes. The server node may determine a ratio between the current cumulative contribution of the selected client node and the current cumulative contribution sum and determine the ratio as a probability that the current random of the selected client node is selected. The server node may then cull the selected client node from the plurality of client nodes and return to the step of selecting one of the plurality of client nodes as the selected client node. At this time, the server node may select one client node from among the plurality of client nodes excluding the selected client node as a new selected client node, determine a new current cumulative contribution sum of the plurality of client nodes excluding the selected client node, and determine a probability that the current random of the new selected client node is selected until a probability that the current random of each client node is selected is determined.

Step 205: and selecting the client nodes participating in the current training round from a plurality of client nodes according to the probability that the current random of each client node is selected.

The server node may select the number of client nodes participating in the current training round from the plurality of client nodes according to a probability that the current random of each client node is selected.

Step 206: and sending the issuing model parameters to each client node participating in the round of training, so that each client node participating in the round of training trains a corresponding local model through the issuing model parameters.

The server node may send the model parameters to each of the client nodes participating in the present round of training, so that each of the client nodes participating in the present round of training trains a corresponding local model through the model parameters.

After sending the parameter values of the issued model, each client node participating in the present round of training can update the corresponding local model through the parameter values of the issued model, and train the corresponding local model by utilizing the local training data set and adopting a random gradient descent algorithm so as to obtain the local update gradient of the local model. Each client node participating in the current round of training may send a local update gradient of the local model to the server node. The server node may receive the local update gradients of the local model sent by each of the client nodes participating in the current round of training.

The server node may determine a current single contribution of each of the client nodes participating in the current round of training relative to the target client node based on the local update gradients of each of the local models. The current single contribution degree of each client node participating in the training round refers to the influence degree of the local update gradient generated by the client node participating in the training round on the prediction loss of the global model of the current training round on the target client node. The target client node is selected from a plurality of client nodes according to preset service requirements. The target client node may be any one of a plurality of client nodes, which may be selected from the plurality of client nodes before the initial training round according to a preset traffic demand.

The server node can update the parameters of the issuing model according to all the current single contribution degrees so as to obtain the final model parameters of the global model in the current training round. And returning to the step of acquiring the historical single contribution degree and the historical accumulated contribution degree of the local model of each client node after the model parameter issued by the global model in the previous training round is trained under the condition that the final model parameter of the current training round and/or the total training frequency of the global model do not meet the termination training condition, until the re-determined final model parameter and/or the total training frequency meet the termination training condition. The final model parameter convergence of the current training round and/or the total training times of the global model are/is greater than or equal to the preset times under the condition of ending the training.

The server node may determine a global update gradient for the global model and a first test parameter for each client node participating in the current round of training based on the local update gradient for each client node participating in the current round of training. In embodiments of the present application, the global update gradient of the global model may be determined by the following formula:

wherein,refers to the global update gradient, |D, of the global model during the kth training _i The I refers to the total data amount in the sample data set of the ith client node participating in the present training at the kth training, and +.>The local update gradient of the ith client node participating in the present training in the kth training is equal to or greater than 0 and less than or equal to m, and m is the total number of the client nodes participating in the present training in the kth training.

In the embodiment of the application, the first test parameter of each client node participating in the present training is determined by the formula (1):

wherein w' _ik Refers to the first test parameter, w, of the ith client node participating in the round of training in the kth training _k-1 Refers to the final model parameters of the global model during the kth-1 training, eta refers to the learning rate, m refers to the total number of client nodes participating in the present training during the kth training, and D _j The j refers to the total data in the sample data set of the j-th client node participating in the present training when the k-th training is performed, the j-th client node participating in the present training refers to a client node participating in the present training which is different from the i-th client node participating in the present training among m client nodes participating in the present training,refers to the local update gradient of the jth client node participating in the present round of training at the kth training.

The server node may determine a second test parameter for the global model from the global update gradient. In an embodiment of the present application, the second test parameters of the global model may be determined by the following formula:

wherein w' _kk Refers to the second test parameter, w, of the global model during the kth training _k-1 Refers to final model parameters of the global model during the k-1 training, eta refers to learning rate, 0<η<1，Refers to the global update gradient of the global model at the kth training.

The server node may send the first test parameters and the second test parameters of each of the client nodes participating in the current round of training to the target client node. The target client node may update the local model with the first test parameter and the second test parameter, respectively, and train the corresponding local model with the local test data set, respectively, to obtain a first predicted loss and a second predicted loss corresponding to each of the first test parameter and the second test parameter. The target client node may send the first predicted loss and the second predicted loss to the server node, respectively, and the server node may obtain the first predicted loss and the second predicted loss sent by the target client node and corresponding to each of the first test parameter and the second test parameter, respectively. The server node may determine a current single contribution of each of the client nodes participating in the current round of training relative to the target client node based on the first predicted loss and the second predicted loss of each of the client nodes participating in the current round of training.

For each client node participating in the current round of training, the server node may determine a loss difference between the first predicted loss and the second predicted loss of the client nodes participating in the current round of training. The server node may determine a ratio between the loss difference and the second predicted loss and determine the ratio as a current single contribution of the client nodes participating in the current round of training relative to the target client node. The current single contribution degree of each client node participating in the training round refers to the influence degree of the local update gradient generated by the client node participating in the training round on the prediction loss of the global model of the current training round on the target client node.

In an embodiment of the present application, the current single contribution of the client node participating in the present round of training with respect to the target client node may be determined by the following formula:

wherein,refers to the current single contribution degree, L ', of the ith client node participating in the round of training in the kth training' _ik Refers to a first predictive loss, L ', corresponding to a first test parameter of an ith client node participating in the round of training at the kth training' _k Refers to a second predicted loss corresponding to a second test parameter of the global model at the kth training.

The server node may select the client nodes participating in the training round with the current single contribution degree greater than the second preset value as candidate client nodes. The second preset value can be customized according to actual conditions. For example, the second preset value is 0. The server node may update the model parameters for delivery according to the local update gradients of the candidate client nodes to obtain final model parameters for the global model for the current training round.

In the embodiment of the application, the final model parameters of the global model in the current training round can be determined through a formula (2):

wherein w is _k Refers to final model parameters, w, of the global model during the kth training _k-1 Refers to final model parameters, P, of the global model during the k-1 th training ^k Refers to a set of client nodes participating in this round of training with a current single contribution greater than a second preset value,refers to the local update gradient of the client node participating in the round of training, I D, of which the ith current single contribution degree is larger than a second preset value in the kth training _i The i refers to the total data amount in the sample data set of the client nodes participating in the training round, wherein the i-th current single contribution degree is larger than a second preset value.

According to the technical scheme, the participation probability of each client node in the current training round is determined according to the current accumulated contribution degree of each client node in the current training round, so that the client nodes participating in the current training round are dynamically adjusted, the local model of the client nodes participating in the current training round is trained through the issuing model parameters of the global model in the current training round, the effective client nodes are fully utilized to optimize and promote federal machine learning, and the training effect and model performance of federal machine learning are improved. And the global model of the current training round is generated by aggregation of local update gradients provided by the client nodes with the contribution degree of the current round being larger than a second preset value, so that the negative effect of the client nodes with the negative contribution degree of the current round on the global model of the current training round is effectively avoided.

As shown in fig. 3, a flow chart of another federal machine learning method is provided, comprising the steps of:

step 301: the server node S may initialize the global model parameters. Assume that the total number of federal learning client nodes is N _c . The server may initialize the global model parameters. The server side is referred to as server node S. The server node S can initialize the parameters of the adopted deep neural network model, wherein the initialization parameters are w ₀ 。

Step 302: the server may issue the global model parameters to a portion of the client nodes randomly chosen based on the cumulative contribution. The model parameter issuing comprises the steps of selecting training nodes and synchronizing model parameters. Wherein the step of selecting the training node comprises: in the kth round of model training, the server node S may determine each client node C _j Cumulative contribution of (2)Wherein (1)> Refers to the historical single contribution degree of the jth client node during the kth-1 training, and +.>Refers to the historical accumulated contribution degree of the jth client node in the kth-1 training, k refers to the model training round, alpha is a weight constant, and 0<α<1. The server node S is according to each client node C _j Cumulative contribution- >From N _c And randomly selecting m client nodes from the client nodes. In particular implementations, the above-described selection process may be split into m times of single client node non-return independent random sampling processes. Each time sampling is performed, the probability of each client node which is not yet pumped in the current time is the ratio of the cumulative contribution of the node to the sum of the cumulative contribution of all the nodes which are not yet pumped in.

If the cumulative contribution of a certain client node is not greater than 0, the probability that node is picked is 0. The selected federal learning client node is C _i I is the client node sequence number, i is more than 0 and less than or equal to m. Wherein m=min { N _fc ,|S ^k I, m is the number of client nodes actually extracted, N _fc Representing clients of each round of plan extractionThe planned number of end nodes is 0 < N _fc ≤N _c Min { x, y } is a function of the minimum of both x and y, 0 < m.ltoreq.N _c ，S ^k Is a set of all "sequence numbers of client nodes with cumulative contribution greater than 0, |S ^k I represents the set S ^k The number of elements in (a). Therein i, j, k, N _fc And N _c Are all positive integers. The step of model parameter synchronization includes: the server node S uses the parameters w of the global model of the deep neural network model _k-1 To the selected federal learning client node C _i . Wherein w is _k-1 Refers to the final model parameters of the global model at the k-1 th training.

Step 303: the selected client node may train a local model based on the local training data and upload a local update gradient to the server node. The model local training step comprises the following steps: each selected federal learning client node C during the kth round of model training _i Using global model parameters w received from server node S _k-1 Updating the local model, training the local model by using the local training data set and adopting a random gradient descent algorithm, and calculating a local updating gradient of the local modelEach federal learning client node C _i Local update gradient of local model +.>To the server node S.

Step 304: the server can synthesize the test parameters and send the test parameters to the target client nodes, calculate the contribution degree of each client node according to the test result of the target client nodes, and aggregate the contribution degree to generate global model parameters. During the training of the kth round of model, the server node S firstly receives the update gradient uploaded by each client nodeThen according to->Determining a global update gradient of a global model>Wherein (1)>Refers to the global update gradient, |D, of the global model during the kth training _i I refers to the total amount of data in the sample dataset for the ith client node participating in the present round of training at the kth training,the local update gradient of the ith client node participating in the present training in the kth training is equal to or greater than 0 and less than or equal to m, and m is the total number of the client nodes participating in the present training in the kth training.

The server node S may be according toCalculating parameters w 'of the global model' _k . Wherein w' _k Refers to the second test parameter, w, of the global model during the kth training _k-1 Refers to final model parameters of the global model during the k-1 training, eta refers to learning rate, 0<η<1，/>Refers to the global update gradient of the global model at the kth training. The server node S may be according to->Calculate each client node C _i Corresponding test parameter w' _ik . Wherein w' _1k Refers to the first test parameter, w, of the ith client node participating in the round of training in the kth training _k-1 Refers to the full training in the k-1 timeFinal model parameters of the office model, η refers to learning rate, m refers to total number of client nodes participating in the present round of training at the kth training, and |d _j The j refers to the total data in the sample data set of the j th client node participating in the present training in the kth training, the j th client node participating in the present training refers to the client node participating in the present training which is different from the i th client node participating in the present training in the m client nodes participating in the present training, and the j th client node participating in the present training is the client node participating in the present training >Refers to the local update gradient of the jth client node participating in the present round of training at the kth training.

The server node S may send w _k ' and w _i ′ _k And sending the message to the target client node. Wherein D is _i For client node C _i Is a local store of sample data sets, |d _i I is set D _i The number of the contained elements, eta is the learning rate, 0<η<1. Target client node tests w using local test dataset _k ' and w _i ′ _k Predictive loss L 'of corresponding model' _k And L' _ik And sent to the server node S. The server node S may be according toComputing each client node C during the kth round of model training _i Contribution to target client node +.>Wherein (1)>Refers to the current single contribution degree, L ', of the ith client node participating in the round of training in the kth training' _ik Refers to a first predictive loss, L ', corresponding to a first test parameter of an ith client node participating in the round of training at the kth training' _k Refers to a second predicted loss corresponding to a second test parameter of the global model at the kth training.

Server node S may aggregate client node uploads with positive contributionAnd is combined withFrom generating final global model parameters w _k . Wherein w is _k Refers to final model parameters, w, of the global model during the kth training _k-1 Refers to the final model parameters of the global model at the k-1 th training,refers to the set of client nodes participating in the training round with the current single contribution degree greater than 0,/for the client nodes>Refers to the local update gradient of the client node participating in the round of training, I D, of which the ith current single contribution degree is larger than a second preset value in the kth training _i The i refers to the total data amount in the sample data set of the client nodes participating in the training round, wherein the i-th current single contribution degree is larger than a second preset value.

Step 305: the server side can judge the training termination condition, if the training termination condition is not met, the server side issues global model parameters, and the training process is repeated. For example, if the parameters w of the global model _k Convergence or k.gtoreq.E _TH The training process is terminated. Wherein E is _TH Is a positive integer. If the parameters w of the global model _k Unconverged or k < E _TH And returning to the step that the execution server side transmits the model parameters to part of client nodes selected randomly according to the accumulated contribution degree, and starting the (k+1) th round of model training process.

In one embodiment, the server node initializes parameters of the deep neural network model employed, the initialization parameters being w ₀ 。

In the kth round of model training, the server node S is trained according to each client node C _j Cumulative contribution of (2)From N _c Randomly choosing m=1000 client nodes from 10000 client nodes. The server node S uses the parameters w of the global model of the deep neural network model _k-1 To the selected federal learning client node. Wherein each client node has private user behavior data stored therein.

Each selected federal learning client node C during the kth round of model training _i Using global model parameters w received from server node S _k-1 Updating the local model, training the local model by using a random gradient descent algorithm by using a local training data set, and calculating the updating gradient of the local modelWill->To the server node S.

During the training of the kth round of model, the server node S firstly receives the update gradient uploaded by each client nodeThen calculate the parameters w 'of the global model' _kk And each client node C _i Corresponding test parameter w' _ik And w 'is taken' _k And w' _ik And sending the message to the target client node. Where η may have an initial value of 0.05 and decays gradually towards 0 with k.

Target client node tests w 'using local test dataset' _k And w' _ik Predictive loss L 'of corresponding model' _k And L' _ik And sent to the server node S. The server node S calculates each client node C during the kth round of model training _i Contribution to target client nodeServer node S aggregates client node uploads with positive contributionGenerating final global model parameters w _k 。

If the parameters w of the global model _k Convergence or k.gtoreq.E _TH =100, then the training process is terminated, otherwise, the execution server node S returns to the process according to each client node C _j Cumulative contribution of (2)From N _c A step of randomly selecting m=1000 client nodes from 10000 client nodes, and starting the k+1st round of model training process.

As shown in FIG. 4, a timing diagram of a federal machine learning method is provided.

The server node 101 determines the model parameters issued by the global model in the current training round, and obtains the historical single contribution and the historical cumulative contribution of the local model of each client node 102, which are obtained after the local model is trained based on the model parameters issued by the global model in the previous training round. The server node 101 determines the current cumulative contribution of each client node 102 from the total number of historical training of the global model and the historical single contribution and the historical cumulative contribution of each client node 102.

The server node 101 determines the number of client nodes participating in the current training round according to the current cumulative contribution of each client node 102, and the probability that the current random of each client node 102 is selected, and selects the number of client nodes participating in the current training round from the plurality of client nodes 102 according to the probability that the current random of each client node 102 is selected. Server node 101 sends the model parameters down to each client node participating in the current training round. Each client node participating in the current training round can update the corresponding local model by issuing model parameters, train the corresponding local model by utilizing the local training data set and adopting a random gradient descent algorithm so as to obtain a local update gradient of the local model. Each client node participating in the current training round sends a local update gradient of the local model to the server node 101.

The server node 101 may determine a global update gradient of the global model and a first test parameter for each client node participating in the current round of training from the local update gradient of each client node participating in the current round of training. Server node 101 may determine a second test parameter for the global model from the global update gradient. The server node 101 may send the first test parameter and the second test parameter of each client node participating in the present round of training to the target client node. The target client node may update the local model with the first test parameter and the second test parameter, respectively, and train the corresponding local model with the local test data set, respectively, to obtain a first predicted loss and a second predicted loss corresponding to each of the first test parameter and the second test parameter. The target client node may send the first predicted loss and the second predicted loss to the server node 101, respectively. The server node 101 may determine a current single contribution of each client node participating in the current round of training relative to the target client node based on the first predicted loss and the second predicted loss of each client node participating in the current round of training.

The server node 101 may select, as the candidate client nodes, the client nodes participating in the present training round having the current single contribution degree larger than the second preset value. Server node 101 may update the model parameters for delivery according to the local update gradients of the candidate client nodes to obtain final model parameters for the global model for the current training round. Server node 101 may determine whether the final model parameters of the current training round and/or the total number of training of the global model satisfy the termination training condition. And returning to the step of acquiring the historical single contribution degree and the historical accumulated contribution degree of the local model of each client node 102, which are obtained after the model parameter sent by the global model in the previous training round is trained, until the redetermined final model parameter and/or the total training frequency meet the termination training condition under the condition that the final model parameter and/or the total training frequency of the current training round do not meet the termination training condition. The final model parameter convergence of the current training round and/or the total training times of the global model are/is greater than or equal to the preset times under the condition of ending the training.

FIGS. 2 and 3 are flow diagrams of a federal machine learning method in one embodiment. It should be understood that, although the steps in the flowcharts of fig. 2 and 3 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 and 3 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.

In one embodiment, a processor configured to perform the above-described federal machine learning method is provided.

In one embodiment, a machine-readable storage medium having instructions stored thereon for causing a machine to perform the federal machine learning method described above is provided.

In one embodiment, a federal machine learning device is provided, comprising the processor described above.

In one embodiment, there is provided a computer device comprising:

a memory configured to store instructions; and

the federal machine learning device described above.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor a01, a network interface a02, a memory (not shown) and a database (not shown) connected by a system bus. Wherein the processor a01 of the computer device is adapted to provide computing and control capabilities. The memory of the computer device includes internal memory a03 and nonvolatile storage medium a04. The nonvolatile storage medium a04 stores an operating system B01, a computer program B02, and a database (not shown in the figure). The internal memory a03 provides an environment for the operation of the operating system B01 and the computer program B02 in the nonvolatile storage medium a04. The database of the computer equipment is used for storing data such as the model parameters and the like issued by the current training round. The network interface a02 of the computer device is used for communication with an external terminal through a network connection. The computer program B02, when executed by the processor a01, implements a federal machine learning method.

It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The embodiment of the application provides equipment, which comprises a processor, a memory and a program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the federal machine learning method.

The present application also provides a computer program product adapted to perform a program initialized with the steps of the federal machine learning method described above when executed on a data processing apparatus.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A federal machine learning method, for use with a server node, the method comprising:

determining final model parameters obtained after the global model on the server node is trained in the last round as issuing model parameters of the global model in the current training round;

Acquiring historical single contribution degree and historical accumulated contribution degree of a local model of each client node after model parameter training issued in the previous training based on the global model;

determining the current accumulated contribution of each client node according to the total historical training times of the global model and the historical single contribution and the historical accumulated contribution of each client node;

determining the number of the client nodes participating in the current training round and the probability of the current random selection of each client node according to the current accumulated contribution degree of each client node;

selecting the number of the client nodes participating in the current training round from a plurality of client nodes according to the probability that the current random of each client node is selected;

2. The federal machine learning method according to claim 1, wherein the determining the number of client nodes participating in the current training round based on the current cumulative contribution of each client node, and the probability that the current random of each client node is selected comprises:

Determining the total number of nodes of the client nodes with the current accumulated contribution degree larger than a first preset value;

determining a smaller value of the total number of nodes and the planned participation number of the client nodes of the current training round as the number of the client nodes participating in the current training round;

selecting one client node from the plurality of client nodes as a selected client node, and determining a current cumulative contribution sum of the plurality of client nodes;

determining a ratio between a current cumulative contribution of the selected client node and a sum of the current cumulative contributions as a probability that a current random of the selected client node is selected;

the selected client node is removed from the plurality of client nodes and returned to the step of selecting one of the plurality of client nodes as the selected client node until a probability of the current random of each client node being selected is determined.

3. The federal machine learning method of claim 1, further comprising:

after the parameters of the issued model are sent to each client node participating in the training of the round, receiving a local update gradient of a local model sent by each client node participating in the training of the round;

Determining the current single contribution degree of each client node participating in the current round of training relative to a target client node according to the local updating gradient of each local model, wherein the target client node is selected from a plurality of client nodes according to preset service requirements;

updating the parameters of the issuing model according to all the current single contribution degrees to obtain final model parameters of the global model in the current training round;

and returning to the step of acquiring the historical single contribution degree and the historical accumulated contribution degree of the local model of each client node, which are obtained after the local model is trained based on the model parameters issued by the global model in the previous training round, until the redetermined final model parameters and/or the total training times meet the termination training conditions under the condition that the final model parameters of the current training round and/or the total training times of the global model are not met.

4. A federal machine learning method according to claim 3, wherein said determining the current single contribution of each client node participating in the present round of training relative to the target client node based on the local update gradient of each client node participating in the present round of training comprises:

Determining a global update gradient of the global model and a first test parameter of each client node participating in the round of training according to the local update gradient of each client node participating in the round of training;

determining a second test parameter for the global model according to the global update gradient;

transmitting the first test parameters and the second test parameters of each client node participating in the training round to the target client node;

acquiring a first prediction loss and a second prediction loss which are respectively corresponding to each first test parameter and each second test parameter and are transmitted by the target client node;

and determining the current single contribution degree of each client node participating in the round of training relative to the target client node according to the first prediction loss and the second prediction loss of each client node participating in the round of training.

5. The federal machine learning method according to claim 4, wherein the determining a current single contribution of each client node participating in the present round of training relative to the target client node based on the first predicted loss and the second predicted loss of each client node participating in the present round of training comprises:

Determining, for each client node participating in the current round of training, a loss difference between a first predicted loss and the second predicted loss of the client node participating in the current round of training;

for each client node participating in the present round of training, determining a ratio between the loss difference value and the second predicted loss as a current single contribution of the client node participating in the present round of training relative to the target client node.

6. The federal machine learning method according to claim 4, wherein the first test parameter is determined by equation (1):

wherein w' _ik Refers to the first test parameter, w, of the ith client node participating in the round of training in the kth training _k-1 Refers to the final model parameters of the global model during the kth-1 training, eta refers to the learning rate, m refers to the total number of client nodes participating in the present training during the kth training, and D _j The j refers to the total data in the sample data set of the j-th client node participating in the present training when the k-th training is performed, the j-th client node participating in the present training refers to a client node participating in the present training which is different from the i-th client node participating in the present training among m client nodes participating in the present training, Refers to the local update gradient of the jth client node participating in the present round of training at the kth training.

7. A federal machine learning method according to claim 3, wherein updating the issued model parameters based on all current single contributions to obtain final model parameters for the global model at a current training round comprises:

selecting the client nodes which participate in the round of training and have the current single contribution degree larger than a second preset value as candidate client nodes;

and updating the issuing model parameters according to the local updating gradient of the candidate client node so as to obtain final model parameters of the global model in the current training round.

8. The federal machine learning method according to claim 7, wherein final model parameters of the global model at the current training round are determined by equation (2):

wherein w is _k Refers to final model parameters, w, of the global model during the kth training _k-1 Refers to final model parameters of the global model during the k-1 th training, eta refers to learning rate and P ^k Refers to the total number of candidate client nodes, |D, at the kth training time _i I refers to the total amount of data in the sample dataset of the i-th client node participating in the present round of training, Refers to the local update gradient of the ith client node participating in the present round of training at the kth training.

9. The federal machine learning method according to claim 1, wherein the determining the current cumulative contribution of each client node from the total number of historical training of the global model and the historical single contribution and the historical cumulative contribution of each client node comprises:

for each client node, determining that the current accumulated contribution of the client node is a third preset value under the condition that the total historical training times are equal to preset times;

for each client node, respectively determining a first weight coefficient of historical single contribution degree and a second weight coefficient of historical accumulated contribution degree of the client node under the condition that the total number of historical training times is larger than the preset number of times;

for each client node, weighting and summing the historical single contribution degree and the historical accumulated contribution degree of the client node based on the first weight coefficient and the second weight coefficient to obtain the current accumulated contribution degree of the client node.

10. A processor configured to perform the federal machine learning method according to any one of claims 1 to 9.

11. A machine-readable storage medium having stored thereon instructions for causing a machine to perform the federal machine learning method according to any one of claims 1 to 9.

12. A federal machine learning apparatus, comprising the processor of claim 10.

13. A computer device, comprising:

a memory configured to store instructions; and

the federal machine learning device according to claim 12.