CN117875454B

CN117875454B - Multistage intelligent linkage-based data heterogeneous federation learning method and storage medium

Info

Publication number: CN117875454B
Application number: CN202410159303.1A
Authority: CN
Inventors: 詹伟德; 杨宝瑶; 方志祥; 李东哲; 胡子逸; 吴恺盈
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2024-02-04
Filing date: 2024-02-04
Publication date: 2024-06-21
Anticipated expiration: 2044-02-04
Also published as: CN117875454A

Abstract

The invention provides a data isomerism federation learning method based on multistage intelligent association and a storage medium, wherein the core steps of the method comprise dynamic clustering and cascading optimization processes, on one hand, each client side dynamically clusters in real time according to the latest gradient update information of a model, and excessive fluctuation of the model performance in the integration process is reduced through model integration in clusters; after each client transmits the latest gradient update information of the model to the server, the server performs real-time dynamic clustering division according to the gradient update information to realize logical grouping, so that the problem of poor clustering effect caused by updating the model of the client can be effectively avoided; on the other hand, combining the dynamic clustering division result, enabling the client to perform intra-group serial training and inter-group parallel training, and finally uploading the model to a server through the grouped proxy client for global aggregation; the cascade optimization mode can realize the trade-off between model precision and training time under the heterogeneous data condition.

Description

Multistage intelligent linkage-based data heterogeneous federation learning method and storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a data heterogeneous federal learning method based on multistage intelligent association and a storage medium.

Background

Federal learning is an emerging artificial intelligence basic technology, and the design goal is to develop high-efficiency machine learning among multiple participants or multiple computing nodes on the premise of guaranteeing information security during large data exchange, protecting terminal data and personal data privacy and guaranteeing legal compliance.

As shown in fig. 1, most of the existing federal learning methods are based on parallel training, and these methods have good effects on model training time, but in the case of heterogeneous data, model accuracy is poor and volatility is obvious. Based on serial training, the training time overhead is extremely huge, although there is good performance in model accuracy, and even intolerable under limited computing resources.

The prior art does not need to intensively train data, only needs to aggregate model parameters trained by local data by each participant through a central server to cooperatively train, so as to maintain the data security of network security manufacturers; compared with the original federal learning method, the prior art has the advantages that participants with approximate data distribution are divided into clusters, so that the data distribution in each cluster is independent and distributed, and then the method of serial training of each cluster enables the federal learning algorithm to have a better effect on data with non-independent and same distribution; although serial training is used to improve the processing precision of heterogeneous data in the prior art, training time is greatly increased, and model precision and training time still cannot be well balanced.

Disclosure of Invention

The invention provides a data isomerism federation learning method and a storage medium based on multistage intelligent linkage, which are used for overcoming the defect that model precision and training time of federation learning are difficult to balance under the condition of data isomerism in the prior art, and a serial mode is adopted in dynamic grouping with similar data distribution, so that precision loss caused by direct aggregation is reduced, and model performance is effectively improved; meanwhile, a parallel mode is adopted among the packets with larger data distribution difference, so that training time is effectively reduced, and the model precision and training time are balanced under the heterogeneous data condition.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a multi-level intelligent association-based data heterogeneous federation learning method comprises the following steps:

S1: selecting a server and a plurality of clients, and constructing a global model by the server;

S2: the server sends the global model to each client, each client takes the received global model as a local model, and updates and trains the local model by utilizing local preset heterogeneous data of the local model to acquire gradient update information of the local model of each client;

S3: the method comprises the steps that gradient update information of a local model of each client is sent to a server, the server utilizes a preset clustering algorithm to conduct clustering division on all clients, all the clients are divided into a plurality of logic groups, corresponding proxy clients are selected for each logic group, and a dynamic clustering division result is obtained;

s4: the server sends the dynamic clustering division result to each client, cascade optimization training is carried out on the local model after updating training in all the clients, and each client respectively and correspondingly acquires the local model after cascade optimization;

the cascade optimization training comprises: for clients of the same logic group, performing intra-group serial training, and for clients of different logic groups, performing inter-group parallel training;

s5: uploading the local model after cascade optimization of each proxy client to a server, and aggregating all the received local models after cascade optimization by the server to obtain a new global model;

S6: and repeating the steps S2 to S5 until the new global model reaches a preset stopping condition, and completing the data isomerism federation learning of the server and the client.

Preferably, in the step S2, each client builds local preset heterogeneous data according to different dirichlet allocation function parameters and different sampling rates;

The dirichlet allocation function parameter is used for representing the isomerism of data between different clients; the sampling rate is used to represent the ratio of the maximum amount of local data available to each client to its total amount of data.

Preferably, in the step S2, the gradient update information of the local model of each client is specifically gradient update information of the last layer of the local model.

Preferably, the gradient update information of the last layer of each client local model is specifically:

G _Li is gradient update information of the last layer of the i-th client local model; η is the learning step size; Local model parameters for the ith client; d _i is heterogeneous data locally preset by the ith client; l _i is the loss function of the i-th client.

Preferably, the loss function L _i of the ith client is a negative log likelihood loss function, specifically:

L_i(y,f(x))＝-log(f(x)[y])

Wherein f (x) is the predicted output probability of the client local model on the input x; y is a real class label; f (x) · [ y ] is the probability that the client local model prediction class is y.

Preferably, the clustering algorithm preset in step S3 includes: any one of a condensation hierarchical clustering algorithm, a K-means clustering algorithm, a DBSCAN clustering algorithm, a spectral clustering algorithm, an OPTICS clustering algorithm and a MEAN SHIFT clustering algorithm.

Preferably, in the step S4, for the clients of the same logical group, performing intra-group serial training includes:

if the number of clients in the same logic group is greater than 1, after one client finishes cascade optimization training, the local model parameters after cascade optimization are transmitted to the next client, and the formula of serial training is as follows:

if the number of clients in the same logical group is 1, the formula of serial training is:

Wherein, Representing model parameters of the ith client in the jth group when the t+1 th round of cascade optimization iteration,Model parameters representing the i-1 th client; d _i represents heterogeneous data locally preset by the ith client.

Preferably, in the step S5, the local model after cascade optimization of each proxy client is uploaded to a server, and the server aggregates all the received local models after cascade optimization, where the specific formula is as follows:

Wherein, Is a global model in the t-th round of iteration; n _k is the number of clients in the group of the kth logical group; n is the total number of clients; /(I)And when the iteration is the t round, the proxy client corresponding to the k logical group is cascaded with the optimized local model.

Preferably, in the step S6, the preset stopping condition is: repeating steps S2-S5 for times to reach a preset value, or the new global model meets the preset precision requirement.

The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above method.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention provides a data isomerism federation learning method based on multistage intelligent association and a storage medium, wherein a server and a plurality of clients are selected, the server constructs a global model and sends the global model to each client; each client takes the received global model as a local model, and updates and trains the local model by utilizing local preset heterogeneous data of the local model to acquire gradient update information of the local model of each client; the method comprises the steps that gradient update information of a local model of each client is sent to a server, the server utilizes a preset clustering algorithm to conduct clustering division on all clients, all the clients are divided into a plurality of logic groups, corresponding proxy clients are selected for each logic group, and a dynamic clustering division result is obtained; the server sends the dynamic clustering division result to each client, cascade optimization training is carried out on the local model after updating training in all the clients, and each client respectively and correspondingly acquires the local model after cascade optimization; the cascade optimization training comprises: for clients of the same logic group, performing intra-group serial training, and for clients of different logic groups, performing inter-group parallel training; uploading the local model after cascade optimization of each proxy client to a server, and aggregating all the received local models after cascade optimization by the server to obtain a new global model; repeating the steps until the new global model reaches a preset stopping condition, and completing data heterogeneous federation learning of the server and the client;

The invention provides a novel data isomerism federation learning algorithm, which fully integrates the advantages of serial training (high model precision) and parallel training (short training time), and can better balance the model precision and the training time; secondly, the core steps of the method comprise 'client dynamic clustering' and 'cascade optimization', the clients with similar data distribution are clustered into a group by using model gradient information, and compared with the traditional parallel training model, the method has higher precision by using the proposed cascade optimization module, and a novel model training framework is provided for federal learning of data isomerism; meanwhile, the 'last layer network layer gradient update information' of the client model is used as the input of model aggregation, so that the communication cost and the calculation cost are greatly reduced, and different data distributions are better mapped by the clustering result; in addition, due to the great reduction of cost, the method and the device can realize dynamic clustering, avoid the problem of poor clustering effect caused by the change of the data distribution of the client, sense the change of the data distribution of the client in real time and improve the stability and the precision of the model.

Drawings

Fig. 1 is a schematic diagram of a conventional parallel federal learning framework provided in the background art.

Fig. 2 is a flowchart of a data heterogeneous federation learning method based on multi-level intelligence provided in embodiment 1.

Fig. 3 is a schematic diagram of a multi-level intelligent link-based data heterogeneous federal learning framework provided in example 2.

Fig. 4 is a graph of the performance of the methods in the first verification experiment provided in example 3 (α=0.1 and α=10.0).

Fig. 5 is a graph of the performance of the methods in a second validation experiment provided in example 3 (α=0.1 and α=10.0).

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 2, the embodiment provides a multi-level intelligent link-based data heterogeneous federation learning method, which includes the following steps:

In the specific implementation process, firstly, a server and a plurality of clients are selected, the server builds a global model and sends the global model to each client; each client takes the received global model as a local model, and updates and trains the local model by utilizing local preset heterogeneous data of the local model to acquire gradient update information of the local model of each client; the method comprises the steps that gradient update information of a local model of each client is sent to a server, the server utilizes a preset clustering algorithm to conduct clustering division on all clients, all the clients are divided into a plurality of logic groups, corresponding proxy clients are selected for each logic group, and a dynamic clustering division result is obtained; the server sends the dynamic clustering division result to each client, cascade optimization training is carried out on the local model after updating training in all the clients, and each client respectively and correspondingly acquires the local model after cascade optimization; the cascade optimization training comprises: for clients of the same logic group, performing intra-group serial training, and for clients of different logic groups, performing inter-group parallel training; uploading the local model after cascade optimization of each proxy client to a server, and aggregating all the received local models after cascade optimization by the server to obtain a new global model; repeating the steps until the new global model reaches a preset stopping condition, and completing data heterogeneous federation learning of the server and the client;

The method provides a novel data isomerism federation learning algorithm, fully integrates the advantages of serial training (high model precision) and parallel training (short training time), and can better balance the model precision and the training time; secondly, the core steps of the method comprise 'client dynamic clustering' and 'cascade optimization', the clients with similar data distribution are clustered into a group by using model gradient information, and compared with the traditional parallel training model, the method has higher precision by using the proposed cascade optimization module, and a novel model training framework is provided for federal learning of data isomerism; meanwhile, gradient update information of a client model is used as input of model aggregation, so that communication cost and calculation cost are greatly reduced, and different data distribution is better mapped by a clustering result; in addition, due to the fact that cost is greatly reduced, dynamic clustering can be achieved, the problem that clustering effect is poor due to change of data distribution of a client is avoided, data distribution change of the client can be perceived in real time, and stability and accuracy of a model are improved.

Example 2

The embodiment provides a data isomerism federation learning method based on multistage intelligent association, which comprises the following steps:

S6: repeating the steps S1 to S5 until the new global model reaches a preset stopping condition, and completing data isomerism federal learning of the server and the client;

In the step S2, each client builds local preset heterogeneous data according to different dirichlet allocation function parameters and different sampling rates;

In this embodiment, the dirichlet allocation function parameter is used to represent the heterogeneity of data between different clients, and a smaller value indicates a higher statistical heterogeneity; the sampling rate is used for representing the proportion of the maximum local data volume available to each client to the total data volume of the client, and the smaller the value is, the greater the task difficulty is; for the sampling rate, in particular, assuming that there are 100 data available per client now, assuming a sampling rate of 0.1, then the local data of the client is "100×0.1=10 data at maximum; on the basis, the requirement of meeting the distribution of the dilichlet under the condition of meeting the data isomerism is considered, and the local data of the final client may be 7;

in the step S2, the gradient update information of the local model of each client is specifically gradient update information of the last layer of the local model;

the gradient update information of the last layer of each client local model is specifically:

G _Li is gradient update information of the last layer of the i-th client local model; η is the learning step size; Local model parameters for the ith client; d _i is heterogeneous data locally preset by the ith client; l _i is the loss function of the ith client;

the loss function L _i of the ith client is a negative log likelihood loss function, specifically:

L_i(y,f(x))＝-log(f(x)[y])

Wherein f (x) is the predicted output probability of the client local model on the input x; y is a real class label; f (x) · [ y ] is the probability that the client local model prediction class is y;

The clustering algorithm preset in the step S3 includes: any one of a condensation hierarchical clustering algorithm, a K-means clustering algorithm, a DBSCAN clustering algorithm, a spectral clustering algorithm, an OPTICS clustering algorithm and a MEAN SHIFT clustering algorithm; the embodiment specifically relates to a condensation hierarchical clustering algorithm;

in the step S4, for the clients of the same logical group, performing intra-group serial training includes:

Wherein, Representing model parameters of the ith client in the jth group when the t+1 th round of cascade optimization iteration,Model parameters representing the i-1 th client; d _i represents heterogeneous data locally preset by the ith client;

In the step S5, the local model after cascade optimization of each proxy client is uploaded to a server, and the server aggregates all the received local models after cascade optimization, and the specific formula is as follows:

Wherein, Is a global model in the t-th round of iteration; n _k is the number of clients in the group of the kth logical group; n is the total number of clients; /(I)When the iteration is the t round, the proxy client corresponding to the k logical group is cascaded with the optimized local model;

in the step S6, the preset stopping conditions are as follows: repeating steps S2-S5 for times to reach a preset value, or the new global model meets the preset precision requirement.

In the implementation process, as shown in fig. 3, a server and a plurality of clients are selected first, the server builds a global model M ^g and sends the global model M ^g to each client; the client model is expressed as

Each clientTaking the received global model M ^g as a local model, and respectively utilizing local preset heterogeneous data to update and train the local model to obtain gradient update information g _Li, particularly g _L1,g_L2,…,g_Lk, of the last layer of the local model of each client; the specific process of gradient update calculation is as follows:

The purpose of using the gradient update information of the last layer of the model network is to extract richer global information, so that the result of the subsequent dynamic clustering better maps different data distribution; in addition, by adopting gradient update information, communication cost and calculation cost can be greatly reduced, and a calculation formula of the gradient update information of the last layer of the model network is expressed as follows:

G _Li is gradient update information of the last layer of the i-th client local model; η is the learning step size; Local model parameters for the ith client; d _i is heterogeneous data locally preset by the ith client; l _i is the loss function of the ith client, and the negative log likelihood loss function is adopted in this embodiment, specifically:

L_i(y,f(x))＝-log(f(x)[y])

Wherein f (x) is the predicted output probability of the client local model on the input x; y is a real class label; f (x) · [ y ] is the probability that the client local model prediction class is y; the goal of this loss function is to maximize the probability that the model predicts the correct class;

The gradient update information G _Li of the last layer of the local model of each client is sent to a server, the server utilizes a condensation hierarchical clustering algorithm (Agglomerative Hierarchical Clustering, HAC) to carry out clustering division on all the clients, all the clients are divided into a plurality of logic groups G ₁,G₂,…,G_k, and the logic groups are used for grouping the clients with different data distribution; and selecting a corresponding proxy client for each logical group Obtaining a dynamic clustering dividing result;

The aim of introducing a condensation hierarchical clustering algorithm is to group clusters according to gradient update information of each client; in the practical problem, different advanced clustering algorithms can be adopted according to the specific solved problem, and the method is orthogonal with the current advanced clustering algorithm, so that the expansibility and the orthogonality of the framework are also reflected;

the step of a condensation hierarchical clustering algorithm:

1) Initializing: treating each client as a separate cluster;

2) Calculating a distance matrix D; the distance matrix is used for measuring the adjacency between clusters (increment of square error caused by combining two clusters), so that the variance of the clusters being combined is reduced to the greatest extent;

3) Finding two clusters i and j with the distance smaller than a preset threshold value, and merging;

4) Updating the distance matrix;

5) Repeating the steps 3) and 4) until the distance between any two clusters is greater than a preset threshold value;

Wherein the distance matrix D _ij represents the distance between the i-th cluster and the j-th cluster; the method adopts the gradient update information of the last layer of the client model, so that the communication and calculation cost is greatly reduced (the amount of 19744 parameters is reduced to 330 by the traditional method), the method can realize real-time dynamic clustering, and the problem of non-ideal clustering grouping effect after the data distribution of each client is changed is avoided;

the gradient update information of each client side is used for carrying out real-time dynamic clustering to obtain a cluster group capable of representing different data distribution;

the server returns a logical grouping group G _i for each client, and any attribute of different clients is not required to be changed physically through logical grouping, so that convenience is high; the gradient update information of the last layer has more abstract and richer global information, so that the clustering grouping result can better map different data distribution;

The server sends the dynamic clustering division result to each client, cascade optimization training is carried out on the local model after updating training in all the clients, and each client respectively and correspondingly acquires the local model after cascade optimization; the cascade optimization training comprises: for clients of the same logic group, performing intra-group serial training, and for clients of different logic groups, performing inter-group parallel training;

As shown in fig. 3, in order to fully combine the advantages of serial training and parallel training, the present embodiment designs a multi-level model aggregation scheme using the grouping result G ₁,G₂,…,G_k of dynamic clustering; the clients with similar data distribution (same group G _i) are subjected to serial training, and the clients with larger data distribution difference (different groups) are subjected to parallel training; the advantages of model precision of serial training and training efficiency of parallel training are fully utilized;

the serial iterative update is designed to make dynamic clustering obtain grouping internal model The serial calculation in the group can be realized, and the advantages of high precision and stable model under the heterogeneous condition of serial training processing data are fully combined; specifically, after receiving the model, one client performs local training, and then transmits updated model parameters to the next client; the specific serial iterative update formula is as follows:

Wherein, Representing model parameters of the ith client in the jth set at iteration t+1,/>Model parameters representing the i-1 th client; d _i denotes the local training data of the i-th client;

Furthermore, when i is equal to 0, let This means that the first client (or only one client in the group) at each iteration takes the model parameters that were the model parameters of the previous iteration; the serial iteration formula may be modified for the first client to:

uploading the local model after cascade optimization of each proxy client to a server, and aggregating all the received local models after cascade optimization by the server to obtain a new global model; the purpose of parallel weighted aggregation is to distinguish the contribution of groups to a server global model, and the more the number of clients in the group is, the larger the duty ratio is, and the specific formula is as follows:

Where n _k is the number of clients in the group of the kth packet, n is the total number of clients participating in the training, Representing a global model at iteration round t,/>Then representing the proxy model of the kth packet at the time of the t-th iteration;

Repeating the steps until the new global model reaches a preset stopping condition, wherein the stopping condition is that iterative training reaches the maximum number of times, or the new global model reaches a preset precision requirement, and completing data isomerism federal learning of the server and the client;

the method mainly solves the following difficulties in the traditional technology:

1) How to group clients with similar data distribution into a group under the condition that the data of each client is inaccessible;

2) How to avoid the cluster effect from being poor due to the change of the data distribution of the client (with time); the data distribution of each client in a short period is relatively fixed, but when the time span is increased, the data distribution of the client may change, and if the change of the data distribution cannot be timely perceived according to the previous clustering result, the model effect is poor;

3) How to balance time efficiency and model accuracy in the training process;

4) How to match the new training framework with the real hardware computing power;

For the above problems, the solution of the method is:

1) The method can realize grouping of the clients by using the information of the model, so that the clients with similar data distribution can be grouped into the same group; grouping is carried out through gradient update information of the last layer of the model network layer, and client grouping division is carried out by utilizing the richer and more abstract global information, so that clients with similar data distribution are divided into one group, and communication cost and calculation cost are greatly reduced;

2) Based on the update condition of gradient information, the data distribution change of the client is perceived in real time in a dynamic clustering mode;

3) The combination of serial and parallel is realized by designing a multi-stage intelligent model training frame, the serial and parallel are realized in the group, the parallel is realized between the groups, and the advantages of serial training and parallel training are effectively combined;

4) The infinite approach of algorithm precision to serial training or infinite approach parallelism of algorithm training time efficiency can be realized by dynamically changing the clustering result; in practical application, aiming at the precision requirement, the precision advantage that the precision is approximate to the serial can be realized by adjusting the parameters of the framework of the invention; aiming at the time efficiency requirement in reality, the model training time can be infinitely close to the high efficiency advantage of parallel training by adjusting parameters in a frame;

The method provides a novel model training paradigm except for integrating the advantages of serial training and parallel training, and compared with the existing related federal learning algorithm, the method not only greatly reduces the communication cost, but also can sense the change of data distribution in real time.

Example 3

This example provides 2 validation experiments based on the numeric and extended alphabetical datasets (MNIST and EMNIST) to validate the method presented in example 2.

In a specific implementation process, the first verification experiment is:

In the digital MNIST, 5 clients are set, and in each client, heterogeneous distributed local data are constructed through a Dirichlet (alpha) distribution function and different sampling rates (ratios); wherein α controls data heterogeneity between clients, smaller values representing higher statistical heterogeneity; the sampling rate ratio represents the proportion of the local data volume of each client to the total data volume of the client, and the smaller the ratio value is, the greater the task difficulty is; the first verification experiment tests federal learning performance when alpha is 0.1, 1.0, 10.0 and sampling rate is 0.5 respectively, local data of all clients are mutually heterogeneous, and communication rounds are 200;

the global prediction accuracy of the model after federal learning is recorded in table 1;

TABLE 1 model accuracy and training time for different algorithms under MNIST

Comparing the experimental result with a current core serial federal learning algorithm framework (CWT) and a core parallel federal learning algorithm framework (Fedavg) (the rest of federal learning algorithms are mainly based on the framework detail optimization made by CWT and Fedavg); the advantages of the parallel federal learning algorithm (Fedavg), the serial federal learning algorithm (CWT) and the algorithm of the embodiment in terms of model accuracy and training time overhead are explored respectively; to ensure model training fairness, the number of local iterations of Fedavg and CWT algorithms is 20, and the number of local iterations of the method of example 2 is 10; in order to more intuitively show the experimental results, the performance curves of the methods are shown in fig. 4 (the upper left is the precision curve, and the lower right is the time curve);

From the results of table 1 and fig. 4, it can be seen that the model accuracy of the method of example 2 in federal learning of data isomerization is between CWT (serial) and Fedavg (parallel), and also between the two in model training time, which fully demonstrates that the method of example 2 balances the accuracy and time advantages of serial and parallel well; under the condition of larger data isomerism, the method also has better model convergence and small fluctuation degree;

the second validation experiment was:

In numeral EMNIST, constructing heterogeneous distributed local data by a Dirichlet (α) distribution function and different sampling rates (ratios); wherein α controls data heterogeneity between clients, smaller values representing higher statistical heterogeneity; the sampling rate ratio represents the proportion of the local data volume of each client to the total data volume of the client, and the smaller the ratio value is, the greater the task difficulty is; federal learning performance with a 0.1, 1.0, 10.0 and sample rate 0.1 were tested separately; in order to ensure model training fairness, the number of local iterations of Fedavg and the CWT algorithm is 20, the number of local iterations of the method of embodiment 2 is 10, and data among all clients are mutually heterogeneous; in order to more intuitively show the experimental results, the performance curves of the methods (the upper left is the precision curve, and the lower right is the time curve) are shown in fig. 5;

the global prediction accuracy of the model after federal learning is recorded in table 2;

model accuracy and training time for different algorithms under Table 2EMNIST

As can be seen from table 2 and fig. 5, the method of the present embodiment better balances the relationship between model accuracy and training time in EMNIST datasets; in addition, the method of the embodiment 2 can dynamically adapt to the calculation capability of real hardware in a real physical environment by adjusting the super-parameters of a condensation hierarchical clustering algorithm, so that the model precision approaches to the model precision of serial training infinitely, the training time approaches to the training time of parallel training infinitely, and the algorithm has stronger adaptability and flexibility;

The novel framework provided by the invention is a novel federal learning training framework for effectively balancing model precision and training time by combining two verification experiments, and fully integrates the precision advantage of serial training and the time efficiency advantage of parallel training.

The same or similar reference numerals correspond to the same or similar components;

the terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. The multi-level intelligent linkage-based data heterogeneous federation learning method is characterized by comprising the following steps of:

2. The multi-level intelligent link-based data heterogeneous federation learning method according to claim 1, wherein in the step S2, each client builds the locally preset heterogeneous data through different dirichlet allocation function parameters and different sampling rates;

3. The multi-level intelligent link-based data heterogeneous federation learning method according to claim 1, wherein in step S2, the gradient update information of each client local model is specifically gradient update information of the last layer of the local model.

4. A multi-level intelligent link-based data heterogeneous federal learning method according to claim 3, wherein the gradient update information of the last layer of each client local model is specifically:

5. The multi-level intelligent link-based data heterogeneous federation learning method according to claim 4, wherein the loss function L _i of the ith client is a negative log likelihood loss function, specifically:

L_i(y,f(x))＝-log(f(x)[y])

6. The multi-level intelligent link-based data heterogeneous federation learning method according to claim 1, wherein the clustering algorithm preset in step S3 comprises: any one of a condensation hierarchical clustering algorithm, a K-means clustering algorithm, a DBSCAN clustering algorithm, a spectral clustering algorithm, an OPTICS clustering algorithm and a MEAN SHIFT clustering algorithm.

7. The multi-level intelligent link-based data heterogeneous federation learning method according to claim 1, wherein in the step S4, for the clients of the same logical group, performing intra-group serial training includes:

Wherein, Representing model parameters of the ith client in the jth group when the t+1st round of cascade optimization iteration is performed,/>Model parameters representing the i-1 th client; d _i represents heterogeneous data locally preset by the ith client.

8. The multi-level intelligent federation-based data heterogeneous federation learning method according to claim 1, wherein in step S5, the local model after cascade optimization of each proxy client is uploaded to a server, and the server aggregates all the received local models after cascade optimization, and the specific formula is as follows:

9. The multi-level intelligent link-based data heterogeneous federal learning method according to claim 1, wherein in the step S6, the preset stopping condition is: repeating steps S2-S5 for times to reach a preset value, or the new global model meets the preset precision requirement.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1-9.