CN116776948A

CN116776948A - Federal learning method, system and medium based on customer selection and weight distribution

Info

Publication number: CN116776948A
Application number: CN202310236329.7A
Authority: CN
Inventors: 孙国辉; 李星毅
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-09-19

Abstract

The application provides federal learning, a system and a medium based on customer selection and weight distribution, the method comprising the steps of: layering the client according to the real-time accuracy of the client; the selection probability of each layer is adjusted, and layers with poor performance tend to be selected under the new probability distribution; adjusting the selection probability of each client in the layer; selecting user training and uploading model parameters; adjusting model weights using an attention mechanism; and carrying out global model parameter aggregation. The application controls the convergence direction of the global model when processing Non-IID data, selects a user favorable for model convergence on the premise of not violating the fairness of the final model, improves the convergence speed of the model on the premise of controlling the calculation cost of the client, and solves the problems that the model convergence is slow and the convergence direction is uncontrollable due to a random selection algorithm under the condition of stronger client data abnormal shape.

Description

Federal learning method, system and medium based on customer selection and weight distribution

Technical Field

The application belongs to the technical field of federal learning, and particularly relates to federal learning, a federal learning system and a federal learning medium based on customer selection and weight distribution.

Background

With the development of artificial intelligence, data privacy is more and more important, and the problem of data island exists in the fields of medical treatment, finance and the like, so that the method is a great challenge for machine learning technology requiring raw data collection for training. Federal learning is an integrated machine learning technology capable of protecting user data privacy, and aims to train a model by using respective data in a plurality of users (called clients) locally without transmitting user original data, and upload the trained model to a central server (called a server) for model aggregation, so that protection of user data privacy is realized.

Limited by the communication capabilities of the client, only a portion of the users may be randomly selected to participate in the training for each round of training in federal learning. In practical use of federal learning, the faced user data are often Non-independent and distributed (Non-IID), the data size and the data characteristic difference among all users are huge, the users with great influence on the convergence of the current model cannot be identified in real time in the existing federal learning model, and the convergence direction and the training round number of the federal learning model are uncontrollable by random selection. In addition, some research efforts have proposed a measure of contribution to setting up high contributions to customers with better device hardware and network capabilities or greater impact on models during previous training to increase selection probabilities, which can lead to an increasing bias of the overall model towards these users, resulting in a loss of fairness for the final model.

Disclosure of Invention

Aiming at the technical problems, the application provides a federal learning method, a federal learning system and a federal learning medium based on client selection and weight distribution, which are used for controlling the convergence direction of a global model when processing Non-IID data, selecting users favorable for model convergence on the premise of not violating fairness of a final model, improving the convergence speed of the model on the premise of controlling the calculation cost of a client, and solving the problems that the model convergence is slow and the convergence direction is uncontrollable due to a random selection algorithm under the condition of stronger heterogeneous client data.

The present application achieves the above technical object by the following means.

A federal learning method based on customer selection and weight distribution, comprising the steps of:

step S1: layering the client according to the real-time accuracy of the client, dividing a descending list S of the accuracy of the client into m layers with equal sizes, wherein each layer hasA user;

step S2: calculating the average accuracy A of each layer in step S1 _l,t Adopts the average accuracy A of users in the layer _l,t As a basis for the probability p of selection of each layer in step S1 _l，t Making adjustments that tend to pick user layers that perform poorly with the new probability distribution;

step S3: taking the data duty cycle P _k Adjusting the selection probability of each client in the layers selected in the step S2 to enable the users with larger data volume to be selected;

step S4: selecting the client training selected in the step S3, and uploading local model parameters to a server after all the selected client training is completed;

step S5: the model weight to the server is adjusted by using an attention mechanism, the attention value of each model and the global model of the last round of server is calculated, and after parameters are calculated, the similarity sim of all clients is calculated _k,l Normalization processing is carried out, and the normalized similarity s is utilized _k,l The weight of each client is adjusted, and the ratio of the similarity to the total volume of the multiplied data volume is used as the weight alpha distributed by the user _k,l ；

Step S6: performing global model parameter aggregation, and performing parameter aggregation on all user layers participating in the round of training according to the weight distributed in the step S5

In the above scheme, in the step S2, the average accuracy a of each layer is calculated _l,t The specific formula is as follows:

wherein l represents the user layer, A _l,t The average accuracy of the first layer under the t-th round global training round, a _n Representing the quasi-meaning of each user in a layerThe acknowledgement, N, indicates the number of users in the layer.

In the above scheme, the probability p of selecting each layer in step S2 _l，t Calculated by the following formula:

wherein p is _l，t Representing the probability of selecting the first layer of the t-th round, which is equal to 1-A _l,t The values of (a) are proportional, i.e. layers with high average accuracy are assigned a lower probability of selection, giving the selection of user layers that tend to perform poorly, l denotes user layer, A _l,t The average accuracy of the first layer under the t-th global training round is represented.

In the above scheme, the data duty ratio P is adopted in the step S3 _k To adjust the probability of user selection within the layer, the calculation formula is as follows:

wherein P is _k,t Representing the probability that user k is selected at the t-th round, which is equal to the ratio of the user data amount to the total data amount of the layer where the user k is located, n _k Representing the amount of data for user k.

In the above scheme, the step S4 specifically includes the following steps:

step S4.1: the server starts to select users and determines a target selection quantity C;

step S4.2: selecting according to the probabilities adjusted in the steps S1, S2 and S3, selecting a certain layer according to the probability distribution of the layer, and selecting a certain client in the layer according to the probability distribution of the client;

step S4.3: as described in step S4.2, selecting a client every time the execution is completed, repeating (4.2) C times to complete the client selection of the current round;

step S4.4: the selected clients execute local training, and batch gradient descent training is carried out according to a local round E specified by a server side;

step S4.5: and uploading the local model parameters to the server after the training of all the selected clients is completed.

In the above scheme, sim in step S5 _k,l Calculated by the following formula:

wherein sim is _k,l Representing the similarity, ω, of user k at the first layer of the model and the pair of global model parameter layers _l Parameters, ω, representing the first layer of the global model _k,l Layer one parameters representing a user k model;

normalized similarity s _k,l Calculated by the following formula:

wherein s is _k,l And representing the normalized parameter similarity.

In the above scheme, the weight α allocated in step S5 _k,l Calculated by the following formula:

wherein alpha is _k,l Representing the weight of user k at the first layer of the aggregation phase model, n _k The data amount of user K is represented, K being the total amount of users participating in the model aggregation.

In the above scheme, the parameter aggregation in step S6 is calculated by the following formula:

wherein the method comprises the steps ofRepresenting aggregated layer one global model parameters. />A layer i parameter representing user k.

Step S6.2: after the aggregation is completed, the training of the round is finished, and the next training round is started by repeating the steps.

A system of the federal learning method based on customer selection and weight distribution applies the federal learning method based on customer selection and weight distribution, which comprises a user layering module, a user selection module and an attention mechanism module;

the user layering module is used for layering the client according to the real-time accuracy of the client, dividing a descending list S of the accuracy of the client into m layers with equal sizes, wherein each layer hasThe average accuracy A of each layer is calculated by each user _l,t Adopts the average accuracy A of users in the layer _l,t As a basis for the probability p of selection of each layer in step S1 _l，t Making adjustments that tend to pick user layers that perform poorly with the new probability distribution;

the user selection module is used for taking the data duty ratio P _k Adjusting the selection probability of each client in the selected layer to enable the user with larger data volume to be selected, training the selected clients, and uploading local model parameters to the server after the training of all the selected clients is completed;

the attention mechanism module is used for adjusting the model weight to the server by using an attention mechanism, calculating the attention value of each model and the global model of the server in the previous round, and calculating the similarity sim of all clients after calculating the parameters _k,l Normalization processing is carried out, and the normalized similarity s is utilized _k,l The weight of each client is adjusted, and the ratio of the similarity to the total volume of the multiplied data volume is used as the weight alpha distributed by the user _k,l Performing global model parameter aggregation according to the assigned weight alpha _k,l Parameter aggregation for all user layers participating in the round of training

A storage medium storing a program which, when executed by a processor, implements the federal learning method based on customer selection and weight distribution.

Compared with the prior art, the application has the beneficial effects that:

the application proposes a attention mechanism to re-distribute the aggregate weight to the clients, reduces the client weight with overlarge difference compared with the global model according to the model similarity, can control the convergence direction of the global model not to deviate, and ensures that the convergence process is more stable. According to the application, the client is subjected to layering treatment, the selection probability of the client is adjusted, and the client which is more beneficial to the training of the current model is selected, so that the convergence speed of the global model is increased, and the performance of the final model in all clients is fairer. The application can process Non-IID data more effectively with lower additional calculation cost.

Drawings

FIG. 1 is a flow chart of a classical federal learning algorithm.

Fig. 2 is a diagram of a model architecture according to an embodiment of the present application.

FIG. 3 is a flowchart of a federal learning algorithm according to an embodiment of the present application.

Fig. 4 is an exemplary diagram of MNIST data set and CIFAR-10 data set according to an embodiment of the present application, wherein fig. 4 (a) is an exemplary diagram of MNIST data set and fig. 4 (b) is an exemplary diagram of CIFAR-10 data set.

Fig. 5 is a graph of training accuracy of MNIST data set according to an embodiment of the present application.

FIG. 6 is a chart of training results of CIFAR-10 data set according to an embodiment of the present application.

FIG. 7 is a graph of accuracy rate variation for CIFAR-10 data sets in accordance with one embodiment of the present application.

Fig. 8 is a diagram showing training results of MNIST data set according to an embodiment of the present application.

FIG. 9 is a chart of training results of CIFAR-10 data set according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, a classical federal learning algorithm flow is as follows:

(1) parameters of the global model (neural network) are initialized by a central server of the server side.

(2) Each round of training randomly selects a client with a fixed proportion C to participate in the round of training.

(3) And the parameters are issued to the selected clients to perform local training, and each client trains the E round of local iteration.

(4) After the client trains, the model parameters are uploaded to a server, and the server distributes weights according to the data volume of the client and then carries out model weighted aggregation. Repeating the above process until the target round or stop condition is reached. As shown in fig. 1:

the algorithm flow of the present application is shown in connection with fig. 2 and 3: the model is based on a traditional transverse federal learning framework server-client model structure, and three new modules are added: the system comprises a user layering module, a user selection module and an attention mechanism module.

The application provides a layering strategy based on user accuracy when a user selects, and the user is layered in real time, and is selected by taking a layer as a unit when the user is selected, and then the user is selected from the layers.

When the server side models are aggregated, the method introduces an attention mechanism to calculate the similarity between each client model and the global model, and adjusts weight distribution according to the similarity.

The application provides a user layering module and a user selecting module aiming at random equal probability selection, and provides a weight distribution scheme of an attention mechanism module based on an attention mechanism aiming at a weight scheme based on a data volume ratio.

The specific flow is as follows: at the beginning of training, the central server initializes global model parameters and builds a list of clients ordered by client accuracy. In step (1), the central server issues global parameters to the user selection module and then activates the user layering module. In step (2), the user layering module performs layering on all users according to the ordered list of the server, the users with similar accuracy are divided into the same layer, the average accuracy of each layer is calculated, and the selection probability of the layer is adjusted according to the average accuracy. And (3) after the user selection module waits for the layering module to finish calculation, selecting a layer according to the probability of the layer allocated in the last step, and selecting a client in the layer according to the data volume ratio of the client after selecting the layer. Repeating the step (3) until the number of clients reaches a target value, and then issuing the model to the selected clients for local training. In step (4), all selected clients upload the trained data to the server. In step (5), the attention mechanism module adjusts weights of all models based on the attention mechanism and then uploads the adjusted weights to the central server, and the server aggregates all model parameters according to the new weights to obtain a final global model of the round. The above steps are repeated until a termination condition is reached. Fig. 1 is an algorithm flow chart, and fig. 2 is a model architecture diagram.

Examples

The application provides a layering strategy based on user accuracy when a user selects, and the user is layered in real time, and is selected by taking a layer as a unit when the user is selected, and then the user is selected from the layers. When the server side models are aggregated, an attention mechanism is introduced to calculate the similarity between each client model and the global model, and weight distribution is adjusted according to the similarity, so that the problems that the model is slow to converge and the convergence direction is uncontrollable due to a random selection algorithm under the condition that the client side data is strong in abnormal shape are solved.

The step S1 specifically comprises the following steps:

layering the client according to the real-time accuracy of the client:

in the step S1.1, when training is started, the accuracy of all clients is set to 0, in the subsequent experiments, each participating user updates the accuracy of the user after local training to the server in real time, then a descending list S of the accuracy of all clients is maintained at the server, and each client performs one-time insertion sorting to maintain the descending sorting of the list after updating the accuracy of the server.

The step S1.2 adopts simple equidistant cutting to divide the descending list S of the accuracy into m layers with equal size, and each layer hasAnd the individual users.

The step S2 specifically includes the following steps:

the selection probability of each layer is adjusted, and the layer with poor performance tends to be selected under the new probability distribution:

step S2.1 is to first calculate the average correct rate for each layer,

wherein l represents the user layer, A _l,t The average accuracy of the first layer under the t-th round global training round, a _n Representing the accuracy of each user in the layer, N represents the number of users in the layer.

The objective of the step S2.2 and the scheme is to select a client favorable for training the current model, and users who do not participate in training for a long time and have low accuracy can greatly contribute to convergence of the current model, and average accuracy of users in layers is adopted as a basis to adjust probability of each layer.

Wherein p is _l，t Representing the probability of selecting the first layer of the t-th round, which is equal to 1-A _l,t The values of (2) are proportional, i.e. layers with high average accuracy are assigned a lower probability of selection, which tends to be less likely to perform for user layers.

The step S3 specifically comprises the following steps:

the selection probability of each client in the layer is adjusted:

the step S3.1 and the data heterogeneous type are also reflected in unbalance of the data volume of the client, the accuracy of users in each layer is similar, but the difference of the data volume is obvious, the user with lower data volume has smaller convergence contribution to the current model, and the selection probability of each user needs to be adjusted, so that the user with larger data volume is selected.

The step S3.2 is to take the data duty ratio P _k To adjust the probability of user selection within the layer:

wherein P is _k,t Representing the probability that user k is selected at round t, which is equal to the ratio of the amount of user data to the total amount of data at the layer where it is located.

The step S4 specifically includes the following steps:

selecting user training and uploading model parameters:

step S4.1, the server starts to select users, and a target selection quantity C is determined

And step S4.2, selecting according to the probabilities adjusted in the steps S1, S2 and S3, selecting a certain layer according to the layer probability distribution, and selecting a certain client in the layer according to the client probability distribution.

And step S4.3, selecting one client every time the execution is finished, and repeating step S4.2 for C times to finish the client selection of the current round.

And step S4.4, the selected clients execute local training, and batch gradient descent training is carried out according to the local round E specified by the server side.

And step S4.5, uploading the local model parameters to the server after the training of all the selected clients is completed.

The step S5 specifically includes the following steps:

model weights are adjusted using an attention mechanism:

and step S5.1, calculating the attention value of each model and the global model of the last round of service end. Since the model parameters vary in length from layer to layer, hierarchical calculations are employed here for each layer of the model. The attention scoring function uses Cosine similarity:

wherein sim is _k,l Representing the similarity, ω, of user k at the first layer of the model and the pair of global model parameter layers _l Parameters, ω, representing the first layer of the global model _k,l Representing the first layer parameters of the user k model.

And step S5.2, after calculating the parameters, carrying out normalization processing on the similarity of all the clients to map the similarity to the distribution on the interval from 0 to 1. Using the softmax function:

wherein s is _k,l And representing the normalized parameter similarity.

(5.3) adjusting the weight of each client by using the normalized similarity of (5.2)

Wherein alpha is _k,l Representing the weight of user k at the first layer of the aggregation phase model, n _k The data amount of user K is represented, K being the total amount of users participating in the model aggregation. The ratio of the similarity to the total volume after the data volume is multiplied is used as the weight allocated by the user.

The step S6 specifically includes the following steps:

and (3) performing global model parameter aggregation:

step S6.1, according to the weight distributed in the step S5, parameter aggregation is carried out on all user layers participating in the round of training

And S6.2, after the aggregation is completed, the training of the round is finished, and the next round of training is started by repeating the steps.

In order to verify the effectiveness of the present application for heterogeneous data processing, the following experiments were performed:

experiments were continued with different data heterogeneous Non-IID strategies for both image datasets.

Two comparative models were used: classical federal learning algorithm FedAVg and a federal learning optimization algorithm FedProx.

Data set: MNIST, CIFAR-10.

MNIST datasets contain a large number of human written digital pictures and are widely used in benchmark test experiments for various algorithms in the machine learning field. In one embodiment of the application, the MNIST data set contains 60000 handwritten sample pictures, of which 50000 are divided into training sets and 10000 Zhang Ze are divided into test sets. Each picture in the MNIST data set has a gray scale image with a size of 28×28 pixels, each picture has only 1 channel, the gray scale is 256, and the content of the images is pictures with arabic numerals 0 to 9. CIFAR-10 data set is an image data set comprising 60000 RGB color images, CIFAR-10 is commonly used to train machine learning and computer vision algorithms, and the appropriate data set size makes CIFAR-10 one of the most versatile image recognition tasks. 50000 pieces of picture data in the CIFAR-10 data set are training sets, and the other 10000 pieces are test sets. Each image of the CIFAR10 dataset is 32 x 32 pixels in size and each image has three channels of RGB. Two examples of data sets are shown in FIG. 4, where FIG. 4 (a) is the MNIST data set and FIG. 4 (b) is an example of the CIFAR-10 data set.

Reference model: fedAvg, fedProx

Heterogeneous data classification strategy:

(1) IID distribution. And directly carrying out random distribution on the data, and equally distributing the data to all clients. This is also the data distribution strategy used by conventional machine learning.

(2) Tag skew (Label distribution skew). With this allocation scheme, each user cannot acquire all tag data. For example, in the MINST data set, there are 10 tags containing 0 to 9 in total, and each user is assigned only a fixed number of tags, specifically, a certain user data set contains only a fixed number of 1 s,

2. 3. While the other user contains only 4, 5, 6 handwriting data.

(3) Quantity allocation skew (Quantity skew). With this allocation scheme, there may be a large data volume difference between each user, which is also a common Non-IID data distribution case. And adopting dirichlet distribution to sample the data volume of each category and dividing the data volume into each client.

Experimental settings the specific parameter settings are shown in Table 1

Table 1 experimental parameter settings

Experimental results:

for convenience of description, IID distribution, label skew, and data volume skew among the three data distribution schemes formulated above are abbreviated as IID, LS, and QS. The model of the application is abbreviated as FedAM.

Test 1: and (5) analyzing the overall performance of the model.

The results of the accuracy test of the FedAM model and the other two baseline models in two image datasets and three different data distributions are given in Table 2. For more visual comparison of the data, results are shown in fig. 5 and 6.

TABLE 2 accuracy of overall model training

As can be seen from the data in table 2, the feda m model and the other two baseline models exhibited little difference in final performance when tested using IID distributed data in both data sets, with the three models performing substantially identically. This is related to the nature of both FedProx and FedAM models, both of which are optimizations that are done when the FedAvg model processes Non-IID data, whereas FedAvg itself performs very well for IID data. For the two Non-IID distribution schemes, no significant difference was found between the three performances under QS (data volume skew) distribution conditions, whereas in the case of LS (label skew), the fedab model presented herein performed best, with great advantages over other models, with a maximum of 0.9% higher accuracy in the MINST dataset than the baseline model, as shown in fig. 5, and 6.7% higher in the relatively complex CIFAR-10 dataset, as shown in fig. 6. Compared with the processing of IID data, the FedAM model has the advantages that the performance drop in the three models is minimum when the Non-IID data is processed, the performance drop in the MINST is only reduced by 0.3%, the accuracy loss in the CIFAR-10 data set is only reduced by 2.8%, and compared with the other models, the accuracy loss is about 10%, so that the FedAM model has robustness and better convergence when the Non-IID data is processed.

Experiment 2: analysis of the performance of the aggregation algorithm of the attention mechanism.

Because the MINST data training task is simpler and the model gap is smaller, the CIFAR-10 data set with poorer training effect is selected as the training data of experiment 2.

As shown in fig. 7, the experiment selects the convergence under the distribution of QS data in which three models exhibit little difference. According to the vibration degree of the accuracy curve in the model training process, the FedAVg is very unstable when processing Non-IID data, and the attention mechanism module in the FedAM model can automatically adjust the convergence direction of the model, so that the model is in a stable state in the whole training process. This also proves that the attention mechanism module is effective in controlling the model direction.

Experiment 3: layered customer selection scheme performance analysis

TABLE 3 number of convergence of models for a given accuracy rate

The experiment is to investigate whether a hierarchical customer selection scheme can accelerate convergence by selecting valuable customers. The evaluation standard adopted by the experiment is that under the condition of given target accuracy, the global iteration times of three models when reaching the condition are researched, and the vibration condition of model convergence is considered, and the model is required to reach the accuracy requirement for three times in the experiment in a cumulative way and then is recorded as a final result. The accuracy and experimental results are shown in table 3, and for convenience of presentation, the table data are plotted as bar graphs shown in fig. 8 and 9.

As can be seen from the data in fig. 8 and 9, all three models can quickly converge to the target accuracy under IID data under any data set. Under Non-IID data, the layering scheme in FedAM has excellent performance, and particularly when the LS data of CIFAR-10 is processed, the FedAM model achieves the target only by 87 rounds, and is faster than the second FedAVg model achieving the target by 97 rounds. The data proves that the layering scheme can actively select users with larger potential contribution, each user only has data with specific label quantity, after the layering scheme is adopted, the selected probability can be improved for user data with less label training, the convergence speed of the model in the early stage can be greatly improved through the sampling scheme, and the target accuracy is achieved through less training times. However, fedAvg and FedProx cannot identify these users, resulting in slower convergence speed.

Therefore, the application proposes an attention mechanism to redistribute the aggregate weight to the clients, reduces the client weight with overlarge difference compared with the global model according to the model similarity, and can control the convergence direction of the global model not to deviate, so that the convergence process is more stable. According to the application, the client is subjected to layering treatment, the selection probability of the client is adjusted, and the client which is more beneficial to the training of the current model is selected, so that the convergence speed of the global model is increased, and the performance of the final model in all clients is fairer. The application can process Non-IID data more effectively with lower additional calculation cost.

In another embodiment of the present application, a system of a federal learning method based on customer selection and weight allocation is further provided, and the federal learning method based on customer selection and weight allocation is applied, where the system includes a user layering module, a user selection module, and an attention mechanism module;

It should be noted that, the system provided in the foregoing embodiment is only exemplified by the division of the foregoing functional modules, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to perform all or part of the functions described above, and the system is a federal learning method based on customer selection and weight allocation applied to the foregoing embodiment.

In another embodiment of the present application, there is further provided a storage medium storing a program, where the program when executed by a processor implements the federal learning method based on customer selection and weight allocation, specifically:

step S5: the model weight to the server is adjusted by using an attention mechanism, the attention value of each model and the global model of the last round of server is calculated, and after parameters are calculated, the similarity sim of all clients is calculated _k,l Normalization processing is carried out, and normalization is utilizedSimilarity s of (2) _k,l The weight of each client is adjusted, and the ratio of the similarity to the total volume of the multiplied data volume is used as the weight alpha distributed by the user _k,l ；

Step S6: performing global model parameter aggregation, and performing parameter aggregation on all user layers participating in the round of training according to the weight distributed in the step S5After the aggregation is completed, the training of the round is finished, and the next training round is started by repeating the steps.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

It should be understood that although the present disclosure has been described in terms of various embodiments, not every embodiment is provided with a separate technical solution, and this description is for clarity only, and those skilled in the art should consider the disclosure as a whole, and the technical solutions in the various embodiments may be combined appropriately to form other embodiments that will be understood by those skilled in the art.

The above list of detailed descriptions is only specific to practical embodiments of the present application, and they are not intended to limit the scope of the present application, and all equivalent embodiments or modifications that do not depart from the spirit of the present application should be included in the scope of the present application.

Claims

1. A federal learning method based on customer selection and weight distribution, comprising the steps of:

2. The federal learning method based on customer selection and weight distribution according to claim 1, wherein the step S2 calculatesAverage accuracy A of each layer _l,t The specific formula is as follows:

3. The federal learning method based on customer selection and weight distribution according to claim 1, wherein the probability p of selection of each layer in step S2 is _l，t Calculated by the following formula:

wherein p is _l，t Representing the selection probability of the first layer of the t-th round, l represents the user layer, A _l,t The average accuracy of the first layer under the t-th global training round is represented.

4. The federal learning method based on customer selection and weight distribution according to claim 1, wherein the data duty ratio P is adopted in step S3 _k To adjust the probability of user selection within the layer, the calculation formula is as follows:

5. The federal learning method based on customer selection and weight distribution according to claim 1, wherein said step S4 specifically comprises the steps of:

step S4.3: c, repeating the step S4.2 for C times after each execution, and finishing the client selection of the current round;

6. The federal learning method based on customer selection and weight distribution according to claim 1, wherein sim in step S5 _k,l Calculated by the following formula:

normalized similarity s _k,l Calculated by the following formula:

wherein s is _k,l And representing the normalized parameter similarity.

7. According to claim 1Is characterized in that the weight alpha distributed in the step S5 _k,l Calculated by the following formula:

8. The federal learning method based on customer selection and weight distribution according to claim 1, wherein the parameter aggregation in step S6 is calculated by the following formula:

9. A system of federal learning methods based on customer selection and weight distribution, characterized in that the federal learning method based on customer selection and weight distribution according to any one of claims 1-8 is applied, comprising a user layering module, a user selection module and an attention mechanism module;

10. A storage medium storing a program, characterized in that: the program, when executed by a processor, implements the federal learning method based on customer selection and weight distribution of any of claims 1-8.