CN115906153A - Federal learning optimization method, device and system under sample imbalance scene - Google Patents

Federal learning optimization method, device and system under sample imbalance scene Download PDF

Info

Publication number
CN115906153A
CN115906153A CN202211357345.3A CN202211357345A CN115906153A CN 115906153 A CN115906153 A CN 115906153A CN 202211357345 A CN202211357345 A CN 202211357345A CN 115906153 A CN115906153 A CN 115906153A
Authority
CN
China
Prior art keywords
model
local
sample
samples
participant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211357345.3A
Other languages
Chinese (zh)
Inventor
肖文杰
汤学海
董扬琛
赵序光
冯远航
张潇丹
韩冀中
虎嵩林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202211357345.3A priority Critical patent/CN115906153A/en
Publication of CN115906153A publication Critical patent/CN115906153A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a federate learning optimization method, a federate learning optimization device and a federate learning optimization system in a sample imbalance scene. The method comprises the following steps: acquiring the number of samples in a local training sample set of each participant; generating a business initial model; performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model; distributing the global model to each participant, so that the participants with the sample number not less than a threshold take the global model as a final model, and the participants with the sample number less than the threshold perform iterative optimization on the global model based on local data, and take the trained model as the final model. The invention better realizes the safe data sharing and the efficient cooperative modeling of the participants under the scene of unbalanced samples.

Description

Federal learning optimization method, device and system under sample imbalance scene
Technical Field
The invention relates to the field of privacy calculation, in particular to a federate learning optimization method, a federate learning optimization device and a federate learning optimization system under a sample imbalance scene.
Background
With the successful landing of AlphaGo, deep learning shows extremely important use values in the industry and daily life, such as the fields of recommendation systems, face recognition, situation awareness and the like. However, most enterprises have the problems of small data volume and poor data quality, which greatly limits the wide application of deep learning technology. Therefore, cross-domain, cross-industry, and cross-region inter-organization data sharing is an important force for promoting resource optimization configuration and improving production element efficiency. In industrial production, data of business companies contain huge business values, personal privacy information and other sensitive elements. On the other hand, the domestic and foreign regulatory environments are gradually strengthening data protection, and relevant policies are continuously issued to limit non-secure data sharing, such as the data security laws in China, the General Data Protection Regulations (GDPR) of the new laws recently introduced in the European Union, and the like. Therefore, the 'data island barrier' is broken through, a safety mechanism of data sharing is established, forward games of deep learning building site landing and data privacy protection are realized, and the method becomes a great challenge for the nation and enterprises.
To address data "islanding" and privacy concerns, federal learning has emerged. Federal learning is a secure distributed machine learning framework proposed by McMahan in 2017, which trains a global model by sharing local model parameters under the condition of ensuring that data is available and invisible. In 2019, *** implemented the first product-level federal learning system. The system is applied to a mobile phone end, potential privacy information in the process that a user uses a search function is kept on equipment, and a user input method prediction model is realized by sharing the model gradient of local equipment. With the intelligent application and large data sharing requirements of enterprises, hospitals and other organizations, the university professor of Yangqiang in 2019 proposes a federal learning concept oriented to cross organizations, including horizontal federal learning, vertical federal learning and federal transfer learning. In the same year, the first global Federal learning industry-level open source framework FATE self-developed by the Minzhou Bank artificial intelligence team provides high-performance safe calculation support for machine learning, deep learning and transfer learning algorithms, and can effectively solve the problem of how to realize cross-organization AI cooperation on the premise of protecting data privacy.
Both the federal learning framework proposed by Google and the federal learning framework implemented by the micro-public bank make assumptions: the amount and distribution of training samples among federal participants should satisfy the conditions of equilibrium. However, in real life, the types and data collection modes of business data are different among enterprises, so that the distribution of samples among the enterprises is different, and the model accuracy is influenced. In addition, as manual data labeling is a project with high cost for part of enterprises, when the enterprises with small data quantity participate in federal learning, a scene of sample imbalance can be caused, the scene can aggravate the great reduction of the accuracy of the federal model, and meanwhile, the communication efficiency between each enterprise and the central server is reduced. The concrete conditions are as follows:
1. federal learning model accuracy decline under sample imbalance
The federal mean algorithm is the most commonly used federal learning training algorithm, whose general flow is: a participant in the federated network trains a local model on a local sample set by using a stochastic gradient descent algorithm, then model parameters are transmitted to a trusted third-party server, the trusted third-party server receives the model parameters of all clients and carries out average aggregation to obtain a new global model, then the global model is transmitted to the participant, and multiple iterations are carried out until convergence occurs. The objective function of the Federal averaging algorithm is shown in equation 1, where F (w) is the global loss function, F k (w) is the loss function of the local model, n k Is the number of samples of the participant. The contribution degree and the local number of the local model to the global model can be obtained from formula 1And the nk has a direct proportion relation. If the samples are extremely unbalanced, training deviation of the global model is brought, the sample distribution of participants with a large number of samples seriously affects the party with a small number of samples, and finally the model precision is greatly reduced.
Figure BDA0003920572350000021
2. Lower Copier learning communication efficiency reduction of sample imbalance
In the sample imbalance scenario, as can be seen from equation 1, the federate averaging algorithm ignores the inconsistency of sample distribution among enterprises in real life. The effect of different sample distributions on federally learned communication efficiency was tested in the paper published by Li et al and indicates that inconsistent sample distributions can slow convergence too slowly. However, the sample imbalance scenario can aggravate the problem, and eventually, the number of communication rounds between each enterprise and the third-party server is greatly increased, i.e., the communication efficiency is reduced.
In summary, how to solve the problem of reduction of model accuracy and communication efficiency in a sample unbalanced scene, a federate learning optimization method facing the sample unbalanced scene is provided, a high-accuracy and high-efficiency novel federate learning framework is designed, data safety sharing and cross-domain friendly cooperation are achieved, and the technical problem to be solved at present is formed.
Disclosure of Invention
The invention provides a federate learning optimization method, a federate learning optimization device and a federate learning optimization system oriented to a sample unbalanced scene aiming at the problems of federate model precision loss and communication efficiency reduction in the sample unbalanced scene, and data safety sharing and efficient cooperative modeling of participants in the sample unbalanced scene are better realized.
The technical scheme of the invention comprises the following steps:
a federate learning optimization method under a sample imbalance scene is applied to a third-party trusted parameter server, and the method comprises the following steps:
acquiring the number of samples in a local training sample set of each participant;
generating a business initial model;
performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model;
distributing the global model to each participant, so that the participants with the sample number not less than a threshold value use the global model as a final model, and the participants with the sample number less than the threshold value perform iterative optimization on the global model based on local data, and use the trained model as the final model.
Further, the performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model, including:
distributing the service initial model to each participant so that each participant trains the service initial model based on a local training sample set and a random gradient descent algorithm to obtain and return to an initial local model T α,0 (ii) a Wherein α represents the participant's number;
for local model T α,β-1 Carrying out weighted aggregation to obtain a global model V β (ii) a Wherein β represents the training round of joint learning;
the global model V is β Distributed to participants to base local model T on return i,β Generating a Global model V β+1 (ii) a Wherein the local model T α,β Based on participant to the global model V β Performing iterative optimization to obtain the target;
in the global model V β+1 Let β = β +1 in case of unconvergence, and go from the pair local model T α,β-1 Carrying out weighted aggregation to obtain a global model V β The execution is started again;
in the global model V β+1 In case of convergence, outputting the global model V β+1
Further, the local model T α,β Based on participant to the global model V β Performing iterative optimization to obtain the final product, including:
using the global model V β Performing predictive classification on samples in the local training sample set, and updating the weights of the samples in the local training sample set based on a predictive classification result to obtain a weighted sample set;
performing the global model V based on the weighted sample set β Is iteratively trained to obtain a local model T' α,β
To the local model T' α,β Compressing to obtain a local model T α,β
Further, the global model V is utilized β Performing predictive classification on samples in the local training sample set, and updating weights of the samples in the local training sample set based on a predictive classification result to obtain a weighted sample set, including:
calculating the global model V β Error rate of (e) t Wherein t represents the current sample weight update turn;
based on the error rate ε if the number of samples corresponding to the participant is less than the threshold value t Updating local training sample set samples
Figure BDA0003920572350000031
Right of (1) resetting>
Figure BDA0003920572350000032
Get the updated weight->
Figure BDA0003920572350000033
Wherein T represents a participant whose number of samples is less than the threshold value>
Figure BDA00039205723500000314
The ith sample, in the local training sample set representing participant T, is `>
Figure BDA0003920572350000034
Represents a sample +>
Figure BDA0003920572350000035
Is true value of->
Figure BDA0003920572350000036
Indicates that a sample in the round t->
Figure BDA0003920572350000037
Is predicted, the sample weight->
Figure BDA0003920572350000038
Is based on the weight->
Figure BDA0003920572350000039
Figure BDA00039205723500000310
Updating samples in a sample set based on a sample weight update total turn K when the number of samples corresponding to the participant is not less than the threshold
Figure BDA00039205723500000311
Is based on the weight->
Figure BDA00039205723500000312
Get the updated weight->
Figure BDA00039205723500000313
Wherein S represents a participant whose number of samples is not less than the threshold value, </or>
Figure BDA0003920572350000041
The i-th sample, which represents the local training sample set of the participant S, is->
Figure BDA0003920572350000042
Figure BDA0003920572350000043
Up to said fullOffice model V β Error rate of (e) t And if the value is less than a set value, obtaining the weighted sample set.
Further, the pair of the local models T α,β Compressing to obtain a local model T α,β The method comprises the following steps:
using network pruning to said local model T α,β Performing compression cutting on the network connection with the smaller medium weight, and training a compression cutting model;
generating codebook based on the trained clipping model, and gradually quantizing the weight into int8 bits with low precision according to the codebook;
after the quantized model is subjected to fine adjustment and precision recovery, the local model T is obtained α,β
Further, the iteratively optimizing the global model by the participants with the sample number smaller than the threshold based on the local data, and using the trained model as a final model, including:
sampling unlabeled samples in the local data by using an uncertainty strategy and a diversity strategy to construct a new sample subset;
marking the samples in the new sample subset to generate a new training sample set;
merging the new training sample set and a local training sample set to obtain a retraining sample set;
based on the retraining sample set, a global model V is selected d Optimizing to obtain a global model V d+1 (ii) a Wherein the global model V 0 For the global model, d represents an iteration turn of training the global model;
in the global model V d+1 Under the condition of convergence, the global model V is divided into d+1 As a final model;
in the global model V d+1 And under the condition of non-convergence, enabling d = d +1, returning to the step of sampling the unlabeled samples in the local data by using the uncertainty strategy and the diversity strategy, and constructing a new sample subset.
Further, the sampling unlabeled samples in the local data by using the uncertainty policy and the diversity policy to construct a new sample subset includes:
computing unlabeled sample x i′ Maximum entropy of
Figure BDA0003920572350000044
Wherein i' represents the serial number of the unmarked sample, j represents the unmarked sample x i′ A category of (1);
according to the maximum entropy
Figure BDA0003920572350000045
Constructing a descending unlabeled sample set;
measuring the unlabeled samples x in the unlabeled sample set by using Euclidean distance i′ Similarity between two unlabeled samples x with the similarity greater than a set threshold i′ In which an unlabelled sample x is deleted i′
Selecting a plurality of former unmarked samples x in the processed unmarked sample set i′ And a new sample subset is composed.
An apparatus for federated learning optimization in a sample-oriented imbalance scenario, the apparatus comprising:
the quantity acquisition module is used for acquiring the quantity of samples in the local training sample set of each participant;
the model generation module is used for generating a business initial model;
the federated training module is used for performing joint learning training on the business initial model based on the local training sample set of each participant to obtain a global model;
and the model distribution module is used for distributing the global model to each participant, so that the participants of which the sample number is not less than a threshold value use the global model as a final model, and the participants of which the sample number is less than the threshold value perform iterative optimization on the global model based on local data, and use the trained model as the final model.
A federated learning optimization system for a sample imbalance scenario, the system comprising:
the third-party credible parameter server is used for acquiring the number of samples in the local training sample set of each participant; generating a business initial model; performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model; distributing the global model to participants;
at least two participants, which are used for performing joint learning training on the business initial model based on a local training sample set; taking the global model as a final model under the condition that the number of samples in a local training sample set is not less than a threshold value; and under the condition that the number of samples in the local training sample set is less than a threshold value, performing iterative optimization on the global model based on local data to obtain a final model.
A storage medium having a computer program stored therein, wherein the computer program is configured to execute the above federate learning optimization method in a sample imbalance oriented scenario when running.
Compared with the prior art, the invention has the following positive effects:
1) The method comprises the steps of providing a dynamic weight-based federal model precision optimization method, optimizing contribution difference of participants to a global model under a sample unbalanced scene, and improving the accuracy of the federal model;
2) The method for optimizing the federal learning communication based on the combination model compression is provided, the communication bandwidth of a local model under the scene of sample imbalance is reduced, and the federal learning communication efficiency is improved;
3) A novel federal learning framework oriented to the sample unbalanced scene is designed, and the high-efficiency and high-precision federal learning process under the sample unbalanced scene is realized.
Drawings
Fig. 1 is an overall architecture diagram of a novel federal learning framework in a sample imbalance-oriented scenario according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a dynamic weight-based federal model accuracy optimization method according to an embodiment of the present invention.
Fig. 3 is a structural diagram of a federated learning communication optimization method based on combinatorial model compression according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
The federate learning optimization method under the sample imbalance-oriented scene designs a dynamic weight-based federate model precision optimization method and a combined model compression-based federate learning communication optimization method, and provides capabilities of model precision optimization and communication efficiency improvement. The federal learning optimization method comprises the following steps:
step 1: initial model distribution: and a third-party credible parameter server in the federated network distributes the business initial model to each participant, and the participants perform local training after receiving the initial model.
The participants select a trusted third party as a model aggregation parameter server, and then jointly design an initial network model on the third party server. The trusted third party server distributes the initial model to the federal network participants.
Step 2: initial training of a local model: after receiving the initial business model, the participants perform model training on their local data according to a stochastic gradient descent algorithm, and obtain the local model after training multiple rounds. The participants then send the respective local models to a third party trusted parameters server.
And after receiving the initial network model sent by the server, the participant performs model iterative training on a local training sample by using a stochastic gradient descent algorithm. And the participants send the trained local models to the third-party parameter server. And the third-party parameter server performs weighted aggregation on the received local model by using a formula 1 in the background technology to obtain a global model.
And step 3: and (3) safe aggregation of a third party: and after receiving the local models of all the participants, the third-party trusted parameter server performs weighted aggregation to obtain a global model, and then distributes the global model to the participants.
And 4, step 4: local model weighting training: after receiving the global model, the participants firstly classify respective sample sets by using the global model, and endow corresponding weights to the samples according to classification results. And then, performing iterative training of the model by the participant by using the weighted sample set.
In one example, the participant receives the global model after the first aggregation, firstly performs prediction classification on respective samples by using the global model, and calculates the error rate of the model according to formula 2. If a sample in a participant with a smaller amount of data is misclassified, equation 3 is used to increase the weight of the sample, so that the sample gets more attention in the local model training process. If a sample in a participant with a larger amount of data is misclassified, which is considered to be different from a set of samples in participants with a smaller amount of data, the weight of the sample is reduced using equation 4. Equations 2, 3, 4 are shown below, where T represents a participant with a small amount of data, S represents a participant with a large amount of data, y represents a true label, a represents a sample weight, Q represents a predicted value of a model prediction sample, epsilon represents a current model error rate,
Figure BDA0003920572350000061
the weight of the sample weight a is represented, t is the round of updating the current sample weight, K is the round of the whole training, and n is the current training round.
Figure BDA0003920572350000071
Figure BDA0003920572350000072
Figure BDA0003920572350000073
/>
Figure BDA0003920572350000074
Figure BDA0003920572350000075
And 5: deep compression of a local model: the local participants use network pruning and weight quantization to carry out deep compression on the updated local model, so that the size of the model is reduced, and the communication efficiency is improved. And finally, the participant sends the compressed local model to a third-party parameter server.
In one example, the participant iteratively trains the local model using the weighted sample set. Then, network pruning is used for carrying out compression cutting on the network connection with smaller weight in the neural network structure, and training is carried out after cutting is finished. And then, further compressing the cut model by weight quantization, firstly generating codebook, gradually quantizing the weight into int8 bits with low precision according to the codebook, and then finely adjusting the quantized network to restore the precision of the model. And finally, the deeply compressed model is transmitted to a third-party central parameter server, so that the efficiency optimization in model transmission is realized.
Step 6: and (5) repeating the steps 3, 4 and 5 until the aggregated global model is converged to obtain a global model, and distributing the global model to participants.
And 7: and the participants with more samples can directly use the received global model as the trained model.
And 8: and after receiving the global model, the participants with less samples sample the unlabeled samples by using an uncertainty strategy and a diversity strategy, and select the samples worth labeling to form a new sample subset.
Active learning resampling includes uncertainty strategies and diversity strategies. A participant with a small number of samples selects the maximum entropy as an uncertainty sampling strategy to perform importance sampling on unlabeled samples and provides the unlabeled samples for an expert to label. The participant calculates the maximum of the unlabeled sample according to equation 7 using a pre-modelEntropy, and then sorting all the calculation results in ascending or descending order, wherein the samples with larger entropy values are marked with larger values. The participant then measures the similarity between different samples using the euclidean distance as a diversity policy. And (3) the participant calculates the Euclidean distance between two samples according to a formula 8 according to the sorted sample set, and if the Euclidean distance value between the two samples is smaller and the similarity of the two samples is larger, only one sample is selected for marking. Conversely, the larger the euclidean distance value between two samples, the smaller their similarity, and both samples need to be labeled. Equations 5 and 6 are shown below, wherein
Figure BDA0003920572350000081
Representing example samples in an unmarked dataset>
Figure BDA0003920572350000082
In conjunction with the uncertainty of (a), based on the time of day>
Figure BDA0003920572350000083
Represents the predicted sample, where i represents the sample number, j represents the class to which the sample belongs, and p represents the current model p, d i (j, l) represents the Euclidean distance, and Y represents the number of sample classes.
Figure BDA0003920572350000084
Figure BDA0003920572350000085
And step 9: pre-model retraining: and (4) marking the new sample subset by combining the experience of a marking expert to obtain a new training sample set, and forming a retraining sample set by the new training sample set and the original training sample. And finally, performing iterative training on the global model by the participants by using the retraining sample set.
And (3) selecting a sample worth labeling from the unmarked sample set by a participant with a small amount of samples by using an uncertainty strategy and a diversity strategy, labeling by using the experience of a labeling expert, adding the labeled sample into a training set, and retraining the pre-model. And repeating the multiple rounds until convergence to obtain a shared model with higher precision.
Step 10: and (5) repeatedly finishing the step 8 and the step 9 until the model converges to obtain the high-precision federal model.
The federated learning optimization method of the present invention is described in more detail below with an embodiment, which can be implemented based on a framework shown in fig. 1, the framework is a P-S distributed architecture composed of third-party trusted servers and participants, and includes participants with large data volume, participants with small data volume, a dynamic training component, a model compression component, a third-party parameter server, and a retraining component. The method comprises the following steps:
step 1: and initializing a business model, wherein a participant receives the business model sent by the third-party parameter server and performs first iterative training on the model by using a local training sample set. The participant sends the trained local model to a third-party parameter server for security aggregation, and then sends the global model to the participant;
and 2, step: and (4) dynamically training the business model, and performing prediction classification on respective sample sets by participants by using the global model, and giving corresponding weights to the samples according to classification results. Then, the participator uses the weighted sample set to carry out the iterative training of the model;
and 3, step 3: and deeply compressing the business model, and compressing the trained local model by using network pruning and weight quantization by the participants. Then, the participant sends the compressed model to a third-party parameter server;
and 4, step 4: and (4) safely aggregating the service models, wherein the third-party parameter server receives all local model parameters and performs weighted aggregation to obtain a global model. Then, the server sends the global model to the participants for the next local iteration update;
and then, repeating the step 2, the step 3 and the step 4 until the aggregated global model converges. The third-party parameter server sends the global model to the participants with less sample size;
and 5: and (4) retraining the business model, and carrying out importance sampling and sampling sample labeling on unlabeled samples by using the uncertainty strategy and diversity strategy of active learning by the participants with less sample amount. The participant then retrains the model with the newly labeled sample set.
And then, repeating the step 5 until the model converges to obtain the higher-precision Federal model.
The following describes a specific process of the present invention with reference to examples:
assume that there are two participants implementing secure sharing of data through federal learning, wherein participant a has less annotation data and participant B has more annotation data. The specific implementation process is as follows:
the specific steps of the business model initialization in step 1 are as follows:
step 11: the two participants can provide metadata information of local data, and a shared business model structure is designed by combining metadata information description of source data of the two participants according to common business requirements. In addition, two participants consult and select a third party trusted server as a sharing parameter coordinator.
Step 12: and the third-party trusted server sends the initial service model to two participants, and the participants use the local data set to carry out local training on the model through a random gradient descent algorithm after receiving the model, and iterate for multiple rounds to obtain the local model. And the participants send the local model to a third-party trusted server, and the local model is subjected to weighted aggregation according to a formula 1 to obtain a global model. And finally, the third-party trusted server sends the initialized global model to the two participants.
The specific steps of the dynamic training of the business model in the step 2 are as follows:
step 21: and after receiving the global model, the participants perform prediction classification on the local sample set by using the global model, and endow corresponding weights to the samples according to the classification result. After receiving the global model, the participant a classifies the local samples and calculates the error rate of the current model according to formula 2. If a sample in participant a is incorrectly distinguished by the model, the sample is considered as being difficult to distinguish, and the model contains less information about the sample. Therefore, participant a needs to increase the weight of the sample, thereby increasing the proportion of the sample information in the model. Participant a increases the sample weight using equations 3-4 to get more attention to the sample in the next iteration. It can be seen visually from fig. 2 that the color of the squares of the a sample set changes from light to dark. However, if a sample in participant B is misclassified by the model, the sample is considered to be a different distribution of samples than participant a, reducing the weight of such samples, i.e., reducing their negative impact on the next round of model updating. The participants reduce the sample weights by using formulas 5-6, so that the dynamic balance of the sample weights under the scene of sample imbalance is realized. The color of the square of the B sample set is visually seen from the dark to the light in FIG. 2.
Step 22: and after the two participants respectively complete the weighting on the sample set, updating and training the local model by using the weighted sample set according to the set local model training round number and the random gradient descent algorithm. After the model training is finished, the model deep layer compression operation is carried out.
The specific steps of deep compression of the business model in step 3 are shown in fig. 3, and include:
step 31: take participant a as an example. A, firstly, pruning a local model by using network pruning, and pruning sparse filters and channels. A, regularization sparse training is carried out on the local model, so that partial parameters of the network tend to 0, and finally, a neural network model with sparse model parameters can be obtained. Since the weights are a part that can directly express the importance of the model, the sum of the absolute values of the weights of the filters in the convolutional layers is obtained by using equation 9, and information of the overall weights is obtained. Then, the participant calculates the importance score function of the filter according to equation 10 using the weight information of the convolutional layer and the BN layer. In addition, participant score set for filter B = { P = 1 ,P 2 ,…,P n And sorting in an ascending order, and then cutting according to a pruning rate K set by experience, so that a more compact network model can be obtained. Finally, the participator passes the model after pruningAnd fine-tuning to obtain a preliminary compression model. The network pruning operation of participant B is consistent with participant a.
Figure BDA0003920572350000101
P i =a i *E i Equation 10
Wherein, E in the formula 9 x Representing the overall weight of the network layer, R (W) j ) Denotes the weight of the filter, pi in equation 10 denotes the importance score of the filter (convolution kernel), a i Representing the variance of the score, E i Network layer weights.
Step 32: and after completing model network pruning, the participant A performs deep sub-compression on the model by using weight quantization. Although the pruned model network layer is effectively pruned, floating point number calculation is generally adopted in the model, and the calculation form consumes large storage and calculation. Therefore, by adopting other simple numerical value types for storage and calculation, the size of the model as a whole can be further compressed. The invention utilizes weight grouping sharing and grouping quantification to convert a 32-bit model into a low-precision exponential type. The operation is specifically divided into Kmeans weight grouping, grouping quantization and retraining. Firstly, weight sharing is realized on the pruned model based on Kmeans, the weight is quantized into a plurality of bins, all weights in the same bin share the same value, and then index values in a shared weight table are stored. In the updating process, all gradients are combined and added according to the bin range where the weight is located, and after multiplying the learning rate, the gradients are subtracted from the gravity center of the weight obtained in last iteration to finally obtain the weight gravity center after fine adjustment. And finally, the local data is utilized to finely adjust the model, so that the loss degree of the precision of the model after quantization is reduced. The weight quantization operation of participant B is consistent with participant a. Finally, both participants get the deep compressed local model.
The specific steps of the service model security aggregation in the step 4 are as follows:
step 41: and the participants A and B send the local model parameters subjected to dynamic training and deep compression to a third-party parameter server. And after receiving by a third party, carrying out safe aggregation on the model by using a federal average algorithm (formula 1) to obtain a global model of the current communication turn.
Step 42: and after the third-party parameter server completes the model security aggregation, the global model of the current communication turn is sent to the participants A and B. And the participants are used as local training models of the next round after receiving the global model.
And then, repeating the step 2, the step 3 and the step 4 until the aggregated global model converges. The third-party parameter server sends the global model to a participant A with fewer labeled samples;
the business model retraining in step 5 comprises the following specific steps:
step 51: after completing the convergence of the pre-model through the step 2, the step 3 and the step 4, the participant a receives the pre-model. And the participant A samples and labels the local unlabeled samples by using an uncertainty strategy and a diversity strategy. Through research, the more uncertain the classification result of the model on the sample, the more worthwhile the sample is labeled. First, a calculates the entropy value of the sample according to the uncertainty strategy by equation 5. Then, sorting the unmarked sample sets in an ascending order according to the entropy values of the samples, wherein the samples with larger entropy values are more worth marking.
Step 52: and after completing the uncertainty sorting, the participant A further samples the unlabeled samples by using a diversity strategy. Through research, the uncertainty strategy only considers the information contained in a single sample, and cannot consider the distribution of the whole sample space, so that redundant samples appear. Therefore, participant a calculates the euclidean distances between the sample selected in step 51 and the other samples in the candidate set by formula 6 according to the euclidean distance-based diversity policy, and then calculates the average thereof. If the average is below 0.5, the sample is considered similar to the other samples, containing too much redundant information, and the sample will be discarded. And finally, labeling the sampled sample set according to expert experience to obtain a new training set. The training set will be input to the pre-model for retraining. After repeated multiple rounds of operation, a higher-precision sharing model is obtained.
In conclusion, the federate learning optimization method facing the sample imbalance scene provides a model precision and communication efficiency optimization method facing the sample imbalance scene in federate learning, solves the problems of model precision loss and communication efficiency reduction in the sample imbalance scene, and achieves safe modeling and data sharing of enterprises and application landing of federate learning under the condition of assistance reality.
In an exemplary embodiment, the present invention further provides a federate learning optimization apparatus facing a sample imbalance scenario, including: the system comprises a quantity acquisition module, a model generation module, a federal training module and a model distribution module. Wherein,
the quantity acquisition module is used for acquiring the quantity of samples in the local training sample set of each participant;
the model generation module is used for generating a business initial model;
the federated training module is used for performing joint learning training on the business initial model based on the local training sample set of each participant to obtain a global model;
and the model distribution module is used for distributing the global model to each participant, so that the participants of which the sample number is not less than a threshold value use the global model as a final model, and the participants of which the sample number is less than the threshold value perform iterative optimization on the global model based on local data, and use the trained model as the final model.
In an exemplary embodiment, the present invention further provides a federate learning optimization system facing a sample imbalance scenario, including: a third party trusted parameter server and at least two participants. Wherein,
the third-party credible parameter server is used for acquiring the number of samples in the local training sample set of each participant; generating a business initial model; performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model; distributing the global model to participants;
at least two participants, which are used for performing joint learning training on the business initial model based on a local training sample set; taking the global model as a final model under the condition that the number of samples in a local training sample set is not less than a threshold value; and under the condition that the number of samples in the local training sample set is less than a threshold value, performing iterative optimization on the global model based on local data to obtain a final model.
For the explanation of the specific implementation process and beneficial effects of the devices and systems, please refer to the description of the above method embodiments, which is not repeated herein.
In an exemplary embodiment, a computer readable storage medium is also provided, on which a computer program is stored, which when executed by a processor implements the federated learning optimization method in the sample-oriented imbalance scenario described above.
In an exemplary embodiment, a computer device is further provided, and the computer device includes a memory and a processor, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the federal learning optimization method in the sample imbalance oriented scenario.
In an exemplary embodiment, a computer program product is also provided that, when run on a computer device, causes the computer device to perform the federated learning optimization method described above in the sample-oriented imbalance scenario.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A federal exercise optimization method oriented to a sample unbalanced scene is applied to a third-party trusted parameter server and is characterized by comprising the following steps:
acquiring the number of samples in a local training sample set of each participant;
generating a business initial model;
performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model;
distributing the global model to each participant, so that the participants with the sample number not less than a threshold take the global model as a final model, and the participants with the sample number less than the threshold perform iterative optimization on the global model based on local data, and take the trained model as the final model.
2. The method of claim 1, wherein the jointly learning training the business initial model based on the local training sample sets of the participants to obtain a global model comprises:
distributing the service initial model to each participant so that each participant trains the service initial model based on a local training sample set and a random gradient descent algorithm to obtain and return to an initial local model T α,0 (ii) a Wherein α represents the participant's number;
for local model T α,β-1 Carrying out weighted aggregation to obtain a global model V β (ii) a Wherein β represents the training round of the joint learning;
the global model V is β Distributed to participants for return-based local model T i,β Generating a Global model V β+1 (ii) a Wherein the local model T α,β Based on participant to the global model V β Performing iterative optimization to obtain the target;
in the global model V β+1 Let β = β +1 in case of unconvergence, and go from the pair local model T α,β-1 Carrying out weighted aggregation to obtain a global model V β The execution is started again;
in the global model V β+1 In case of convergence, outputting the global model V β+1
3. The method of claim 2, wherein the local model T is α,β Based on participant to the global model V β Performing iterative optimization to obtain the product, including:
using the global model V β Carrying out predictive classification on the samples in the local training sample set, and updating the weights of the samples in the local training sample set based on a predictive classification result to obtain a weighted sample set;
performing the global model V based on the weighted sample set β Is iteratively trained to obtain a local model T' α,β
To the local model T' α,β Compressing to obtain a local model T α,β
4. The method of claim 3, wherein said utilizing said global model V β Performing predictive classification on samples in the local training sample set, and updating weights of the samples in the local training sample set based on a predictive classification result to obtain a weighted sample set, including:
calculating the global model V β Error rate of (e) t Wherein t represents the current sample weight update turn;
based on the error rate ε if the number of samples to which the participant corresponds is less than the threshold value t Updating local training sample set samples
Figure FDA0003920572340000021
Right of (1) resetting>
Figure FDA0003920572340000022
Get the updated weight->
Figure FDA0003920572340000023
Wherein T represents a participant for which the number of samples is less than the threshold value>
Figure FDA0003920572340000024
Book representing participant TAn i-th sample in the ground training sample set, based on the sample location>
Figure FDA0003920572340000025
Represents a sample +>
Figure FDA0003920572340000026
True value of (d), in>
Figure FDA0003920572340000027
Represents a sample in a round t->
Figure FDA0003920572340000028
Is predicted, the sample weight->
Figure FDA0003920572340000029
Right of (1) is reset and/or judged>
Figure FDA00039205723400000210
Figure FDA00039205723400000211
Updating samples in a sample set based on a sample weight update total turn K when the number of samples corresponding to the participant is not less than the threshold
Figure FDA00039205723400000212
Is based on the weight->
Figure FDA00039205723400000213
Get the updated weight->
Figure FDA00039205723400000214
Wherein S represents a participant for whom the number of samples is not less than the threshold value>
Figure FDA00039205723400000215
An ith sample, in a local training sample set representing participant S, that is greater than or equal to>
Figure FDA00039205723400000216
Up to the global model V β Error rate of (e) t And if the value is less than a set value, obtaining a weighted sample set.
5. The method of claim 3, wherein the pair of the local models T α,β Compressing to obtain a local model T α,β The method comprises the following steps:
using network pruning to said local model T α,β Performing compression cutting on the network connection with the smaller medium weight, and training a compression cutting model;
generating codebook based on the trained clipping model, and gradually quantizing the weight into int8 bits with low precision according to the codebook;
after the quantized model is subjected to fine adjustment and precision recovery, the local model T is obtained α,β
6. The method of any one of claims 1 to 5, wherein the iteratively optimizing the global model based on local data by the participants whose sample number is less than the threshold and using the trained model as a final model comprises:
sampling unlabeled samples in local data by using an uncertainty strategy and a diversity strategy to construct a new sample subset;
marking samples in the new sample subset to generate a new training sample set;
merging the new training sample set and a local training sample set to obtain a retraining sample set;
based on the retraining sample set to global model V d Optimizing to obtain a global model V d+1 (ii) a Wherein the global model V 0 For the global model, d represents the iteration round of training the global model;
in the global modeForm V d+1 Under the condition of convergence, the global model V is divided into d+1 As a final model;
in the global model V d+1 And under the condition of non-convergence, enabling d = d +1, returning to the step of sampling the unlabeled samples in the local data by using the uncertainty strategy and the diversity strategy, and constructing a new sample subset.
7. The method of claim 6, wherein the sampling unlabeled samples in the local data using the uncertainty policy and the diversity policy, constructing a new subset of samples, comprises:
computing unlabeled samples x i′ Maximum entropy of
Figure FDA0003920572340000031
Wherein i' represents the serial number of the unlabeled sample, j represents the unlabeled sample x i′ A category of (1);
according to the maximum entropy
Figure FDA0003920572340000032
Constructing a descending unlabeled sample set;
measuring the unlabeled samples x in the unlabeled sample set by using Euclidean distance i′ Similarity between two unlabeled samples x with the similarity greater than a set threshold i′ In which an unlabelled sample x is deleted i′
Selecting a plurality of former unmarked samples x in the processed unmarked sample set i′ And a new sample subset is composed.
8. An apparatus for federated learning optimization in a sample imbalance-oriented scenario, the apparatus comprising:
the quantity acquisition module is used for acquiring the quantity of the samples in the local training sample set of each participant;
the model generation module is used for generating a business initial model;
the federated training module is used for performing joint learning training on the business initial model based on the local training sample set of each participant to obtain a global model;
and the model distribution module is used for distributing the global model to all participants so that the participants with the sample number not less than a threshold value take the global model as a final model, and the participants with the sample number less than the threshold value perform iterative optimization on the global model based on local data and take the trained model as the final model.
9. A federated learning optimization system for a sample imbalance scenario, the system comprising:
the third-party credible parameter server is used for acquiring the number of samples in the local training sample set of each participant; generating a business initial model; performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model; distributing the global model to participants;
at least two participants, which are used for performing joint learning training on the business initial model based on a local training sample set; taking the global model as a final model under the condition that the number of samples in a local training sample set is not less than a threshold value; and under the condition that the number of samples in the local training sample set is less than a threshold value, carrying out iterative optimization on the global model based on local data to obtain a final model.
10. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when executed, perform the method of any of claims 1-7.
CN202211357345.3A 2022-11-01 2022-11-01 Federal learning optimization method, device and system under sample imbalance scene Pending CN115906153A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211357345.3A CN115906153A (en) 2022-11-01 2022-11-01 Federal learning optimization method, device and system under sample imbalance scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211357345.3A CN115906153A (en) 2022-11-01 2022-11-01 Federal learning optimization method, device and system under sample imbalance scene

Publications (1)

Publication Number Publication Date
CN115906153A true CN115906153A (en) 2023-04-04

Family

ID=86485278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211357345.3A Pending CN115906153A (en) 2022-11-01 2022-11-01 Federal learning optimization method, device and system under sample imbalance scene

Country Status (1)

Country Link
CN (1) CN115906153A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245172A (en) * 2023-03-14 2023-06-09 南京航空航天大学 Coalition building game method facing individual model performance optimization in cross-island federal learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245172A (en) * 2023-03-14 2023-06-09 南京航空航天大学 Coalition building game method facing individual model performance optimization in cross-island federal learning

Similar Documents

Publication Publication Date Title
CN112949837B (en) Target recognition federal deep learning method based on trusted network
Shlezinger et al. UVeQFed: Universal vector quantization for federated learning
CN111553470B (en) Information interaction system and method suitable for federal learning
CN106776928B (en) Position recommendation method based on memory computing framework and fusing social contact and space-time data
CN112215604B (en) Method and device for identifying transaction mutual-party relationship information
CN113065974B (en) Link prediction method based on dynamic network representation learning
EP4038519A1 (en) Federated learning using heterogeneous model types and architectures
CN112100514B (en) Friend recommendation method based on global attention mechanism representation learning
CN115344883A (en) Personalized federal learning method and device for processing unbalanced data
CN113112020B (en) Model network extraction and compression method based on generation network and knowledge distillation
CN115081532A (en) Federal continuous learning training method based on memory replay and differential privacy
CN114385376B (en) Client selection method for federal learning of lower edge side of heterogeneous data
CN109919202A (en) Disaggregated model training method and device
CN109960755B (en) User privacy protection method based on dynamic iteration fast gradient
CN115906153A (en) Federal learning optimization method, device and system under sample imbalance scene
CN109670927A (en) The method of adjustment and its device of credit line, equipment, storage medium
CN115858675A (en) Non-independent same-distribution data processing method based on federal learning framework
CN112446489A (en) Dynamic network embedded link prediction method based on variational self-encoder
CN115879542A (en) Federal learning method oriented to non-independent same-distribution heterogeneous data
CN115238588A (en) Graph data processing method, risk prediction model training method and device
Wright et al. Help me to help you: machine augmented citizen science
CN113409157B (en) Cross-social network user alignment method and device
CN112487305B (en) GCN-based dynamic social user alignment method
Yang et al. Lstm network-based adaptation approach for dynamic integration in intelligent end-edge-cloud systems
CN116244484B (en) Federal cross-modal retrieval method and system for unbalanced data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination