CN115906153A

CN115906153A - Federal learning optimization method, device and system under sample imbalance scene

Info

Publication number: CN115906153A
Application number: CN202211357345.3A
Authority: CN
Inventors: 肖文杰; 汤学海; 董扬琛; 赵序光; 冯远航; 张潇丹; 韩冀中; 虎嵩林
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2023-04-04

Abstract

The invention discloses a federate learning optimization method, a federate learning optimization device and a federate learning optimization system in a sample imbalance scene. The method comprises the following steps: acquiring the number of samples in a local training sample set of each participant; generating a business initial model; performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model; distributing the global model to each participant, so that the participants with the sample number not less than a threshold take the global model as a final model, and the participants with the sample number less than the threshold perform iterative optimization on the global model based on local data, and take the trained model as the final model. The invention better realizes the safe data sharing and the efficient cooperative modeling of the participants under the scene of unbalanced samples.

Description

Federal learning optimization method, device and system under sample imbalance scene

Technical Field

The invention relates to the field of privacy calculation, in particular to a federate learning optimization method, a federate learning optimization device and a federate learning optimization system under a sample imbalance scene.

Background

With the successful landing of AlphaGo, deep learning shows extremely important use values in the industry and daily life, such as the fields of recommendation systems, face recognition, situation awareness and the like. However, most enterprises have the problems of small data volume and poor data quality, which greatly limits the wide application of deep learning technology. Therefore, cross-domain, cross-industry, and cross-region inter-organization data sharing is an important force for promoting resource optimization configuration and improving production element efficiency. In industrial production, data of business companies contain huge business values, personal privacy information and other sensitive elements. On the other hand, the domestic and foreign regulatory environments are gradually strengthening data protection, and relevant policies are continuously issued to limit non-secure data sharing, such as the data security laws in China, the General Data Protection Regulations (GDPR) of the new laws recently introduced in the European Union, and the like. Therefore, the 'data island barrier' is broken through, a safety mechanism of data sharing is established, forward games of deep learning building site landing and data privacy protection are realized, and the method becomes a great challenge for the nation and enterprises.

To address data "islanding" and privacy concerns, federal learning has emerged. Federal learning is a secure distributed machine learning framework proposed by McMahan in 2017, which trains a global model by sharing local model parameters under the condition of ensuring that data is available and invisible. In 2019, *** implemented the first product-level federal learning system. The system is applied to a mobile phone end, potential privacy information in the process that a user uses a search function is kept on equipment, and a user input method prediction model is realized by sharing the model gradient of local equipment. With the intelligent application and large data sharing requirements of enterprises, hospitals and other organizations, the university professor of Yangqiang in 2019 proposes a federal learning concept oriented to cross organizations, including horizontal federal learning, vertical federal learning and federal transfer learning. In the same year, the first global Federal learning industry-level open source framework FATE self-developed by the Minzhou Bank artificial intelligence team provides high-performance safe calculation support for machine learning, deep learning and transfer learning algorithms, and can effectively solve the problem of how to realize cross-organization AI cooperation on the premise of protecting data privacy.

Both the federal learning framework proposed by Google and the federal learning framework implemented by the micro-public bank make assumptions: the amount and distribution of training samples among federal participants should satisfy the conditions of equilibrium. However, in real life, the types and data collection modes of business data are different among enterprises, so that the distribution of samples among the enterprises is different, and the model accuracy is influenced. In addition, as manual data labeling is a project with high cost for part of enterprises, when the enterprises with small data quantity participate in federal learning, a scene of sample imbalance can be caused, the scene can aggravate the great reduction of the accuracy of the federal model, and meanwhile, the communication efficiency between each enterprise and the central server is reduced. The concrete conditions are as follows:

1. federal learning model accuracy decline under sample imbalance

The federal mean algorithm is the most commonly used federal learning training algorithm, whose general flow is: a participant in the federated network trains a local model on a local sample set by using a stochastic gradient descent algorithm, then model parameters are transmitted to a trusted third-party server, the trusted third-party server receives the model parameters of all clients and carries out average aggregation to obtain a new global model, then the global model is transmitted to the participant, and multiple iterations are carried out until convergence occurs. The objective function of the Federal averaging algorithm is shown in equation 1, where F (w) is the global loss function, F _k (w) is the loss function of the local model, n _k Is the number of samples of the participant. The contribution degree and the local number of the local model to the global model can be obtained from formula 1And the nk has a direct proportion relation. If the samples are extremely unbalanced, training deviation of the global model is brought, the sample distribution of participants with a large number of samples seriously affects the party with a small number of samples, and finally the model precision is greatly reduced.

2. Lower Copier learning communication efficiency reduction of sample imbalance

In the sample imbalance scenario, as can be seen from equation 1, the federate averaging algorithm ignores the inconsistency of sample distribution among enterprises in real life. The effect of different sample distributions on federally learned communication efficiency was tested in the paper published by Li et al and indicates that inconsistent sample distributions can slow convergence too slowly. However, the sample imbalance scenario can aggravate the problem, and eventually, the number of communication rounds between each enterprise and the third-party server is greatly increased, i.e., the communication efficiency is reduced.

In summary, how to solve the problem of reduction of model accuracy and communication efficiency in a sample unbalanced scene, a federate learning optimization method facing the sample unbalanced scene is provided, a high-accuracy and high-efficiency novel federate learning framework is designed, data safety sharing and cross-domain friendly cooperation are achieved, and the technical problem to be solved at present is formed.

Disclosure of Invention

The invention provides a federate learning optimization method, a federate learning optimization device and a federate learning optimization system oriented to a sample unbalanced scene aiming at the problems of federate model precision loss and communication efficiency reduction in the sample unbalanced scene, and data safety sharing and efficient cooperative modeling of participants in the sample unbalanced scene are better realized.

The technical scheme of the invention comprises the following steps:

a federate learning optimization method under a sample imbalance scene is applied to a third-party trusted parameter server, and the method comprises the following steps:

acquiring the number of samples in a local training sample set of each participant;

generating a business initial model;

performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model;

distributing the global model to each participant, so that the participants with the sample number not less than a threshold value use the global model as a final model, and the participants with the sample number less than the threshold value perform iterative optimization on the global model based on local data, and use the trained model as the final model.

Further, the performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model, including:

distributing the service initial model to each participant so that each participant trains the service initial model based on a local training sample set and a random gradient descent algorithm to obtain and return to an initial local model T _α,0 (ii) a Wherein α represents the participant's number;

for local model T _α,β-1 Carrying out weighted aggregation to obtain a global model V _β (ii) a Wherein β represents the training round of joint learning;

the global model V is _β Distributed to participants to base local model T on return _i,β Generating a Global model V _β+1 (ii) a Wherein the local model T _α,β Based on participant to the global model V _β Performing iterative optimization to obtain the target;

in the global model V _β+1 Let β = β +1 in case of unconvergence, and go from the pair local model T _α,β-1 Carrying out weighted aggregation to obtain a global model V _β The execution is started again;

in the global model V _β+1 In case of convergence, outputting the global model V _β+1 。

Further, the local model T _α,β Based on participant to the global model V _β Performing iterative optimization to obtain the final product, including:

using the global model V _β Performing predictive classification on samples in the local training sample set, and updating the weights of the samples in the local training sample set based on a predictive classification result to obtain a weighted sample set;

performing the global model V based on the weighted sample set _β Is iteratively trained to obtain a local model T' _α,β ；

To the local model T' _α,β Compressing to obtain a local model T _α,β 。

Further, the global model V is utilized _β Performing predictive classification on samples in the local training sample set, and updating weights of the samples in the local training sample set based on a predictive classification result to obtain a weighted sample set, including:

calculating the global model V _β Error rate of (e) _t Wherein t represents the current sample weight update turn;

based on the error rate ε if the number of samples corresponding to the participant is less than the threshold value _t Updating local training sample set samples

Right of (1) resetting>

Get the updated weight->

Wherein T represents a participant whose number of samples is less than the threshold value>

The ith sample, in the local training sample set representing participant T, is `>

Represents a sample +>

Is true value of->

Indicates that a sample in the round t->

Is predicted, the sample weight->

Is based on the weight->

Updating samples in a sample set based on a sample weight update total turn K when the number of samples corresponding to the participant is not less than the threshold

Is based on the weight->

Get the updated weight->

Wherein S represents a participant whose number of samples is not less than the threshold value, </or>

The i-th sample, which represents the local training sample set of the participant S, is->

Up to said fullOffice model V _β Error rate of (e) _t And if the value is less than a set value, obtaining the weighted sample set.

Further, the pair of the local models T _α,β Compressing to obtain a local model T _α,β The method comprises the following steps:

using network pruning to said local model T _α,β Performing compression cutting on the network connection with the smaller medium weight, and training a compression cutting model;

generating codebook based on the trained clipping model, and gradually quantizing the weight into int8 bits with low precision according to the codebook;

after the quantized model is subjected to fine adjustment and precision recovery, the local model T is obtained _α,β 。

Further, the iteratively optimizing the global model by the participants with the sample number smaller than the threshold based on the local data, and using the trained model as a final model, including:

sampling unlabeled samples in the local data by using an uncertainty strategy and a diversity strategy to construct a new sample subset;

marking the samples in the new sample subset to generate a new training sample set;

merging the new training sample set and a local training sample set to obtain a retraining sample set;

based on the retraining sample set, a global model V is selected _d Optimizing to obtain a global model V _d+1 (ii) a Wherein the global model V ₀ For the global model, d represents an iteration turn of training the global model;

in the global model V _d+1 Under the condition of convergence, the global model V is divided into _d+1 As a final model;

in the global model V _d+1 And under the condition of non-convergence, enabling d = d +1, returning to the step of sampling the unlabeled samples in the local data by using the uncertainty strategy and the diversity strategy, and constructing a new sample subset.

Further, the sampling unlabeled samples in the local data by using the uncertainty policy and the diversity policy to construct a new sample subset includes:

computing unlabeled sample x _i′ Maximum entropy of

Wherein i' represents the serial number of the unmarked sample, j represents the unmarked sample x _i′ A category of (1);

according to the maximum entropy

Constructing a descending unlabeled sample set;

measuring the unlabeled samples x in the unlabeled sample set by using Euclidean distance _i′ Similarity between two unlabeled samples x with the similarity greater than a set threshold _i′ In which an unlabelled sample x is deleted _i′ ；

Selecting a plurality of former unmarked samples x in the processed unmarked sample set _i′ And a new sample subset is composed.

An apparatus for federated learning optimization in a sample-oriented imbalance scenario, the apparatus comprising:

the quantity acquisition module is used for acquiring the quantity of samples in the local training sample set of each participant;

the model generation module is used for generating a business initial model;

the federated training module is used for performing joint learning training on the business initial model based on the local training sample set of each participant to obtain a global model;

and the model distribution module is used for distributing the global model to each participant, so that the participants of which the sample number is not less than a threshold value use the global model as a final model, and the participants of which the sample number is less than the threshold value perform iterative optimization on the global model based on local data, and use the trained model as the final model.

A federated learning optimization system for a sample imbalance scenario, the system comprising:

the third-party credible parameter server is used for acquiring the number of samples in the local training sample set of each participant; generating a business initial model; performing joint learning training on the service initial model based on the local training sample set of each participant to obtain a global model; distributing the global model to participants;

at least two participants, which are used for performing joint learning training on the business initial model based on a local training sample set; taking the global model as a final model under the condition that the number of samples in a local training sample set is not less than a threshold value; and under the condition that the number of samples in the local training sample set is less than a threshold value, performing iterative optimization on the global model based on local data to obtain a final model.

A storage medium having a computer program stored therein, wherein the computer program is configured to execute the above federate learning optimization method in a sample imbalance oriented scenario when running.

Compared with the prior art, the invention has the following positive effects:

1) The method comprises the steps of providing a dynamic weight-based federal model precision optimization method, optimizing contribution difference of participants to a global model under a sample unbalanced scene, and improving the accuracy of the federal model;

2) The method for optimizing the federal learning communication based on the combination model compression is provided, the communication bandwidth of a local model under the scene of sample imbalance is reduced, and the federal learning communication efficiency is improved;

3) A novel federal learning framework oriented to the sample unbalanced scene is designed, and the high-efficiency and high-precision federal learning process under the sample unbalanced scene is realized.

Drawings

Fig. 1 is an overall architecture diagram of a novel federal learning framework in a sample imbalance-oriented scenario according to an embodiment of the present invention.

Fig. 2 is a structural diagram of a dynamic weight-based federal model accuracy optimization method according to an embodiment of the present invention.

Fig. 3 is a structural diagram of a federated learning communication optimization method based on combinatorial model compression according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.

The federate learning optimization method under the sample imbalance-oriented scene designs a dynamic weight-based federate model precision optimization method and a combined model compression-based federate learning communication optimization method, and provides capabilities of model precision optimization and communication efficiency improvement. The federal learning optimization method comprises the following steps:

step 1: initial model distribution: and a third-party credible parameter server in the federated network distributes the business initial model to each participant, and the participants perform local training after receiving the initial model.

The participants select a trusted third party as a model aggregation parameter server, and then jointly design an initial network model on the third party server. The trusted third party server distributes the initial model to the federal network participants.

Step 2: initial training of a local model: after receiving the initial business model, the participants perform model training on their local data according to a stochastic gradient descent algorithm, and obtain the local model after training multiple rounds. The participants then send the respective local models to a third party trusted parameters server.

And after receiving the initial network model sent by the server, the participant performs model iterative training on a local training sample by using a stochastic gradient descent algorithm. And the participants send the trained local models to the third-party parameter server. And the third-party parameter server performs weighted aggregation on the received local model by using a formula 1 in the background technology to obtain a global model.

And step 3: and (3) safe aggregation of a third party: and after receiving the local models of all the participants, the third-party trusted parameter server performs weighted aggregation to obtain a global model, and then distributes the global model to the participants.

And 4, step 4: local model weighting training: after receiving the global model, the participants firstly classify respective sample sets by using the global model, and endow corresponding weights to the samples according to classification results. And then, performing iterative training of the model by the participant by using the weighted sample set.

In one example, the participant receives the global model after the first aggregation, firstly performs prediction classification on respective samples by using the global model, and calculates the error rate of the model according to formula 2. If a sample in a participant with a smaller amount of data is misclassified, equation 3 is used to increase the weight of the sample, so that the sample gets more attention in the local model training process. If a sample in a participant with a larger amount of data is misclassified, which is considered to be different from a set of samples in participants with a smaller amount of data, the weight of the sample is reduced using equation 4. Equations 2, 3, 4 are shown below, where T represents a participant with a small amount of data, S represents a participant with a large amount of data, y represents a true label, a represents a sample weight, Q represents a predicted value of a model prediction sample, epsilon represents a current model error rate,

the weight of the sample weight a is represented, t is the round of updating the current sample weight, K is the round of the whole training, and n is the current training round.

/>

And 5: deep compression of a local model: the local participants use network pruning and weight quantization to carry out deep compression on the updated local model, so that the size of the model is reduced, and the communication efficiency is improved. And finally, the participant sends the compressed local model to a third-party parameter server.

In one example, the participant iteratively trains the local model using the weighted sample set. Then, network pruning is used for carrying out compression cutting on the network connection with smaller weight in the neural network structure, and training is carried out after cutting is finished. And then, further compressing the cut model by weight quantization, firstly generating codebook, gradually quantizing the weight into int8 bits with low precision according to the codebook, and then finely adjusting the quantized network to restore the precision of the model. And finally, the deeply compressed model is transmitted to a third-party central parameter server, so that the efficiency optimization in model transmission is realized.

Step 6: and (5) repeating the steps 3, 4 and 5 until the aggregated global model is converged to obtain a global model, and distributing the global model to participants.

And 7: and the participants with more samples can directly use the received global model as the trained model.

And 8: and after receiving the global model, the participants with less samples sample the unlabeled samples by using an uncertainty strategy and a diversity strategy, and select the samples worth labeling to form a new sample subset.

Active learning resampling includes uncertainty strategies and diversity strategies. A participant with a small number of samples selects the maximum entropy as an uncertainty sampling strategy to perform importance sampling on unlabeled samples and provides the unlabeled samples for an expert to label. The participant calculates the maximum of the unlabeled sample according to equation 7 using a pre-modelEntropy, and then sorting all the calculation results in ascending or descending order, wherein the samples with larger entropy values are marked with larger values. The participant then measures the similarity between different samples using the euclidean distance as a diversity policy. And (3) the participant calculates the Euclidean distance between two samples according to a formula 8 according to the sorted sample set, and if the Euclidean distance value between the two samples is smaller and the similarity of the two samples is larger, only one sample is selected for marking. Conversely, the larger the euclidean distance value between two samples, the smaller their similarity, and both samples need to be labeled. Equations 5 and 6 are shown below, wherein

Representing example samples in an unmarked dataset>

In conjunction with the uncertainty of (a), based on the time of day>

Represents the predicted sample, where i represents the sample number, j represents the class to which the sample belongs, and p represents the current model p, d _i (j, l) represents the Euclidean distance, and Y represents the number of sample classes.

And step 9: pre-model retraining: and (4) marking the new sample subset by combining the experience of a marking expert to obtain a new training sample set, and forming a retraining sample set by the new training sample set and the original training sample. And finally, performing iterative training on the global model by the participants by using the retraining sample set.

And (3) selecting a sample worth labeling from the unmarked sample set by a participant with a small amount of samples by using an uncertainty strategy and a diversity strategy, labeling by using the experience of a labeling expert, adding the labeled sample into a training set, and retraining the pre-model. And repeating the multiple rounds until convergence to obtain a shared model with higher precision.

Step 10: and (5) repeatedly finishing the step 8 and the step 9 until the model converges to obtain the high-precision federal model.

The federated learning optimization method of the present invention is described in more detail below with an embodiment, which can be implemented based on a framework shown in fig. 1, the framework is a P-S distributed architecture composed of third-party trusted servers and participants, and includes participants with large data volume, participants with small data volume, a dynamic training component, a model compression component, a third-party parameter server, and a retraining component. The method comprises the following steps:

step 1: and initializing a business model, wherein a participant receives the business model sent by the third-party parameter server and performs first iterative training on the model by using a local training sample set. The participant sends the trained local model to a third-party parameter server for security aggregation, and then sends the global model to the participant;

and 2, step: and (4) dynamically training the business model, and performing prediction classification on respective sample sets by participants by using the global model, and giving corresponding weights to the samples according to classification results. Then, the participator uses the weighted sample set to carry out the iterative training of the model;

and 3, step 3: and deeply compressing the business model, and compressing the trained local model by using network pruning and weight quantization by the participants. Then, the participant sends the compressed model to a third-party parameter server;

and 4, step 4: and (4) safely aggregating the service models, wherein the third-party parameter server receives all local model parameters and performs weighted aggregation to obtain a global model. Then, the server sends the global model to the participants for the next local iteration update;

and then, repeating the step 2, the step 3 and the step 4 until the aggregated global model converges. The third-party parameter server sends the global model to the participants with less sample size;

and 5: and (4) retraining the business model, and carrying out importance sampling and sampling sample labeling on unlabeled samples by using the uncertainty strategy and diversity strategy of active learning by the participants with less sample amount. The participant then retrains the model with the newly labeled sample set.

And then, repeating the step 5 until the model converges to obtain the higher-precision Federal model.

The following describes a specific process of the present invention with reference to examples:

assume that there are two participants implementing secure sharing of data through federal learning, wherein participant a has less annotation data and participant B has more annotation data. The specific implementation process is as follows:

the specific steps of the business model initialization in step 1 are as follows:

step 11: the two participants can provide metadata information of local data, and a shared business model structure is designed by combining metadata information description of source data of the two participants according to common business requirements. In addition, two participants consult and select a third party trusted server as a sharing parameter coordinator.

Step 12: and the third-party trusted server sends the initial service model to two participants, and the participants use the local data set to carry out local training on the model through a random gradient descent algorithm after receiving the model, and iterate for multiple rounds to obtain the local model. And the participants send the local model to a third-party trusted server, and the local model is subjected to weighted aggregation according to a formula 1 to obtain a global model. And finally, the third-party trusted server sends the initialized global model to the two participants.

The specific steps of the dynamic training of the business model in the step 2 are as follows:

step 21: and after receiving the global model, the participants perform prediction classification on the local sample set by using the global model, and endow corresponding weights to the samples according to the classification result. After receiving the global model, the participant a classifies the local samples and calculates the error rate of the current model according to formula 2. If a sample in participant a is incorrectly distinguished by the model, the sample is considered as being difficult to distinguish, and the model contains less information about the sample. Therefore, participant a needs to increase the weight of the sample, thereby increasing the proportion of the sample information in the model. Participant a increases the sample weight using equations 3-4 to get more attention to the sample in the next iteration. It can be seen visually from fig. 2 that the color of the squares of the a sample set changes from light to dark. However, if a sample in participant B is misclassified by the model, the sample is considered to be a different distribution of samples than participant a, reducing the weight of such samples, i.e., reducing their negative impact on the next round of model updating. The participants reduce the sample weights by using formulas 5-6, so that the dynamic balance of the sample weights under the scene of sample imbalance is realized. The color of the square of the B sample set is visually seen from the dark to the light in FIG. 2.

Step 22: and after the two participants respectively complete the weighting on the sample set, updating and training the local model by using the weighted sample set according to the set local model training round number and the random gradient descent algorithm. After the model training is finished, the model deep layer compression operation is carried out.

The specific steps of deep compression of the business model in step 3 are shown in fig. 3, and include:

step 31: take participant a as an example. A, firstly, pruning a local model by using network pruning, and pruning sparse filters and channels. A, regularization sparse training is carried out on the local model, so that partial parameters of the network tend to 0, and finally, a neural network model with sparse model parameters can be obtained. Since the weights are a part that can directly express the importance of the model, the sum of the absolute values of the weights of the filters in the convolutional layers is obtained by using equation 9, and information of the overall weights is obtained. Then, the participant calculates the importance score function of the filter according to equation 10 using the weight information of the convolutional layer and the BN layer. In addition, participant score set for filter B = { P = ₁ ,P ₂ ,…,P _n And sorting in an ascending order, and then cutting according to a pruning rate K set by experience, so that a more compact network model can be obtained. Finally, the participator passes the model after pruningAnd fine-tuning to obtain a preliminary compression model. The network pruning operation of participant B is consistent with participant a.

P _i ＝a _i *E _i Equation 10

Wherein, E in the formula 9 _x Representing the overall weight of the network layer, R (W) _j ) Denotes the weight of the filter, pi in equation 10 denotes the importance score of the filter (convolution kernel), a _i Representing the variance of the score, E _i Network layer weights.

Step 32: and after completing model network pruning, the participant A performs deep sub-compression on the model by using weight quantization. Although the pruned model network layer is effectively pruned, floating point number calculation is generally adopted in the model, and the calculation form consumes large storage and calculation. Therefore, by adopting other simple numerical value types for storage and calculation, the size of the model as a whole can be further compressed. The invention utilizes weight grouping sharing and grouping quantification to convert a 32-bit model into a low-precision exponential type. The operation is specifically divided into Kmeans weight grouping, grouping quantization and retraining. Firstly, weight sharing is realized on the pruned model based on Kmeans, the weight is quantized into a plurality of bins, all weights in the same bin share the same value, and then index values in a shared weight table are stored. In the updating process, all gradients are combined and added according to the bin range where the weight is located, and after multiplying the learning rate, the gradients are subtracted from the gravity center of the weight obtained in last iteration to finally obtain the weight gravity center after fine adjustment. And finally, the local data is utilized to finely adjust the model, so that the loss degree of the precision of the model after quantization is reduced. The weight quantization operation of participant B is consistent with participant a. Finally, both participants get the deep compressed local model.

The specific steps of the service model security aggregation in the step 4 are as follows:

step 41: and the participants A and B send the local model parameters subjected to dynamic training and deep compression to a third-party parameter server. And after receiving by a third party, carrying out safe aggregation on the model by using a federal average algorithm (formula 1) to obtain a global model of the current communication turn.

Step 42: and after the third-party parameter server completes the model security aggregation, the global model of the current communication turn is sent to the participants A and B. And the participants are used as local training models of the next round after receiving the global model.

And then, repeating the step 2, the step 3 and the step 4 until the aggregated global model converges. The third-party parameter server sends the global model to a participant A with fewer labeled samples;

the business model retraining in step 5 comprises the following specific steps:

step 51: after completing the convergence of the pre-model through the step 2, the step 3 and the step 4, the participant a receives the pre-model. And the participant A samples and labels the local unlabeled samples by using an uncertainty strategy and a diversity strategy. Through research, the more uncertain the classification result of the model on the sample, the more worthwhile the sample is labeled. First, a calculates the entropy value of the sample according to the uncertainty strategy by equation 5. Then, sorting the unmarked sample sets in an ascending order according to the entropy values of the samples, wherein the samples with larger entropy values are more worth marking.

Step 52: and after completing the uncertainty sorting, the participant A further samples the unlabeled samples by using a diversity strategy. Through research, the uncertainty strategy only considers the information contained in a single sample, and cannot consider the distribution of the whole sample space, so that redundant samples appear. Therefore, participant a calculates the euclidean distances between the sample selected in step 51 and the other samples in the candidate set by formula 6 according to the euclidean distance-based diversity policy, and then calculates the average thereof. If the average is below 0.5, the sample is considered similar to the other samples, containing too much redundant information, and the sample will be discarded. And finally, labeling the sampled sample set according to expert experience to obtain a new training set. The training set will be input to the pre-model for retraining. After repeated multiple rounds of operation, a higher-precision sharing model is obtained.

In conclusion, the federate learning optimization method facing the sample imbalance scene provides a model precision and communication efficiency optimization method facing the sample imbalance scene in federate learning, solves the problems of model precision loss and communication efficiency reduction in the sample imbalance scene, and achieves safe modeling and data sharing of enterprises and application landing of federate learning under the condition of assistance reality.

In an exemplary embodiment, the present invention further provides a federate learning optimization apparatus facing a sample imbalance scenario, including: the system comprises a quantity acquisition module, a model generation module, a federal training module and a model distribution module. Wherein,

the model generation module is used for generating a business initial model;

In an exemplary embodiment, the present invention further provides a federate learning optimization system facing a sample imbalance scenario, including: a third party trusted parameter server and at least two participants. Wherein,

For the explanation of the specific implementation process and beneficial effects of the devices and systems, please refer to the description of the above method embodiments, which is not repeated herein.

In an exemplary embodiment, a computer readable storage medium is also provided, on which a computer program is stored, which when executed by a processor implements the federated learning optimization method in the sample-oriented imbalance scenario described above.

In an exemplary embodiment, a computer device is further provided, and the computer device includes a memory and a processor, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the federal learning optimization method in the sample imbalance oriented scenario.

In an exemplary embodiment, a computer program product is also provided that, when run on a computer device, causes the computer device to perform the federated learning optimization method described above in the sample-oriented imbalance scenario.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A federal exercise optimization method oriented to a sample unbalanced scene is applied to a third-party trusted parameter server and is characterized by comprising the following steps:

generating a business initial model;

distributing the global model to each participant, so that the participants with the sample number not less than a threshold take the global model as a final model, and the participants with the sample number less than the threshold perform iterative optimization on the global model based on local data, and take the trained model as the final model.

2. The method of claim 1, wherein the jointly learning training the business initial model based on the local training sample sets of the participants to obtain a global model comprises:

distributing the service initial model to each participant so that each participant trains the service initial model based on a local training sample set and a random gradient descent algorithm to obtain and return to an initial local model T _α，0 (ii) a Wherein α represents the participant's number;

for local model T _α，β-1 Carrying out weighted aggregation to obtain a global model V _β (ii) a Wherein β represents the training round of the joint learning;

the global model V is _β Distributed to participants for return-based local model T _i，β Generating a Global model V _β+1 (ii) a Wherein the local model T _α，β Based on participant to the global model V _β Performing iterative optimization to obtain the target;

in the global model V _β+1 Let β = β +1 in case of unconvergence, and go from the pair local model T _α，β-1 Carrying out weighted aggregation to obtain a global model V _β The execution is started again;

3. The method of claim 2, wherein the local model T is _α，β Based on participant to the global model V _β Performing iterative optimization to obtain the product, including:

using the global model V _β Carrying out predictive classification on the samples in the local training sample set, and updating the weights of the samples in the local training sample set based on a predictive classification result to obtain a weighted sample set;

performing the global model V based on the weighted sample set _β Is iteratively trained to obtain a local model T' _α，β ；

To the local model T' _α，β Compressing to obtain a local model T _α，β 。

4. The method of claim 3, wherein said utilizing said global model V _β Performing predictive classification on samples in the local training sample set, and updating weights of the samples in the local training sample set based on a predictive classification result to obtain a weighted sample set, including:

based on the error rate ε if the number of samples to which the participant corresponds is less than the threshold value _t Updating local training sample set samples

Right of (1) resetting>

Get the updated weight->

Wherein T represents a participant for which the number of samples is less than the threshold value>

Book representing participant TAn i-th sample in the ground training sample set, based on the sample location>

Represents a sample +>

True value of (d), in>

Represents a sample in a round t->

Is predicted, the sample weight->

Right of (1) is reset and/or judged>

Is based on the weight->

Get the updated weight->

Wherein S represents a participant for whom the number of samples is not less than the threshold value>

An ith sample, in a local training sample set representing participant S, that is greater than or equal to>

Up to the global model V _β Error rate of (e) _t And if the value is less than a set value, obtaining a weighted sample set.

5. The method of claim 3, wherein the pair of the local models T _α，β Compressing to obtain a local model T _α，β The method comprises the following steps:

using network pruning to said local model T _α，β Performing compression cutting on the network connection with the smaller medium weight, and training a compression cutting model;

after the quantized model is subjected to fine adjustment and precision recovery, the local model T is obtained _α，β 。

6. The method of any one of claims 1 to 5, wherein the iteratively optimizing the global model based on local data by the participants whose sample number is less than the threshold and using the trained model as a final model comprises:

sampling unlabeled samples in local data by using an uncertainty strategy and a diversity strategy to construct a new sample subset;

marking samples in the new sample subset to generate a new training sample set;

based on the retraining sample set to global model V _d Optimizing to obtain a global model V _d+1 (ii) a Wherein the global model V ₀ For the global model, d represents the iteration round of training the global model;

in the global modeForm V _d+1 Under the condition of convergence, the global model V is divided into _d+1 As a final model;

7. The method of claim 6, wherein the sampling unlabeled samples in the local data using the uncertainty policy and the diversity policy, constructing a new subset of samples, comprises:

computing unlabeled samples x _i′ Maximum entropy of

Wherein i' represents the serial number of the unlabeled sample, j represents the unlabeled sample x _i′ A category of (1);

according to the maximum entropy

Constructing a descending unlabeled sample set;

8. An apparatus for federated learning optimization in a sample imbalance-oriented scenario, the apparatus comprising:

the quantity acquisition module is used for acquiring the quantity of the samples in the local training sample set of each participant;

the model generation module is used for generating a business initial model;

and the model distribution module is used for distributing the global model to all participants so that the participants with the sample number not less than a threshold value take the global model as a final model, and the participants with the sample number less than the threshold value perform iterative optimization on the global model based on local data and take the trained model as the final model.

9. A federated learning optimization system for a sample imbalance scenario, the system comprising:

at least two participants, which are used for performing joint learning training on the business initial model based on a local training sample set; taking the global model as a final model under the condition that the number of samples in a local training sample set is not less than a threshold value; and under the condition that the number of samples in the local training sample set is less than a threshold value, carrying out iterative optimization on the global model based on local data to obtain a final model.

10. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when executed, perform the method of any of claims 1-7.