CN114742237A - Federal learning model aggregation method and device, electronic equipment and readable storage medium - Google Patents

Federal learning model aggregation method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN114742237A
CN114742237A CN202210569972.7A CN202210569972A CN114742237A CN 114742237 A CN114742237 A CN 114742237A CN 202210569972 A CN202210569972 A CN 202210569972A CN 114742237 A CN114742237 A CN 114742237A
Authority
CN
China
Prior art keywords
training
model parameter
weight coefficient
round
aggregation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210569972.7A
Other languages
Chinese (zh)
Inventor
高文灵
王亚奇
邓宇光
吴萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210569972.7A priority Critical patent/CN114742237A/en
Publication of CN114742237A publication Critical patent/CN114742237A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a federated learning model aggregation method and device, electronic equipment and a readable storage medium, and relates to the field of artificial intelligence, in particular to the field of federated learning. The specific implementation scheme is as follows: obtaining a first model parameter and a second model parameter, wherein the first model parameter is sent by at least one first client in the federal learning of the current training round, the second model parameter is sent by at least one second client in the federal learning of the historical training round, the second model parameter does not participate in aggregation operation, and the historical training round is a training round within a preset round number before the current training round; and performing the aggregation operation of the current training round based on the first model parameter and the second model parameter. In the scheme, aggregation operation is performed by using the second model parameters which do not participate in the aggregation operation in the historical training round, so that the second model parameters are effectively utilized, the waste of the model parameters is avoided, and the training efficiency of the model is improved.

Description

Federal learning model aggregation method and device, electronic equipment and readable storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, particularly to the technical field of federal learning, and particularly relates to a method and a device for aggregating a federated learning model, electronic equipment and a readable storage medium.
Background
Federal learning is a distributed machine learning method that can train a large amount of data scattered on mobile devices.
In the existing scheme, the client reports the model parameters obtained by training to the server, and the server performs aggregation operation on the model parameters reported by each client, and updates the global model according to the aggregation result. In practical situations, some clients may not report the trained model parameters to the server in time, and the model parameters that are reported with delay may not participate in the aggregation operation and may be discarded, thereby causing waste of the model parameters and affecting the model training efficiency.
Disclosure of Invention
In order to solve at least one of the above drawbacks, the present disclosure provides a federated learning model aggregation method, an apparatus, an electronic device, and a readable storage medium.
According to a first aspect of the present disclosure, a method for aggregating federated learning models is provided, which includes:
obtaining a first model parameter and a second model parameter, wherein the first model parameter is sent by at least one first client in the federal learning of the current training round, the second model parameter is sent by at least one second client in the federal learning of the historical training round, the second model parameter does not participate in aggregation operation, and the historical training round is a training round within a preset round number before the current training round;
and performing the aggregation operation of the current training round based on the first model parameter and the second model parameter.
According to a second aspect of the present disclosure, there is provided a federated learning model aggregation apparatus, the apparatus comprising:
the model parameter acquisition module is used for acquiring a first model parameter and a second model parameter, wherein the first model parameter is sent by at least one first client in the federal learning of the current training round, the second model parameter is sent by at least one second client in the federal learning of the historical training round, the second model parameter does not participate in aggregation operation, and the historical training round is a training round within a preset round number before the current training round;
and the aggregation processing module is used for carrying out aggregation operation of the current training round based on the first model parameter and the second model parameter.
According to a third aspect of the present disclosure, there is provided an electronic apparatus comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the federated learning model aggregation method.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the above federated learning model aggregation method.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above federated learning model aggregation method.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flowchart of a method for aggregating federated learning models provided in an embodiment of the present disclosure;
fig. 2 is a schematic diagram of an evaluation index curve of a global model according to an embodiment of the disclosure;
FIG. 3 is a schematic flow chart diagram illustrating another federated learning model aggregation method provided by an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a federated learning model aggregation apparatus provided in the embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of another federated learning model aggregation apparatus provided in an embodiment of the present disclosure;
FIG. 6 is a block diagram of an electronic device for implementing the federated learning model aggregation method of embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The client reports the model parameters obtained by training in the current training round to the server, and the server performs aggregation operation on the model parameters reported by each client in the current training round, and then updates the global model according to the aggregation result. If some clients cannot report model parameters obtained by training in the current training round in time for some reasons, the part of model parameters reported in a delayed manner cannot participate in the aggregation operation of the current training round, and in the related art, the part of model parameters reported in a delayed manner are generally directly discarded, so that the waste of the model parameters is caused.
Under the condition that clients participating in federal learning are few, model parameters capable of participating in aggregation operation are also few, if model parameters which are not uploaded timely are directly discarded, the training effect may be reduced, more rounds of training are needed to achieve the training target, and if the duration of each round of training is prolonged, the discarding of the model parameters can be reduced to a certain extent, the training effect is guaranteed, but the whole training time is prolonged.
The federal learning model aggregation method, apparatus, electronic device and readable storage medium provided in the embodiments of the present disclosure are directed to solving at least one of the above technical problems in the prior art.
Fig. 1 shows a schematic flow diagram of a federated learning model aggregation method provided by an embodiment of the present disclosure, and as shown in fig. 1, the method mainly includes:
step S110: the method comprises the steps of obtaining a first model parameter and a second model parameter, wherein the first model parameter is sent by at least one first client in the federal learning of a current training round, the second model parameter is sent by at least one second client in the federal learning of a historical training round, the second model parameter does not participate in aggregation operation, and the historical training round is a training round within a preset round number before the current training round;
step S120: and performing aggregation operation of the current training round based on the first model parameter and the second model parameter.
The client can receive a training task issued by the server and obtain the model parameters by executing the training task. The model parameters may be local model weight parameters of the client, etc.
The client can execute the training task of the current training round and obtain the model parameters, send the model parameters obtained by the current training round to the server, record the model parameters reported by the client in the current training round received by the server as first model parameters, and record the client reporting the first model parameters as the first client.
In the embodiment of the present disclosure, a preset round number may be specified, and a training round within the preset round number completed before the current training round may be determined as a historical training round. For example, if the current training round is the 10 th round and the preset round number is 5, the training round between the 5 th round and the 9 th round may be used as the historical training round.
In each historical training round, the client may not report the model parameters in time, and the part of the model parameters may be reported to the server in a delayed manner and cannot participate in the aggregation operation of the corresponding historical training round. Model parameters which are reported by the client in the historical training turns and do not participate in the aggregation operation and received by the server can be recorded as second model parameters, and the client reporting the second model parameters can be recorded as a second client.
According to the method provided by the embodiment of the disclosure, the aggregation operation of the current training round is performed on the basis of the first model parameter and the second model parameter by acquiring the first model parameter sent by the first client in the federal learning of the current training round and the second model parameter which is sent by the second client in the federal learning of the historical training round and does not participate in the aggregation operation. In the scheme, the second model parameters which do not participate in the aggregation operation in the historical training turns are used for carrying out the aggregation operation, so that the second model parameters are effectively utilized, the waste of the model parameters is avoided, and the training efficiency of the model is improved.
In the embodiment of the disclosure, the first model parameter and the second model parameter are brought into the aggregation operation of the current training round together, so that the number of the model parameters participating in the aggregation operation is increased, and the training effect can be effectively ensured.
In an optional manner of the present disclosure, performing an aggregation operation of a current training round based on a first model parameter and a second model parameter includes:
determining a first sub-aggregation result corresponding to the current training turn based on the first model parameter;
determining a second sub-aggregation result of each historical training round based on the second model parameter of each historical training round;
and performing aggregation operation of the current training round based on the first sub-aggregation result and each second sub-aggregation result.
In the embodiment of the disclosure, when performing aggregation operation on the first model parameters and the second model parameters, aggregation operation may be performed on each first model parameter to obtain a first sub-aggregation result of a current training round, aggregation operation may be performed on each second model parameter to obtain a second sub-aggregation result of each historical training round, and then aggregation operation of the current training round is performed according to the first sub-aggregation result and each second sub-aggregation result to determine an aggregation result.
In an optional manner of the present disclosure, performing an aggregation operation of a current training round based on a first sub-aggregation result and each second sub-aggregation result includes:
and performing aggregation operation of the current training round based on the first sub-aggregation result and a first weight coefficient corresponding to the current training round and based on each second sub-aggregation result and a second weight coefficient corresponding to each historical training round.
In the embodiment of the disclosure, a first weight coefficient corresponding to a current training turn and a second weight coefficient corresponding to a historical training turn may be determined. And performing weighted operation on the first sub-aggregation result of the current training round and the second sub-aggregation result of each historical training round according to the first weight coefficient and the second weight coefficient to obtain an aggregation result.
In an alternative aspect of the present disclosure, the second weight coefficient is determined by:
determining a third weight coefficient corresponding to each historical training turn based on the offset degree of the first model parameter and the second model parameter corresponding to each historical training turn;
determining a fourth weight coefficient corresponding to each historical training round based on the total amount of samples participating in training in each historical training round and the total amount of samples participating in training in the current training round;
and determining a second weight coefficient based on the third weight coefficient and the fourth weight coefficient.
In the embodiment of the present disclosure, the second weight coefficient of the historical training round is related to the deviation degree between the global model used in the historical training round and the global model used in the current training round, and in general, when the historical training round is closer to the current training round (i.e., spaced by a smaller number of training rounds), the deviation degree between the global model used in the historical training round and the global model used in the current training round is lower, and the second weight coefficient corresponding to the historical training round should be higher.
Specifically, the degree of deviation between the global model used in each historical training round and the global model used in the current training round can be measured by the degree of deviation between the first model parameter and the second model parameter corresponding to each historical training round.
In this embodiment of the present disclosure, the second weight coefficient of the historical training round is further related to a total amount of samples participating in training in the historical training round, and if the total amount of samples participating in training in the historical training round is smaller, the confidence of the model parameter obtained by training in the historical training round is also lower.
Specifically, the fourth weight coefficient corresponding to each historical training round may be determined by the total amount of samples participating in training in each historical training round and the total amount of samples participating in training in the current training round.
As an example, if the total number of samples participating in training in a historical training round is NLThe total amount of samples participating in training in the current training round is NKIf the fourth weight coefficient Ω corresponding to the historical training round is min (N)L/NK1.0) is NL/NKThe smaller of 1.
As an example, the product of the third weight coefficient and the fourth weight coefficient may be taken as the second weight coefficient.
In practical use, the summation result of the second weighting coefficients should satisfy the following formula two.
Figure BDA0003659829380000061
Wherein β represents the summation result of the second weighting coefficients, β should not be greater than 0.3, and γ is a correction coefficient of the second weighting coefficients, for ensuring that β should not be greater than 0.3. The default value of γ may be 1.0. t represents the number of preset rounds, p represents any historical training round, thetapRepresents a third weight coefficient, omega, corresponding to the p-th historical training roundpAnd a fourth weight coefficient corresponding to the p-th historical training turn is represented.
In an alternative aspect of the present disclosure, the offset is determined by:
constructing a first vector based on the first model parameter;
respectively constructing second vectors based on second model parameters corresponding to each historical training turn;
and determining the offset degree of the second model parameters and the first model parameters corresponding to each historical training turn based on the correlation between each second vector and the first vector.
In the embodiment of the disclosure, a first vector may be constructed based on a first model parameter, second vectors may be constructed based on second model parameters, and a deviation degree between the second model parameter corresponding to each historical training turn and the first model parameter may be determined based on a correlation between each second vector and the first vector.
Specifically, similarity analysis methods such as a vector mean, a euclidean distance, and a cosine distance may be used to determine the correlation between each second vector and the first vector, and then determine the degree of deviation based on the correlation.
As an example, a second vector constructed based on a second model parameter of a certain historical training round is a, a first vector constructed based on a first model parameter is b, and the second vector a is normalized to obtain a mean value
Figure BDA0003659829380000071
Normalizing the first vector b to obtain a mean value
Figure BDA0003659829380000072
The degree of offset can be determined by the following formula three.
Figure BDA0003659829380000073
Wherein the mean value of the second vector is
Figure BDA0003659829380000074
The mean value of the first vector is
Figure BDA0003659829380000075
d is the deviation degree of the second model parameter and the first model parameter corresponding to the historical training round. The deviation degree determined by the formula III is between 0 and 1]And the second weight coefficient may be 1-d.
In an alternative aspect of the present disclosure, the first weight coefficient is determined by:
and determining the first weight coefficient based on the preset quantitative relation between the first weight coefficient and the second weight coefficient and based on the second weight coefficient.
In the embodiment of the present disclosure, a quantitative relationship that is satisfied between the first weight coefficient and the second weight coefficient may be preconfigured, and after the second weight coefficient is determined, the first weight coefficient is determined according to the second weight coefficient and the quantitative relationship.
As an example, the satisfied quantitative relationship between the first weight coefficient and the second weight coefficient may be represented by formula one as follows.
Figure BDA0003659829380000076
Wherein alpha represents a first weight coefficient, t represents a preset round number, p represents any historical training round, and betapAnd gamma represents a correction coefficient of the second weight coefficient.
In practical use, can be additionally used
Figure BDA0003659829380000077
And preset α ═ 0.7 and β ═ 0.3.
The values of alpha and beta can be dynamically adjusted according to the convergence condition of the model in the model evaluation process. As an example, the value of α may be adjusted repeatedly according to a preset increment policy, for example, the increment policy may be that the value of α is increased by 0.1 each time, and accordingly, the value of β is decreased by 0.1, after each adjustment, aggregation operation is performed based on the adjusted first weight coefficient and the adjusted second weight coefficient, respectively, and a finally trained global model is obtained, and whether the global model meets a preset convergence condition is evaluated, such as loss reduction of the model, and accuracy and recall rate increase until the model is relatively gentle. If the global model does not satisfy the predetermined convergence condition, the increase of the value of α may be stopped, and a predetermined value (e.g., 0.5) is subtracted from the value of α to serve as a first weighting factor for subsequent use, so as to determine β.
In an optional manner of the present disclosure, determining a first sub-aggregation result of the current training round based on each first model parameter includes:
determining a third weight coefficient of each first client in the current training round based on the sum of the sample amount of each first client participating in the training in the current training round and the sample amount of all first clients participating in the training in the current training round;
and determining a first sub-aggregation result of the current training turn based on the third weight coefficient and the first model parameter.
In the embodiment of the present disclosure, in the current training round, the third weight coefficient corresponding to each first client in the current training round is related to the sample size of the first client participating in training in the current training round. The third weight coefficient of each client in the current training round may be determined according to the sample size of the first client participating in the training in the current training round and the sum of the sample sizes of all the first clients participating in the training in the current training round. Specifically, the ratio of the sample size of the first client participating in the training in the current training round to the sum of the sample sizes of the first clients participating in the training in the current training round may be used as the third weight coefficient.
After the third weight coefficient is determined, the first model parameters corresponding to each client may be subjected to weighting operation based on the third weight coefficient, so as to obtain a first sub-aggregation result of the current training round.
In an optional manner of the present disclosure, determining a second sub-aggregation result of each historical training round based on a second model parameter of each historical training round respectively includes:
determining a fourth weight coefficient of the second client in each historical training round based on the sample amount of each second client participating in training in each historical training round and the sum of the sample amounts of all second clients participating in training in each historical training round;
and determining a second sub-aggregation result of each historical training turn based on the fourth weight coefficient and the second model parameter.
In the embodiment of the present disclosure, in each historical training round, a fourth weight coefficient corresponding to each second client in the historical training round is related to a sample size of the second client participating in training. For any historical training turn, a fourth weight coefficient of each second client in the historical training turn may be determined based on the sample size of each second client participating in training in the historical training turn and the sum of the sample sizes of all second clients participating in training in the historical training turn. Specifically, the ratio of the sample size of each second client participating in the training in the historical training turn to the sum of the sample sizes of all the second clients participating in the training in the historical training turn may be used as the fourth weight coefficient.
After the fourth weight coefficient is determined, the second model parameters corresponding to each second client may be subjected to weighting operation based on the fourth weight coefficient, so as to obtain a second sub-aggregation result of each historical training turn.
As an example, the aggregation result in the embodiment of the present disclosure may be calculated by the following formula four.
Figure BDA0003659829380000091
Wherein α represents a first weight coefficient, K represents the number of first clients, K represents any one of the first clients, nkRepresents the number of samples participating in model training in the current training round in the kth first client, NKRepresenting the total number of samples participating in training in all first clients in the current training round.
Figure BDA0003659829380000092
And reporting the first model parameter for the kth first client in the current training round.
t represents the number of preset rounds, p represents any historical trainingRun of time, betapAnd gamma represents a correction coefficient of the second weight coefficient. L represents the number of second clients, L represents any one of the second clients, nlRepresents the number of samples participating in model training in the p-th historical training round in the l-th second client, NLRepresenting the total number of samples participating in training in all second clients in the p-th historical training round.
Figure BDA0003659829380000093
And representing a second model parameter reported by a second client in the p-th historical training turn. w is at+1The polymerization result is shown.
As an example, the value of the preset round number may be dynamically adjusted according to the convergence condition of the model during the model evaluation process. As an example, the initial value of the preset round number may be adjusted repeatedly according to a preset adjustment strategy, for example, the adjustment strategy may be that the preset round number is increased by 1 each time, the federal learning process is performed based on the adjusted preset round number after each adjustment, and a finally trained global model is obtained, and whether the global model meets a preset convergence condition is evaluated, for example, the loss of the model decreases, and the accuracy and the recall rate increase until the model is relatively gentle. If the global model does not meet the preset convergence condition, the increase of the preset round times can be stopped, and one preset value (for example, 4) of the current preset round times is used as the preset round times for subsequent use.
Fig. 2 is a schematic diagram illustrating an evaluation index curve of a global model provided by an embodiment of the present disclosure, where fig. 2 includes a loss curve, an accuracy curve, and a call rate curve. The evaluation index curves in fig. 2 may reflect that the loss of the model is in a decreasing trend, the accuracy and the recall are in an increasing trend, and it may be shown that the global model is in a convergence state.
Fig. 3 shows a flow diagram of another federated learning model aggregation method provided in the embodiment of the present disclosure, and as shown in fig. 3, the method mainly includes:
step S310: the method comprises the steps of obtaining a first model parameter and a second model parameter, wherein the first model parameter is sent by at least one first client in the federal learning of a current training round, the second model parameter is sent by at least one second client in the federal learning of a historical training round, the second model parameter does not participate in aggregation operation, and the historical training round is a training round within a preset round number before the current training round;
step S320: determining a first sub-aggregation result of the current training round based on the first model parameter;
step S330: determining a second sub-aggregation result of each historical training round based on the second model parameter of each historical training round;
step S340: and performing aggregation operation of the current training round based on the first sub-aggregation result and a first weight coefficient corresponding to the current training round and based on each second sub-aggregation result and a second weight coefficient corresponding to each historical training round.
The client can receive a training task issued by the server and obtain the model parameters by executing the training task. The model parameters may be local model weight parameters of the client, etc.
The client can execute the training task of the current training round and obtain the model parameters, send the model parameters obtained by the current training round to the server, record the model parameters reported by the client in the current training round received by the server as first model parameters, and record the client reporting the first model parameters as the first client.
In the embodiment of the present disclosure, a preset round number may be specified, and a training round within the preset round number completed before the current training round may be determined as a historical training round. For example, if the current training round is the 10 th round and the preset round number is 5, the training round between the 5 th round and the 9 th round may be used as the historical training round.
In each historical training turn, the client may not report the model parameters in time, and the part of the model parameters may be reported to the server in a delayed manner and cannot participate in the aggregation operation. The model parameters reported by the client in the historical training round received by the server and not participating in the aggregation operation of the corresponding historical training round can be recorded as the second model parameters, and the client reporting the second model parameters can be recorded as the second client.
According to the method provided by the embodiment of the disclosure, the aggregation operation of the current training round is performed by acquiring the first model parameter sent by the first client in the federal learning of the current training round and the second model parameter which is sent by the second client in the federal learning of the historical training round and does not participate in the aggregation operation. In the scheme, the second model parameters which do not participate in the aggregation operation in the historical training turns are used for carrying out the aggregation operation, so that the second model parameters are effectively utilized, the waste of the model parameters is avoided, and the training efficiency of the model is improved.
In the embodiment of the disclosure, the first model parameter and the second model parameter are brought into the aggregation operation of the current training round together, so that the number of the model parameters participating in the aggregation operation is increased, and the training effect can be effectively ensured.
In the embodiment of the present disclosure, when performing aggregation operation on the first model parameters and the second model parameters, aggregation operation may be performed on each first model parameter to obtain a first sub-aggregation result of a current training round, aggregation operation may be performed on each second model parameter to obtain a second sub-aggregation result corresponding to each historical training round, and then aggregation operation of the current training round is performed according to the first sub-aggregation result and each second sub-aggregation result to determine an aggregation result.
In the embodiment of the disclosure, a first weight coefficient corresponding to a current training turn and a second weight coefficient corresponding to a historical training turn may be determined. And performing weighted operation on the first sub-aggregation result of the current training round and the second sub-aggregation result of each historical training round according to the first weight coefficient and the second weight coefficient to obtain an aggregation result.
Based on the same principle as the method shown in fig. 1, fig. 4 shows a schematic structural diagram of a federated learning model aggregation apparatus 40 provided in the embodiment of the present disclosure, and as shown in fig. 4, the federated learning model aggregation apparatus 40 may include:
a model parameter obtaining module 410, configured to obtain a first model parameter and a second model parameter, where the first model parameter is sent by at least one first client in federal learning of a current training round, the second model parameter is sent by at least one second client in federal learning of a historical training round, the second model parameter does not participate in aggregation operation, and the historical training round is a training round within a preset number of rounds before the current training round;
and the aggregation processing module 420 is configured to perform an aggregation operation of the current training round based on the first model parameter and the second model parameter.
According to the device provided by the embodiment of the disclosure, the aggregation operation of the current training round is performed on the basis of the first model parameter and the second model parameter by acquiring the first model parameter sent by the first client in the federal learning of the current training round and the second model parameter which is sent by the second client in the federal learning of the historical training round and does not participate in the aggregation operation. In the scheme, the second model parameters which do not participate in the aggregation operation in the historical training turns are used for carrying out the aggregation operation, so that the second model parameters are effectively utilized, the waste of the model parameters is avoided, and the training efficiency of the model is improved.
Optionally, the aggregation processing module is specifically configured to:
determining a first sub-aggregation result of the current training round based on the first model parameter;
determining a second sub-aggregation result of each historical training round based on the second model parameter of each historical training round;
and performing aggregation operation of the current training round based on the first sub-aggregation result and each second sub-aggregation result.
Optionally, when performing the aggregation operation of the current training round based on the first sub-aggregation result and each second sub-aggregation result, the aggregation processing module is specifically configured to:
and performing aggregation operation of the current training round based on the first sub-aggregation result and the first weight coefficient corresponding to the current training round, and based on each second sub-aggregation result and the second weight coefficient corresponding to each historical training round.
Optionally, the second weight coefficient is determined by:
determining a third weight coefficient corresponding to each historical training turn based on the offset degree of the first model parameter and the second model parameter corresponding to each historical training turn;
determining a fourth weight coefficient corresponding to each historical training round based on the total amount of samples participating in training in each historical training round and the total amount of samples participating in training in the current training round;
and determining a second weight coefficient based on the third weight coefficient and the fourth weight coefficient.
Optionally, the degree of offset is determined by:
constructing a first vector based on the first model parameter;
respectively constructing second vectors based on second model parameters corresponding to the historical training turns;
and determining the offset degree of the second model parameters and the first model parameters corresponding to each historical training turn based on the correlation between each second vector and the first vector.
Optionally, the first weight coefficient is determined by:
and determining the first weight coefficient based on the preset quantitative relation between the first weight coefficient and the second weight coefficient and based on the second weight coefficient.
Optionally, when determining the first sub-aggregation result of the current training round based on the first model parameter, the aggregation processing module is specifically configured to:
determining a third weight coefficient of each first client in the current training round based on the sum of the sample amount of each first client participating in the training in the current training round and the sample amount of all first clients participating in the training in the current training round;
and determining a first sub-aggregation result of the current training turn based on the third weight coefficient and the first model parameter.
Optionally, when the aggregation processing module determines the second sub-aggregation result of each historical training round based on the second model parameter of each historical training round, the aggregation processing module is specifically configured to:
determining a fourth weight coefficient of the second client in each historical training turn based on the sample amount of each second client participating in training in each historical training turn and the sum of the sample amounts of all the second clients participating in training in each historical training turn;
and determining a second sub-aggregation result of each historical training turn based on the fourth weight coefficient and the second model parameter.
It is understood that the above modules of the federal learning model aggregation device in the embodiment of the present disclosure have functions of implementing the corresponding steps of the federal learning model aggregation method in the embodiment shown in fig. 1. The function can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module of the federal learning model aggregation device, reference may be made to the corresponding description of the federal learning model aggregation method in the embodiment shown in fig. 1, which is not described herein again.
Based on the same principle as the method shown in fig. 3, fig. 5 shows a schematic structural diagram of another federated learning model aggregation apparatus 50 provided in the embodiment of the present disclosure, and as shown in fig. 5, the federated learning model aggregation apparatus 50 may include:
a model parameter obtaining module 510, configured to obtain a first model parameter and a second model parameter, where the first model parameter is sent by at least one first client in federal learning of a current training round, the second model parameter is sent by at least one second client in federal learning of a historical training round, the second model parameter is not involved in aggregation operation, and the historical training round is a training round within a preset number of rounds before the current training round;
a first sub-aggregation module 520, configured to determine a first sub-aggregation result of the current training round based on the first model parameter;
a second sub-aggregation module 530, configured to determine, based on the second model parameter of each historical training round, a second sub-aggregation result corresponding to each historical training round;
the aggregation processing module 540 is configured to perform aggregation operation on the current training round based on the first sub-aggregation result and the first weight coefficient corresponding to the current training round, and based on each second sub-aggregation result and the second weight coefficient corresponding to each historical training round.
According to the device provided by the embodiment of the disclosure, the aggregation operation of the current training round is performed on the basis of the first model parameter and the second model parameter by acquiring the first model parameter sent by the first client in the federal learning of the current training round and the second model parameter which is sent by the second client in the federal learning of the historical training round and does not participate in the aggregation operation. In the scheme, the second model parameters which do not participate in the aggregation operation in the historical training turns are used for carrying out the aggregation operation, so that the second model parameters are effectively utilized, the waste of the model parameters is avoided, and the training efficiency of the model is improved.
It is understood that the above modules of the federal learning model aggregation device in the embodiment of the present disclosure have functions of implementing the corresponding steps of the federal learning model aggregation method in the embodiment shown in fig. 3. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module of the federal learning model aggregation device, reference may be made to the corresponding description of the federal learning model aggregation method in the embodiment shown in fig. 3, which is not described herein again.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
The electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the federated learning model aggregation method as provided by embodiments of the present disclosure.
Compared with the prior art, the electronic equipment performs the aggregation operation of the current training round based on the first model parameters and the second model parameters by acquiring the first model parameters sent by the first client in the federal learning of the current training round and the second model parameters which are sent by the second client in the federal learning of the historical training round and do not participate in the aggregation operation. In the scheme, the second model parameters which do not participate in the aggregation operation in the historical training turns are used for carrying out the aggregation operation, so that the second model parameters are effectively utilized, the waste of the model parameters is avoided, and the training efficiency of the model is improved.
The readable storage medium is a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a federated learning model aggregation method as provided by embodiments of the present disclosure.
Compared with the prior art, the readable storage medium carries out aggregation operation on the current training round based on the first model parameters and the second model parameters by acquiring the first model parameters sent by the first client in the federal learning of the current training round and the second model parameters which are sent by the second client in the federal learning of the historical training round and do not participate in the aggregation operation. In the scheme, the second model parameters which do not participate in the aggregation operation in the historical training turns are used for carrying out the aggregation operation, so that the second model parameters are effectively utilized, the waste of the model parameters is avoided, and the training efficiency of the model is improved.
The computer program product, including a computer program, which when executed by a processor implements the federated learning model aggregation method as provided by embodiments of the present disclosure.
Compared with the prior art, the computer program product performs the aggregation operation of the current training round based on the first model parameter and the second model parameter by acquiring the first model parameter sent by the first client in the federal learning of the current training round and the second model parameter which is sent by the second client in the federal learning of the historical training round and does not participate in the aggregation operation. In the scheme, the second model parameters which do not participate in the aggregation operation in the historical training turns are used for carrying out the aggregation operation, so that the second model parameters are effectively utilized, the waste of the model parameters is avoided, and the training efficiency of the model is improved.
FIG. 6 illustrates a schematic block diagram of an example electronic device 60 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 60 includes a computing unit 610 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)620 or a computer program loaded from a storage unit 680 into a Random Access Memory (RAM) 630. In the RAM 630, various programs and data required for the operation of the device 60 can also be stored. The computing unit 610, the ROM 620, and the RAM 630 are connected to each other by a bus 640. An input/output (I/O) interface 650 is also connected to bus 640.
Various components in device 60 are connected to I/O interface 650, including: an input unit 660 such as a keyboard, a mouse, etc.; an output unit 670 such as various types of displays, speakers, and the like; a storage unit 680, such as a magnetic disk, optical disk, or the like; and a communication unit 690 such as a network card, modem, wireless communication transceiver, etc. The communication unit 690 allows the device 60 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 610 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 610 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 610 performs the federated learning model aggregation methods provided in embodiments of the present disclosure. For example, in some embodiments, performing the federated learning model aggregation methods provided in embodiments of the present disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 680. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 60 via the ROM 620 and/or the communication unit 690. When loaded into RAM 630 and executed by computing unit 610, may perform one or more steps of the federated learning model aggregation method provided in embodiments of the present disclosure. Alternatively, in other embodiments, the computing unit 610 may be configured in any other suitable manner (e.g., by way of firmware) to perform the federated learning model aggregation approach provided in embodiments of the present disclosure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (19)

1. A method for aggregating Federation learning models comprises the following steps:
obtaining a first model parameter and a second model parameter, wherein the first model parameter is sent by at least one first client in federal learning of a current training round, the second model parameter is sent by at least one second client in federal learning of a historical training round, the second model parameter does not participate in aggregation operation, and the historical training round is a training round within a preset round number before the current training round;
and performing the aggregation operation of the current training round based on the first model parameter and the second model parameter.
2. The method of claim 1, wherein the performing the aggregation operation for the current training round based on the first model parameter and the second model parameter comprises:
determining a first sub-aggregation result for the current training round based on the first model parameter;
determining a second sub-aggregation result of each historical training round based on the second model parameter of each historical training round respectively;
and performing the aggregation operation of the current training round based on the first sub-aggregation result and each second sub-aggregation result.
3. The method of claim 2, wherein the performing the aggregation operation for the current training round based on the first sub-aggregation result and each of the second sub-aggregation results comprises:
and performing aggregation operation of the current training round based on the first sub-aggregation result and a first weight coefficient corresponding to the current training round, and based on each second sub-aggregation result and a second weight coefficient corresponding to each historical training round.
4. The method of claim 3, the second weight coefficient being determined by:
determining a third weight coefficient corresponding to each historical training turn based on the offset degree of the second model parameter corresponding to each historical training turn and the first model parameter;
determining a fourth weight coefficient corresponding to each historical training round based on the total amount of samples participating in training in each historical training round and the total amount of samples participating in training in the current training round;
determining the second weight coefficient based on the third weight coefficient and the fourth weight coefficient.
5. The method of claim 4, wherein the degree of offset is determined by:
constructing a first vector based on the first model parameters;
respectively constructing second vectors based on second model parameters corresponding to the historical training rounds;
and determining the degree of deviation between the second model parameter and the first model parameter corresponding to each historical training turn based on the correlation between each second vector and the first vector.
6. The method according to any of claims 3-5, wherein the first weight coefficient is determined by:
and determining the first weight coefficient based on a preset quantity relation between the first weight coefficient and the second weight coefficient and based on the second weight coefficient.
7. The method of any of claims 2-6, wherein the determining a first sub-aggregation result for the current training round based on the first model parameter comprises:
determining a third weight coefficient of each first client in the current training round based on the sample amount of each first client participating in training in the current training round and the sum of the sample amounts of all the first clients participating in training in the current training round;
determining a first sub-aggregation result of the current training round based on the third weight coefficient and the first model parameter.
8. The method according to any of claims 2-7, wherein said determining a second sub-aggregate result for each of the historical training rounds based on the second model parameters for each of the historical training rounds, respectively, comprises:
determining a fourth weight coefficient of each historical training turn of the second client based on the sum of the sample amount of each second client participating in training in each historical training turn and the sample amount of all second clients participating in training in each historical training turn;
and determining a second sub-aggregation result of each historical training turn based on the fourth weight coefficient and the second model parameter.
9. A federated learning model aggregation device, comprising:
the model parameter acquiring module is used for acquiring a first model parameter and a second model parameter, wherein the first model parameter is sent by at least one first client in the federal learning of the current training round, the second model parameter is sent by at least one second client in the federal learning of a historical training round, the second model parameter does not participate in aggregation operation, and the historical training round is a training round within a preset round number before the current training round;
and the aggregation processing module is used for carrying out aggregation operation of the current training round based on the first model parameter and the second model parameter.
10. The apparatus according to claim 9, wherein the aggregation processing module is specifically configured to:
determining a first sub-aggregation result for the current training round based on the first model parameter;
determining a second sub-aggregation result of each historical training round based on the second model parameter of each historical training round respectively;
and performing the aggregation operation of the current training round based on the first sub-aggregation result and each second sub-aggregation result.
11. The apparatus according to claim 10, wherein the aggregation processing module, when performing the aggregation operation of the current training round based on the first sub-aggregation result and each of the second sub-aggregation results, is specifically configured to:
and performing aggregation operation of the current training round based on the first sub-aggregation result and a first weight coefficient corresponding to the current training round, and based on each second sub-aggregation result and a second weight coefficient corresponding to each historical training round.
12. The apparatus of claim 11, wherein the second weight coefficient is determined by:
determining a third weight coefficient corresponding to each historical training turn based on the offset degree of the second model parameter corresponding to each historical training turn and the first model parameter;
determining a fourth weight coefficient corresponding to each historical training round based on the total amount of samples participating in training in each historical training round and the total amount of samples participating in training in the current training round;
determining the second weight coefficient based on the third weight coefficient and the fourth weight coefficient.
13. The apparatus of claim 12, wherein the degree of offset is determined by:
constructing a first vector based on the first model parameters;
respectively constructing second vectors based on second model parameters corresponding to the historical training rounds;
and determining the degree of deviation between the second model parameter and the first model parameter corresponding to each historical training turn based on the correlation between each second vector and the first vector.
14. The apparatus according to any of claims 11-13, wherein the first weight coefficient is determined by:
and determining the first weight coefficient based on a preset quantity relation between the first weight coefficient and the second weight coefficient and based on the second weight coefficient.
15. The apparatus according to any of claims 10-14, wherein the aggregation processing module, when determining the first sub-aggregation result for the current training round based on the first model parameter, is specifically configured to:
determining a third weight coefficient of each first client in the current training round based on the sample amount of each first client participating in training in the current training round and the sum of the sample amounts of all the first clients participating in training in the current training round;
determining a first sub-aggregation result of the current training round based on the third weight coefficient and the first model parameter.
16. The apparatus according to any of claims 10 to 15, wherein the aggregation processing module, when determining the second sub-aggregation result for each of the historical training rounds based on the second model parameter for each of the historical training rounds, is specifically configured to:
determining a fourth weight coefficient of each historical training turn of the second client based on the sum of the sample amount of each second client participating in training in each historical training turn and the sample amount of all second clients participating in training in each historical training turn;
and determining a second sub-aggregation result of each historical training turn based on the fourth weight coefficient and the second model parameter.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
CN202210569972.7A 2022-05-24 2022-05-24 Federal learning model aggregation method and device, electronic equipment and readable storage medium Pending CN114742237A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210569972.7A CN114742237A (en) 2022-05-24 2022-05-24 Federal learning model aggregation method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210569972.7A CN114742237A (en) 2022-05-24 2022-05-24 Federal learning model aggregation method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114742237A true CN114742237A (en) 2022-07-12

Family

ID=82287137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210569972.7A Pending CN114742237A (en) 2022-05-24 2022-05-24 Federal learning model aggregation method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114742237A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115145966A (en) * 2022-09-05 2022-10-04 山东省计算中心(国家超级计算济南中心) Comparison federal learning method and system for heterogeneous data
CN116187473A (en) * 2023-01-19 2023-05-30 北京百度网讯科技有限公司 Federal learning method, apparatus, electronic device, and computer-readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115145966A (en) * 2022-09-05 2022-10-04 山东省计算中心(国家超级计算济南中心) Comparison federal learning method and system for heterogeneous data
CN115145966B (en) * 2022-09-05 2022-11-11 山东省计算中心(国家超级计算济南中心) Comparison federated learning method and system for heterogeneous data
CN116187473A (en) * 2023-01-19 2023-05-30 北京百度网讯科技有限公司 Federal learning method, apparatus, electronic device, and computer-readable storage medium
CN116187473B (en) * 2023-01-19 2024-02-06 北京百度网讯科技有限公司 Federal learning method, apparatus, electronic device, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN109697522B (en) Data prediction method and device
CN114742237A (en) Federal learning model aggregation method and device, electronic equipment and readable storage medium
CN114282670A (en) Neural network model compression method, device and storage medium
CN112506619B (en) Job processing method, job processing device, electronic equipment and storage medium
CN114187459A (en) Training method and device of target detection model, electronic equipment and storage medium
CN114119989A (en) Training method and device for image feature extraction model and electronic equipment
CN114781650A (en) Data processing method, device, equipment and storage medium
CN114492794A (en) Method, apparatus, device, medium and product for processing data
CN113052063A (en) Confidence threshold selection method, device, equipment and storage medium
CN114399513B (en) Method and device for training image segmentation model and image segmentation
CN114999665A (en) Data processing method and device, electronic equipment and storage medium
CN114998649A (en) Training method of image classification model, and image classification method and device
CN114021642A (en) Data processing method and device, electronic equipment and storage medium
CN113010782A (en) Demand amount acquisition method and device, electronic equipment and computer readable medium
CN116614379B (en) Bandwidth adjustment method and device for migration service and related equipment
CN116798592B (en) Method, device, equipment and storage medium for determining facility layout position
CN113407844B (en) Version recommendation method, device and equipment of applet framework and storage medium
CN116095198B (en) Intelligent construction method and system for laboratory data center
CN113360798B (en) Method, device, equipment and medium for identifying flooding data
CN113836242A (en) Data processing method and device, electronic equipment and readable storage medium
CN115622949A (en) Traffic scheduling method, device, equipment and medium
CN115358409A (en) Model aggregation method, server, device and storage medium in federal learning
CN117271882A (en) Multi-target fusion processing method and device, electronic equipment and storage medium
CN117236995A (en) Payment rate estimation method, device, equipment and storage medium
CN114546831A (en) Experiment index correction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination