CN115496121A - Model training method and device based on federal learning - Google Patents

Model training method and device based on federal learning Download PDF

Info

Publication number
CN115496121A
CN115496121A CN202210466824.2A CN202210466824A CN115496121A CN 115496121 A CN115496121 A CN 115496121A CN 202210466824 A CN202210466824 A CN 202210466824A CN 115496121 A CN115496121 A CN 115496121A
Authority
CN
China
Prior art keywords
gradient
user
model
round
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210466824.2A
Other languages
Chinese (zh)
Inventor
范晓亮
杨佩蓁
王铮
王程
俞容山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202210466824.2A priority Critical patent/CN115496121A/en
Publication of CN115496121A publication Critical patent/CN115496121A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a model training method and a device based on federal learning, wherein the method comprises the following steps: in each communication cycle, the server acquires model parameters and training loss uploaded by users and updates a user historical gradient list, wherein the user historical gradient list comprises the latest gradients of all the users and the number of cycles to which the gradients belong; the server calculates the average gradient of the users sampled in the current round; the server calculates the minimum value of cosine clip angle values between the user historical gradients not sampled by the current wheel and the average gradients, and obtains the updated global gradient of the wheel through the optimization of a nonlinear programming objective function with constraint, thereby obtaining a final aggregated model; therefore, the global model is optimized on the premise of not damaging the user model which is not sampled, the representativeness of the model to the user which is not sampled at the current wheel is improved, and the negative influence of the sampling deviation when the server selects the user is reduced, so that the accuracy and the fairness of the federal learning model are improved.

Description

Model training method and device based on federal learning
Technical Field
The invention relates to the technical field of deep learning, in particular to a model training method based on federal learning, a computer readable storage medium, computer equipment and a model training device based on federal learning.
Background
Today, with the rapid development of information technology, models trained on mass data through a machine learning algorithm are applied to various industries; however, with the increasing of people's privacy awareness and the implementation of related laws, it is no longer feasible to directly acquire user data for training a model; federal learning has been produced as a distributed machine learning paradigm with privacy protection; the method does not require a user to upload data any more, and only exchanges encrypted model parameters, so that the data and the calculation power of the user can be utilized to cooperatively train a model, and the privacy of the data of the user is protected.
In an actual application scene, due to specific habits and preferences of users, the data distribution of the users is not independently and identically distributed; secondly, due to the problems of limited communication resources, network bottlenecks and the like, users participating in training in each round of communication are obtained through sampling, and the representativeness of the aggregated model to the users who are not sampled is weak, so that the phenomenon of sampling deviation inevitably exists; not only is a burden brought to the early-stage communication, but also the user experience is poor due to the fact that the effect of some user models is ignored.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, one purpose of the present invention is to provide a model training method based on federal learning, which optimizes a global model on the premise of not damaging an un-sampled user model, improves the representativeness of the model on the un-sampled user at the current wheel, and reduces the negative influence of sampling deviation when a server selects the user, thereby improving the accuracy and fairness of the federal learning model.
A second object of the invention is to propose a computer-readable storage medium.
A third object of the invention is to propose a computer device.
The fourth purpose of the invention is to provide a model training device based on the federal learning.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a model training method based on federal learning, including the following steps: in each round of communication, a server randomly samples a user subset from a user set, and sends a global model to each corresponding client in the user subset; each client trains the global model according to the corresponding data set to obtain a model parameter corresponding to each client, and sends the model parameter to the server; the server side updates a user historical gradient set according to the model parameters and calculates the average gradient of each client side in the current round, wherein the user historical gradient set comprises gradients and corresponding round numbers; the server side obtains a user history gradient array of the client side which is not sampled in the current round of user set and a previous round relatively updated gradient array of the client side which is not sampled in the current round of user set, and updates the gradient weight according to the user history gradient array, the previous round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight; and the server side obtains an updated global model according to the updated gradient corresponding to each client side so as to update the model of the next round, and the training is completed until the specified number of communication rounds or the average training loss meets the requirement.
According to the model training method based on the federal learning, firstly, in each round of communication, a server randomly samples a user subset from a user set, and sends a global model to each corresponding client in the user subset; then each client trains the global model according to the corresponding data set to obtain a model parameter corresponding to each client, and sends the model parameter to the server; then the server side updates a user historical gradient set according to the model parameters and calculates the average gradient of each client side in the current round, wherein the user historical gradient set comprises the gradient and the corresponding round number; then, the server side obtains a user historical gradient array of the client side which is not sampled in the current round of user set and an upper round relatively updated gradient array of the client side which is not sampled in the current round of user set, and updates the gradient weight according to the user historical gradient array, the upper round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight; finally, the server side obtains an updated global model according to the updated gradient corresponding to each client side so as to update the model of the next round, and training is completed until the number of appointed communication rounds or the average training loss meets the requirement; therefore, the global model is optimized on the premise of not damaging the user model which is not sampled, the representativeness of the model to the user which is not sampled at the current wheel is improved, and the negative influence of the sampling deviation when the server selects the user is reduced, so that the accuracy and the fairness of the federal learning model are improved.
In addition, the model training method based on the federal learning proposed by the above embodiment of the present invention may also have the following additional technical features:
optionally, before each round of communication, the server side initializes the deep learning model parameters, saves the historical gradient set of the user, makes the number of communication rounds zero, and sets the number of rounds of the historical gradients to be recorded.
Optionally, the server side optimizes and updates the gradient weight according to the following objective function:
Figure BDA0003624624370000021
where ω is the gradient weight; w is the probability simplex; t represents transposition;
Figure BDA0003624624370000022
is the update gradient to be optimized; t is the current round number; g is the upper round relative update gradient array of the client which is not sampled in the current round of the user set; delta is a user history gradient array of the client which is not sampled in the user set of the current round;
Figure BDA0003624624370000023
is the average gradient per client for the current round.
Optionally, the gradient is updated according to the following formula:
Figure BDA0003624624370000024
wherein,
Figure BDA0003624624370000031
is the updated gradient, ω, of the current wheel * The updated gradient weights are optimized.
To achieve the above object, a second aspect of the present invention provides a computer-readable storage medium having a federal learning based model training program stored thereon, wherein the federal learning based model training program is executed by a processor to implement the federal learning based model training method as described above.
According to the computer-readable storage medium of the embodiment of the invention, the federate learning-based model training program is stored, so that the processor can realize the federate learning-based model training method when executing the federate learning-based model training program, thereby optimizing a global model on the premise of not damaging an un-sampled user model, improving the representativeness of the model on the un-sampled user at the current wheel, reducing the negative influence of sampling deviation when the server selects the user, and improving the accuracy and fairness of the federate learning model.
In order to achieve the above object, a third aspect of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for model training based on federal learning as described above when executing the computer program.
According to the computer equipment provided by the embodiment of the invention, the model training program based on the federal learning is stored through the memory, so that the processor can realize the above model training method based on the federal learning when executing the model training program based on the federal learning, therefore, the global model is optimized on the premise of not damaging the un-sampled user model, the representativeness of the model on the un-sampled user at the current wheel is improved, the negative influence of the sampling deviation when the server side selects the user is reduced, and the accuracy and the fairness of the federal learning model are improved.
In order to achieve the above object, a fourth aspect of the present invention provides a model training apparatus based on federal learning, including: the broadcasting module is used for randomly sampling a user subset from a user set in each round of communication and sending a global model to each corresponding client in the user subset; the local training module is used for training the global model according to the corresponding data set to obtain model parameters corresponding to each client, and sending the model parameters to the server; the gradient updating module is used for updating a user historical gradient set according to the model parameters and calculating the average gradient of each client in the current round, wherein the user historical gradient set comprises gradients and corresponding round numbers; acquiring a user history gradient array of the client which is not sampled in the user set of the current round and an upper round relatively updated gradient array of the client which is not sampled in the user set of the current round, and updating the gradient weight according to the user history gradient array, the upper round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight; and the model aggregation module is used for obtaining an updated global model according to the updated gradient corresponding to each client so as to update the model in the next round until the specified number of communication rounds is reached or the average training loss meets the requirement, and then finishing training.
According to the model training device based on the federal learning, a user subset is randomly sampled from a user set in each round of communication through a broadcasting module, and a global model is sent to each corresponding client in the user subset; the local training module trains the global model according to the corresponding data set to obtain model parameters corresponding to each client, and sends the model parameters to the server; the gradient updating module is used for updating a user historical gradient set according to the model parameters and calculating the average gradient of each client in the current round, wherein the user historical gradient set comprises gradients and corresponding round numbers; acquiring a user history gradient array of the client which is not sampled in the current round of user set and an upper round relatively updated gradient array of the client which is not sampled in the current round of user set, and updating the gradient weight according to the user history gradient array, the upper round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight; the model aggregation module obtains an updated global model according to the updated gradient corresponding to each client so as to update the model of the next round until the specified number of communication rounds is reached or the average training loss meets the requirement, and then the training is completed; therefore, the global model is optimized on the premise of not damaging the user model which is not sampled, the representativeness of the model to the user which is not sampled at the current wheel is improved, and the negative influence of the sampling deviation when the server selects the user is reduced, so that the accuracy and the fairness of the federal learning model are improved.
Drawings
FIG. 1 is a flow diagram illustrating a federated learning-based model training method according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a federated learning-based model training method in accordance with one embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a comparison before and after considering the gradient of an unsampled user in a current round of a model training method based on federated learning according to an embodiment of the present invention;
FIG. 4 is a block diagram of a model training apparatus based on federated learning in accordance with an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.
In order to better understand the above technical solution, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Fig. 1 is a schematic flow chart of a model training method based on federal learning according to an embodiment of the present invention, and as shown in fig. 1, the model training method based on federal learning includes the following steps:
s101, in each round of communication, a server randomly samples a user subset from a user set, and sends a global model to each corresponding client in the user subset.
That is, the server is from the set of users
Figure BDA0003624624370000051
Randomly sampling a user subset S with the size of K (t) ={c k K is equal to {1, 2.,. K }, and the model parameter theta of the current wheel is determined (t) Is broadcast to S (t) Of the user (2).
As an embodiment, before each round of communication, the server side initializes the parameters of the deep learning model, saves the historical gradient set of the user, makes the number of communication rounds zero, and sets the number of rounds of the historical gradient to be recorded.
That is, the server side initializes the deep learning model parameter θ (0) While preserving a user history gradient set
Figure BDA00036246243700000515
The communication round number t =0, and the round number τ of the history gradient to be recorded is set.
And S102, each client trains the global model according to the corresponding data set to obtain the model parameters corresponding to each client, and sends the model parameters to the server.
As one embodiment, user c selected in the t-wheel i Accepting a global model θ from a server (t) (ii) a User c k Using local data sets
Figure BDA0003624624370000052
Repeatedly training the model theta according to the learning rate eta and the batch size B (t) E local rounds are counted, and new model parameters are calculated
Figure BDA0003624624370000053
And loss of training
Figure BDA0003624624370000054
And sending to the server side.
And S103, the server side updates a user historical gradient set according to the model parameters and calculates the average gradient of each client side in the current round, wherein the user historical gradient set comprises the gradient and the corresponding round number.
That is, the server side accepts the data from S (t) The data uploaded by the user obtains the updated model parameters of the user
Figure BDA0003624624370000055
The server calculates the gradient of each user
Figure BDA0003624624370000056
Obtaining a set of user gradients
Figure BDA0003624624370000057
And updating the user history gradient set
Figure BDA0003624624370000058
The server calculates the average gradient of each user
Figure BDA0003624624370000059
And S104, the server side obtains a user history gradient array of the client side which is not sampled in the current round of user set and an upper round relatively updated gradient array of the client side which is not sampled in the current round of user set, and updates the gradient weight according to the user history gradient array, the upper round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight.
It should be noted that the historical gradient array of the user that is not sampled in the current round
Figure BDA00036246243700000510
Figure BDA00036246243700000511
And the relative update gradient array of the upper round which is not sampled to the user
Figure BDA00036246243700000512
Where C is the number of non-sampled users.
As an embodiment, the server side optimally updates the gradient weight according to the following objective function to obtain an updated gradient weight ω *
Figure BDA00036246243700000513
Where ω is the gradient weight; w is the probability simplex; t represents transposition;
Figure BDA00036246243700000514
is the update gradient to be optimized; t is the current round number; g is an upper round relative update gradient array of the clients which are not sampled in the current round of the user set; delta is the user of the client not sampled in the current round of the user setA historical gradient array;
Figure BDA0003624624370000061
is the average gradient per client for the current round.
As an embodiment, the gradient is updated according to the following formula:
Figure BDA0003624624370000062
wherein,
Figure BDA0003624624370000063
is the updated gradient of the current round and ω is the optimized updated gradient weight.
It should be noted that the server modifies the update gradient
Figure BDA0003624624370000064
In order to eliminate the effect of step size inconsistency on the optimized gradient.
And S105, the server side obtains an updated global model according to the updated gradient corresponding to each client side so as to update the model of the next round, and the training is completed until the specified number of communication rounds or the average training loss meets the requirement.
That is, the server side updates the model parameters
Figure BDA0003624624370000065
And issuing the model to all the clients and updating the next round, and stopping training if the number of training rounds reaches the specified number T or the average training loss meets the requirement.
In summary, as shown in fig. 2 to 3, in each communication round, the server obtains model parameters and training loss uploaded by the user, and updates the user history gradient list, where the user history gradient list records the latest gradient of all users and the number of rounds to which the gradient belongs; the server calculates the average gradient of the users sampled in the current round; and the server calculates the minimum value of cosine clip angle values between every two of the historical gradients of the users not sampled by the current round and the average gradient, and obtains the updated global gradient of the round through optimization of a nonlinear programming objective function with constraint, thereby obtaining the finally aggregated model. Therefore, by the federate learning aggregation optimization method considering the sampling deviation of the server, the aggregated model not only considers the sampled user model, but also considers the user model not sampled by the current wheel, namely, the global model is optimized on the premise of not damaging the user model not sampled by the current wheel, the representativeness of the model to the user not sampled by the current wheel is improved, the negative influence of the sampling deviation when the server selects the user is reduced, and the accuracy and the fairness of the federate learning model are improved.
To implement the above embodiments, embodiments of the present invention provide a computer-readable storage medium on which a federal learning based model training program is stored, which when executed by a processor implements a federal learning based model training method as described above.
According to the computer-readable storage medium of the embodiment of the invention, the Federal learning-based model training program is stored, so that the processor can realize the Federal learning-based model training method when executing the Federal learning-based model training program, therefore, the global model is optimized on the premise of not damaging the user model which is not sampled, the representativeness of the model to the user which is not sampled at the current wheel is improved, the negative influence of the sampling deviation when the server side selects the user is reduced, and the accuracy and fairness of the Federal learning model are improved.
In order to implement the foregoing embodiments, an embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the computer device implements the federate learning-based model training method as described above.
According to the computer equipment provided by the embodiment of the invention, the Federal learning-based model training program is stored through the memory, so that the processor can realize the Federal learning-based model training method when executing the Federal learning-based model training program, therefore, the global model is optimized on the premise of not damaging the un-sampled user model, the representativeness of the model to the un-sampled user at the current wheel is improved, the negative influence of the sampling deviation when the server side selects the user is reduced, and the accuracy and fairness of the Federal learning model are improved.
In order to implement the foregoing embodiment, an embodiment of the present invention further provides a model training apparatus based on federal learning, and as shown in fig. 4, the model training apparatus based on federal learning includes: broadcast module 10, local training module 20, gradient update module 30, and model aggregation module 40.
The broadcasting module 10 is configured to randomly sample a user subset from a user set in each round of communication, and send the global model to each client corresponding to the user subset; the local training module 20 is configured to train the global model according to the corresponding data set to obtain a model parameter corresponding to each client, and send the model parameter to the server; the gradient updating module 30 is configured to update a user history gradient set according to the model parameters, and calculate an average gradient of each client in the current round, where the user history gradient set includes a gradient and a corresponding round number; acquiring a user history gradient array of the client which is not sampled in the current round of user set and an upper round relatively updated gradient array of the client which is not sampled in the current round of user set, and updating the gradient weight according to the user history gradient array, the upper round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight; the model aggregation module 40 is configured to obtain an updated global model according to the updated gradient corresponding to each client, so as to update the model in the next round, and complete training until a specified number of communication rounds or an average training loss meets a requirement.
As an embodiment, the broadcasting module 10 is further configured to initialize deep learning model parameters at the server side before each round of communication is performed, and save a user history gradient set, make the number of communication rounds zero, and set the number of rounds of history gradients to be recorded.
As an embodiment, the server side optimizes and updates the gradient weights according to the following objective function:
Figure BDA0003624624370000071
where ω is the gradient weight; w is the probability simplex; t represents transposition;
Figure BDA0003624624370000072
is the update gradient to be optimized; t is the current round number; g is an upper round relative update gradient array of the client which is not sampled in the current round of user set; delta is a user history gradient array of the client which is not sampled in the current round of user set;
Figure BDA0003624624370000073
is the average gradient per client for the current round.
As an embodiment, the gradient is updated according to the following formula:
Figure BDA0003624624370000081
wherein,
Figure BDA0003624624370000082
is the updated gradient, ω, of the current wheel * Is to optimize the updated gradient weights.
In summary, according to the model training device based on federal learning in an embodiment of the present invention, in each communication round, a broadcast module randomly samples a user subset from a user set, and sends a global model to each corresponding client in the user subset; the local training module trains the global model according to the corresponding data set to obtain model parameters corresponding to each client, and sends the model parameters to the server; the gradient updating module is used for updating a user historical gradient set according to the model parameters and calculating the average gradient of each client in the current round, wherein the user historical gradient set comprises gradients and corresponding round numbers; acquiring a user history gradient array of the client which is not sampled in the current round of user set and an upper round relatively updated gradient array of the client which is not sampled in the current round of user set, and updating the gradient weight according to the user history gradient array, the upper round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight; the model aggregation module obtains an updated global model according to the updated gradient corresponding to each client so as to update the model of the next round until the specified number of communication rounds or the average training loss meets the requirement and then completes training; therefore, the global model is optimized on the premise of not damaging the user model which is not sampled, the representativeness of the model to the user which is not sampled at the current wheel is improved, and the negative influence of the sampling deviation when the server selects the user is reduced, so that the accuracy and the fairness of the federal learning model are improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "above," and "over" a second feature may be directly on or obliquely above the second feature, or simply mean that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the terminology used in the description presented above should not be understood as necessarily referring to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A model training method based on federal learning is characterized by comprising the following steps:
in each round of communication, a server randomly samples a user subset from a user set, and sends a global model to each corresponding client in the user subset;
each client side trains the global model according to the corresponding data set to obtain a model parameter corresponding to each client side, and sends the model parameter to the server side;
the server side updates a user historical gradient set according to the model parameters and calculates the average gradient of each client side in the current round, wherein the user historical gradient set comprises the gradient and the corresponding round number;
the server side obtains a user history gradient array of the client side which is not sampled in the current round of user set and a previous round relatively updated gradient array of the client side which is not sampled in the current round of user set, and updates the gradient weight according to the user history gradient array, the previous round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight;
and the server side obtains an updated global model according to the updated gradient corresponding to each client side so as to update the model of the next round, and the training is completed until the specified number of communication rounds or the average training loss meets the requirement.
2. The method of claim 1, wherein deep learning model parameters are initialized at the server side before each communication round, a historical gradient set of the user is saved, the number of communication rounds is set to zero, and the number of rounds of the historical gradient to be recorded is set.
3. The method for model training based on federated learning of claim 1, wherein the server side optimizes and updates the gradient weights according to the following objective function:
Figure FDA0003624624360000011
where ω is the gradient weight; w is the probability simplex; t represents transposition;
Figure FDA0003624624360000012
is the update gradient to be optimized; t is the current round number; g is an upper round relative update gradient array of the clients which are not sampled in the current round of the user set; delta is a user history gradient array of the client which is not sampled in the current round of the user set;
Figure FDA0003624624360000013
is the average gradient per client for the current round.
4. The method of federal learning based model training as defined in claim 3, wherein the gradient is updated according to the following formula:
Figure FDA0003624624360000014
wherein,
Figure FDA0003624624360000015
is the updated gradient, ω, of the current wheel * Is to optimize the updated gradient weights.
5. A computer-readable storage medium having stored thereon a federal learning based model training program which, when executed by a processor, implements a federal learning based model training method as claimed in any of claims 1-4.
6. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the federal learning based model training method as claimed in any of claims 1-4.
7. A model training device based on federal learning, comprising:
the broadcasting module is used for randomly sampling a user subset from a user set in each round of communication and sending a global model to each corresponding client in the user subset;
the local training module is used for training the global model according to the corresponding data set to obtain model parameters corresponding to each client, and sending the model parameters to the server;
the gradient updating module is used for updating a user historical gradient set according to the model parameters and calculating the average gradient of each client in the current round, wherein the user historical gradient set comprises the gradient and the corresponding round number; acquiring a user history gradient array of the client which is not sampled in the user set of the current round and an upper round relatively updated gradient array of the client which is not sampled in the user set of the current round, and updating the gradient weight according to the user history gradient array, the upper round relatively updated gradient array and the average gradient so as to update the gradient according to the updated gradient weight;
and the model aggregation module is used for obtaining an updated global model according to the updated gradient corresponding to each client so as to update the model in the next round, and the training is completed until the specified communication round number or the average training loss meets the requirement.
8. The federal learning-based model training device as claimed in claim 7, wherein the broadcasting module is further configured to initialize deep learning model parameters at the server side before each communication round, save a user historical gradient set, make the number of communication rounds zero, and set the number of rounds of historical gradients to be recorded.
9. The federal learning-based model training device of claim 7, wherein the server-side optimizes and updates the gradient weights according to the following objective function:
Figure FDA0003624624360000021
where ω is the gradient weight; w is the probability simplex; t represents transposition;
Figure FDA0003624624360000022
is the update gradient to be optimized; t is the current round number; g is an upper round relative update gradient array of the clients which are not sampled in the current round of the user set; delta is a user history gradient array of the client which is not sampled in the current round of the user set;
Figure FDA0003624624360000023
is the average gradient per client for the current round.
10. The federal learning based model training device as in claim 9, wherein the gradient is updated in accordance with the following formula:
Figure FDA0003624624360000031
wherein,
Figure FDA0003624624360000032
is the updated gradient, ω, of the current wheel * Is to optimize the updated gradient weights.
CN202210466824.2A 2022-04-29 2022-04-29 Model training method and device based on federal learning Pending CN115496121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210466824.2A CN115496121A (en) 2022-04-29 2022-04-29 Model training method and device based on federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210466824.2A CN115496121A (en) 2022-04-29 2022-04-29 Model training method and device based on federal learning

Publications (1)

Publication Number Publication Date
CN115496121A true CN115496121A (en) 2022-12-20

Family

ID=84464681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210466824.2A Pending CN115496121A (en) 2022-04-29 2022-04-29 Model training method and device based on federal learning

Country Status (1)

Country Link
CN (1) CN115496121A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436515A (en) * 2023-12-07 2024-01-23 四川警察学院 Federal learning method, system, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436515A (en) * 2023-12-07 2024-01-23 四川警察学院 Federal learning method, system, device and storage medium
CN117436515B (en) * 2023-12-07 2024-03-12 四川警察学院 Federal learning method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN109871702B (en) Federal model training method, system, apparatus, and computer-readable storage medium
CN110942154B (en) Data processing method, device, equipment and storage medium based on federal learning
CN110378488B (en) Client-side change federal training method, device, training terminal and storage medium
CN110929880A (en) Method and device for federated learning and computer readable storage medium
CN111260061B (en) Differential noise adding method and system in federated learning gradient exchange
CN113326949A (en) Model parameter optimization method and system for federal learning
CN112906046B (en) Model training method and device using single bit compressed sensing technology
CN115496121A (en) Model training method and device based on federal learning
CN111327355A (en) Unmanned aerial vehicle sensing and transmission time balancing method, device, medium and equipment
CN114912626A (en) Method for processing distributed data of federal learning mobile equipment based on summer pril value
CN117392483A (en) Album classification model training acceleration method, system and medium based on reinforcement learning
CN111459780B (en) User identification method and device, readable medium and electronic equipment
CN113361618A (en) Industrial data joint modeling method and system based on federal learning
CN115296927B (en) Block chain-based federal learning credible fusion excitation method and system
CN116720592A (en) Federal learning model training method and device, nonvolatile storage medium and electronic equipment
CN115796275A (en) Block chain-based federal learning method and device, electronic equipment and storage medium
CN116340841A (en) Data classification model training method based on transverse federal learning and related components thereof
CN117557870B (en) Classification model training method and system based on federal learning client selection
CN111461403A (en) Vehicle path planning method and device, computer readable storage medium and terminal
CN116778363B (en) Low-traffic reservoir area water environment risk identification method based on federal learning
CN113810212B (en) Root cause positioning method and device for 5G slice user complaints
CN116610868B (en) Sample labeling method, end-edge cloud cooperative training method and device
CN112506673B (en) Intelligent edge calculation-oriented collaborative model training task configuration method
CN114881229B (en) Personalized collaborative learning method and device based on parameter gradual freezing
CN116502237B (en) Digital twin platform security collaboration method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination